Re: [PATCH] [19/50] Experimental: detect if SVM is disabled by BIOS
On Sat, Sep 22, 2007 at 12:32:18AM +0200, Andi Kleen wrote: > > Also allow to set svm lock. > > TBD double check, documentation, i386 support > > Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> Could we have this patch tagged with x86 instead of "Experimental" in subject. Sam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources
On 9/21/07, Yinghai Lu <[EMAIL PROTECTED]> wrote: > On 9/21/07, Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > From: Robert Hancock <[EMAIL PROTECTED]> > > > > This path adds validation of the MMCONFIG table against the ACPI reserved > > motherboard resources. If the MMCONFIG table is found to be reserved in > > ACPI, we don't bother checking the E820 table. The PCI Express firmware > > spec apparently tells BIOS developers that reservation in ACPI is required > > and E820 reservation is optional, so checking against ACPI first makes > > sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though > > it is perfectly functional, the existing check needlessly disables MMCONFIG > > in these cases. > > > > In order to do this, MMCONFIG setup has been split into two phases. If PCI > > configuration type 1 is not available then MMCONFIG is enabled early as > > before. Otherwise, it is enabled later after the ACPI interpreter is > > enabled, since we need to be able to execute control methods in order to > > check the ACPI reserved resources. Presently this is just triggered off > > the end of ACPI interpreter initialization. > > > > There are a few other behavioral changes here: > > > > - Validate all MMCONFIG configurations provided, not just the first one. > > > > - Validate the entire required length of each configuration according to > > the provided ending bus number is reserved, not just the minimum required > > allocation. > > > > - Validate that the area is reserved even if we read it from the chipset > > directly and not from the MCFG table. This catches the case where the > > BIOS didn't set the location properly in the chipset and has mapped it > > over other things it shouldn't have. > > > > This also cleans up the MMCONFIG initialization functions so that they > > simply do nothing if MMCONFIG is not compiled in. > > > > Based on an original patch by Rajesh Shah from Intel. > > > > [EMAIL PROTECTED]: many fixes and cleanups] > > Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> > > Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> > > Cc: Rajesh Shah <[EMAIL PROTECTED]> > > Cc: Jesse Barnes <[EMAIL PROTECTED]> > > Acked-by: Linus Torvalds <[EMAIL PROTECTED]> > > Cc: Andi Kleen <[EMAIL PROTECTED]> > > Cc: Greg KH <[EMAIL PROTECTED]> > > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Also the titile is misleading: it is x86 instead of i386.. because it will affect x86_64 too. YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu
On Fri, Sep 21, 2007 at 06:45:39PM -0400, Dave Jones wrote: > On Sat, Sep 22, 2007 at 12:32:02AM +0200, Andi Kleen wrote: > > > > +Select this for: > > + Pentiums (Pentium 4, Pentium D, Celeron, Celeron D) corename: > > + -Willamette > > + -Northwood > > + -Mobile Pentium 4 > > + -Mobile Pentium 4 M > > + -Extreme Edition (Gallatin) > > + -Prescott > > + -Prescott 2M > > + -Cedar Mill > > + -Presler > > + -Smithfiled > > + Xeons (Intel Xeon, Xeon MP, Xeon LV, Xeon MV) corename: > > + -Foster > > + -Prestonia > > + -Gallatin > > + -Nocona > > + -Irwindale > > + -Cranford > > + -Potomac > > + -Paxville > > + -Dempsey > > This seems like yet another list that will need to be perpetually > kept up to date, and given 99% of users don't know the codename > of their core, just the marketing name, I question its value. As a bare minimum requirement the list presented here shall use same names as used in /proc/cpuinfo On this box I read: vendor_id : GenuineIntel model name : Pentium III (Coppermine) This info must be present in Kconfig text (help text) too. I always have trouble selecting the right CPU before so I welcome this patch that give me more info - and maybe a bit too much. Sam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: Build failures on ppc64_defconfig
On Thu, 20 Sep 2007, Satyam Sharma wrote: > > BTW ppc64_defconfig didn't quite like 2.6.23-rc6-mm1 either ... > IIRC I got build failures in: > drivers/net/spider_net.c [PATCH -mm] spider_net: Misc build fixes after recent netdev stats changes Unbreak the following: drivers/net/spider_net.c: In function 'spider_net_release_tx_chain': drivers/net/spider_net.c:818: error: 'dev' undeclared (first use in this function) drivers/net/spider_net.c:818: error: (Each undeclared identifier is reported only once drivers/net/spider_net.c:818: error: for each function it appears in.) drivers/net/spider_net.c: In function 'spider_net_xmit': drivers/net/spider_net.c:922: error: 'dev' undeclared (first use in this function) drivers/net/spider_net.c: In function 'spider_net_pass_skb_up': drivers/net/spider_net.c:1018: error: 'dev' undeclared (first use in this function) drivers/net/spider_net.c: In function 'spider_net_decode_one_descr': drivers/net/spider_net.c:1215: error: 'dev' undeclared (first use in this function) make[2]: *** [drivers/net/spider_net.o] Error 1 Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]> --- drivers/net/spider_net.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff -ruNp a/drivers/net/spider_net.c b/drivers/net/spider_net.c --- a/drivers/net/spider_net.c 2007-09-22 06:26:39.0 +0530 +++ b/drivers/net/spider_net.c 2007-09-22 12:12:23.0 +0530 @@ -795,6 +795,7 @@ spider_net_set_low_watermark(struct spid static int spider_net_release_tx_chain(struct spider_net_card *card, int brutal) { + struct net_device *dev = card->netdev; struct spider_net_descr_chain *chain = &card->tx_chain; struct spider_net_descr *descr; struct spider_net_hw_descr *hwdescr; @@ -919,7 +920,7 @@ spider_net_xmit(struct sk_buff *skb, str spider_net_release_tx_chain(card, 0); if (spider_net_prepare_tx_descr(card, skb) != 0) { - dev->stats.tx_dropped++; + netdev->stats.tx_dropped++; netif_stop_queue(netdev); return NETDEV_TX_BUSY; } @@ -979,16 +980,12 @@ static void spider_net_pass_skb_up(struct spider_net_descr *descr, struct spider_net_card *card) { - struct spider_net_hw_descr *hwdescr= descr->hwdescr; - struct sk_buff *skb; - struct net_device *netdev; - u32 data_status, data_error; - - data_status = hwdescr->data_status; - data_error = hwdescr->data_error; - netdev = card->netdev; + struct spider_net_hw_descr *hwdescr = descr->hwdescr; + struct sk_buff *skb = descr->skb; + struct net_device *netdev = card->netdev; + u32 data_status = hwdescr->data_status; + u32 data_error = hwdescr->data_error; - skb = descr->skb; skb_put(skb, hwdescr->valid_size); /* the card seems to add 2 bytes of junk in front @@ -1015,8 +1012,8 @@ spider_net_pass_skb_up(struct spider_net } /* update netdevice statistics */ - dev->stats.rx_packets++; - dev->stats.rx_bytes += skb->len; + netdev->stats.rx_packets++; + netdev->stats.rx_bytes += skb->len; /* pass skb up to stack */ netif_receive_skb(skb); @@ -1184,6 +1181,7 @@ static int spider_net_resync_tail_ptr(st static int spider_net_decode_one_descr(struct spider_net_card *card) { + struct net_device *dev = card->netdev; struct spider_net_descr_chain *chain = &card->rx_chain; struct spider_net_descr *descr = chain->tail; struct spider_net_hw_descr *hwdescr = descr->hwdescr; @@ -1210,7 +1208,7 @@ spider_net_decode_one_descr(struct spide (status == SPIDER_NET_DESCR_PROTECTION_ERROR) || (status == SPIDER_NET_DESCR_FORCE_END) ) { if (netif_msg_rx_err(card)) - dev_err(&card->netdev->dev, + dev_err(&dev->dev, "dropping RX descriptor with state %d\n", status); dev->stats.rx_dropped++; goto bad_desc; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources
On 9/21/07, Andi Kleen <[EMAIL PROTECTED]> wrote: > > From: Robert Hancock <[EMAIL PROTECTED]> > > This path adds validation of the MMCONFIG table against the ACPI reserved > motherboard resources. If the MMCONFIG table is found to be reserved in > ACPI, we don't bother checking the E820 table. The PCI Express firmware > spec apparently tells BIOS developers that reservation in ACPI is required > and E820 reservation is optional, so checking against ACPI first makes > sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though > it is perfectly functional, the existing check needlessly disables MMCONFIG > in these cases. > > In order to do this, MMCONFIG setup has been split into two phases. If PCI > configuration type 1 is not available then MMCONFIG is enabled early as > before. Otherwise, it is enabled later after the ACPI interpreter is > enabled, since we need to be able to execute control methods in order to > check the ACPI reserved resources. Presently this is just triggered off > the end of ACPI interpreter initialization. > > There are a few other behavioral changes here: > > - Validate all MMCONFIG configurations provided, not just the first one. > > - Validate the entire required length of each configuration according to > the provided ending bus number is reserved, not just the minimum required > allocation. > > - Validate that the area is reserved even if we read it from the chipset > directly and not from the MCFG table. This catches the case where the > BIOS didn't set the location properly in the chipset and has mapped it > over other things it shouldn't have. > > This also cleans up the MMCONFIG initialization functions so that they > simply do nothing if MMCONFIG is not compiled in. > > Based on an original patch by Rajesh Shah from Intel. > > [EMAIL PROTECTED]: many fixes and cleanups] > Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> > Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> > Cc: Rajesh Shah <[EMAIL PROTECTED]> > Cc: Jesse Barnes <[EMAIL PROTECTED]> > Acked-by: Linus Torvalds <[EMAIL PROTECTED]> > Cc: Andi Kleen <[EMAIL PROTECTED]> > Cc: Greg KH <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > --- > > arch/i386/pci/init.c|4 - > arch/i386/pci/mmconfig-shared.c | 151 > +++- > arch/i386/pci/pci.h |1 > drivers/acpi/bus.c |2 > include/linux/pci.h |8 ++ > 5 files changed, 144 insertions(+), 22 deletions(-) > > Index: linux/arch/i386/pci/init.c > === > --- linux.orig/arch/i386/pci/init.c > +++ linux/arch/i386/pci/init.c > @@ -11,9 +11,7 @@ static __init int pci_access_init(void) > #ifdef CONFIG_PCI_DIRECT > type = pci_direct_probe(); > #endif > -#ifdef CONFIG_PCI_MMCONFIG > - pci_mmcfg_init(type); > -#endif > + pci_mmcfg_early_init(type); > if (raw_pci_ops) > return 0; > #ifdef CONFIG_PCI_BIOS > Index: linux/arch/i386/pci/mmconfig-shared.c > === > --- linux.orig/arch/i386/pci/mmconfig-shared.c > +++ linux/arch/i386/pci/mmconfig-shared.c > @@ -206,9 +206,78 @@ static void __init pci_mmcfg_insert_reso > pci_mmcfg_resources_inserted = 1; > } > > -static void __init pci_mmcfg_reject_broken(int type) > +static acpi_status __init check_mcfg_resource(struct acpi_resource *res, > + void *data) > +{ > + struct resource *mcfg_res = data; > + struct acpi_resource_address64 address; > + acpi_status status; > + > + if (res->type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) { > + struct acpi_resource_fixed_memory32 *fixmem32 = > + &res->data.fixed_memory32; > + if (!fixmem32) > + return AE_OK; > + if ((mcfg_res->start >= fixmem32->address) && > + (mcfg_res->end < (fixmem32->address + > + fixmem32->address_length))) { > + mcfg_res->flags = 1; > + return AE_CTRL_TERMINATE; > + } > + } > + if ((res->type != ACPI_RESOURCE_TYPE_ADDRESS32) && > + (res->type != ACPI_RESOURCE_TYPE_ADDRESS64)) > + return AE_OK; > + > + status = acpi_resource_to_address64(res, &address); > + if (ACPI_FAILURE(status) || > + (address.address_length <= 0) || > + (address.resource_type != ACPI_MEMORY_RANGE)) > + return AE_OK; > + > + if ((mcfg_res->start >= address.minimum) && > + (mcfg_res->end < (address.minimum + address.address_length))) { > + mcfg_res->flags = 1; > + return AE_CTRL_TERMINATE; > + } > + return AE_OK; > +} > + > +static acpi_status __init find_mboard_resource(acpi_handle handle, u
Re: 2.6.23-rc6-mm1: Build failures on ppc64_defconfig
On Thu, 20 Sep 2007, Satyam Sharma wrote: > > BTW ppc64_defconfig didn't quite like 2.6.23-rc6-mm1 either ... > IIRC I got build failures in: > drivers/md/raid6int8.c This turned out to be a gcc bug -- I was using an old cross-compiler. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Distributed storage. Security attributes and ducumentation update.
Hi! > I'm pleased to announce third release of the distributed storage > subsystem, which allows to form a storage on top of remote and local > nodes, which in turn can be exported to another storage as a node to > form tree-like storages. How is this different from raid0/1 over nbd? Or raid0/1 over ata-over-ethernet? > +| DST storate ---| storage? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: Build failures on ppc64_defconfig
On Thu, 20 Sep 2007, Satyam Sharma wrote: > > BTW ppc64_defconfig didn't quite like 2.6.23-rc6-mm1 either ... > IIRC I got build failures in: > drivers/ata/pata_scc.c http://lkml.org/lkml/2007/9/21/557 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 7/7] Add documentation for extended crashkernel syntax
Hi! > This adds the documentation for the extended crashkernel syntax into > Documentation/kdump/kdump.txt. Should you also update kernel-parameters.txt? > +For example: > + > +crashkernel=512M-2G:64M,2G-:128M > + > +This would mean: > + > +1) if the RAM is smaller than 512M, then don't reserve anything > + (this is the "rescue" case) > +2) if the RAM size is between 512M and 2G, then reserve 64M > +3) if the RAM size is larger than 2G, then reserve 128M Why is this useful? I mean... if 64M is enough to save a dump, why use 128M? ...or does the required size somehow scale with memory in machine? (pagetables?) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: clockevents: fix resume logic
Hi! > > Ok, here we are. The bad one uses C2 which stops the local apic on the > > VAIO. I suspect we end up in the suspend/resume with going into C2 > > without the broadcast active. > > > > Can you try to get the output of SysRq-Q during the "it needs help from > > keyboard" period ? > > > > That's a bit tricky because hitting the keyboard is what unsticks things. > And the video is black after resume-from-RAM (has always been thus) and we Ok, can we try to fix the video issue for you? That should make the development easier... I assume you tried s2ram from suspend.sf.net, and no combination of switches helped? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [NFS] NFS on loopback locks up entire system(2.6.23-rc6)?
On 9/21/07, Trond Myklebust <[EMAIL PROTECTED]> wrote: > No. The requirement for 'hard' mounts is not that the server be up all > the time. The server can go up and down as it pleases: the client can > happily recover from that. > > The requirement is rather that nobody remove it permanently before the > application is done with it, and the partition is unmounted. That is > hardly unreasonable (it is the only way I know of to ensure data > integrity), and it is much less strict than the requirements for local > disks. Yes. I completely agree. This is required for data consistency. But in my testing, if one of the NFS server/mount goes offline for some point of time, the entire system slows down, especially IO. In my test program, I forked off 50 threads to do 4K writes on 50 different files in a NFS mounted directory. Now, I have turned off the NFS server and started another dd process on local disk ("dd if=/dev/zero of=/tmp/x count=1000") and this dd process progresses. I see I/O wait of 100% in vmstat. procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 0 21 0 2628416 15152 55102400 0 0 28 344 0 0 0 100 0 0 21 0 2628416 15152 55102400 0 08 340 0 0 0 100 0 0 21 0 2628416 15152 55102400 0 0 26 343 0 0 0 100 0 0 21 0 2628416 15152 55102400 0 08 341 0 0 0 100 0 0 21 0 2628416 15152 55102400 0 0 26 357 0 0 0 100 0 0 21 0 2628416 15152 55102400 0 08 325 0 0 0 100 0 0 21 0 2628416 15152 55102400 0 0 26 343 0 0 0 100 0 0 21 0 2628416 15152 55102400 0 08 325 0 0 0 100 0 I have about 4Gig of RAM in the system and most of the memory is free. I see only about 550MB in buffers, rest all is pretty much available. [EMAIL PROTECTED] ~]# free total used free sharedbuffers cached Mem: 3238004 6093402628664 0 15136 551024 -/+ buffers/cache: 431803194824 Swap: 4096532 04096532 Here is the stack trace for one of my test program threads and dd process, both of them are stuck in congestion_wait. -- PID: 3552 TASK: cb1fc610 CPU: 0 COMMAND: "dd" #0 [f5c04c38] schedule at c0624a34 #1 [f5c04cac] schedule_timeout at c06250ee #2 [f5c04cf0] io_schedule_timeout at c0624c15 #3 [f5c04d04] congestion_wait at c045eb7d #4 [f5c04d28] balance_dirty_pages_ratelimited_nr at c045ab91 #5 [f5c04d7c] generic_file_buffered_write at c0457148 #6 [f5c04e10] __generic_file_aio_write_nolock at c04576e5 #7 [f5c04e84] generic_file_aio_write at c0457799 #8 [f5c04eb4] ext3_file_write at ffd7 #9 [f5c04ed0] do_sync_write at c0472e27 #10 [f5c04f7c] vfs_write at c0473689 #11 [f5c04f98] sys_write at c0473c95 #12 [f5c04fb4] sysenter_entry at c0404ddf -- #0 [f6050c10] schedule at c0624a34 #1 [f6050c84] schedule_timeout at c06250ee #2 [f6050cc8] io_schedule_timeout at c0624c15 #3 [f6050cdc] congestion_wait at c045eb7d #4 [f6050d00] balance_dirty_pages_ratelimited_nr at c045ab91 #5 [f6050d54] generic_file_buffered_write at c0457148 #6 [f6050de8] __generic_file_aio_write_nolock at c04576e5 #7 [f6050e40] enqueue_entity at c042131f #8 [f6050e5c] generic_file_aio_write at c0457799 #9 [f6050e8c] nfs_file_write at f8f90cee #10 [f6050e9c] getnstimeofday at c043d3f7 #11 [f6050ed0] do_sync_write at c0472e27 #12 [f6050f7c] vfs_write at c0473689 #13 [f6050f98] sys_write at c0473c95 #14 [f6050fb4] sysenter_entry at c0404ddf --- Can this be worked around, since most of the RAM is available, dd process could infact find more memory for it's buffers rather than waiting due to NFS requests. I believe this could be one reason why file systems like VxFS use their own buffer cache different from system-wide buffer cache. Thanks --Chakri - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] missing null termination in power supply uevent
On Thu, Sep 20, 2007 at 12:06:10PM -0700, Stephen Hemminger wrote: > Need to null terminate environment. Found by inspection > while looking for similar problems to platform uevent bug > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> Much thanks, git-applymbox'ed to battery-2.6.git. I suppose this is serious enough, thus should hit 2.6.23. Though prior asking to pull I'll wait just a bit. Thanks, -- Anton Vorontsov email: [EMAIL PROTECTED] backup email: [EMAIL PROTECTED] irc://irc.freenode.net/bd2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] pcmcia: Convert io_req_t to use kio_addr_t
On Fri, Sep 21, 2007 at 11:39:36PM +0100, Alan Cox wrote: > On Fri, 21 Sep 2007 17:15:16 -0500 > Olof Johansson <[EMAIL PROTECTED]> wrote: > > > Convert the io_req_t members to kio_addr_t, to allow use on machines with > > more than 16 bits worth of IO ports (i.e. secondary busses on ppc64, etc). > > What about the formatting and field widths ? > > ulong would probably be a lot saner than kio_addr_t and yet more type > obfuscation. I don't think anyone uses ioports > 32bit. Certainly i386 takes an int port as parameter to {in,out}[bwl] (and it really only uses 16-bits). parisc uses 24 bits. I don't know what the various ppcs do, but pci bars can only be 32-bit for ioports. So my opinion is that ioports should be uint, not ulong. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RTC wakealarm write-only, still has 644 permissions
On Thursday 20 September 2007, Pavel Machek wrote: > Hi! > > > ...should they be changed to 200? Or perhaps file should be readable? No, mode 644 is fine. No reason to prevent "other" people from reading the alarm time (is there?) and if you write a legal value, that will work. So $SUBJECT is no problem at all. > > > > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm > > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# echo 132719 > wakealarm At which point I'd expect # echo $? would indicate the write failed. That's a LONG time in the past (January 2, 1970), so that setting would be rejected. > > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# ls -al wakealarm > > -rw-r--r-- 1 root root 0 Sep 20 12:30 wakealarm > > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm > > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm The alarm isn't set; so no value gets displayed. > > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# > > > > > > ...standard PC with reasonably recent kernel... Yeah, well a "standard PC" is chock full of fairly bizarrely glitchey hardware. Clocks and timers have more than their fair share, or x86_64 NOHZ support would be merged by now! > Hmm, something is definitely wrong in here. I sometimes _do_ get > something back. > > [EMAIL PROTECTED]:~# s2ram > Switching from vt9 to vt1 > > > switching back to vt9 > [EMAIL PROTECTED]:~# > [EMAIL PROTECTED]:~# > [EMAIL PROTECTED]:~# cd /sys/class/rtc/rtc0 > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# ls > date dev device@ name power/ since_epoch subsystem@ time > uevent wakealarm > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm > 2051629528 > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat power/wakeup > > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm > 2051629528 > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# date +%s > 1190285030 OK, in that situation you've definitely got some buglike behavior. My question is: how to fix it? The problem is that the RTC is reporting an alarm value with some fields flagged as "wildcard" -- e.g. day/month/year "out of range" so the hardware ignores those fields. This is very common on PC based RTCs, and much less common on embedded systems. (Which for some reason don't tend to cheap out on full date specs like PCs.) And those cause date reports to look like garbage; /proc/driver/rtc would show "**" in those fields, rather than trying to display the canonical "seconds since POSIX epoch" value. But the wakealarm code just calls rtc_tm_to_time(), which doesn't validate its fields and so will gladly spew the garbage you saw. (On PCs especially. This code was originally tested on sane embedded hardware.) Now, in the /dev/rtcX code there's some code working with a similar problem: ioctl(RTC_ALM_SET) morphs partial alarm dates into valid form before passing them down. This needs the same kind of fix, but going in the other direction -- and not always kicking in. That could go into either the wakealarm display code, or rtc_read_alarm(), or maybe someplace else. I'm not sure which fix would be best; maybe Alessandro has an opinion. I'd lean towards just fixing the wakealarm display code, except that would force anyone using that other routine to know about this rude "wildcard" convention, which is rather hardware-specific... and that's not really aligned with the goal of an RTC framework that "just works" without needing to know about such quirks. > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# echo 1190285050 > wakealarm That is, 20 seconds from "now" modulo timezone offsets. Better might be echo $(( $(cat since_epoch) + 20 )) > wakealarm which has no timezone offset issues. > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# s2ram > Switching from vt9 to vt1 > > > switching back to vt9 > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm There's some wierdness related to ACPI, that crept in sometime late in 2.6.21 (or thereabouts) ... where the RTC wake mechanism got broken by redefining the pm_ops functions, for hibernation at least. That MIGHT be related to what you observe here ... unclear what that was supposed to show. If the RTC alarm woke that system after 20 seconds, that's what you requested and all is fine. If not, and you had to wake it by hand, then you're seeing that issue with the redefinition of hibernation ops having borked the RTC wake mechanism interactions with ACPI. In both cases, I'd expect that the result is that no alarm is pending any more, so there's nothing to display. > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# date +%s > 1190285229 ... which BTW should be what the "since_epoch" file shows, other than the timezone offsets on some system RTCs. > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# > > Also, is there some documentation for wakealarm? "git show 3925a5ce44
Re: [patches] [PATCH] [12/50] x86_64: Untable __init references between IO data
On 9/21/07, Andi Kleen <[EMAIL PROTECTED]> wrote: > > Earlier patch added IO APIC setup into local APIC setup. This caused > modpost warnings. Fix them by untangling setup_local_APIC() and splitting > it into smaller functions. The IO APIC initialization is only called > for the BP init. > > Also removed some outdated debugging code and minor cleanup. > Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> > > --- > arch/x86_64/kernel/apic.c| 46 > --- > arch/x86_64/kernel/smpboot.c |8 +++ > include/asm-x86_64/apic.h|1 > 3 files changed, 31 insertions(+), 24 deletions(-) > > Index: linux/arch/x86_64/kernel/apic.c > === > --- linux.orig/arch/x86_64/kernel/apic.c > +++ linux/arch/x86_64/kernel/apic.c > @@ -323,7 +323,7 @@ void __init init_bsp_APIC(void) > > void __cpuinit setup_local_APIC (void) > { > - unsigned int value, maxlvt; > + unsigned int value; > int i, j; > > value = apic_read(APIC_LVR); > @@ -417,33 +417,22 @@ void __cpuinit setup_local_APIC (void) > else > value = APIC_DM_NMI | APIC_LVT_MASKED; > apic_write(APIC_LVT1, value); > +} > > +void __cpuinit lapic_setup_esr(void) static ? > +{ > + unsigned maxlvt = get_maxlvt(); > + apic_write(APIC_LVTERR, ERROR_APIC_VECTOR); > /* > -* Now enable IO-APICs, actually call clear_IO_APIC > -* We need clear_IO_APIC before enabling vector on BP > +* spec says clear errors after enabling vector. > */ > - if (!smp_processor_id()) > - if (!skip_ioapic_setup && nr_ioapics) > - enable_IO_APIC(); > - > - { > - unsigned oldvalue; > - maxlvt = get_maxlvt(); > - oldvalue = apic_read(APIC_ESR); > - value = ERROR_APIC_VECTOR; // enables sending errors > - apic_write(APIC_LVTERR, value); > - /* > -* spec says clear errors after enabling vector. > -*/ > - if (maxlvt > 3) > - apic_write(APIC_ESR, 0); > - value = apic_read(APIC_ESR); > - if (value != oldvalue) > - apic_printk(APIC_VERBOSE, > - "ESR value after enabling vector: %08x, after %08x\n", > - oldvalue, value); > - } > + if (maxlvt > 3) > + apic_write(APIC_ESR, 0); > +} > > +void __cpuinit end_local_APIC_setup(void) > +{ > + lapic_setup_esr(); > nmi_watchdog_default(); > setup_apic_nmi_watchdog(NULL); > apic_pm_activate(); > @@ -1178,6 +1167,15 @@ int __init APIC_init_uniprocessor (void) > > setup_local_APIC(); > > + /* > +* Now enable IO-APICs, actually call clear_IO_APIC > +* We need clear_IO_APIC before enabling vector on BP here it is uniprocessor... so +* We need clear_IO_APIC before enabling error vector > +*/ > + if (!skip_ioapic_setup && nr_ioapics) > + enable_IO_APIC(); could it cause modpost warning too? > + > + end_local_APIC_setup(); > + > if (smp_found_config && !skip_ioapic_setup && nr_ioapics) > setup_IO_APIC(); > else > Index: linux/arch/x86_64/kernel/smpboot.c > === > --- linux.orig/arch/x86_64/kernel/smpboot.c > +++ linux/arch/x86_64/kernel/smpboot.c > @@ -211,6 +211,7 @@ void __cpuinit smp_callin(void) > > Dprintk("CALLIN, before setup_local_APIC().\n"); > setup_local_APIC(); > + end_local_APIC_setup(); > > /* > * Get our bogomips. > @@ -870,6 +871,13 @@ void __init smp_prepare_cpus(unsigned in > */ > setup_local_APIC(); > > + /* > +* Enable IO APIC before setting up error vector > +*/ > + if (!skip_ioapic_setup && nr_ioapics) > + enable_IO_APIC(); > + end_local_APIC_setup(); > + > if (GET_APIC_ID(apic_read(APIC_ID)) != boot_cpu_id) { > panic("Boot APIC ID in local APIC unexpected (%d vs %d)", > GET_APIC_ID(apic_read(APIC_ID)), boot_cpu_id); > Index: linux/include/asm-x86_64/apic.h > === > --- linux.orig/include/asm-x86_64/apic.h > +++ linux/include/asm-x86_64/apic.h > @@ -73,6 +73,7 @@ extern void cache_APIC_registers (void); > extern void sync_Arb_IDs (void); sync_Arb_IDs is still left there? YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [13/50] x86: Fix and reenable CLFLUSH support in change_page_attr()
* Sat, 22 Sep 2007 00:32:11 +0200 (CEST) [] > - flush_map(&l); > + flush_map(&arg); + flush_map(&arg.l); CC arch/x86_64/mm/pageattr.o arch/x86_64/mm/pageattr.c: In function 'global_flush_tlb': arch/x86_64/mm/pageattr.c:274: warning: passing argument 1 of 'flush_map' from incompatible pointer type (for i386 seems too) [] > +#define PageFlush(p) test_bit(PG_owner_priv_1, &(p)->flags) > +#define SetPageFlush(p) set_bit(PG_owner_priv_1, &(p)->flags) > +#define TestClearPageFlush(p) test_and_clear_bit(PG_owner_priv_1, > &(p)->flags) Is it worth introducing more of that Pascal style? Yes, page stuff is all about it, but still. [] > +static struct page *flush_page(unsigned long address) > { > - if (!test_and_set_bit(PG_arch_1, &kpte_page->flags)) > - list_add(&kpte_page->lru, &df_list); > + struct page *p; > + if (!(pfn_valid(__pa(address) >> PAGE_SHIFT))) > + return NULL; > + p = virt_to_page(address); > + if ((PageFlush(p) || PageLRU(p)) && !test_bit(PG_arch_1, &p->flags)) > + return NULL; > + return p; > } Saves 16 bytes in non optimized compile (if tcc will ever do this :) static struct page *flush_page(unsigned long address) { struct page *p = NULL; if (pfn_valid(__pa(address) >> PAGE_SHIFT)) { p = virt_to_page(address); if (PageFlush(p) || PageLRU(p)) if (!test_bit(PG_arch_1, &p->flags)) p = NULL; } return p; } > static int > @@ -158,6 +185,18 @@ __change_page_attr(struct page *page, pg > kpte_page = virt_to_page(kpte); > BUG_ON(PageLRU(kpte_page)); > BUG_ON(PageCompound(kpte_page)); > + BUG_ON(PageLRU(kpte_page)); > + > + /* Do caching attributes change? > +Note: this will need changes if the PAT bit is used (it isn't > +currently) because that one varies between 2MB and 4K pages. */ > + if ((pte_val(*kpte)&_PAGE_CACHE) != (pgprot_val(prot)&_PAGE_CACHE)) { > + struct page *p = flush_page(address); > + if (!p) > + full_flush = 1; > + else > + save_page(p, 1); > + } > > if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) { > if (!pte_huge(*kpte)) { > @@ -189,7 +228,7 @@ __change_page_attr(struct page *page, pg >* replace it with a largepage. >*/ > > - save_page(kpte_page); > + save_page(kpte_page, 0); > if (!PageReserved(kpte_page)) { > if (cpu_has_pse && (page_private(kpte_page) == 0)) { > paravirt_release_pt(page_to_pfn(kpte_page)); > @@ -235,18 +274,22 @@ int change_page_attr(struct page *page, > > void global_flush_tlb(void) > { > - struct list_head l; > + struct flush_arg arg; > struct page *pg, *next; > > BUG_ON(irqs_disabled()); > > spin_lock_irq(&cpa_lock); > - list_replace_init(&df_list, &l); > + arg.full_flush = full_flush; > + full_flush = 0; > + list_replace_init(&df_list, &arg.l); > spin_unlock_irq(&cpa_lock); > - flush_map(&l); > - list_for_each_entry_safe(pg, next, &l, lru) { > + flush_map(&arg); i386 case here. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23-rc7 + radeonfb/s2ram
Hi, Today, out of curiosity, I pulled 2.6.23-rc7 (leave on the edge in a quiet weekend). Anyway, it seems that radeonfb and my: "01:05.0 VGA compatible controller: ATI Technologies Inc ATI Radeon XPRESS 200M 5955 (PCIE)" don't get along anymore, by: a) X somehow fails to initialize the card and everything moves really slow (I can see how surfaces are drawn pixel-by-pixel); furthermore, garbage stuff appears on the screen; b) after powering up from a s2ram, the system freezes; b) is not that bad, s2ram never worked on my machine (kjournald and some other kernel processes, enter disk-sleep and in a matter of seconds, everything just... freezes. I can type a few commands at the normal console but that is all); Following the advices in 'Documentation/power/s2ram.txt' helped. Using the regular VGA console got X on the right track (no more slowness); Now that I got my hands "dirty", I'm in the mood to make my s2ram work (I've been using Linux (exclusively) for three years now, it's about time I do a small contribution). What kernel option must I enable to determine why some processes enter (and stay in) disk-sleep? I'm on a laptop and I don't think it will withstand too many reboots :) I've also attached the output of lspci and dmesg. Maybe someone spots something. Thanks, -- Mihai Donțu [0.00] Linux version 2.6.23-rc7 ([EMAIL PROTECTED]) (gcc version 4.1.2 (Gentoo 4.1.2)) #3 PREEMPT Sat Sep 22 07:27:11 EEST 2007 [0.00] Command line: vga=791 nohz=on [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009fc00 (usable) [0.00] BIOS-e820: 0009fc00 - 000a (reserved) [0.00] BIOS-e820: 000e - 0010 (reserved) [0.00] BIOS-e820: 0010 - 37fd (usable) [0.00] BIOS-e820: 37fd - 37fefc00 (reserved) [0.00] BIOS-e820: 37fefc00 - 37ffb000 (ACPI NVS) [0.00] BIOS-e820: 37ffb000 - 4000 (reserved) [0.00] BIOS-e820: e000 - f000 (reserved) [0.00] BIOS-e820: fec0 - fec02000 (reserved) [0.00] BIOS-e820: ffb8 - ffc0 (reserved) [0.00] BIOS-e820: fff8 - 0001 (reserved) [0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used [0.00] Entering add_active_range(0, 256, 229328) 1 entries of 256 used [0.00] end_pfn_map = 1048576 [0.00] DMI 2.3 present. [0.00] ACPI: RSDP 000FE270, 0014 (r0 HP) [0.00] ACPI: RSDT 37FEFC84, 0034 (r1 HP 0944 22110520 HP 1) [0.00] ACPI: FACP 37FEFC00, 0084 (r2 HP 09442 HP 1) [0.00] ACPI: DSDT 37FEFD50, 7489 (r1 HPSB4001 MSFT 10E) [0.00] ACPI: FACS 37FFAE80, 0040 [0.00] ACPI: APIC 37FEFCB8, 005A (r1 HP 09441 HP 1) [0.00] ACPI: MCFG 37FEFD14, 003C (r1 HP 09441 HP 1) [0.00] ACPI: SSDT 37FF71D9, 0382 (r1 HP HPQPpc 1001 MSFT 10E) [0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used [0.00] Entering add_active_range(0, 256, 229328) 1 entries of 256 used [0.00] No mptable found. [0.00] Zone PFN ranges: [0.00] DMA 0 -> 4096 [0.00] DMA324096 -> 1048576 [0.00] Normal1048576 -> 1048576 [0.00] Movable zone start PFN for each node [0.00] early_node_map[2] active PFN ranges [0.00] 0:0 -> 159 [0.00] 0: 256 -> 229328 [0.00] On node 0 totalpages: 229231 [0.00] DMA zone: 56 pages used for memmap [0.00] DMA zone: 1960 pages reserved [0.00] DMA zone: 1983 pages, LIFO batch:0 [0.00] DMA32 zone: 3079 pages used for memmap [0.00] DMA32 zone: 222153 pages, LIFO batch:31 [0.00] Normal zone: 0 pages used for memmap [0.00] Movable zone: 0 pages used for memmap [0.00] ATI board detected. Disabling timer routing over 8254. [0.00] ACPI: PM-Timer IO Port: 0x8008 [0.00] ACPI: Local APIC address 0xfec01000 [0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) [0.00] Processor #0 (Bootup-CPU) [0.00] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) [0.00] ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0]) [0.00] IOAPIC[0]: apic_id 1, address 0xfec0, GSI 0-23 [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level) [0.00] ACPI: IRQ0 used by override. [0.00] ACPI: IRQ2 used by override. [0.00] Setting APIC routing to flat [0.00] Using ACPI (MADT) for SMP configuration information [0.00]
Re: [PATCH] [34/50] i386: Fix argument signedness warnings
Hi, On Sat, 22 Sep 2007, Andi Kleen wrote: > > From: Satyam Sharma <[EMAIL PROTECTED]> > > > These build warnings: > > In file included from include/asm/thread_info.h:16, > from include/linux/thread_info.h:21, > from include/linux/preempt.h:9, > from include/linux/spinlock.h:49, > from include/linux/vmalloc.h:4, > from arch/i386/boot/compressed/misc.c:14: > include/asm/processor.h: In function $B!F(Jcpuid_count$B!G(J: ^^ ^^ > include/asm/processor.h:615: warning: pointer targets in passing > argument 1 of $B!F(Jnative_cpuid$B!G(J differ in signedness > include/asm/processor.h:615: warning: pointer targets in passing > argument 2 of $B!F(Jnative_cpuid$B!G(J differ in signedness > include/asm/processor.h:615: warning: pointer targets in passing > argument 3 of $B!F(Jnative_cpuid$B!G(J differ in signedness > include/asm/processor.h:615: warning: pointer targets in passing > argument 4 of $B!F(Jnative_cpuid$B!G(J differ in signedness ^^^^ Yikes. My bad, I had faulty (default) alpine settings (and a sad combination of LANG=en_US.UTF-8) when I made and sent out that patch. Please ensure that this finally gets committed in a somewhat saner and more readable state to the tree. Thanks, Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 04/28] Add cmpxchg64 and cmpxchg64_local to powerpc
Mathieu Desnoyers writes: > Make sure that at least cmpxchg64_local is available on all architectures to > use > for unsigned long long values. > > Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]> Acked-by: Paul Mackerras <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.22.6 user-mode linux: use address instead of value as argument in os_free_irq_by_cb
Hi, There is a bug in os_free_irq_by_cb, when the first element of active_fds list is free, the value of active_fds is not updated, just value in stack is updated. The intresting thing is that without this patch, a poweroff in user mode linux guest will halt the host linux system.It seems that after the tracing thread is dead, the syscall to sys_reboot of the traced thread is executed by host. I don't know if it is another bug. Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/arch/um/include/os.h linux-2.6.22.6-lepton/arch/um/include/os.h --- linux-2.6.22.6/arch/um/include/os.h 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-lepton/arch/um/include/os.h 2007-09-22 12:15:28.0 +0800 @@ -325,7 +325,7 @@ extern void reboot_skas(void); extern int os_waiting_for_events(struct irq_fd *active_fds); extern int os_create_pollfd(int fd, int events, void *tmp_pfd, int size_tmpfds); extern void os_free_irq_by_cb(int (*test)(struct irq_fd *, void *), void *arg, - struct irq_fd *active_fds, struct irq_fd ***last_irq_ptr2); + struct irq_fd **active_fds_ptr, struct irq_fd ***last_irq_ptr2); extern void os_free_irq_later(struct irq_fd *active_fds, int irq, void *dev_id); extern int os_get_pollfd(int i); diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/arch/um/kernel/irq.c linux-2.6.22.6-lepton/arch/um/kernel/irq.c --- linux-2.6.22.6/arch/um/kernel/irq.c 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-lepton/arch/um/kernel/irq.c 2007-09-22 12:15:05.0 +0800 @@ -218,7 +218,7 @@ static void free_irq_by_cb(int (*test)(s unsigned long flags; spin_lock_irqsave(&irq_lock, flags); - os_free_irq_by_cb(test, arg, active_fds, &last_irq_ptr); + os_free_irq_by_cb(test, arg, &active_fds, &last_irq_ptr); spin_unlock_irqrestore(&irq_lock, flags); } diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/arch/um/os-Linux/irq.c linux-2.6.22.6-lepton/arch/um/os-Linux/irq.c --- linux-2.6.22.6/arch/um/os-Linux/irq.c 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-lepton/arch/um/os-Linux/irq.c2007-09-22 12:15:42.0 +0800 @@ -84,12 +84,12 @@ int os_create_pollfd(int fd, int events, } void os_free_irq_by_cb(int (*test)(struct irq_fd *, void *), void *arg, - struct irq_fd *active_fds, struct irq_fd ***last_irq_ptr2) + struct irq_fd **active_fds_ptr, struct irq_fd ***last_irq_ptr2) { struct irq_fd **prev; int i = 0; - prev = &active_fds; + prev = active_fds_ptr; while (*prev != NULL) { if ((*test)(*prev, arg)) { struct irq_fd *old_fd = *prev; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] user.c: use kmem_cache_zalloc()
On Fri, 21 Sep 2007, Andrew Morton wrote: > > On Fri, 21 Sep 2007 13:39:06 +0400 > Alexey Dobriyan <[EMAIL PROTECTED]> wrote: > > > Quite a few fields are zeroed during user_struct creation, so use > > kmem_cache_zalloc() -- save a few lines and #ifdef. Also will help avoid > > #ifdef CONFIG_POSIX_MQUEUE in next patch. > > > > Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> > > --- > > > > kernel/user.c | 13 + > > 1 file changed, 1 insertion(+), 12 deletions(-) > > > > --- a/kernel/user.c > > +++ b/kernel/user.c > > @@ -129,21 +129,11 @@ struct user_struct * alloc_uid(struct user_namespace > > *ns, uid_t uid) > > if (!up) { > > struct user_struct *new; > > > > - new = kmem_cache_alloc(uid_cachep, GFP_KERNEL); > > + new = kmem_cache_zalloc(uid_cachep, GFP_KERNEL); > > if (!new) > > return NULL; > > new->uid = uid; > > atomic_set(&new->__count, 1); > > - atomic_set(&new->processes, 0); > > - atomic_set(&new->files, 0); > > - atomic_set(&new->sigpending, 0); > > -#ifdef CONFIG_INOTIFY_USER > > - atomic_set(&new->inotify_watches, 0); > > - atomic_set(&new->inotify_devs, 0); > > -#endif > > - > > - new->mq_bytes = 0; > > - new->locked_shm = 0; > > > This assumes that setting an atomic_t to the all-zeroes pattern is > equivalent to atomic_set(v, 0). > > This happens to be true for all present architectures, afaik. But an > architecture which has crappy primitives could quite legitimately implement > its atomic_t as: > > typedef struct { > int counter; > spinlock_t lock; > } atomic_t; > > in which case your assumption breaks. Agreed, and this (implementing atomic ops using spinlocks) is already true for the CRIS platform. However, cris' implementation explicitly takes care to ensure that atomic_t contains just a solitary int member, and no spinlock_t's inside the atomic_t itself. [ include/asm-cris/arch-v32/atomic.h ] Of course, that "128" limits scalability, so no more than 128 CPUs can be executing atomic ops at any given instant of time, but admittedly I'm getting anal here myself ... (but probably that's often perfectly the right attitude to have too) > So it's all a bit theoretical and a bit anal, and I'm sure we're making the > same mistake in other places, but it's not a change I particularly like.. Hmm, it's borderline. Such changes make text smaller (in terms of LOC as well vmlinux size). But they also hurt grepping. Often we (at least I) want to grep for when is a variable/struct member/etc getting initialized or getting set/assigned to. Take this case, for example -- I bet it's important (for overall logic) that those variables get initialized to zero. But *zalloc() functions do that implicitly, so it wastes precious seconds or minutes of developer time when grepping that code. OTOH, we could make it standard practise to put a little comment on top of such *zalloc() usages, explicitly enumerating the struct members that that the *zalloc() is assumed to initialize to zero. Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch 1/2] Trace code and documentation (resend)
My last posting was mangled by my mailer. I hope this one is better. Also corrected Randy's concerns. Please see previous posting for more information: http://lkml.org/lkml/2007/9/19/4 (PATCH 0/2) Note: this patch requires "[Patch 2/2] Relay reset consumed" is applied. - Trace - Provides tracing primitives Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]> Signed-off-by: Martin Hunt <[EMAIL PROTECTED]> Signed-off-by: David Wilder <[EMAIL PROTECTED]> --- Documentation/trace/src/Makefile |7 + Documentation/trace/src/README | 18 + Documentation/trace/src/fork_trace.c | 119 +++ Documentation/trace/trace.txt| 164 ++ include/linux/trace.h| 99 ++ lib/Kconfig |9 + lib/Makefile |2 + lib/trace.c | 563 ++ 8 files changed, 981 insertions(+), 0 deletions(-) diff --git a/Documentation/trace/src/Makefile b/Documentation/trace/src/Makefile new file mode 100644 index 000..9ee4c72 --- /dev/null +++ b/Documentation/trace/src/Makefile @@ -0,0 +1,7 @@ +obj-m := fork_trace.o +KDIR := /lib/modules/$(shell uname -r)/build +PWD := $(shell pwd) +default: + $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules +clean: + rm -f *.mod.c *.ko *.o diff --git a/Documentation/trace/src/README b/Documentation/trace/src/README new file mode 100644 index 000..f538491 --- /dev/null +++ b/Documentation/trace/src/README @@ -0,0 +1,18 @@ +This small sample module creates a trace channel. It places a kprobe +on the function do_fork(). The value of current->pid is written to +the trace channel each time the kprobe is hit.. + +How to run the example: +$ mount -t debugfs /debug +$ make +$ insmod fork_trace.ko + +To view the data produced by the module: +$ cat /debug/trace_example/do_fork/trace0 + +Remove the module. +$ rmmod fork_trace + +The function trace_cleanup() is called when the module +is removed. This will cause the TRACE channel to be destroyed and the +corresponding files to disappear from the debug file system. diff --git a/Documentation/trace/src/fork_trace.c b/Documentation/trace/src/fork_trace.c new file mode 100644 index 000..8a7bc55 --- /dev/null +++ b/Documentation/trace/src/fork_trace.c @@ -0,0 +1,119 @@ +/* + * An example of using trace in a kprobes module + * + * Copyright (C) 2007 IBM Inc. + * + * David Wilder <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include +#include +#include +#include + +#define USE_GLOBAL_BUFFERS 1 +#define USE_FLIGHT 1 + +#define PROBE_POINT "do_fork" + +static struct kprobe kp; +static struct trace_info *kprobes_trace; + +#ifdef USE_GLOBAL_BUFFERS +static DEFINE_SPINLOCK(trace_lock); +#endif + +/* + * Send formatted trace data to trace channel. + * @note Preemption must be disabled to use this. + */ +static void trace_printf(struct trace_info *trace, const char *format, ...) +{ + va_list ap, aq; + char *record; + unsigned long flags; + int len; + + if (!trace) + return; + +#ifdef USE_GLOBAL_BUFFERS + spin_lock_irqsave(&trace_lock, flags); +#endif + if (trace_running(trace)) { + va_start(ap, format); + va_copy(aq, ap); + len = vsnprintf(NULL, 0, format, aq); + va_end(aq); + record = relay_reserve(trace->rchan, ++len); + if (record) + vsnprintf(record, len, format, ap); + va_end(ap); + } +#ifdef USE_GLOBAL_BUFFERS + spin_unlock_irqrestore(&trace_lock, flags); +#endif +} + +static int handler_pre(struct kprobe *p, struct pt_regs *regs) +{ + rcu_read_lock(); + trace_printf(kprobes_trace, "%d\n", current->pid); + rcu_read_unlock(); + return 0; +} + +int init_module(void) +{ + int ret; + u32 flags = 0; + +#ifdef USE_GLOBAL_BUFFERS + flags |= TRACE_GLOBAL_CHANNEL; +#endif + +#ifdef USE_FLIGHT + flags |= TRACE_FLIGHT_CHANNEL; +#endif + + /* setup the trace */ + kprobes_trace = trace_setup("trace_example", PROBE_POINT, +1024, 8, flags); + if (IS_ERR(kprobes_trace)) + return PTR_ERR(kprobes_trace); + + trace_start(kprobes_trace); + + /* setup the
[Patch 2/2] Relay reset consumed
This patch allows relay channels to be reset i.e. unconsumed. Basically allows a 'rewind' function for flight-recorder tracing. Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]> Signed-off-by: David Wilder <[EMAIL PROTECTED]> --- Documentation/filesystems/relay.txt | 11 ++ include/linux/relay.h |1 + kernel/relay.c | 58 --- 3 files changed, 65 insertions(+), 5 deletions(-) diff --git a/Documentation/filesystems/relay.txt b/Documentation/filesystems/relay.txt index 18d23f9..d31113a 100644 --- a/Documentation/filesystems/relay.txt +++ b/Documentation/filesystems/relay.txt @@ -161,6 +161,7 @@ TBD(curr. line MT:/API/) relay_close(chan) relay_flush(chan) relay_reset(chan) +relay_reset_consumed(chan) channel management typically called on instigation of userspace: @@ -452,6 +453,16 @@ state without reallocating channel buffer memory or destroying existing mappings. It should however only be called when it's safe to do so, i.e. when the channel isn't currently being written to. +The read(2) implementation always 'consumes' the bytes read, +i.e. those bytes won't be available again to subsequent reads. +Certain applications may nonetheless wish to allow the 'consumed' data +to be re-read; relay_reset_consumed() is provided for that purpose - +it resets the internal consumed counters for all buffers in the +channel. For example, if a first set of reads 'drains' the channel, +and then relay_reset_consumed() is called, a second set of reads will +get the exact same data (assuming no new data was written between the +first set of reads and the second). + Finally, there are a couple of utility callbacks that can be used for different purposes. buf_mapped() is called whenever a channel buffer is mmapped from user space and buf_unmapped() is called when it's diff --git a/include/linux/relay.h b/include/linux/relay.h index 6cd8c44..aca45fa 100644 --- a/include/linux/relay.h +++ b/include/linux/relay.h @@ -175,6 +175,7 @@ extern void relay_subbufs_consumed(struct rchan *chan, unsigned int cpu, size_t consumed); extern void relay_reset(struct rchan *chan); +extern void relay_reset_consumed(struct rchan *chan); extern int relay_buf_full(struct rchan_buf *buf); extern size_t relay_switch_subbuf(struct rchan_buf *buf, diff --git a/kernel/relay.c b/kernel/relay.c index 61134eb..6b55eaa 100644 --- a/kernel/relay.c +++ b/kernel/relay.c @@ -383,6 +383,57 @@ void relay_reset(struct rchan *chan) } EXPORT_SYMBOL_GPL(relay_reset); +/** + * __relay_reset_consumed - reset a channel buffer's consumed count + * @buf: the channel buffer + * + * See relay_reset_consumed for description of effect. + */ +static inline void __relay_reset_consumed(struct rchan_buf *buf) +{ + size_t n_subbufs = buf->chan->n_subbufs; + size_t produced = buf->subbufs_produced; + size_t consumed = buf->subbufs_consumed; + + if (produced < n_subbufs) + buf->subbufs_consumed = 0; + else { + consumed = produced - n_subbufs; + if (buf->offset) + consumed++; + buf->subbufs_consumed = consumed; + } + buf->bytes_consumed = 0; +} + +/** + * relay_reset_consumed - reset the channel's consumed counts + * @chan: the channel + * + * This has the effect of making all data previously read (and + * not overwritten by subsequent writes) from a channel available + * for reading again. + * + * NOTE: Care should be taken that the channel isn't actually + * being used by anything when this call is made. + */ +void relay_reset_consumed(struct rchan *chan) +{ + unsigned int i; + struct rchan_buf *prev = NULL; + + if (!chan) + return; + + for (i = 0; i < NR_CPUS; i++) { + if (!chan->buf[i] || chan->buf[i] == prev) + break; + __relay_reset_consumed(chan->buf[i]); + prev = chan->buf[i]; + } +} +EXPORT_SYMBOL_GPL(relay_reset_consumed); + /* * relay_open_buf - create a new relay channel buffer * @@ -845,11 +896,8 @@ static int relay_file_read_avail(struct rchan_buf *buf, size_t read_pos) return 1; } - if (unlikely(produced - consumed >= n_subbufs)) { - consumed = produced - n_subbufs + 1; - buf->subbufs_consumed = consumed; - buf->bytes_consumed = 0; - } + if (unlikely(produced - consumed >= n_subbufs)) + __relay_reset_consumed(buf); produced = (produced % n_subbufs) * subbuf_size + buf->offset; consumed = (consumed % n_subbufs) * subbuf_size + buf->bytes_consumed; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: [PATCH -mm] sb16: Shut up uninitialized var build warning
On 09/20/2007 07:52 PM, Denys Vlasenko wrote: On Sunday 02 September 2007 23:06, Rene Herman wrote: Blah. Your message has: Content-Type: TEXT/PLAIN; charset=iso-2022-jp This apparently is caused by a combination of GCC using groovy UTF tickmarks in its error messages when in a UTF locale and alpine believing it to be a great idea to automatically try for the "simplest" character set it can encode the content in. No idea why that means that iso-2022-jp is picked, but it is. While I could actually read the message this time you should see what iso-2022-jp does to my font. It's scary. Best solution as far as I'm concerned is slap a few GCC developers (not that it wil help, but it'll certainly feel good) and then teach alpine to go for UTF-8 directly if US-ASCII won't do. rotfl. Kindly give me permission to convert your email into gcc bugreport and/or to forward it to gcc mailing list. Blessings be upon thou, oh courageous one... Rene. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
On Fri, Sep 21, 2007 at 10:56:56PM -0400, Steven Rostedt wrote: > > [ sneaks away from the family for a bit to answer emails ] [ same here, now that you mention it... ] > -- > On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > > On Fri, Sep 21, 2007 at 09:19:22PM -0400, Steven Rostedt wrote: > > > > > > -- > > > On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > > > > > > > > > In any case, I will be looking at the scenarios more carefully. If > > > > > it turns out that GP_STAGES can indeed be cranked down a bit, well, > > > > > that is an easy change! I just fired off a POWER run with GP_STAGES > > > > > set to 3, will let you know how it goes. > > > > > > > > The first attempt blew up during boot badly enough that ABAT was unable > > > > to recover the machine (sorry, grahal!!!). Just for grins, I am trying > > > > it again on a machine that ABAT has had a better record of reviving... > > > > > > This still frightens the hell out of me. Going through 15 states and > > > failing. Seems the CPU is holding off writes for a long long time. That > > > means we flipped the counter 4 times, and that still wasn't good enough? > > > > Might be that the other machine has its 2.6.22 version of .config messed > > up. I will try booting it on a stock 2.6.22 kernel when it comes back > > to life -- not sure I ever did that before. Besides, the other similar > > machine seems to have gone down for the count, but without me torturing > > it... > > > > Also, keep in mind that various stages can "record" a memory misordering, > > for example, by incrementing the wrong counter. > > > > > Maybe I'll boot up my powerbook to see if it has the same issues. > > > > > > Well, I'm still finishing up on moving into my new house, so I wont be > > > available this weekend. > > > > The other machine not only booted, but has survived several minutes of > > rcutorture thus far. I am also trying POWER5 machine as well, as the > > one currently running is a POWER4, which is a bit less aggressive about > > memory reordering than is the POWER5. > > > > Even if they pass, I refuse to reduce GP_STAGES until proven safe. > > Trust me, you -don't- want to be unwittingly making use of a subtely > > busted RCU implementation!!! > > I totally agree. This is the same reason I want to understand -why- it > fails with 3 stages. To make sure that adding a 4th stage really does fix > it, and doesn't just make the chances for the bug smaller. Or if it really does break, as opposed to my having happened upon a sick or misconfigured machine. > I just have that funny feeling that we are band-aiding this for POWER with > extra stages and not really solving the bug. > > I could be totally out in left field on this. But the more people have a > good understanding of what is happening (this includes why things fail) > the more people in general can trust this code. Right now I'm thinking > you may be the only one that understands this code enough to trust it. I'm > just wanting you to help people like me to trust the code by understanding > and not just having faith in others. Agreed. Trusting me is grossly insufficient. For one thing, the Linux kernel has a reasonable chance of outliving me. > If you ever decide to give up jogging, we need to make sure that there are > people here that can still fill those running shoes (-: Well, I certainly don't jog as fast or as far as I used to! ;-) Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
On Fri, Sep 21, 2007 at 11:15:42PM -0400, Steven Rostedt wrote: > On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > On Fri, Sep 21, 2007 at 09:15:03PM -0400, Steven Rostedt wrote: > > > On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > > > On Fri, Sep 21, 2007 at 10:40:03AM -0400, Steven Rostedt wrote: > > > > > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote: [ . . . ] > > > Are we sure that adding all these grace periods stages is better than just > > > biting the bullet and put in a memory barrier? > > > > Good question. I believe so, because the extra stages don't require > > much additional processing, and because the ratio of rcu_read_lock() > > calls to the number of grace periods is extremely high. But, if I > > can prove it is safe, I will certainly decrease GP_STAGES or otherwise > > optimize the state machine. > > But until others besides yourself understand that state machine (doesn't > really need to be me) I would be worried about applying it without > barriers. The barriers may add a bit of overhead, but it adds some > confidence in the code. I'm arguing that we have barriers in there until > there's a fine understanding of why we fail with 3 stages and not 4. > Perhaps you don't have a box with enough cpus to fail at 4. > > I don't know how the higher ups in the kernel command line feel, but I > think that memory barriers on critical sections are justified. But if you > can show a proof that adding extra stages is sufficient to deal with > CPUS moving memory writes around, then so be it. But I'm still not > convinced that these extra stages are really solving the bug instead of > just making it much less likely to happen. > > Ingo praised this code since it had several years of testing in the RT > tree. But that version has barriers, so this new verison without the > barriers has not had that "run it through the grinder" feeling to it. Fair point... Though the -rt variant has its shortcomings as well, such as being unusable from NMI/SMI handlers. How about this: I continue running the GP_STAGES=3 run on the pair of POWER machines (which are both going strong, and I also get a document together describing the new version (and of course apply the changes we have discussed, and merge with recent CPU-hotplug changes -- Gautham Shenoy is currently working this), work out a good answer to "how big exactly does GP_STAGES need to be", test whatever that number is, assuming it is neither 3 nor 4, and figure out why the gekko-lp1 machine choked on GP_STAGES=3. Then we can work out the best path forward from wherever that ends up being. [ . . . ] Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: possible corrections in the docs (Re: [PATCH] [7/50] x86: expand /proc/interrupts to include missing vectors, v2)
Looks good to me. Joe Acked-by: Joe Korty <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Eyleme Cagri: Kardesime Dokunma, Olum Yasalari Kaldirilsin
Eyleme Cagri: Kardesime Dokunma, Olum Yasalari Kaldirilsin Nijeryali gocmen Festus Okey'in oldurulmesi bu soruna karsi hicbirimizin duyarsiz kalmamasi gerektigini gosterdi. Olum yasalarinin kaldirilmasi ve gocmen kardeslerimizle dayanismak icin 23 Eylul Pazar gunu saat 14.00'de Taksim Tramvay Duragi'nda bulusuyoruz. Herkesi yapilacak basin aciklamasina bekliyoruz. http://www.anarsistkomunizm.org/karakizilforum/viewtopic.php?t=736 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git] CFS-devel, group scheduler, fixes
Mike, Could you try this patch to see if it solves the latency problem? tong Changes: 1. Modified vruntime adjustment logic in set_task_cpu(). See comments in code. This fixed the negative vruntime problem. 2. This code in update_curr() seems to be wrong: if (unlikely(!curr)) return sync_vruntime(cfs_rq); cfs_rq->curr can be NULL even if cfs_rq->nr_running is non-zero (e.g., when an RT task is running). We only want to call sync_vruntime when cfs_rq->nr_running is 0. This fixed the large latency problem (at least in my tests). Signed-off-by: Tong Li <[EMAIL PROTECTED]> --- diff -uprN linux-2.6-sched-devel-orig/kernel/sched.c linux-2.6-sched-devel/kernel/sched.c --- linux-2.6-sched-devel-orig/kernel/sched.c 2007-09-20 12:15:41.0 -0700 +++ linux-2.6-sched-devel/kernel/sched.c2007-09-21 19:40:08.0 -0700 @@ -1033,9 +1033,20 @@ void set_task_cpu(struct task_struct *p, if (p->se.block_start) p->se.block_start -= clock_offset; #endif - if (likely(new_rq->cfs.min_vruntime)) - p->se.vruntime -= old_rq->cfs.min_vruntime - - new_rq->cfs.min_vruntime; + /* + * Reset p's vruntime if it moves to new_cpu whose min_vruntime is + * 100,000,000 (equivalent to 100ms for nice-0 tasks) larger or +* smaller than p's vruntime. This improves interactivity when +* pinned and unpinned tasks co-exist. For example, pinning a few +* tasks to a CPU can cause its min_vruntime much smaller than the +* other CPUs. If a task moves to this CPU, its vruntime can be so +* large it won't be scheduled until the locally pinned tasks' +* vruntimes catch up, causing large delays. +*/ + if (unlikely(old_cpu != new_cpu && p->se.vruntime && + (p->se.vruntime > new_rq->cfs.min_vruntime + 1 || + p->se.vruntime + 1 < new_rq->cfs.min_vruntime))) + p->se.vruntime = new_rq->cfs.min_vruntime; __set_task_cpu(p, new_cpu); } @@ -1599,6 +1610,7 @@ static void __sched_fork(struct task_str p->se.exec_start = 0; p->se.sum_exec_runtime = 0; p->se.prev_sum_exec_runtime = 0; + p->se.vruntime = 0; #ifdef CONFIG_SCHEDSTATS p->se.wait_start = 0; diff -uprN linux-2.6-sched-devel-orig/kernel/sched_fair.c linux-2.6-sched-devel/kernel/sched_fair.c --- linux-2.6-sched-devel-orig/kernel/sched_fair.c 2007-09-20 12:15:41.0 -0700 +++ linux-2.6-sched-devel/kernel/sched_fair.c 2007-09-21 17:23:09.0 -0700 @@ -306,8 +306,10 @@ static void update_curr(struct cfs_rq *c u64 now = rq_of(cfs_rq)->clock; unsigned long delta_exec; - if (unlikely(!curr)) + if (unlikely(!cfs_rq->nr_running)) return sync_vruntime(cfs_rq); + if (unlikely(!curr)) + return; /* * Get the amount of time the current task was running - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
possible corrections in the docs (Re: [PATCH] [7/50] x86: expand /proc/interrupts to include missing vectors, v2)
* Sat, 22 Sep 2007 00:32:05 +0200 (CEST) [] > Index: linux/Documentation/filesystems/proc.txt >=== > --- linux.orig/Documentation/filesystems/proc.txt > +++ linux/Documentation/filesystems/proc.txt > @@ -347,7 +347,40 @@ connects the CPUs in a SMP system. This > the IO-APIC automatically retry the transmission, so it should not be a big > problem, but you should read the SMP-FAQ. > > -In this context it could be interesting to note the new irq directory in 2.4. > +In 2.6.2* /proc/interrupts was expanded again. This time the goal was for > +/proc/interrupts to display every IRQ vector in use by the system, not > +just those considered 'most important'. The new vectors are: > + > + THR -- a threshold interrupt occurs when ECC memory correction is occuring > + at too high a frequency. Threshold interrupt machinery is often put > + into the ECC logic, as occasional ECC memory corrections are part of > + normal operation (due to random alpha particles), but sequences of > + ECC corrections or outright failures over some short interval usually > + indicate a memory chip that is about to fail. Note that not every > + platform has ECC threshold logic, and those that do generally require > + it to be explicitly turned on. + THR -- a threshold interrupt happens, when frequency of ECC memory + corrections is too high. Threshold interrupt machinery is often put + into the ECC hardware, and must be explicitly enabled, if so. Occasional + ECC memory corrections are part of the normal operation (ionizing radiation + background). Sequences of ECC corrections or outright failures over some + short interval, usually indicate a memory chip, that is about to fail + completely. (that "random alpha particles" bs, must be killed anyway) > + TRM -- a thermal event interrupt occurs when a temperature threshold > + has been exceeded for some CPU chip. This interrupt may also be generated > + when the temperature drops back to normal. > + > + SPU -- a spurious interrupt is some interrupt that was raised then lowered > + by some IO device before it could be fully processed by the APIC. Hence > + the APIC sees the interrupt but does not know what device it came from. > + For this case the APIC will generate the interrupt with a IRQ vector > + of 0xff. + SPU -- a spurious interrupt. This is an interrupt, that was raised then lowered + so quickly, that it was not fully processed by the APIC. Hence, + origin of it is unknown. + For this case, interrupt with a IRQ vector of 0xff will be generated. > + RES, CAL, TLB -- rescheduling, call and tlb flush interrupts are > + sent from one CPU to another per the needs of the OS. Typically, > + their statistics are used by kernel developers and interested users to > + determine the occurance of interrupt floods of the given type. + RES, CAL, TLB -- rescheduling, call and tlb flush interrupts, + produced by normal OS operation. Typically, + this information is used by kernel developers and interested users to + determine the occurance of interrupt floods of the given type. > +The above IRQ vectors are displayed only when relevent. For example, available? > +the threshold vector does not exist on x86_64 platforms. Others are > +suppressed when the system is a uniprocessor. As of this writing, only > +i386 and x86_64 platforms support the new IRQ vector displays. > + > +Of some interest is the introduction of the /proc/irq directory to 2.4. > It could be used to set IRQ to CPU affinity, this means that you can "hook" > an > IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of > the > irq subdir is one subdir for each IRQ, and one file; prof_cpu_mask _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
[took off Ingo, because he has my ISP blacklisted, and I'm tired of getting those return mail messages. He can read LKML or you can re-CC him. Sad since this is a topic he should be reading. ] -- On Fri, 21 Sep 2007, Paul E. McKenney wrote: > On Fri, Sep 21, 2007 at 09:15:03PM -0400, Steven Rostedt wrote: > > On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > > On Fri, Sep 21, 2007 at 10:40:03AM -0400, Steven Rostedt wrote: > > > > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote: > > [ . . . ] > > > > > > + /* > > > > > + * Take the next transition(s) through the RCU grace-period > > > > > + * flip-counter state machine. > > > > > + */ > > > > > + > > > > > + switch (rcu_try_flip_state) { > > > > > + case rcu_try_flip_idle_state: > > > > > + if (rcu_try_flip_idle()) > > > > > + rcu_try_flip_state = rcu_try_flip_waitack_state; > > > > > > > > Just trying to understand all this. Here at flip_idle, only a CPU with > > > > no pending RCU calls will flip it. Then all the cpus flags will be set > > > > to rcu_flipped, and the ctrl.completed counter is incremented. > > > > > > s/no pending RCU calls/at least one pending RCU call/, but otherwise > > > spot on. > > > > > > So if the RCU grace-period machinery is idle, the first CPU to take > > > a scheduling-clock interrupt after having posted an RCU callback will > > > get things going. > > > > I said 'no' becaues of this: > > > > +rcu_try_flip_idle(void) > > +{ > > + int cpu; > > + > > + RCU_TRACE_ME(rcupreempt_trace_try_flip_i1); > > + if (!rcu_pending(smp_processor_id())) { > > + RCU_TRACE_ME(rcupreempt_trace_try_flip_ie1); > > + return 0; > > + } > > > > But now I'm a bit more confused. :-/ > > > > Looking at the caller in kernel/timer.c I see > > > > if (rcu_pending(cpu)) > > rcu_check_callbacks(cpu, user_tick); > > > > And rcu_check_callbacks is the caller of rcu_try_flip. The confusion is > > that we call this when we have a pending rcu, but if we have a pending > > rcu, we won't flip the counter ?? > > We don't enter unless there is something for RCU to do (might be a > pending callback, for example, but might also be needing to acknowledge > a counter flip). If, by the time we get to rcu_try_flip_idle(), there > is no longer anything to do (!rcu_pending()), we bail. > > So a given CPU kicks the state machine out of idle only if it -still- > has something to do once it gets to rcu_try_flip_idle(), right? > Now I can slap my forehead! Duh, I wasn't seeing that ! in front of the rcu_pending condition in the rcu_try_flip_idle. We only flip if we do indeed have something pending. I need some sleep. I also need to re-evaluate some of my analysis of that code. But it doesn't change my opinion of the stages. > > > > Are we sure that adding all these grace periods stages is better than just > > biting the bullet and put in a memory barrier? > > Good question. I believe so, because the extra stages don't require > much additional processing, and because the ratio of rcu_read_lock() > calls to the number of grace periods is extremely high. But, if I > can prove it is safe, I will certainly decrease GP_STAGES or otherwise > optimize the state machine. But until others besides yourself understand that state machine (doesn't really need to be me) I would be worried about applying it without barriers. The barriers may add a bit of overhead, but it adds some confidence in the code. I'm arguing that we have barriers in there until there's a fine understanding of why we fail with 3 stages and not 4. Perhaps you don't have a box with enough cpus to fail at 4. I don't know how the higher ups in the kernel command line feel, but I think that memory barriers on critical sections are justified. But if you can show a proof that adding extra stages is sufficient to deal with CPUS moving memory writes around, then so be it. But I'm still not convinced that these extra stages are really solving the bug instead of just making it much less likely to happen. Ingo praised this code since it had several years of testing in the RT tree. But that version has barriers, so this new verison without the barriers has not had that "run it through the grinder" feeling to it. > > [ . . . ] > > > > > OK, that's all I have on this patch (will take a bit of a break before > > > > reviewing your other patches). But I will say that RCU has grown quite > > > > a bit, and is looking very good. > > > > > > Glad you like it, and thank you again for the careful and thorough review. > > > > I'm scared to do the preempt portion %^O > > Ummm... This -was- the preempt portion. ;-) hehe, I do need sleep I meant the boosting portion. > > > > > Basically, what I'm saying is "Great work, Paul!". This is looking > > > > good. Seems that we just need a little bit better explanation for those > > > > that are not up at the IQ level of you. I can write som
Re: [PATCH] [11/45] x86_64: Remove rogue default m in drivers/video/Kconfig
Acked-by: Len Brown <[EMAIL PROTECTED]> Sorry, i thought we fixed this earlier. thanks, -Len On Friday 21 September 2007 16:44, Andi Kleen wrote: > > default m is near always wrong, like here. For some reason ACPI > likes to reintroduce these and I like to immediately squash them again > before they pollute too many .configs. > > Cc: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > > Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> > > --- > drivers/video/Kconfig |1 - > 1 file changed, 1 deletion(-) > > Index: linux/drivers/video/Kconfig > === > --- linux.orig/drivers/video/Kconfig > +++ linux/drivers/video/Kconfig > @@ -14,7 +14,6 @@ config VGASTATE > > config VIDEO_OUTPUT_CONTROL > tristate "Lowlevel video output switch controls" > - default m > help > This framework adds support for low-level control of the video > output switch. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
[ sneaks away from the family for a bit to answer emails ] -- On Fri, 21 Sep 2007, Paul E. McKenney wrote: > On Fri, Sep 21, 2007 at 09:19:22PM -0400, Steven Rostedt wrote: > > > > -- > > On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > > > > > > > In any case, I will be looking at the scenarios more carefully. If > > > > it turns out that GP_STAGES can indeed be cranked down a bit, well, > > > > that is an easy change! I just fired off a POWER run with GP_STAGES > > > > set to 3, will let you know how it goes. > > > > > > The first attempt blew up during boot badly enough that ABAT was unable > > > to recover the machine (sorry, grahal!!!). Just for grins, I am trying > > > it again on a machine that ABAT has had a better record of reviving... > > > > This still frightens the hell out of me. Going through 15 states and > > failing. Seems the CPU is holding off writes for a long long time. That > > means we flipped the counter 4 times, and that still wasn't good enough? > > Might be that the other machine has its 2.6.22 version of .config messed > up. I will try booting it on a stock 2.6.22 kernel when it comes back > to life -- not sure I ever did that before. Besides, the other similar > machine seems to have gone down for the count, but without me torturing > it... > > Also, keep in mind that various stages can "record" a memory misordering, > for example, by incrementing the wrong counter. > > > Maybe I'll boot up my powerbook to see if it has the same issues. > > > > Well, I'm still finishing up on moving into my new house, so I wont be > > available this weekend. > > The other machine not only booted, but has survived several minutes of > rcutorture thus far. I am also trying POWER5 machine as well, as the > one currently running is a POWER4, which is a bit less aggressive about > memory reordering than is the POWER5. > > Even if they pass, I refuse to reduce GP_STAGES until proven safe. > Trust me, you -don't- want to be unwittingly making use of a subtely > busted RCU implementation!!! I totally agree. This is the same reason I want to understand -why- it fails with 3 stages. To make sure that adding a 4th stage really does fix it, and doesn't just make the chances for the bug smaller. I just have that funny feeling that we are band-aiding this for POWER with extra stages and not really solving the bug. I could be totally out in left field on this. But the more people have a good understanding of what is happening (this includes why things fail) the more people in general can trust this code. Right now I'm thinking you may be the only one that understands this code enough to trust it. I'm just wanting you to help people like me to trust the code by understanding and not just having faith in others. If you ever decide to give up jogging, we need to make sure that there are people here that can still fill those running shoes (-: -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: Build failure on ppc64 drivers/ata/pata_scc.c
Hi, On Thu, 20 Sep 2007, Alan Cox wrote: > > On Thu, 20 Sep 2007 14:13:15 +0100 > [EMAIL PROTECTED] (Mel Gorman) wrote: > > > PPC64 building allmodconfig fails to compile drivers/ata/pata_scc.c . It > > doesn't show up on other arches because this driver is specific to the > > architecture. > > > > drivers/ata/pata_scc.c: In function `scc_bmdma_status' > > Its not been updated to match the libata core changes. Try something like > this. Whoever is maintaining it should also remove the prereset cable handling > code and use the proper cable detect method. It appears you forgot to fix scc_std_softreset() and one of its callsites in scc_bdma_stop(). A complete patch is attempted below -- please review. [PATCH -mm] pata_scc: Keep up with libata core API changes Little fixlets, that the build started erroring / warning about: drivers/ata/pata_scc.c: In function 'scc_bmdma_status': drivers/ata/pata_scc.c:734: error: structure has no member named 'active_tag' drivers/ata/pata_scc.c: In function 'scc_pata_prereset': drivers/ata/pata_scc.c:866: warning: passing arg 1 of 'ata_std_prereset' from incompatible pointer type drivers/ata/pata_scc.c: In function 'scc_error_handler': drivers/ata/pata_scc.c:908: warning: passing arg 2 of 'ata_bmdma_drive_eh' from incompatible pointer type drivers/ata/pata_scc.c:908: warning: passing arg 3 of 'ata_bmdma_drive_eh' from incompatible pointer type drivers/ata/pata_scc.c:908: warning: passing arg 5 of 'ata_bmdma_drive_eh' from incompatible pointer type make[2]: *** [drivers/ata/pata_scc.o] Error 1 Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]> Cc: Alan Cox <[EMAIL PROTECTED]> Cc: Mel Gorman <[EMAIL PROTECTED]> --- Andrew, this includes (supercedes) the previous two ones from Mel / Alan. drivers/ata/pata_scc.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff -ruNp a/drivers/ata/pata_scc.c b/drivers/ata/pata_scc.c --- a/drivers/ata/pata_scc.c2007-09-22 06:26:37.0 +0530 +++ b/drivers/ata/pata_scc.c2007-09-22 08:04:42.0 +0530 @@ -594,16 +594,17 @@ static unsigned int scc_bus_softreset(st * Note: Original code is ata_std_softreset(). */ -static int scc_std_softreset (struct ata_port *ap, unsigned int *classes, - unsigned long deadline) +static int scc_std_softreset(struct ata_link *link, unsigned int *classes, + unsigned long deadline) { + struct ata_port *ap = link->ap; unsigned int slave_possible = ap->flags & ATA_FLAG_SLAVE_POSS; unsigned int devmask = 0, err_mask; u8 err; DPRINTK("ENTER\n"); - if (ata_link_offline(&ap->link)) { + if (ata_link_offline(link)) { classes[0] = ATA_DEV_NONE; goto out; } @@ -692,7 +693,7 @@ static void scc_bmdma_stop (struct ata_q printk(KERN_WARNING "%s: Internal Bus Error\n", DRV_NAME); out_be32(bmid_base + SCC_DMA_INTST, INTSTS_BMSINT); /* TBD: SW reset */ - scc_std_softreset(ap, &classes, deadline); + scc_std_softreset(&ap->link, &classes, deadline); continue; } @@ -731,7 +732,7 @@ static u8 scc_bmdma_status (struct ata_p void __iomem *mmio = ap->ioaddr.bmdma_addr; u8 host_stat = in_be32(mmio + SCC_DMA_STATUS); u32 int_status = in_be32(mmio + SCC_DMA_INTST); - struct ata_queued_cmd *qc = ata_qc_from_tag(ap, ap->active_tag); + struct ata_queued_cmd *qc = ata_qc_from_tag(ap, ap->link.active_tag); static int retry = 0; /* return if IOS_SS is cleared */ @@ -860,10 +861,10 @@ static void scc_bmdma_freeze (struct ata * @deadline: deadline jiffies for the operation */ -static int scc_pata_prereset(struct ata_port *ap, unsigned long deadline) +static int scc_pata_prereset(struct ata_link *link, unsigned long deadline) { - ap->cbl = ATA_CBL_PATA80; - return ata_std_prereset(ap, deadline); + link->ap->cbl = ATA_CBL_PATA80; + return ata_std_prereset(link, deadline); } /** @@ -874,8 +875,10 @@ static int scc_pata_prereset(struct ata_ * Note: Original code is ata_std_postreset(). */ -static void scc_std_postreset (struct ata_port *ap, unsigned int *classes) +static void scc_std_postreset(struct ata_link *link, unsigned int *classes) { + struct ata_port *ap = link->ap; + DPRINTK("ENTER\n"); /* is double-select really necessary? */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Killing printk calls for size (Re: [PATCH] [6/50] i386: clean up oops/bug reports)
* Sat, 22 Sep 2007 00:32:04 +0200 (CEST) [] > arch/i386/kernel/traps.c | 16 > arch/i386/mm/fault.c | 13 +++-- > 2 files changed, 11 insertions(+), 18 deletions(-) It seems, like size can be reduced even more now: [] > report_bug(regs->eip, regs); > > - printk(KERN_EMERG "%s: %04lx [#%d]\n", str, err & 0x, > ++die_counter); > + printk(KERN_EMERG "%s: %04lx [#%d] ", str, err & 0x, > ++die_counter); + printk(KERN_EMERG "%s: %04lx [#%d] %s", str, err &0x, ++die_counter, > #ifdef CONFIG_PREEMPT > - printk(KERN_EMERG "PREEMPT "); > - nl = 1; > + printk("PREEMPT "); + "PREEMPT "\ > #endif > #ifdef CONFIG_SMP > - if (!nl) > - printk(KERN_EMERG); > printk("SMP "); "SMP "\ > - nl = 1; > #endif > #ifdef CONFIG_DEBUG_PAGEALLOC > - if (!nl) > - printk(KERN_EMERG); > printk("DEBUG_PAGEALLOC"); "DEBUG_PAGEALLOC"\ > - nl = 1; > #endif > - if (nl) > - printk("\n"); > + printk("\n"); + "\n"); Just hand waving. FWIW, with more flexible kconfig, ifdiffery can be removed also... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux 2.6.23-rc7 - 14 compile warnings
On Fri, Sep 21, 2007 at 10:33:56AM +0100, Dave Haywood wrote: >Contents: > ver_linux output > Summary of compile warnings > Full compile log > .config > > > >Linux s1 2.6.23-rc7-g335fb8fc #9 SMP Fri Sep 21 09:31:01 BST 2007 i686 Pentium >III (Coppermine) GenuineIntel GNU/Linux > >Gnu C 4.2.0 >Gnu make 3.81 >binutils 2.17 >util-linux 2.12r >mount 2.12r >module-init-tools 3.2.2 >e2fsprogs 1.40.2 >PPP2.4.4 >Linux C Library2.5 >Dynamic linker (ldd) 2.5 >Procps 3.2.7 >Net-tools 1.60 >Kbd1.12 >Sh-utils 6.9 >udev 114 > > > >Summary: > CC mm/slub.o >mm/slub.c: In function 'kfree': >mm/slub.c:2491: warning: passing argument 3 of 'slab_free' discards >qualifiers from pointer target type > > CC fs/autofs4/symlink.o >fs/autofs4/symlink.c: In function 'autofs4_follow_link': >fs/autofs4/symlink.c:18: warning: passing argument 2 of 'nd_set_link' >discards qualifiers from pointer target type These two warnings are suspicious. Explicit casts are already there, how they come out? Or gcc bugs? {snip} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[-rc7 Patch] fs/isofs/namei.c: mark variables as uninitialized_var
Fix may-be-used-uninitialized warnings. Signed-off-by: WANG Cong <[EMAIL PROTECTED]> --- fs/isofs/namei.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.23-rc7/fs/isofs/namei.c === --- linux-2.6.23-rc7.orig/fs/isofs/namei.c +++ linux-2.6.23-rc7/fs/isofs/namei.c @@ -158,7 +158,7 @@ isofs_find_entry(struct inode *dir, stru struct dentry *isofs_lookup(struct inode *dir, struct dentry *dentry, struct nameidata *nd) { int found; - unsigned long block, offset; + unsigned long uninitialized_var(block), uninitialized_var(offset); struct inode *inode; struct page *page; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PATCH] ACPI patches for 2.6.23-rc7
Hi Linus, Before 2.6.23, please pull from: git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git release This restores some dmesg to what folks had in 2.6.22, and prevents a possible system hang on video switching. This will update the files shown below. thanks! -Len ps. individual patches are available on [EMAIL PROTECTED] and a consolidated plain patch is available here: ftp://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/release/2.6.23/acpi-release-20070126-2.6.23-rc7.diff.gz drivers/acpi/sleep/Makefile |2 drivers/acpi/sleep/main.c | 57 -- drivers/acpi/sleep/poweroff.c | 75 -- drivers/acpi/video.c |3 - 4 files changed, 53 insertions(+), 84 deletions(-) through these commits: Alexey Starikovskiy (1): ACPI: suspend: consolidate handling of Sx states. Frans Pop (1): ACPI: suspend: consolidate handling of Sx states addendum Maik Broemme (1): ACPI: video: remove dmesg spam Zhang Rui (1): ACPI: video: _DOS=0 by default to prevent hotkey hang with this log: commit e5c86b5d4a517d10db89456426590ecba1597f1f Merge: 19adc6b... 5a50fe7... Author: Len Brown <[EMAIL PROTECTED]> Date: Fri Sep 21 21:55:34 2007 -0400 Pull suspend.now into release branch commit 19adc6ba6c6a23e07617fe791db40c1b0668d123 Merge: 335fb8f... 7f10cc4... Author: Len Brown <[EMAIL PROTECTED]> Date: Fri Sep 21 21:55:29 2007 -0400 Pull now into release branch commit 5a50fe709d527f31169263e36601dd83446d5744 Author: Frans Pop <[EMAIL PROTECTED]> Date: Thu Sep 20 22:27:44 2007 +0200 ACPI: suspend: consolidate handling of Sx states addendum Make the S0 state be always reported as supported Signed-off: Frans Pop <[EMAIL PROTECTED]> Acked-by: Alexey Starikovskiy <[EMAIL PROTECTED]> Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> Signed-off-by: Len Brown <[EMAIL PROTECTED]> commit f216cc3748a3a22c2b99390fddcdafa0583791a2 Author: Alexey Starikovskiy <[EMAIL PROTECTED]> Date: Thu Sep 20 21:32:35 2007 +0400 ACPI: suspend: consolidate handling of Sx states. Recent changes to sleep initialization in ACPI dropped reporting of supported Sx states above S3. Fix that and also move S5 init into same file as other Sx. The only functional change is adding printk() for S4 and S5 cases. Signed-off-by: Alexey Starikovskiy <[EMAIL PROTECTED]> Acked-by: Rafael J. Wysocki <[EMAIL PROTECTED]> Signed-off-by: Len Brown <[EMAIL PROTECTED]> commit 7f10cc4e838c2b2d7272031954c56c407569d497 Author: Maik Broemme <[EMAIL PROTECTED]> Date: Fri Sep 14 22:12:34 2007 +0200 ACPI: video: remove dmesg spam i am actually heavily using the ACPI video extension for my Thinkpad X61 Tablet. I have bound the input events triggered by the brightness up/down keys to a simple echo > /sys/class/backlight/acpi_video1/brightness but everytime the event is triggered and acpi_video_device_lcd_set_level() is called i got a notificication in my kernel log like: set_level status: 0 set_level status: 0 set_level status: 0 set_level status: 0 ... Signed-off-by: Maik Broemme <[EMAIL PROTECTED]> Signed-off-by: Len Brown <[EMAIL PROTECTED]> commit a21101c46ca5b4320e31408853cdcbf7cb1ce4ed Author: Zhang Rui <[EMAIL PROTECTED]> Date: Fri Sep 14 11:46:22 2007 +0800 ACPI: video: _DOS=0 by default to prevent hotkey hang In the past, the Linux/ACPI video driver invoked _DOS (Display Output Switch) with the parameter 1 to tell the BIOS to switch the video output display for us. But this conflicts with Linux native graphics drivers, and can cause all sorts of issues, including hanging the system. http://bugzilla.kernel.org/show_bug.cgi?id=6001 Here we change the Linux default to evaluate _DOS=0, which tells the BIOS to simply send us a hotkey event and not touch the graphics hardware. The acpi video driver sends the display switch hotkey event up through the intput layer, and X can interpret that and use its native graphics driver to switch the display. For the case where Linux has no native graphics driver running, or the graphics driver doesn't know how to switch video and the BIOS (safely) does, the previous behaviour can be restored with: # echo 1 > /proc/acpi/video/*/DOS Signed-off-by: Zhang Rui <[EMAIL PROTECTED]> Signed-off-by: Len Brown <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Thu, Sep 20, 2007 at 12:31:39PM +0100, Hugh Dickins wrote: > On Wed, 19 Sep 2007, Peter Zijlstra wrote: > > On Wed, 19 Sep 2007 21:03:19 +0100 (BST) Hugh Dickins > > <[EMAIL PROTECTED]> wrote: > > > > > On Wed, 19 Sep 2007, Andy Whitcroft wrote: > > > > Seems I have a case of a largish i386 NUMA (NUMA-Q) which has a mkfs > > > > stuck in a 'D' wait: > > > > > > > > === > > > > mkfs.ext2 D c10220f4 0 6233 6222 > > > > [] io_schedule_timeout+0x1e/0x28 > > > > [] congestion_wait+0x62/0x7a > > > > [] get_dirty_limits+0x16a/0x172 > > > > [] balance_dirty_pages+0x154/0x1be > > > > [] generic_perform_write+0x168/0x18a > > > > [] generic_file_buffered_write+0x73/0x107 > > > > [] __generic_file_aio_write_nolock+0x47a/0x4a5 > > > > [] generic_file_aio_write_nolock+0x48/0x9b > > > > [] do_sync_write+0xbf/0xfc > > > > [] vfs_write+0x8d/0x108 > > > > [] sys_write+0x41/0x67 > > > > [] syscall_call+0x7/0xb > > > > === > > > > > > [edited out some bogus lines from stale stack] > > > > > > > This machine and others have run numerous test runs on this kernel and > > > > this is the first time I've see a hang like this. > > > > > > I've been seeing something like that on 4-way PPC64: in my case I've > > > shells hanging in D state trying to append to kernel build log on ext3 > > > (the builds themselves going on elsewhere, in tmpfs): one of the shells > > > holding i_mutex and stuck doing congestion_waits from balance_dirty_pages. > > > > > > > I wonder if this is the ultimate cause of the couple of mainline hangs > > > > which were seen, but not diagnosed. > > > > > > My *guess* is that this is peculiar to 2.6.23-rc6-mm1, and from Peter's > > > mm-per-device-dirty-threshold.patch. printks showed bdi_nr_reclaimable > > > 0, bdi_nr_writeback 24, bdi_thresh 1 in balance_dirty_pages (though I've > > > not done enough to check if those really correlate with the hangs), > > > and I'm wondering if the bdi_stat_sum business is needed on the > > > !nr_reclaimable path. > > > > FWIW my tired brain seems to think it the !nr_reclaimable path needs it > > just the same. So this change seems to make sense for now :-) > > Thanks. > > > > So I'm running now with the patch below, good so far, but can't judge > > > until tomorrow whether it has actually addressed the problem seen. > > Last night's run went well: that patch does indeed seem to have fixed it. > Looking at the timings (some variance but _very_ much less than the night > before), there does appear to be some other occasional slight slowdown - > but I've no reason to suspect your patch for it, nor to suppose it's > something new: it may just be an artifact of my heavy swap thrashing. > > > [PATCH mm] mm per-device dirty threshold fix > > Fix occasional hang when a task couldn't get out of balance_dirty_pages: > mm-per-device-dirty-threshold.patch needs to reevaluate bdi_nr_writeback > across all cpus when bdi_thresh is low, even in the case when there was > no bdi_nr_reclaimable. > > Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> Thank you Hugh. I ran into similar problems with many dd(large file) operations. This patch seems to fix it. But now my desktop was locked up again when writing a lot of small files. The problem is repeatable with the command $ ketchup 2.6.23-rc6-mm1 I writeup two debug patches: --- mm/page-writeback.c |9 + 1 file changed, 9 insertions(+) --- linux-2.6.22.orig/mm/page-writeback.c +++ linux-2.6.22/mm/page-writeback.c @@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); } + printk(KERN_DEBUG "balance_dirty_pages written %lu %lu congested %d limits %lu %lu %lu %lu %lu %ld\n", + pages_written, + write_chunk - wbc.nr_to_write, + bdi_write_congested(bdi), + background_thresh, dirty_thresh, + bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback, + bdi_thresh - bdi_nr_reclaimable - bdi_nr_writeback); + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) break; if (pages_written >= write_chunk) --- mm/page-writeback.c |5 + 1 file changed, 5 insertions(+) --- linux-2.6.22.orig/mm/page-writeback.c +++ linux-2.6.22/mm/page-writeback.c @@ -373,6 +373,7 @@ static void balance_dirty_pages(struct a long bdi_thresh; unsigned long pages_written = 0; unsigned long write_chunk = sync_writeback_pages(); + int i = 0; struct backing_dev_info *bdi = mapping->backing_dev_info; @@ -434,6 +435,10 @@ static void balance_dirty_pages(struct a bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback, bdi_thresh - bdi
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
On Fri, Sep 21, 2007 at 09:15:03PM -0400, Steven Rostedt wrote: > On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > On Fri, Sep 21, 2007 at 10:40:03AM -0400, Steven Rostedt wrote: > > > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote: [ . . . ] > > > > + /* > > > > +* Take the next transition(s) through the RCU grace-period > > > > +* flip-counter state machine. > > > > +*/ > > > > + > > > > + switch (rcu_try_flip_state) { > > > > + case rcu_try_flip_idle_state: > > > > + if (rcu_try_flip_idle()) > > > > + rcu_try_flip_state = rcu_try_flip_waitack_state; > > > > > > Just trying to understand all this. Here at flip_idle, only a CPU with > > > no pending RCU calls will flip it. Then all the cpus flags will be set > > > to rcu_flipped, and the ctrl.completed counter is incremented. > > > > s/no pending RCU calls/at least one pending RCU call/, but otherwise > > spot on. > > > > So if the RCU grace-period machinery is idle, the first CPU to take > > a scheduling-clock interrupt after having posted an RCU callback will > > get things going. > > I said 'no' becaues of this: > > +rcu_try_flip_idle(void) > +{ > + int cpu; > + > + RCU_TRACE_ME(rcupreempt_trace_try_flip_i1); > + if (!rcu_pending(smp_processor_id())) { > + RCU_TRACE_ME(rcupreempt_trace_try_flip_ie1); > + return 0; > + } > > But now I'm a bit more confused. :-/ > > Looking at the caller in kernel/timer.c I see > > if (rcu_pending(cpu)) > rcu_check_callbacks(cpu, user_tick); > > And rcu_check_callbacks is the caller of rcu_try_flip. The confusion is > that we call this when we have a pending rcu, but if we have a pending > rcu, we won't flip the counter ?? We don't enter unless there is something for RCU to do (might be a pending callback, for example, but might also be needing to acknowledge a counter flip). If, by the time we get to rcu_try_flip_idle(), there is no longer anything to do (!rcu_pending()), we bail. So a given CPU kicks the state machine out of idle only if it -still- has something to do once it gets to rcu_try_flip_idle(), right? [ . . . ] > > > Is there a chance that overflow of a counter (although probably very > > > very unlikely) would cause any problems? > > > > The only way it could cause a problem would be if there was ever > > more than 4,294,967,296 outstanding rcu_read_lock() calls. I believe > > that lockdep screams if it sees more than 30 nested locks within a > > single task, so for systems that support no more than 100M tasks, we > > should be OK. It might sometime be necessary to make this be a long > > rather than an int. Should we just do that now and be done with it? > > Sure, why not. More and more and more overkill!!! > > (rostedt hears in his head the Monty Python "Spam" song). ;-) OK! > > > Also, all the CPUs have their "check_mb" set. > > > > > > > + rcu_try_flip_state = rcu_try_flip_waitmb_state; > > > > + break; > > > > + case rcu_try_flip_waitmb_state: > > > > + if (rcu_try_flip_waitmb()) > > > > > > I have to admit that this seems a bit of an overkill, but I guess you > > > know what you are doing. After going through three states, we still > > > need to do a memory barrier on each CPU? > > > > Yep. Because there are no memory barriers in rcu_read_unlock(), the > > CPU is free to reorder the contents of the RCU read-side critical section > > to follow the counter decrement. This means that this CPU would still > > be referencing RCU-protected data after it had told the world that it > > was no longer doing so. Forcing a memory barrier on each CPU guarantees > > that if we see the memory-barrier acknowledge, we also see any prior > > RCU read-side critical section. > > And this seem reasonable to me that this would be enough to satisfy a > grace period. But the CPU moving around the rcu_read_(un)lock's around. > > Are we sure that adding all these grace periods stages is better than just > biting the bullet and put in a memory barrier? Good question. I believe so, because the extra stages don't require much additional processing, and because the ratio of rcu_read_lock() calls to the number of grace periods is extremely high. But, if I can prove it is safe, I will certainly decrease GP_STAGES or otherwise optimize the state machine. [ . . . ] > > > OK, that's all I have on this patch (will take a bit of a break before > > > reviewing your other patches). But I will say that RCU has grown quite > > > a bit, and is looking very good. > > > > Glad you like it, and thank you again for the careful and thorough review. > > I'm scared to do the preempt portion %^O Ummm... This -was- the preempt portion. ;-) > > > Basically, what I'm saying is "Great work, Paul!". This is looking > > > good. Seems that we just need a little bit better explanation for those > > >
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
On Fri, 2007-09-21 at 10:49 -0700, David Miller wrote: > From: Denys Vlasenko <[EMAIL PROTECTED]> > Date: Fri, 21 Sep 2007 18:03:55 +0100 > > > Do patches look ok to you? > > I'm travelling so I haven't looked closely yet :-) > > Michael can take a look and I'll try to do so as well > tonight. > I've already reviewed the earlier versions of the patch and have made some suggestions. This latest one looks ok to me and tested ok. I'll follow up later with another patch to remove all the zeros in other firmware sections, and to remove the gzip headers completely. Acked-by: Michael Chan <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
On Fri, Sep 21, 2007 at 09:19:22PM -0400, Steven Rostedt wrote: > > -- > On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > > > > > In any case, I will be looking at the scenarios more carefully. If > > > it turns out that GP_STAGES can indeed be cranked down a bit, well, > > > that is an easy change! I just fired off a POWER run with GP_STAGES > > > set to 3, will let you know how it goes. > > > > The first attempt blew up during boot badly enough that ABAT was unable > > to recover the machine (sorry, grahal!!!). Just for grins, I am trying > > it again on a machine that ABAT has had a better record of reviving... > > This still frightens the hell out of me. Going through 15 states and > failing. Seems the CPU is holding off writes for a long long time. That > means we flipped the counter 4 times, and that still wasn't good enough? Might be that the other machine has its 2.6.22 version of .config messed up. I will try booting it on a stock 2.6.22 kernel when it comes back to life -- not sure I ever did that before. Besides, the other similar machine seems to have gone down for the count, but without me torturing it... Also, keep in mind that various stages can "record" a memory misordering, for example, by incrementing the wrong counter. > Maybe I'll boot up my powerbook to see if it has the same issues. > > Well, I'm still finishing up on moving into my new house, so I wont be > available this weekend. The other machine not only booted, but has survived several minutes of rcutorture thus far. I am also trying POWER5 machine as well, as the one currently running is a POWER4, which is a bit less aggressive about memory reordering than is the POWER5. Even if they pass, I refuse to reduce GP_STAGES until proven safe. Trust me, you -don't- want to be unwittingly making use of a subtely busted RCU implementation!!! Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Message codes (Re: [Announce] Linux-tiny project revival)
On Fri, Sep 21, 2007 at 04:15:39PM -0500, Rob Landley wrote: [] > > >Not all, but critical info, that must exist in human-readable form of > > >course. > > > > I disagree. For a production product the you want minimal information > > to reduce the communication bandwidth required between the remote > > customer and the support organization. > > > > In fact there is a good argument that you don't what the remote customer > > to know enough to start guessing. > > Don't use Linux then. Open source is a horrible fit for the way you think. > > I'm sympathetic to "shrink the binary size" arguments. I'm not really > sympathic to "keep the customer in the dark intentionally" arguments, whether > the justification is "because they're stupid", "to increase dependency on our > support staff", or any other reason. {1} > > >Seriously. When in the Windows there are only messages like: > > > > > >"Error (Code:0x2012)". > > > > Now it's been ~8 years since I did any serious windows work, but if I > > recall correctly ALL THE FRICKING TIME!!! When was the last time you've > > seen a bug check on windows? This is about all you get. > > I believe he was holding it up as a bad example, and definitely not something > we want to emulate. I tried to show, that keeping users in compete information vacuum is a bad thing. Even without sources, _configuration_ makes another area of mis-working and bugs, usually addressed by reinstalling. That may be bad example, because here talk is about developers and testers, who are not just ordinary users. And by applying Torvalds's Law, all users are such in some degree. That's why {1} in your reply, Rob, makes perfect sense. If Mark have a bad experience with lusers only, then i just can say: what a pity! AFAIK nobody can read somebody's plain-bin OOP output. Anyway, anything must be opted by config options, even schedulers. But maintenance and flame wars rule otherwise :). What i can propose is form of binary-only "printk", where all info: diagnostic, error, bug, statistics messages (in not debugging environment, of course), is just fed right to output buffer (see, pa, no kmallocs). Info itself must have structured content, that makes it easy to extract and locate human-readable representation of both message and data. This doesn't address loglevels, though. Implementation (seems) as easy as feeding output to `od` to have unambiguous form of various troublesome bytes, like "0x00" and "0x0A", Structuring, who is printing, i.e. arch code, fs, driver whatever, must be agreed: * Profiles[0]: originator's ID of a message is a byte (or word, or double word) 0x01 - arch, 0x02 - fs, 0x03 - net, 0x04 - hw drivers, etc. * Data itself can be sent in form of [0] [0] Banana -- extendable protocol for sending and receiving s-expressions http://twistedmatrix.com/projects/core/documentation/specifications/banana.html and having shell script with functions, that have names that correspond to actual structured content: _*_ [EMAIL PROTECTED]:/tmp$ sh banana.sh < banana.c >bb [EMAIL PROTECTED]:/tmp$ sh -c '. ./bb ; _07080' start [EMAIL PROTECTED]:/tmp$ sh -c '. ./bb ; _07081' ti_startup - product , num configurations 0, configuration value 0 [EMAIL PROTECTED]:/tmp$ sh -c '. ./bb ; _07082' not reached [EMAIL PROTECTED]:/tmp$ [EMAIL PROTECTED]:/tmp$ sh -c '. ./bb ; _07081 777 7 8' ti_startup - product 0x0309, num configurations 7, configuration value 8 [EMAIL PROTECTED]:/tmp$ _(banana.c and banana.sh can be found in the ftp /upload on my server)_ >From file linux/drivers/usb/serial/ti_usb*c with [...] dbg("%s - product 0x%4X, num configurations %d, configuration value %d", __FUNCTION__, le16_to_cpu(dev->descriptor.idProduct), dev->descriptor.bNumConfigurations, dev->actconfig->desc.bConfigurationValue); [...] lets tacke one particular function (transformed a little bit): _*_ #include #define dbg printf #define ti_startup(foo) main (int argc, char **argv) #define dev_descriptor_idProduct3 #define dev_descriptor_bNumConfigurations 4 #define dev_actconfig_desc_bConfigurationValue 5 /* declaration */ int ti_startup(void); /* implementation */ int ti_startup(void) { dbg("start\n"); return dbg("%s - product %#.4x, num configurations %d, " "configuration value %d\n", __FUNCTION__, dev_descriptor_idProduct, dev_descriptor_bNumConfigurations, dev_actconfig_desc_bConfigurationValue); /* bla bla */ dbg("not reached\n"); } _*_ * Process this file with this script: * _*_ # just as an example USB_SERIAL_ID=07 TI_USB_ID=08 __FILE__="ti_usb_3410_5052.c" # possible i=0 sed -n ' # finding function body /^[[:alpha:]]/{ # found, print it for __FUNCTION__ keyword s_[^ ]* *\([^ (]*\).*[^;]$_\1_p; t_func ; b ; # walking inside of a function :_func; # load ne
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
-- On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > > > In any case, I will be looking at the scenarios more carefully. If > > it turns out that GP_STAGES can indeed be cranked down a bit, well, > > that is an easy change! I just fired off a POWER run with GP_STAGES > > set to 3, will let you know how it goes. > > The first attempt blew up during boot badly enough that ABAT was unable > to recover the machine (sorry, grahal!!!). Just for grins, I am trying > it again on a machine that ABAT has had a better record of reviving... This still frightens the hell out of me. Going through 15 states and failing. Seems the CPU is holding off writes for a long long time. That means we flipped the counter 4 times, and that still wasn't good enough? Maybe I'll boot up my powerbook to see if it has the same issues. Well, I'm still finishing up on moving into my new house, so I wont be available this weekend. Thanks, -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
On Fri, 21 Sep 2007, Paul E. McKenney wrote: > On Fri, Sep 21, 2007 at 10:40:03AM -0400, Steven Rostedt wrote: > > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote: > > Covering the pieces that weren't in Peter's reply. ;-) > > And thank you -very- much for the careful and thorough review!!! > > > > #endif /* __KERNEL__ */ > > > #endif /* __LINUX_RCUCLASSIC_H */ > > > diff -urpNa -X dontdiff > > > linux-2.6.22-b-fixbarriers/include/linux/rcupdate.h > > > linux-2.6.22-c-preemptrcu/include/linux/rcupdate.h > > > --- linux-2.6.22-b-fixbarriers/include/linux/rcupdate.h 2007-07-19 > > > 14:02:36.0 -0700 > > > +++ linux-2.6.22-c-preemptrcu/include/linux/rcupdate.h2007-08-22 > > > 15:21:06.0 -0700 > > > @@ -52,7 +52,11 @@ struct rcu_head { > > > void (*func)(struct rcu_head *head); > > > }; > > > > > > +#ifdef CONFIG_CLASSIC_RCU > > > #include > > > +#else /* #ifdef CONFIG_CLASSIC_RCU */ > > > +#include > > > +#endif /* #else #ifdef CONFIG_CLASSIC_RCU */ > > > > A bit extreme on the comments here. > > My fingers do this without any help from the rest of me, but I suppose > it is a bit of overkill in this case. Heck, why stop the overkill here, the whole patch is overkill ;-) > > > + > > > +#define GP_STAGES 4 > > > > I take it that GP stand for "grace period". Might want to state that > > here. /* Grace period stages */ When I was looking at this code at 1am, > > I kept asking myself "what's this GP?" (General Protection??). But > > that's what happens when looking at code like this after midnight ;-) > > Good point, will add a comment. You did get it right, "grace period". Thanks, so many places in the kernel have acronyms that are just suppose to be "obvious". I hate them, because they make me feel so stupid when I don't know what they are. After I find out, I usually slap my forehead and say "duh!". My mind is set on reading code, not deciphering TLAs. > > > > Can you have a pointer somewhere that explains these states. And not a > > "it's in this paper or directory". Either have a short discription here, > > or specify where exactly to find the information (perhaps a > > Documentation/RCU/preemptible_states.txt?). > > > > Trying to understand these states has caused me the most agony in > > reviewing these patches. > > Good point, perhaps a comment block above the enum giving a short > description of the purpose of each state. Maybe more detail in > Documentation/RCU as well, as you suggest above. That would be great. > > > + > > > +/* > > > + * Return the number of RCU batches processed thus far. Useful for debug > > > + * and statistics. The _bh variant is identical to straight RCU. > > > + */ > > > > If they are identical, then why the separation? > > I apologize for the repetition in this email. > > I apologize for the repetition in this email. > > I apologize for the repetition in this email. > > Yep, will fix with either #define or static inline, as you suggested > in a later email. you're starting to sound like me ;-) > > > + struct task_struct *me = current; > > > > Nitpick, but other places in the kernel usually use "t" or "p" as a > > variable to assign current to. It's just that "me" thows me off a > > little while reviewing this. But this is just a nitpick, so do as you > > will. > > Fair enough, as discussed earlier. Who's on first, What's on second, and I-dont-know is on third. > > > + unsigned long oldirq; > > > > Nitpick, "flags" is usually used for saving irq state. > > A later patch in the series fixes these -- I believe I got all of them. > (The priority-boost patch, IIRC.) OK > > > > + > > > + /* > > > + * Disable local interrupts to prevent the grace-period > > > + * detection state machine from seeing us half-done. > > > + * NMIs can still occur, of course, and might themselves > > > + * contain rcu_read_lock(). > > > + */ > > > + > > > + local_irq_save(oldirq); > > > > Isn't the GP detection done via a tasklet/softirq. So wouldn't a > > local_bh_disable be sufficient here? You already cover NMIs, which would > > also handle normal interrupts. > > We beat this into the ground in other email. Nothing like kicking a dead horse on LKML ;-) > > > + > > > + /* > > > + * It is now safe to decrement this task's nesting count. > > > + * NMIs that occur after this statement will route their > > > + * rcu_read_lock() calls through this "else" clause, and > > > + * will thus start incrementing the per-CPU coutner on > > > > s/coutner/counter/ > > wlli fxi!!! snousd oogd > > > + > > > +/* > > > + * Attempt a single flip of the counters. Remember, a single flip does > > > + * -not- constitute a grace period. Instead, the interval between > > > + * at least three consecutive flips is a grace period. > > > + * > > > + * If anyone is nuts enough to run this CONFIG_PREEMPT_RCU implementation > > > > Oh, come now! It
Re: MTRR initialization
Siddha, Suresh B wrote: On Fri, Sep 14, 2007 at 09:33:30AM -0700, Howard Chu wrote: So now I have this, which is pretty much what I wanted: reg00: base=0x ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x8000 (2048MB), size=1024MB: write-back, count=1 reg02: base=0x1 (4096MB), size=1024MB: write-back, count=1 reg03: base=0xc000 (3072MB), size=1024MB: uncachable, count=1 reg04: base=0xc000 (3072MB), size= 256MB: write-combining, count=1 reg05: base=0xd000 (3328MB), size= 256MB: write-combining, count=1 BTW, having overlapping WC, UC regions make the end result UC. So in this case, you may not be getting the desired performance. Thanks, I noticed that later. I simply deleted the UC mapping since it was no longer needed. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2.6.23-rc7 0/3] async_tx and md-accel fixes for 2.6.23
> From: Neil Brown [mailto:[EMAIL PROTECTED] > On Friday September 21, [EMAIL PROTECTED] wrote: > > On Thu, 20 Sep 2007 18:27:35 -0700 > > Dan Williams <[EMAIL PROTECTED]> wrote: > > > > > Fix a couple bugs and provide documentation for the async_tx api. > > > > > > Neil, please 'ack' patch #3. > > > > > > git://lost.foo-projects.org/~dwillia2/git/iop async-tx-fixes-for-linus > > > > Well it looks like Neil is on vacation or is hiding from us or something. > > Neil is just not coping well with jet-lag > > Patch #3 looks good and necessary > Acked-By: NeilBrown <[EMAIL PROTECTED]> > > I know that should probably be a "reviewed-by" I was a bit I went ahead and added reviewed-by. > surprised that the "handle_completed_read_requests" call was so early > in handle_stripe5 - I don't think the code was originally that early. It is slightly earlier than 2.6.22 (outside the '/* now count some things */' loop) to make sure the R5_Wantfill flags from the last request have been cleared before starting a new one: /* maybe we can request a biofill operation * * new wantfill requests are only permitted while * STRIPE_OP_BIOFILL is clear */ if (test_bit(R5_UPTODATE, &dev->flags) && dev->toread && !test_bit(STRIPE_OP_BIOFILL, &sh->ops.pending)) set_bit(R5_Wantfill, &dev->flags); > But it is probably right. Hopefully my brain will have cleared by > Monday and I'll review it again then. > Ok, the tree is updated with 'Reviewed-by' tags and the proposed documentation updates from Randy and Shannon. git://lost.foo-projects.org/~dwillia2/git/iop async-tx-fixes-for-linus Dan Williams (3): async_tx: usage documentation and developer notes (v2) async_tx: fix dma_wait_for_async_tx raid5: fix ops_complete_biofill > NeilBrown -- Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
On Fri, Sep 21, 2007 at 04:03:43PM -0700, Paul E. McKenney wrote: > On Fri, Sep 21, 2007 at 11:20:48AM -0400, Steven Rostedt wrote: > > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote: [ . . . ] > > Paul, > > > > Looking further into this, I still think this is a bit of overkill. We > > go through 20 states from call_rcu to list->func(). > > > > On call_rcu we put our stuff on the next list. Before we move stuff from > > next to wait, we need to go through 4 states. So we have > > > > next -> 4 states -> wait[0] -> 4 states -> wait[1] -> 4 states -> > > wait[2] -> 4 states -> wait[3] -> 4 states -> done. > > > > That's 20 states that we go through from the time we add our function to > > the list to the time it actually gets called. Do we really need the 4 > > wait lists? > > > > Seems a bit overkill to me. > > > > What am I missing? > > "Nothing kills like overkill!!!" ;-) > > Seriously, I do expect to be able to squeeze this down over time, but > feel the need to be a bit on the cowardly side at the moment. > > In any case, I will be looking at the scenarios more carefully. If > it turns out that GP_STAGES can indeed be cranked down a bit, well, > that is an easy change! I just fired off a POWER run with GP_STAGES > set to 3, will let you know how it goes. The first attempt blew up during boot badly enough that ABAT was unable to recover the machine (sorry, grahal!!!). Just for grins, I am trying it again on a machine that ABAT has had a better record of reviving... Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MTRR initialization
On Fri, Sep 14, 2007 at 09:33:30AM -0700, Howard Chu wrote: > Hi, was wondering if anyone else has been tripped up by this... I've got > 4GB of > RAM in my Asus A8V Deluxe and memory hole mapping enabled in the BIOS. By > default, my system boots up with these MTRR settings: > > reg00: base=0x ( 0MB), size=4096MB: write-back, count=1 > reg01: base=0x1 (4096MB), size=1024MB: write-back, count=1 > reg02: base=0xc000 (3072MB), size=1024MB: uncachable, count=1 > reg03: base=0xc000 (3072MB), size= 256MB: write-combining, count=1 > > The X server and various other programs try to add a mapping for my video > card's buffer, at 0xd000, size=256MB, type=write-combining, and this > always > fails with a type mismatch error (old type is write-back). Apparently it's > conflicting with mapping register 0. I can't just disable the existing > settings > and re-add them; the system hangs soon after disabling reg01. > > I guess the kernel must be getting the initial setup from the BIOS. I've > hacked > around this in mtrr/generic.c by explicitly changing the MTRR state in > get_mtrr_state to split the first mapping into two; one at base 0 size > 2048M > and one at base 2048M size 1024M. So now I have this, which is pretty much > what > I wanted: > > reg00: base=0x ( 0MB), size=2048MB: write-back, count=1 > reg01: base=0x8000 (2048MB), size=1024MB: write-back, count=1 > reg02: base=0x1 (4096MB), size=1024MB: write-back, count=1 > reg03: base=0xc000 (3072MB), size=1024MB: uncachable, count=1 > reg04: base=0xc000 (3072MB), size= 256MB: write-combining, count=1 > reg05: base=0xd000 (3328MB), size= 256MB: write-combining, count=1 BTW, having overlapping WC, UC regions make the end result UC. So in this case, you may not be getting the desired performance. > > So the question is - was there an easier/correct way to do this? > > It might have been nice if the MTRR ioctls allowed the register number to > be > specified on the Set commands, though I'm not sure that would have helped > in > this case. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
On Fri, Sep 21, 2007 at 10:40:03AM -0400, Steven Rostedt wrote: > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote: Covering the pieces that weren't in Peter's reply. ;-) And thank you -very- much for the careful and thorough review!!! > > #endif /* __KERNEL__ */ > > #endif /* __LINUX_RCUCLASSIC_H */ > > diff -urpNa -X dontdiff linux-2.6.22-b-fixbarriers/include/linux/rcupdate.h > > linux-2.6.22-c-preemptrcu/include/linux/rcupdate.h > > --- linux-2.6.22-b-fixbarriers/include/linux/rcupdate.h 2007-07-19 > > 14:02:36.0 -0700 > > +++ linux-2.6.22-c-preemptrcu/include/linux/rcupdate.h 2007-08-22 > > 15:21:06.0 -0700 > > @@ -52,7 +52,11 @@ struct rcu_head { > > void (*func)(struct rcu_head *head); > > }; > > > > +#ifdef CONFIG_CLASSIC_RCU > > #include > > +#else /* #ifdef CONFIG_CLASSIC_RCU */ > > +#include > > +#endif /* #else #ifdef CONFIG_CLASSIC_RCU */ > > A bit extreme on the comments here. My fingers do this without any help from the rest of me, but I suppose it is a bit of overkill in this case. > > #define RCU_HEAD_INIT { .next = NULL, .func = NULL } > > #define RCU_HEAD(head) struct rcu_head head = RCU_HEAD_INIT > > @@ -218,10 +222,13 @@ extern void FASTCALL(call_rcu_bh(struct > > /* Exported common interfaces */ > > extern void synchronize_rcu(void); > > extern void rcu_barrier(void); > > +extern long rcu_batches_completed(void); > > +extern long rcu_batches_completed_bh(void); > > > > /* Internal to kernel */ > > extern void rcu_init(void); > > extern void rcu_check_callbacks(int cpu, int user); > > +extern int rcu_needs_cpu(int cpu); > > > > #endif /* __KERNEL__ */ > > #endif /* __LINUX_RCUPDATE_H */ > > diff -urpNa -X dontdiff > > linux-2.6.22-b-fixbarriers/include/linux/rcupreempt.h > > linux-2.6.22-c-preemptrcu/include/linux/rcupreempt.h > > --- linux-2.6.22-b-fixbarriers/include/linux/rcupreempt.h 1969-12-31 > > 16:00:00.0 -0800 > > +++ linux-2.6.22-c-preemptrcu/include/linux/rcupreempt.h2007-08-22 > > 15:21:06.0 -0700 > > @@ -0,0 +1,78 @@ > > +/* > > + * Read-Copy Update mechanism for mutual exclusion (RT implementation) > > + * > > + * This program is free software; you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License as published by > > + * the Free Software Foundation; either version 2 of the License, or > > + * (at your option) any later version. > > + * > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > + * GNU General Public License for more details. > > + * > > + * You should have received a copy of the GNU General Public License > > + * along with this program; if not, write to the Free Software > > + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, > > USA. > > + * > > + * Copyright (C) IBM Corporation, 2006 > > + * > > + * Author: Paul McKenney <[EMAIL PROTECTED]> > > + * > > + * Based on the original work by Paul McKenney <[EMAIL PROTECTED]> > > + * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen. > > + * Papers: > > + * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf > > + * http://lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf > > (OLS2001) > > + * > > + * For detailed explanation of Read-Copy Update mechanism see - > > + * Documentation/RCU > > + * > > + */ > > + > > +#ifndef __LINUX_RCUPREEMPT_H > > +#define __LINUX_RCUPREEMPT_H > > + > > +#ifdef __KERNEL__ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#define rcu_qsctr_inc(cpu) > > +#define rcu_bh_qsctr_inc(cpu) > > +#define call_rcu_bh(head, rcu) call_rcu(head, rcu) > > + > > +extern void __rcu_read_lock(void); > > +extern void __rcu_read_unlock(void); > > +extern int rcu_pending(int cpu); > > +extern int rcu_needs_cpu(int cpu); > > + > > +#define __rcu_read_lock_bh() { rcu_read_lock(); local_bh_disable(); } > > +#define __rcu_read_unlock_bh() { local_bh_enable(); rcu_read_unlock(); > > } > > + > > +#define __rcu_read_lock_nesting() (current->rcu_read_lock_nesting) > > + > > +extern void __synchronize_sched(void); > > + > > +extern void __rcu_init(void); > > +extern void rcu_check_callbacks(int cpu, int user); > > +extern void rcu_restart_cpu(int cpu); > > + > > +#ifdef CONFIG_RCU_TRACE > > +struct rcupreempt_trace; > > +extern int *rcupreempt_flipctr(int cpu); > > +extern long rcupreempt_data_completed(void); > > +extern int rcupreempt_flip_flag(int cpu); > > +extern int rcupreempt_mb_flag(int cpu); > > +extern char *rcupreempt_try_flip_state_name(void); > > +extern struct rcupreempt_trace *rcupreempt_trace_cpu(int cpu); > > +#endif > > + > > +struct softirq_action; > > + > > +#endif /* __KERNEL__ */ > > +#endif /* __LINUX_RCUPREEMPT_H */ > > diff -urpNa -X don
Re: [PATCH 2.6.23-rc7 0/3] async_tx and md-accel fixes for 2.6.23
On Friday September 21, [EMAIL PROTECTED] wrote: > On Thu, 20 Sep 2007 18:27:35 -0700 > Dan Williams <[EMAIL PROTECTED]> wrote: > > > Fix a couple bugs and provide documentation for the async_tx api. > > > > Neil, please 'ack' patch #3. > > > > git://lost.foo-projects.org/~dwillia2/git/iop async-tx-fixes-for-linus > > Well it looks like Neil is on vacation or is hiding from us or something. Neil is just not coping well with jet-lag Patch #3 looks good and necessary Acked-By: NeilBrown <[EMAIL PROTECTED]> I know that should probably be a "reviewed-by" I was a bit surprised that the "handle_completed_read_requests" call was so early in handle_stripe5 - I don't think the code was originally that early. But it is probably right. Hopefully my brain will have cleared by Monday and I'll review it again then. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
On Fri, Sep 21, 2007 at 11:48:05PM +0100, Alan Cox wrote: > > According to an earlier thread, dgrs was never really maintained, > > written for hardware that was never really distributed widely, and very > > likely hasn't had users in years... if ever. > > > > If that picture is accurate (it's a story I was told), then I am > > definitely queueing up a deletion patch. > > I think thats sensible. If someone whines it can be put back but I really > don't think anyone will nobody did yet, please yell if you need a rebased patch. -- maks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[no subject]
Новый сотовый телефон по цене использованного. Уважаемые господа, Преглагаем Вашему вниманию “refurbished” (обновленные) мобильные телефоны. Это означает, что мобильный телефон имеет: б/у, но полностью функциональную материнскую плату новый оригинальный корпус новые аксессуары (зарядноеб аккумулятор, handsfree) новую оригинальную упаковку Внешне телефон является абсолютно новым и имеет гарантийный срок 12 месяцев Выборочные цены для ознакомления: Model Цена; USD Motorola V3 93,96 Motorola V3x160,08 Motorola K1 195,39 Nokia 6111 110,16 Nokia 6131 160,20 Nokia 6280 179,06 Nokia 7360 126,12 Nokia 7373 202,96 Samsung E530120,84 Samsung E900155,14 Sony Ericsson K800i 266,23 Sony Ericsson W300i 117,36 Sony Ericsson W800i 172,56 Sony Ericsson Z550i 135,12 Sony Erisccon W810i 214,94 Полный прайслист содержит около 180 моделей сотовых телефонов Motorola, Nokia, Panasonic, Samsung, Sony Ericsson. Каждый из предложенных телефонов имеет следующий комплект поставки: Телефон + зарядное устройство + 2 аккумулятора + HandsFree + инструкция + упаковка По требованию вышлю полный прайслист . C наилучшими пожеланиями AVK Plus spol. s.r.o. Nam Svobody 1626 27201 Kladno Czech Republic - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
edac_mc: sleeping function called from invalid context
Kernel 2.6.22.6: # echo "1" >/sys/devices/system/edac/pci/check_pci_parity # dmesg | tail -14 BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 Call Trace: [] down_read+0x15/0x24 [] pci_get_subsys+0x81/0x113 [] schedule_timeout+0x85/0xad [] :edac_mc:edac_kernel_thread+0x9e/0x104 [] :edac_mc:edac_kernel_thread+0x0/0x104 [] kthread+0x47/0x73 [] child_rip+0xa/0x12 [] kthread+0x0/0x73 [] child_rip+0x0/0x12 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump
Hi. On Saturday 22 September 2007 09:19:18 Kyle Moffett wrote: > I think that in order for this to work, there would need to be some > ABI whereby the resume-ing kernel can pass its entire ACPI state and > a bunch of other ACPI-related device details to the resume-ed kernel, > which I believe it does not do at the moment. I believe that what > causes problems is the ACPI state data that the kernel stores is > *different* between identical sequential boots, especially when you > add/remove/replace batteries, AC, etc. That's certainly possible. We already pass a very small amount of data between the boot and resuming kernels at the moment, and it's done quite simply - by putting the variables we want to 'transfer' in a nosave page/section. I could conceive of a scheme wherein this was extended for driver data. Since the memory needed would depend on the drivers loaded, it would probably require that the space be allocated when hibernating, and the locations of structures be stored in the image header and then drivers notified of the locations to use when preparing to resume, but it could work... > Since we currently throw away most of that in-kernel ACPI interpreter > state data when we load the to-be-resumed image and replace it with > the state from the previous boot it looks to the ACPI code and > firmware like our system's hardware magically changed behind its > back. The result is that the ACPI and firmware code is justifiably > confused (although probably it should be more idempotent to begin > with). There's 2 potential solutions: >1) Formalize and copy a *lot* of ACPI state from the resume-ing > kernel to the resume-ed kernel. >2) Properly call the ACPI S4 methods in the proper order ... that said, I don't think the above should be necessary in most cases. I believe we're already calling the ACPI S4 methods in the proper order. If I understood correctly, Rafael put a lot of effort into learning what that was, and into ensuring it does get done. > Neither one is particularly easy or particularly pleasant, especially > given all the vendor bugs in this general area. Theoretically we > should be able to do both, since one will be more reliable than the > other on different systems depending on what kinds of firmware bugs > they have. Regards, Nigel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/25] r/o bind mounts: elevate write count for some ioctls
On Fri, 21 Sep 2007 16:39:40 -0700 Dave Hansen <[EMAIL PROTECTED]> wrote: > On Fri, 2007-09-21 at 16:03 -0700, Andrew Morton wrote: > > Dave Hansen <[EMAIL PROTECTED]> wrote: > > > > > Some ioctl()s can cause writes to the filesystem. Take > > > these, and make them use mnt_want/drop_write() instead. > > > > > > We need to pass the filp one layer deeper in XFS, but > > > somebody _just_ pulled it out in February because nobody > > > was using it, so I don't feel guilty for adding it back. > > > > Note that -mm's ext2-reservations.patch adds EXT2_IOC_SETRSVSZ, > > and it doesn't do mnt_want_write(). > > That doesn't quite apply to mainline (at least after the patches I just > sent). I'll wait and send you one on top of the next -mm so that I can > get a coherent view of what's going on if that's all right. > Sure, that's OK. But I only noticed it because I happened to have my nose in there fixing a reject. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
On Fri, Sep 21, 2007 at 07:23:09PM -0400, Steven Rostedt wrote: > -- > On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > > If you do a synchronize_rcu() it might well have to wait through the > > following sequence of states: > > > > Stage 0: (might have to wait through part of this to get out of "next" > > queue) > > rcu_try_flip_idle_state,/* "I" */ > > rcu_try_flip_waitack_state, /* "A" */ > > rcu_try_flip_waitzero_state,/* "Z" */ > > rcu_try_flip_waitmb_state /* "M" */ > > Stage 1: > > rcu_try_flip_idle_state,/* "I" */ > > rcu_try_flip_waitack_state, /* "A" */ > > rcu_try_flip_waitzero_state,/* "Z" */ > > rcu_try_flip_waitmb_state /* "M" */ > > Stage 2: > > rcu_try_flip_idle_state,/* "I" */ > > rcu_try_flip_waitack_state, /* "A" */ > > rcu_try_flip_waitzero_state,/* "Z" */ > > rcu_try_flip_waitmb_state /* "M" */ > > Stage 3: > > rcu_try_flip_idle_state,/* "I" */ > > rcu_try_flip_waitack_state, /* "A" */ > > rcu_try_flip_waitzero_state,/* "Z" */ > > rcu_try_flip_waitmb_state /* "M" */ > > Stage 4: > > rcu_try_flip_idle_state,/* "I" */ > > rcu_try_flip_waitack_state, /* "A" */ > > rcu_try_flip_waitzero_state,/* "Z" */ > > rcu_try_flip_waitmb_state /* "M" */ > > > > So yes, grace periods do indeed have some latency. > > Yes they do. I'm now at the point that I'm just "trusting" you that you > understand that each of these stages are needed. My IQ level only lets me > understand next -> wait -> done, but not the extra 3 shifts in wait. > > ;-) In the spirit of full disclosure, I am not -absolutely- certain that they are needed, only that they are sufficient. Just color me paranoid. > > > True, but the "me" confused me. Since that task struct is not me ;-) > > > > Well, who is it, then? ;-) > > It's the app I watch sitting there waiting it's turn for it's callback to > run. :-) > > > > > Isn't the GP detection done via a tasklet/softirq. So wouldn't a > > > > > local_bh_disable be sufficient here? You already cover NMIs, which > > > > > would > > > > > also handle normal interrupts. > > > > > > > > This is also my understanding, but I think this disable is an > > > > 'optimization' in that it avoids the regular IRQs from jumping through > > > > these hoops outlined below. > > > > > > But isn't disabling irqs slower than doing a local_bh_disable? So the > > > majority of times (where irqs will not happen) we have this overhead. > > > > The current code absolutely must exclude the scheduling-clock hardirq > > handler. > > ACKed, > The reasoning you gave in Peter's reply most certainly makes sense. > > > > > > > + * > > > > > > + * If anyone is nuts enough to run this CONFIG_PREEMPT_RCU > > > > > > implementation > > > > > > > > > > Oh, come now! It's not "nuts" to use this ;-) > > > > > > > > > > > + * on a large SMP, they might want to use a hierarchical > > > > > > organization of > > > > > > + * the per-CPU-counter pairs. > > > > > > + */ > > > > > > > > Its the large SMP case that's nuts, and on that I have to agree with > > > > Paul, its not really large SMP friendly. > > > > > > Hmm, that could be true. But on large SMP systems, you usually have a > > > large amounts of memory, so hopefully a really long synchronize_rcu > > > would not be a problem. > > > > Somewhere in the range from 64 to a few hundred CPUs, the global lock > > protecting the try_flip state machine would start sucking air pretty > > badly. But the real problem is synchronize_sched(), which loops through > > all the CPUs -- this would likely cause problems at a few tens of > > CPUs, perhaps as early as 10-20. > > hehe, From someone who's largest box is 4 CPUs, to me 16 CPUS is large. > But I can see hundreds, let alone thousands of CPUs would make a huge > grinding halt on things like synchronize_sched. God, imaging if all CPUs > did that approximately at the same time. The system would should a huge > jitter. Well, the first time the SGI guys tried to boot a 1024-CPU Altix, I got an email complaining about RCU overheads. ;-) Manfred Spraul fixed things up for them, though. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu
> > config MPSC > > bool "Intel P4 / older Netburst based Xeon" > > help > > sidenote: I always wondered what 'PSC' stood for ? Produces Smoke and Cooks ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH Kbuild] Call only one make with all targets for O=
Change the invocations of make in the output directory Makefile and the main Makefile for seperate object trees to pass all goals to one $(MAKE) via a new phony target "sub-make" and the existing target _all. When compiling with seperate object directories, a seperate make is called in the context of another directory (from the output directory the main Makefile is called, the Makefile is then restarted with current directory set to the object tree). Before this patch, when multiple make command goals are specified, each target results in a seperate make invocation. With make -j, these invocations may run in parallel, resulting in multiple commands running in the same directory clobbering each others results. I did not try to address make -j for mixed dot-config and no-dot-config targets. Because the order does matter, a solution was not obvious. Perhaps a simple check for MAKEFLAGS having -j and refusing to run would be approprate. Signed-off-by: Milton Miller <[EMAIL PROTECTED]> --- I chose @: as the phony command after the sub-make target does all the work; is there a better alternative? It looks like :; is used for Makefile. Index: kernel/Makefile === --- kernel.orig/Makefile2007-09-19 01:55:45.0 -0500 +++ kernel/Makefile 2007-09-19 02:01:16.0 -0500 @@ -116,12 +116,16 @@ KBUILD_OUTPUT := $(shell cd $(KBUILD_OUT $(if $(KBUILD_OUTPUT),, \ $(error output directory "$(saved-output)" does not exist)) -PHONY += $(MAKECMDGOALS) +PHONY += $(MAKECMDGOALS) sub-make -$(filter-out _all,$(MAKECMDGOALS)) _all: +$(filter-out _all sub-make,$(MAKECMDGOALS)) _all: sub-make + @: + +sub-make: FORCE $(if $(KBUILD_VERBOSE:1=),@)$(MAKE) -C $(KBUILD_OUTPUT) \ KBUILD_SRC=$(CURDIR) \ - KBUILD_EXTMOD="$(KBUILD_EXTMOD)" -f $(CURDIR)/Makefile $@ + KBUILD_EXTMOD="$(KBUILD_EXTMOD)" -f $(CURDIR)/Makefile \ + $(filter-out _all sub-make,$(MAKECMDGOALS)) # Leave processing to above invocation of make skip-makefile := 1 Index: kernel/scripts/mkmakefile === --- kernel.orig/scripts/mkmakefile 2007-09-19 01:55:45.0 -0500 +++ kernel/scripts/mkmakefile 2007-09-19 02:01:53.0 -0500 @@ -26,11 +26,13 @@ MAKEFLAGS += --no-print-directory .PHONY: all \$(MAKECMDGOALS) +all:= \$(filter-out all Makefile,\$(MAKECMDGOALS)) + all: - \$(MAKE) -C \$(KERNELSRC) O=\$(KERNELOUTPUT) + \$(MAKE) -C \$(KERNELSRC) O=\$(KERNELOUTPUT) \$(all) Makefile:; -\$(filter-out all Makefile,\$(MAKECMDGOALS)) %/: - \$(MAKE) -C \$(KERNELSRC) O=\$(KERNELOUTPUT) \$@ +\$(all) %/: all + @: EOF - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/25] r/o bind mounts: elevate write count for some ioctls
On Fri, 2007-09-21 at 16:03 -0700, Andrew Morton wrote: > Dave Hansen <[EMAIL PROTECTED]> wrote: > > > Some ioctl()s can cause writes to the filesystem. Take > > these, and make them use mnt_want/drop_write() instead. > > > > We need to pass the filp one layer deeper in XFS, but > > somebody _just_ pulled it out in February because nobody > > was using it, so I don't feel guilty for adding it back. > > Note that -mm's ext2-reservations.patch adds EXT2_IOC_SETRSVSZ, > and it doesn't do mnt_want_write(). That doesn't quite apply to mainline (at least after the patches I just sent). I'll wait and send you one on top of the next -mm so that I can get a coherent view of what's going on if that's all right. -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 14/22] NFS: Use local caching
David Howells <[EMAIL PROTECTED]> wrote: > Peter Staubach <[EMAIL PROTECTED]> wrote: > > > Did I miss the section where the modified semantics about which > > mounted file systems can use the cache and which ones can not > > was implemented? > > Yes. fs/nfs/super.c: case Opt_sharecache: mnt->flags &= ~NFS_MOUNT_UNSHARED; break; case Opt_nosharecache: mnt->flags |= NFS_MOUNT_UNSHARED; mnt->options &= ~NFS_OPTION_FSCACHE; break; case Opt_fscache: /* sharing is mandatory with fscache */ mnt->options |= NFS_OPTION_FSCACHE; mnt->flags &= ~NFS_MOUNT_UNSHARED; break; case Opt_nofscache: mnt->options &= ~NFS_OPTION_FSCACHE; break; Hmmm... Actually, I'm not sure this is sufficient. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] taskstats: fix stats->ac_exitcode to work on threads and use group_exit_code
Threads also have an exit code on their own, so report it in TASKSTATS_CMD_ATTR_PID. For TASKSTATS_CMD_ATTR_TGID, instead of relying only on the exit code of the leader, we use task->signal->group_exit_code if not null as suggested by Oleg Nesterov. Also, document that as of this patch, fill_threadgroup() must be called after add_tsk() as it may overwrite some stats. Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]> Cc: Balbir Singh <[EMAIL PROTECTED]> Cc: Jay Lan <[EMAIL PROTECTED]> Cc: Jonathan Lim <[EMAIL PROTECTED]> Cc: Oleg Nesterov <[EMAIL PROTECTED]> --- kernel/taskstats.c |3 +++ kernel/tsacct.c| 12 +++- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/kernel/taskstats.c b/kernel/taskstats.c index 42d3110..24d7f62 100644 --- a/kernel/taskstats.c +++ b/kernel/taskstats.c @@ -181,6 +181,9 @@ static void send_cpu_listeners(struct sk_buff *skb, * memory usage), so are taken from the group leader. * XXX_threadgroup() methods deal with the first type while XXX_add_tsk() with * the second. + * + * fill_threadgroup() may overwrite stats from add_tsk(), so it must be called + * after add_tsk(). */ static void fill_threadgroup(struct taskstats *stats, struct task_struct *task, bool tg_stats) diff --git a/kernel/tsacct.c b/kernel/tsacct.c index 24056aa..526b134 100644 --- a/kernel/tsacct.c +++ b/kernel/tsacct.c @@ -44,6 +44,8 @@ static void fill_wall_times(struct taskstats *stats, struct task_struct *tsk) void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task, bool tg_stats) { + int group_exit_code; + BUILD_BUG_ON(TS_COMM_LEN < TASK_COMM_LEN); rcu_read_lock(); @@ -53,11 +55,11 @@ void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task, fill_wall_times(stats, task); - if (thread_group_leader(task)) { - stats->ac_exitcode = task->exit_code; - if (task->flags & PF_FORKNOEXEC) - stats->ac_flag |= AFORK; - } + if (thread_group_leader(task) && (task->flags & PF_FORKNOEXEC)) + stats->ac_flag |= AFORK; + + group_exit_code = tg_stats ? task->signal->group_exit_code : 0; + stats->ac_exitcode = group_exit_code ? : task->exit_code; stats->ac_nice = task_nice(task); stats->ac_sched = task->policy; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] taskstats: tell fill_thread_group() whether it replies with PID or TGID stats
fill_thread_group() may want to know if it is filling TASKSTATS_CMD_ATTR_TGID or TASKSTATS_CMD_ATTR_PID stats, so give it this information in the tg_stats boolean. Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]> Cc: Balbir Singh <[EMAIL PROTECTED]> Cc: Jay Lan <[EMAIL PROTECTED]> Cc: Jonathan Lim <[EMAIL PROTECTED]> Cc: Oleg Nesterov <[EMAIL PROTECTED]> --- include/linux/tsacct_kern.h |4 ++-- kernel/taskstats.c | 12 +++- kernel/tsacct.c |3 ++- 3 files changed, 11 insertions(+), 8 deletions(-) diff --git a/include/linux/tsacct_kern.h b/include/linux/tsacct_kern.h index 93dffc2..5652ae0 100644 --- a/include/linux/tsacct_kern.h +++ b/include/linux/tsacct_kern.h @@ -10,10 +10,10 @@ #include #ifdef CONFIG_TASKSTATS -void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task); +void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task, bool tg_stats); void bacct_add_tsk(struct taskstats *stats, struct task_struct *task); #else -static inline void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task) +static inline void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task, bool tg_stats) {} static inline void bacct_add_tsk(struct taskstats *stats, struct task_struct *task) {} diff --git a/kernel/taskstats.c b/kernel/taskstats.c index ce43fae..42d3110 100644 --- a/kernel/taskstats.c +++ b/kernel/taskstats.c @@ -172,6 +172,7 @@ static void send_cpu_listeners(struct sk_buff *skb, * fill_threadgroup - initialize some common stats for the thread group * @stats: the taskstats to write into * @task: the thread representing the whole group + * @tg_stats: whether in the end thread group stats are requested * * There are two types of taskstats fields when considering a thread group: * - those that can be aggregated from each thread in the group (like CPU @@ -181,7 +182,8 @@ static void send_cpu_listeners(struct sk_buff *skb, * XXX_threadgroup() methods deal with the first type while XXX_add_tsk() with * the second. */ -static void fill_threadgroup(struct taskstats *stats, struct task_struct *task) +static void fill_threadgroup(struct taskstats *stats, struct task_struct *task, +bool tg_stats) { /* * Each accounting subsystem adds calls to its functions to initialize @@ -193,7 +195,7 @@ static void fill_threadgroup(struct taskstats *stats, struct task_struct *task) stats->version = TASKSTATS_VERSION; /* fill in basic acct fields */ - bacct_fill_threadgroup(stats, task); + bacct_fill_threadgroup(stats, task, tg_stats); /* fill in extended acct fields */ xacct_fill_threadgroup(stats, task); @@ -248,7 +250,7 @@ static int fill_pid(pid_t pid, struct task_struct *tsk, memset(stats, 0, sizeof(*stats)); add_tsk(stats, tsk); - fill_threadgroup(stats, tsk); + fill_threadgroup(stats, tsk, false); /* Define err: label here if needed */ put_task_struct(tsk); @@ -289,7 +291,7 @@ static int fill_tgid(pid_t tgid, struct task_struct *first, add_tsk(stats, tsk); while_each_thread(first, tsk); - fill_threadgroup(stats, first->group_leader); + fill_threadgroup(stats, first->group_leader, true); unlock_task_sighand(first, &flags); rc = 0; out: @@ -545,7 +547,7 @@ void taskstats_exit(struct task_struct *tsk, int group_dead) */ memcpy(stats, tsk->signal->stats, sizeof(*stats)); - fill_threadgroup(stats, tsk->group_leader); + fill_threadgroup(stats, tsk->group_leader, true); send: send_cpu_listeners(rep_skb, listeners); diff --git a/kernel/tsacct.c b/kernel/tsacct.c index 9541a1a..24056aa 100644 --- a/kernel/tsacct.c +++ b/kernel/tsacct.c @@ -41,7 +41,8 @@ static void fill_wall_times(struct taskstats *stats, struct task_struct *tsk) * fill in basic accounting fields */ -void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task) +void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task, + bool tg_stats) { BUILD_BUG_ON(TS_COMM_LEN < TASK_COMM_LEN); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] JBD2/ext4 naming cleanup
JBD2 naming cleanup From: Mingming Cao <[EMAIL PROTECTED]> change micros name from JBD_XXX to JBD2_XXX in JBD2/Ext4 Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> --- fs/ext4/extents.c |2 +- fs/ext4/super.c |2 +- fs/jbd2/commit.c |2 +- fs/jbd2/journal.c |8 fs/jbd2/recovery.c|2 +- fs/jbd2/revoke.c |4 ++-- include/linux/ext4_jbd2.h |6 +++--- include/linux/jbd2.h | 30 +++--- 8 files changed, 28 insertions(+), 28 deletions(-) Index: linux-2.6.23-rc6/fs/ext4/super.c === --- linux-2.6.23-rc6.orig/fs/ext4/super.c 2007-09-21 16:27:31.0 -0700 +++ linux-2.6.23-rc6/fs/ext4/super.c2007-09-21 16:27:46.0 -0700 @@ -966,7 +966,7 @@ static int parse_options (char *options, if (option < 0) return 0; if (option == 0) - option = JBD_DEFAULT_MAX_COMMIT_AGE; + option = JBD2_DEFAULT_MAX_COMMIT_AGE; sbi->s_commit_interval = HZ * option; break; case Opt_data_journal: Index: linux-2.6.23-rc6/include/linux/ext4_jbd2.h === --- linux-2.6.23-rc6.orig/include/linux/ext4_jbd2.h 2007-09-10 19:50:29.0 -0700 +++ linux-2.6.23-rc6/include/linux/ext4_jbd2.h 2007-09-21 16:27:46.0 -0700 @@ -12,8 +12,8 @@ * Ext4-specific journaling extensions. */ -#ifndef _LINUX_EXT4_JBD_H -#define _LINUX_EXT4_JBD_H +#ifndef _LINUX_EXT4_JBD2_H +#define _LINUX_EXT4_JBD2_H #include #include @@ -228,4 +228,4 @@ static inline int ext4_should_writeback_ return 0; } -#endif /* _LINUX_EXT4_JBD_H */ +#endif /* _LINUX_EXT4_JBD2_H */ Index: linux-2.6.23-rc6/include/linux/jbd2.h === --- linux-2.6.23-rc6.orig/include/linux/jbd2.h 2007-09-21 09:07:09.0 -0700 +++ linux-2.6.23-rc6/include/linux/jbd2.h 2007-09-21 16:27:46.0 -0700 @@ -13,8 +13,8 @@ * filesystem journaling support. */ -#ifndef _LINUX_JBD_H -#define _LINUX_JBD_H +#ifndef _LINUX_JBD2_H +#define _LINUX_JBD2_H /* Allow this file to be included directly into e2fsprogs */ #ifndef __KERNEL__ @@ -37,26 +37,26 @@ #define journal_oom_retry 1 /* - * Define JBD_PARANIOD_IOFAIL to cause a kernel BUG() if ext3 finds + * Define JBD2_PARANIOD_IOFAIL to cause a kernel BUG() if ext4 finds * certain classes of error which can occur due to failed IOs. Under - * normal use we want ext3 to continue after such errors, because + * normal use we want ext4 to continue after such errors, because * hardware _can_ fail, but for debugging purposes when running tests on * known-good hardware we may want to trap these errors. */ -#undef JBD_PARANOID_IOFAIL +#undef JBD2_PARANOID_IOFAIL /* * The default maximum commit age, in seconds. */ -#define JBD_DEFAULT_MAX_COMMIT_AGE 5 +#define JBD2_DEFAULT_MAX_COMMIT_AGE 5 #ifdef CONFIG_JBD2_DEBUG /* - * Define JBD_EXPENSIVE_CHECKING to enable more expensive internal + * Define JBD2_EXPENSIVE_CHECKING to enable more expensive internal * consistency checks. By default we don't do this unless * CONFIG_JBD2_DEBUG is on. */ -#define JBD_EXPENSIVE_CHECKING +#define JBD2_EXPENSIVE_CHECKING extern u8 jbd2_journal_enable_debug; #define jbd_debug(n, f, a...) \ @@ -163,8 +163,8 @@ typedef struct journal_block_tag_s __be32 t_blocknr_high; /* most-significant high 32bits. */ } journal_block_tag_t; -#define JBD_TAG_SIZE32 (offsetof(journal_block_tag_t, t_blocknr_high)) -#define JBD_TAG_SIZE64 (sizeof(journal_block_tag_t)) +#define JBD2_TAG_SIZE32 (offsetof(journal_block_tag_t, t_blocknr_high)) +#define JBD2_TAG_SIZE64 (sizeof(journal_block_tag_t)) /* * The revoke descriptor: used on disk to describe a series of blocks to @@ -256,8 +256,8 @@ typedef struct journal_superblock_s #include #include -#define JBD_ASSERTIONS -#ifdef JBD_ASSERTIONS +#define JBD2_ASSERTIONS +#ifdef JBD2_ASSERTIONS #define J_ASSERT(assert) \ do { \ if (!(assert)) {\ @@ -284,9 +284,9 @@ void buffer_assertion_failure(struct buf #else #define J_ASSERT(assert) do { } while (0) -#endif /* JBD_ASSERTIONS */ +#endif /* JBD2_ASSERTIONS */ -#if defined(JBD_PARANOID_IOFAIL) +#if defined(JBD2_PARANOID_IOFAIL) #define J_EXPECT(expr, why...) J_ASSERT(expr) #define J_EXPECT_BH(bh, expr, why...) J_ASSERT_BH(bh, expr) #define J_EXPECT_JH(jh, expr, why...) J_ASSERT_JH(jh, expr) @@ -1104,4 +1104,4 @@ extern int jbd_blocks_per_p
[PATCH 1/3] taskstats: separate PID/TGID stats producers to complete the TGID ones
TASKSTATS_CMD_ATTR_TGID used to return only the delay accounting stats, not the basic and extended accounting. With this patch, TASKSTATS_CMD_ATTR_TGID also aggregates the accounting info for all threads of a thread group. TASKSTATS_CMD_ATTR_PID output should be unchanged TASKSTATS_CMD_ATTR_TGID output should have all fields set, unlike before the patch where most of the fiels were set to 0. To this aim, two functions were introduced: fill_threadgroup() and add_tsk(). These functions are responsible for aggregating the subsystem specific accounting information. Taskstats requesters (fill_pid(), fill_tgid() and fill_tgid_exit()) should only call add_tsk() and fill_threadgroup() to get the stats. Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]> Cc: Balbir Singh <[EMAIL PROTECTED]> Cc: Jay Lan <[EMAIL PROTECTED]> Cc: Jonathan Lim <[EMAIL PROTECTED]> Cc: Oleg Nesterov <[EMAIL PROTECTED]> --- Documentation/accounting/getdelays.c |2 - include/linux/tsacct_kern.h | 12 ++- kernel/taskstats.c | 131 ++ kernel/tsacct.c | 106 4 files changed, 155 insertions(+), 96 deletions(-) diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c index cbee3a2..78773c0 100644 --- a/Documentation/accounting/getdelays.c +++ b/Documentation/accounting/getdelays.c @@ -76,7 +76,7 @@ static void usage(void) fprintf(stderr, "getdelays [-dilv] [-w logfile] [-r bufsize] " "[-m cpumask] [-t tgid] [-p pid]\n"); fprintf(stderr, " -d: print delayacct stats\n"); - fprintf(stderr, " -i: print IO accounting (works only with -p)\n"); + fprintf(stderr, " -i: print IO accounting\n"); fprintf(stderr, " -l: listen forever\n"); fprintf(stderr, " -v: debug on\n"); } diff --git a/include/linux/tsacct_kern.h b/include/linux/tsacct_kern.h index 7e50ac7..93dffc2 100644 --- a/include/linux/tsacct_kern.h +++ b/include/linux/tsacct_kern.h @@ -10,17 +10,23 @@ #include #ifdef CONFIG_TASKSTATS -extern void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk); +void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task); +void bacct_add_tsk(struct taskstats *stats, struct task_struct *task); #else -static inline void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk) +static inline void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task) +{} +static inline void bacct_add_tsk(struct taskstats *stats, struct task_struct *task) {} #endif /* CONFIG_TASKSTATS */ #ifdef CONFIG_TASK_XACCT -extern void xacct_add_tsk(struct taskstats *stats, struct task_struct *p); +void xacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task); +void xacct_add_tsk(struct taskstats *stats, struct task_struct *p); extern void acct_update_integrals(struct task_struct *tsk); extern void acct_clear_integrals(struct task_struct *tsk); #else +static inline void xacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task) +{} static inline void xacct_add_tsk(struct taskstats *stats, struct task_struct *p) {} static inline void acct_update_integrals(struct task_struct *tsk) diff --git a/kernel/taskstats.c b/kernel/taskstats.c index 059431e..ce43fae 100644 --- a/kernel/taskstats.c +++ b/kernel/taskstats.c @@ -168,6 +168,68 @@ static void send_cpu_listeners(struct sk_buff *skb, up_write(&listeners->sem); } +/** + * fill_threadgroup - initialize some common stats for the thread group + * @stats: the taskstats to write into + * @task: the thread representing the whole group + * + * There are two types of taskstats fields when considering a thread group: + * - those that can be aggregated from each thread in the group (like CPU + * times), + * - those that cannot be aggregated (like UID) or are identical (like + * memory usage), so are taken from the group leader. + * XXX_threadgroup() methods deal with the first type while XXX_add_tsk() with + * the second. + */ +static void fill_threadgroup(struct taskstats *stats, struct task_struct *task) +{ + /* +* Each accounting subsystem adds calls to its functions to initialize +* relevant parts of struct taskstsats for a single tgid as follows: +* +* per-task-foo-fill_threadgroup(stats, task); +*/ + + stats->version = TASKSTATS_VERSION; + + /* fill in basic acct fields */ + bacct_fill_threadgroup(stats, task); + + /* fill in extended acct fields */ + xacct_fill_threadgroup(stats, task); +} + +/** + * add_tsk - combine some thread specific stats in a taskstats + * @stats: the taskstats to write into + * @task: the thread to combine + * + * Stats specific to each thread in the thread group. Stats of @task should be + * combined with those already present in @stats. add_tsk() works in + * conjunction with f
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
-- On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > If you do a synchronize_rcu() it might well have to wait through the > following sequence of states: > > Stage 0: (might have to wait through part of this to get out of "next" queue) > rcu_try_flip_idle_state,/* "I" */ > rcu_try_flip_waitack_state, /* "A" */ > rcu_try_flip_waitzero_state,/* "Z" */ > rcu_try_flip_waitmb_state /* "M" */ > Stage 1: > rcu_try_flip_idle_state,/* "I" */ > rcu_try_flip_waitack_state, /* "A" */ > rcu_try_flip_waitzero_state,/* "Z" */ > rcu_try_flip_waitmb_state /* "M" */ > Stage 2: > rcu_try_flip_idle_state,/* "I" */ > rcu_try_flip_waitack_state, /* "A" */ > rcu_try_flip_waitzero_state,/* "Z" */ > rcu_try_flip_waitmb_state /* "M" */ > Stage 3: > rcu_try_flip_idle_state,/* "I" */ > rcu_try_flip_waitack_state, /* "A" */ > rcu_try_flip_waitzero_state,/* "Z" */ > rcu_try_flip_waitmb_state /* "M" */ > Stage 4: > rcu_try_flip_idle_state,/* "I" */ > rcu_try_flip_waitack_state, /* "A" */ > rcu_try_flip_waitzero_state,/* "Z" */ > rcu_try_flip_waitmb_state /* "M" */ > > So yes, grace periods do indeed have some latency. Yes they do. I'm now at the point that I'm just "trusting" you that you understand that each of these stages are needed. My IQ level only lets me understand next -> wait -> done, but not the extra 3 shifts in wait. ;-) > > > > True, but the "me" confused me. Since that task struct is not me ;-) > > Well, who is it, then? ;-) It's the app I watch sitting there waiting it's turn for it's callback to run. > > > > > > > > Isn't the GP detection done via a tasklet/softirq. So wouldn't a > > > > local_bh_disable be sufficient here? You already cover NMIs, which would > > > > also handle normal interrupts. > > > > > > This is also my understanding, but I think this disable is an > > > 'optimization' in that it avoids the regular IRQs from jumping through > > > these hoops outlined below. > > > > But isn't disabling irqs slower than doing a local_bh_disable? So the > > majority of times (where irqs will not happen) we have this overhead. > > The current code absolutely must exclude the scheduling-clock hardirq > handler. ACKed, The reasoning you gave in Peter's reply most certainly makes sense. > > > > > + * > > > > > + * If anyone is nuts enough to run this CONFIG_PREEMPT_RCU > > > > > implementation > > > > > > > > Oh, come now! It's not "nuts" to use this ;-) > > > > > > > > > + * on a large SMP, they might want to use a hierarchical > > > > > organization of > > > > > + * the per-CPU-counter pairs. > > > > > + */ > > > > > > Its the large SMP case that's nuts, and on that I have to agree with > > > Paul, its not really large SMP friendly. > > > > Hmm, that could be true. But on large SMP systems, you usually have a > > large amounts of memory, so hopefully a really long synchronize_rcu > > would not be a problem. > > Somewhere in the range from 64 to a few hundred CPUs, the global lock > protecting the try_flip state machine would start sucking air pretty > badly. But the real problem is synchronize_sched(), which loops through > all the CPUs -- this would likely cause problems at a few tens of > CPUs, perhaps as early as 10-20. hehe, From someone who's largest box is 4 CPUs, to me 16 CPUS is large. But I can see hundreds, let alone thousands of CPUs would make a huge grinding halt on things like synchronize_sched. God, imaging if all CPUs did that approximately at the same time. The system would should a huge jitter. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 14/22] NFS: Use local caching
Peter Staubach <[EMAIL PROTECTED]> wrote: > Did I miss the section where the modified semantics about which > mounted file systems can use the cache and which ones can not > was implemented? Yes. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump
On Sep 21, 2007, at 17:16:59, Jeremy Maitin-Shepard wrote: "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: The ACPI platform firmware is allowed to preserve information accross the hibernation-resume cycle, so this need not be the same. All of my comments related to the case where S4 is not being used (instead the system is just powered off normally), and a boot kernel that does not initialize ACPI is used. In that case, the ACPI platform firmware should not be able to distinguish a normal boot from a resume from hibernation. I think that in order for this to work, there would need to be some ABI whereby the resume-ing kernel can pass its entire ACPI state and a bunch of other ACPI-related device details to the resume-ed kernel, which I believe it does not do at the moment. I believe that what causes problems is the ACPI state data that the kernel stores is *different* between identical sequential boots, especially when you add/remove/replace batteries, AC, etc. Since we currently throw away most of that in-kernel ACPI interpreter state data when we load the to-be-resumed image and replace it with the state from the previous boot it looks to the ACPI code and firmware like our system's hardware magically changed behind its back. The result is that the ACPI and firmware code is justifiably confused (although probably it should be more idempotent to begin with). There's 2 potential solutions: 1) Formalize and copy a *lot* of ACPI state from the resume-ing kernel to the resume-ed kernel. 2) Properly call the ACPI S4 methods in the proper order Neither one is particularly easy or particularly pleasant, especially given all the vendor bugs in this general area. Theoretically we should be able to do both, since one will be more reliable than the other on different systems depending on what kinds of firmware bugs they have. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/22] Introduce credential record
Casey Schaufler <[EMAIL PROTECTED]> wrote: > They are nonetheless in effect and (heaven forbid) should they be > abused you don't want to hide the facts from concerned observers. Because, I suspect, what the observer through /proc should see is what the process thinks it is doing, not what is transparently going on behind the scenes. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 11/22] CacheFiles: Permit the page lock state to be monitored
Trond Myklebust <[EMAIL PROTECTED]> wrote: > > This is used by CacheFiles to detect read completion on a page in the > > backing filesystem so that it can then copy the data to the waiting netfs > > page. > > Won't it in any case want to lock the page too? No. Why would it? All it wants to do is to read the page (copying it to the netfs's page), assuming it becomes PG_uptodate. > That would be the only way to ensure that the page is still mapped into the > address space when you're writing it out... I don't understand what you're getting at. Write the page out where? We've just read it in from the cache, so why would we be writing it back out? David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] JBD/ext34 cleanups: convert to kzalloc
Convert kmalloc to kzalloc() and get rid of the memset(). Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> --- fs/ext3/xattr.c |3 +-- fs/ext4/xattr.c |3 +-- fs/jbd/journal.c |3 +-- fs/jbd/transaction.c |2 +- fs/jbd2/journal.c |3 +-- fs/jbd2/transaction.c |2 +- 6 files changed, 6 insertions(+), 10 deletions(-) Index: linux-2.6.23-rc6/fs/jbd/journal.c === --- linux-2.6.23-rc6.orig/fs/jbd/journal.c 2007-09-21 09:08:02.0 -0700 +++ linux-2.6.23-rc6/fs/jbd/journal.c 2007-09-21 09:10:37.0 -0700 @@ -653,10 +653,9 @@ static journal_t * journal_init_common ( journal_t *journal; int err; - journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL); + journal = kzalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL); if (!journal) goto fail; - memset(journal, 0, sizeof(*journal)); init_waitqueue_head(&journal->j_wait_transaction_locked); init_waitqueue_head(&journal->j_wait_logspace); Index: linux-2.6.23-rc6/fs/jbd/transaction.c === --- linux-2.6.23-rc6.orig/fs/jbd/transaction.c 2007-09-21 09:13:11.0 -0700 +++ linux-2.6.23-rc6/fs/jbd/transaction.c 2007-09-21 09:13:24.0 -0700 @@ -96,7 +96,7 @@ static int start_this_handle(journal_t * alloc_transaction: if (!journal->j_running_transaction) { - new_transaction = kmalloc(sizeof(*new_transaction), + new_transaction = kzalloc(sizeof(*new_transaction), GFP_NOFS|__GFP_NOFAIL); if (!new_transaction) { ret = -ENOMEM; Index: linux-2.6.23-rc6/fs/jbd2/journal.c === --- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-21 09:10:53.0 -0700 +++ linux-2.6.23-rc6/fs/jbd2/journal.c 2007-09-21 09:11:13.0 -0700 @@ -654,10 +654,9 @@ static journal_t * journal_init_common ( journal_t *journal; int err; - journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL); + journal = kzalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL); if (!journal) goto fail; - memset(journal, 0, sizeof(*journal)); init_waitqueue_head(&journal->j_wait_transaction_locked); init_waitqueue_head(&journal->j_wait_logspace); Index: linux-2.6.23-rc6/fs/jbd2/transaction.c === --- linux-2.6.23-rc6.orig/fs/jbd2/transaction.c 2007-09-21 09:12:46.0 -0700 +++ linux-2.6.23-rc6/fs/jbd2/transaction.c 2007-09-21 09:12:59.0 -0700 @@ -96,7 +96,7 @@ static int start_this_handle(journal_t * alloc_transaction: if (!journal->j_running_transaction) { - new_transaction = kmalloc(sizeof(*new_transaction), + new_transaction = kzalloc(sizeof(*new_transaction), GFP_NOFS|__GFP_NOFAIL); if (!new_transaction) { ret = -ENOMEM; Index: linux-2.6.23-rc6/fs/ext3/xattr.c === --- linux-2.6.23-rc6.orig/fs/ext3/xattr.c 2007-09-21 10:22:24.0 -0700 +++ linux-2.6.23-rc6/fs/ext3/xattr.c2007-09-21 10:24:19.0 -0700 @@ -741,12 +741,11 @@ ext3_xattr_block_set(handle_t *handle, s } } else { /* Allocate a buffer where we construct the new block. */ - s->base = kmalloc(sb->s_blocksize, GFP_KERNEL); + s->base = kzalloc(sb->s_blocksize, GFP_KERNEL); /* assert(header == s->base) */ error = -ENOMEM; if (s->base == NULL) goto cleanup; - memset(s->base, 0, sb->s_blocksize); header(s->base)->h_magic = cpu_to_le32(EXT3_XATTR_MAGIC); header(s->base)->h_blocks = cpu_to_le32(1); header(s->base)->h_refcount = cpu_to_le32(1); Index: linux-2.6.23-rc6/fs/ext4/xattr.c === --- linux-2.6.23-rc6.orig/fs/ext4/xattr.c 2007-09-21 10:20:21.0 -0700 +++ linux-2.6.23-rc6/fs/ext4/xattr.c2007-09-21 10:21:00.0 -0700 @@ -750,12 +750,11 @@ ext4_xattr_block_set(handle_t *handle, s } } else { /* Allocate a buffer where we construct the new block. */ - s->base = kmalloc(sb->s_blocksize, GFP_KERNEL); + s->base = kzalloc(sb->s_blocksize, GFP_KERNEL); /* assert(header == s->base) */ error = -ENOMEM; if (s->base == NULL) goto cleanup; - memset(s->base, 0, sb->s_blocksize);
Re: [PATCH 10/22] CacheFiles: Add a hook to write a single page of data to an inode
Trond Myklebust <[EMAIL PROTECTED]> wrote: > So why do you need a new address space operation? AFAICS the generic > implementation will work for pretty much everyone who supports the > existing prepare_write()/commit_write(). Because Christoph decreed that I wasn't allowed to call prepare_write() and commit_write() directly. It's possible that the method should be in the inode_operations rather than on the address space. > Furthermore, you don't appear to supply any alternative "optimised" > implementations... Optimised in what fashion? David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] Linux-tiny project revival
On Fri, 2007-09-21 at 18:05 -0500, Rob Landley wrote: > > from printks and defining something that modifies pr_. > pr_level doesn't exist in mainline. pr_info and pr_debug do. pr_alert, pr_emerg, pr_crit, pr_err, and pr_warn could be added. > > #define pr_info(fmt, arg) printk(KERN_INFO PR_FMT fmt PR_ARG, ##arg) > Do we really need another layer of indirection? It'd make file/function/line cost free for embedded use. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] Linux-tiny project revival
On Friday 21 September 2007 12:45:27 pm Joe Perches wrote: > On Fri, 2007-09-21 at 13:16 -0400, [EMAIL PROTECTED] wrote: > > What about something *really* hardcore ugly like: > > #ifdef __FILE__ > > #undef __FILE__ > > #define __FILE__ "" > > #endif > > (or similar preprocessor blecherousness) if you want to *really* shrink > > that binary down? > > I prefer removing all __FILE__, __FUNCTION__, __LINE__ uses > from printks and defining something that modifies pr_. pr_level doesn't exist in mainline. > #define pr_info(fmt, arg) printk(KERN_INFO PR_FMT fmt PR_ARG, ##arg) Do we really need another layer of indirection? Rob -- "One of my most productive days was throwing away 1000 lines of code." - Ken Thompson. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Megaraid driver not detecting RAID volumes in kernel 2.6.22?
> > https://bugzilla.redhat.com/show_bug.cgi?id=288421 > > When running Fedora on a Dell 2950 w/ integrated LSI Perc5i (megaraid), the > system will not boot after upgrading to 2.6.22. The boot message indicates the > system is somehow seeing through RAID, cannot access logical volume. This > causes the root device to be unavailable and the kernel to panic. > > Version-Release number of selected component (if applicable): > I experience this problem with kernel 2.6.22 and higher. I do not believe it is > isolated to FC6, as I downloaded the stock 2.6.22 kernel from kernel.org and was > able to reproduce. > > How reproducible: > Every time. > > Steps to Reproduce: > 1. Configure RAID10 (I've also tried RAID5) on a Perc5i in this system. > > 2. Load Fedora Core. The installer works fine since the kernel version it uses > has a working LSI driver. > > 3. Upgrade to 2.6.22 kernel image (in yum) or download kernel.org sources, > compile, and install. > > 4. Reboot system. It comes up unable to boot. The kernel panics. > > Actual results: > As the system boots, it cannot mount the root device. Also in the output we see > all 6 disks separately, when they should be showing up as one logical volume. Could a standard MPT driver (non-RAID) be loading on this controller? During the reboot, can you see megaraid driver loading at all? Or do you see mpt_scsi driver? Before upgrading, can you blacklist this controller in pci hotplug? I see shpchp on your screenshot. Sreenivas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
On Fri, Sep 21, 2007 at 11:20:48AM -0400, Steven Rostedt wrote: > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote: > > + > > +/* > > + * PREEMPT_RCU data structures. > > + */ > > + > > +#define GP_STAGES 4 > > +struct rcu_data { > > + spinlock_t lock; /* Protect rcu_data fields. */ > > + longcompleted; /* Number of last completed batch. */ > > + int waitlistcount; > > + struct tasklet_struct rcu_tasklet; > > + struct rcu_head *nextlist; > > + struct rcu_head **nexttail; > > + struct rcu_head *waitlist[GP_STAGES]; > > + struct rcu_head **waittail[GP_STAGES]; > > + struct rcu_head *donelist; > > + struct rcu_head **donetail; > > +#ifdef CONFIG_RCU_TRACE > > + struct rcupreempt_trace trace; > > +#endif /* #ifdef CONFIG_RCU_TRACE */ > > +}; > > +struct rcu_ctrlblk { > > + spinlock_t fliplock; /* Protect state-machine transitions. */ > > + longcompleted; /* Number of last completed batch. */ > > +}; > > +static DEFINE_PER_CPU(struct rcu_data, rcu_data); > > +static struct rcu_ctrlblk rcu_ctrlblk = { > > + .fliplock = SPIN_LOCK_UNLOCKED, > > + .completed = 0, > > +}; > > +static DEFINE_PER_CPU(int [2], rcu_flipctr) = { 0, 0 }; > > + > > +/* > > + * States for rcu_try_flip() and friends. > > + */ > > + > > +enum rcu_try_flip_states { > > + rcu_try_flip_idle_state,/* "I" */ > > + rcu_try_flip_waitack_state, /* "A" */ > > + rcu_try_flip_waitzero_state,/* "Z" */ > > + rcu_try_flip_waitmb_state /* "M" */ > > +}; > > +static enum rcu_try_flip_states rcu_try_flip_state = > > rcu_try_flip_idle_state; > > +#ifdef CONFIG_RCU_TRACE > > +static char *rcu_try_flip_state_names[] = > > + { "idle", "waitack", "waitzero", "waitmb" }; > > +#endif /* #ifdef CONFIG_RCU_TRACE */ > > [snip] > > > +/* > > + * If a global counter flip has occurred since the last time that we > > + * advanced callbacks, advance them. Hardware interrupts must be > > + * disabled when calling this function. > > + */ > > +static void __rcu_advance_callbacks(struct rcu_data *rdp) > > +{ > > + int cpu; > > + int i; > > + int wlc = 0; > > + > > + if (rdp->completed != rcu_ctrlblk.completed) { > > + if (rdp->waitlist[GP_STAGES - 1] != NULL) { > > + *rdp->donetail = rdp->waitlist[GP_STAGES - 1]; > > + rdp->donetail = rdp->waittail[GP_STAGES - 1]; > > + RCU_TRACE_RDP(rcupreempt_trace_move2done, rdp); > > + } > > + for (i = GP_STAGES - 2; i >= 0; i--) { > > + if (rdp->waitlist[i] != NULL) { > > + rdp->waitlist[i + 1] = rdp->waitlist[i]; > > + rdp->waittail[i + 1] = rdp->waittail[i]; > > + wlc++; > > + } else { > > + rdp->waitlist[i + 1] = NULL; > > + rdp->waittail[i + 1] = > > + &rdp->waitlist[i + 1]; > > + } > > + } > > + if (rdp->nextlist != NULL) { > > + rdp->waitlist[0] = rdp->nextlist; > > + rdp->waittail[0] = rdp->nexttail; > > + wlc++; > > + rdp->nextlist = NULL; > > + rdp->nexttail = &rdp->nextlist; > > + RCU_TRACE_RDP(rcupreempt_trace_move2wait, rdp); > > + } else { > > + rdp->waitlist[0] = NULL; > > + rdp->waittail[0] = &rdp->waitlist[0]; > > + } > > + rdp->waitlistcount = wlc; > > + rdp->completed = rcu_ctrlblk.completed; > > + } > > + > > + /* > > +* Check to see if this CPU needs to report that it has seen > > +* the most recent counter flip, thereby declaring that all > > +* subsequent rcu_read_lock() invocations will respect this flip. > > +*/ > > + > > + cpu = raw_smp_processor_id(); > > + if (per_cpu(rcu_flip_flag, cpu) == rcu_flipped) { > > + smp_mb(); /* Subsequent counter accesses must see new value */ > > + per_cpu(rcu_flip_flag, cpu) = rcu_flip_seen; > > + smp_mb(); /* Subsequent RCU read-side critical sections */ > > + /* seen -after- acknowledgement. */ > > + } > > +} > > [snip] > > > +/* > > + * Attempt a single flip of the counters. Remember, a single flip does > > + * -not- constitute a grace period. Instead, the interval between > > + * at least three consecutive flips is a grace period. > > + * > > + * If anyone is nuts enough to run this CONFIG_PREEMPT_RCU implementation > > + * on a large SMP, they might want to use a hierarchical organization of > > + * the per-CPU-counter pairs. > > + */ > > +static void rcu_try_flip(void) > > +{ > > + unsigned long oldirq; > > + > > + RCU_TRACE_ME(rcupreempt_trace_try_flip_1); > > + if (unlikely(!spin_trylock_irqsave(&rcu_ctrlblk.fliplock, oldirq))) {
Re: [PATCH 07/25] r/o bind mounts: elevate write count for some ioctls
On Thu, 20 Sep 2007 12:52:57 -0700 Dave Hansen <[EMAIL PROTECTED]> wrote: > Some ioctl()s can cause writes to the filesystem. Take > these, and make them use mnt_want/drop_write() instead. > > We need to pass the filp one layer deeper in XFS, but > somebody _just_ pulled it out in February because nobody > was using it, so I don't feel guilty for adding it back. Note that -mm's ext2-reservations.patch adds EXT2_IOC_SETRSVSZ, and it doesn't do mnt_want_write(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.22.7
diff --git a/Makefile b/Makefile index 3067f6a..12edea0 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 2 PATCHLEVEL = 6 SUBLEVEL = 22 -EXTRAVERSION = .6 +EXTRAVERSION = .7 NAME = Holy Dancing Manatees, Batman! # *DOCUMENTATION* diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S index 47565c3..0bc623a 100644 --- a/arch/x86_64/ia32/ia32entry.S +++ b/arch/x86_64/ia32/ia32entry.S @@ -38,6 +38,18 @@ movq%rax,R8(%rsp) .endm + .macro LOAD_ARGS32 offset + movl \offset(%rsp),%r11d + movl \offset+8(%rsp),%r10d + movl \offset+16(%rsp),%r9d + movl \offset+24(%rsp),%r8d + movl \offset+40(%rsp),%ecx + movl \offset+48(%rsp),%edx + movl \offset+56(%rsp),%esi + movl \offset+64(%rsp),%edi + movl \offset+72(%rsp),%eax + .endm + .macro CFI_STARTPROC32 simple CFI_STARTPROC \simple CFI_UNDEFINED r8 @@ -152,7 +164,7 @@ sysenter_tracesys: movq$-ENOSYS,RAX(%rsp) /* really needed? */ movq%rsp,%rdi/* &pt_regs -> arg1 */ callsyscall_trace_enter - LOAD_ARGS ARGOFFSET /* reload args from stack in case ptrace changed it */ + LOAD_ARGS32 ARGOFFSET /* reload args from stack in case ptrace changed it */ RESTORE_REST movl%ebp, %ebp /* no need to do an access_ok check here because rbp has been @@ -255,7 +267,7 @@ cstar_tracesys: movq $-ENOSYS,RAX(%rsp) /* really needed? */ movq %rsp,%rdi/* &pt_regs -> arg1 */ call syscall_trace_enter - LOAD_ARGS ARGOFFSET /* reload args from stack in case ptrace changed it */ + LOAD_ARGS32 ARGOFFSET /* reload args from stack in case ptrace changed it */ RESTORE_REST movl RSP-ARGOFFSET(%rsp), %r8d /* no need to do an access_ok check here because r8 has been @@ -333,7 +345,7 @@ ia32_tracesys: movq $-ENOSYS,RAX(%rsp) /* really needed? */ movq %rsp,%rdi/* &pt_regs -> arg1 */ call syscall_trace_enter - LOAD_ARGS ARGOFFSET /* reload args from stack in case ptrace changed it */ + LOAD_ARGS32 ARGOFFSET /* reload args from stack in case ptrace changed it */ RESTORE_REST jmp ia32_do_syscall END(ia32_syscall) diff --git a/arch/x86_64/kernel/ptrace.c b/arch/x86_64/kernel/ptrace.c index 9409117..8d89d8c 100644 --- a/arch/x86_64/kernel/ptrace.c +++ b/arch/x86_64/kernel/ptrace.c @@ -223,10 +223,6 @@ static int putreg(struct task_struct *child, { unsigned long tmp; - /* Some code in the 64bit emulation may not be 64bit clean. - Don't take any chances. */ - if (test_tsk_thread_flag(child, TIF_IA32)) - value &= 0x; switch (regno) { case offsetof(struct user_regs_struct,fs): if (value && (value & 3) != 3) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] Linux-tiny project revival
On Sep 21, 2007, at 18:05:34, Joe Perches wrote: On Fri, 2007-09-21 at 17:34 -0400, Kyle Moffett wrote: With a bit more glue that would cause GCC to notice that for a given qprintk_kmalloc the "qpk->type" is always zero because the level is too high, and therefore it would optimize out *ALL* of the _qprintk_kmalloc(), _qprintk(), and _qprintk_finish() calls. A negative is that lockup conditions swallow partial messages. But typically you don't care if a "partial line" gets swallowed regardless. The only reason people really use partial lines is when they're accumulating a variable number of things into a single line and so a single printk() won't do, and in that case it's really not a problem to "lose" the first half of the line in event of a crash. And hell, if it matters that much you could just make the qprintk_ {kmalloc,percpu,irq} functions chain the qpk variables on a little linked list and stuff an smp_wmb() in the _gprint() function after writing the text and before writing the size. That way any panic could very carefully look at the messages being queued during the crash and attempt to write out partial buffers. It's a technique which in combination with looking at the first 3 characters of the arguments to printk() would let you elide 99% of the non-critical printks pretty easily while only needing to change the much smaller proportion of the printk()s which are partial lines. Furthermore it's pretty easy to grep for the partial-line printk()s and you can even have it emit warnings when you hit a partial-line printk() (it doesn't start with "<"[0-9]">") in -mm to help fix up the last few users and keep people from adding new ones. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [50/50] x86_64: Remove fpu io port resource
Andi Kleen wrote: Not needed on modern systems without external FPU TBD on i386 it is only needed for true 386s. Could remove it there TBD for >= 486 Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> --- arch/x86_64/kernel/setup.c |2 -- 1 file changed, 2 deletions(-) Index: linux/arch/x86_64/kernel/setup.c === --- linux.orig/arch/x86_64/kernel/setup.c +++ linux/arch/x86_64/kernel/setup.c @@ -121,8 +121,6 @@ struct resource standard_io_resources[] .flags = IORESOURCE_BUSY | IORESOURCE_IO }, { .name = "dma2", .start = 0xc0, .end = 0xdf, .flags = IORESOURCE_BUSY | IORESOURCE_IO }, - { .name = "fpu", .start = 0xf0, .end = 0xff, - .flags = IORESOURCE_BUSY | IORESOURCE_IO } Since we are merging x86 and x86-64, I think it would be nice at least to CC Thomas on patches that increase 32/64-bit differences... because won't this patch have to be partial un-done when we merge i386 and x86-64? Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux 2.6.22.7
We (the -stable team) are announcing the release of the 2.6.22.7 kernel. It contains a single security bugfix for the x86_64 architecture. There is potential for local privilege escalation, so all x86_64 users are certainly encouraged to upgrade. CVE-2007-4573: x86_64: Zero extend all registers after ptrace in 32bit entry path. I'll also be replying to this message with a copy of the patch between 2.6.22.6 and 2.6.22.7 The updated 2.6.22.y git tree can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.22.y.git and can be browsed at the normal kernel.org git web browser: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.22.y.git;a=summary thanks, -chris Makefile |2 +- arch/x86_64/ia32/ia32entry.S | 18 +++--- arch/x86_64/kernel/ptrace.c |4 3 files changed, 16 insertions(+), 8 deletions(-) Summary of changes from v2.6.22.6 to v2.6.22.7 == Andi Kleen (1): x86_64: Zero extend all registers after ptrace in 32bit entry path. Chris Wright (1): Linux 2.6.22.7 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [20/45] x86_64: Use 8 byte stack alignment when possible
On Sat, Sep 22, 2007 at 12:34:31AM +0200, Andi Kleen wrote: > On Friday 21 September 2007 23:13, Dave Jones wrote: > > On Fri, Sep 21, 2007 at 10:45:02PM +0200, Andi Kleen wrote: > > > Kernel doesn't use SSE2, so it doesn't need 16 byte alignment. Also > > > the stack can be already unaligned so letting the compiler align > > > is useless. This may make some stack frames smaller. > > > Only works with very recent gcc 4.3 > > > > My gcc 4.1.2 from Fedora 7 (with who knows what backported) > > references this in its manpage. How was it broken before 4.3 ? > > Try it. It is rejected by the compiler in 64bit mode. Ah yes, it fails if not between 4 & 12, but the call cc-option catches that. Looks fine to me. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
Denys Vlasenko wrote: On Friday 21 September 2007 20:33, Krzysztof Oledzki wrote: On Fri, 21 Sep 2007, Denys Vlasenko wrote: On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote: On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said: I plan to use gzip compression on following drivers' firmware, if patches will be accepted: textdata bss dec hex filename 17653 109968 240 127861 1f375 drivers/net/acenic.o 6628 120448 4 127080 1f068 drivers/net/dgrs.o ^^ Should this be redone to use the existing firmware loading framework to load the firmware instead? Not in every case. For example, bnx2 maintainer says that driver and firmware are closely tied for his driver. IOW: you upgrade kernel and your NIC is not working anymore. Firmware may come with a kernel. We have a "install modules", we can also add "install firmware". Install where? I boot my machine over NFS, and it has no hard drive. Special cases already fail when using distro-linked targets like "make install." Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] x86: Convert cpuinfo_x86 array to a per_cpu array v2
On Thu, 20 Sep 2007 14:30:05 -0700 [EMAIL PROTECTED] wrote: > cpu_data is currently an array defined using NR_CPUS. This means that > we overallocate since we will rarely really use maximum configured cpus. > When NR_CPU count is raised to 4096 the size of cpu_data becomes > 3,145,728 bytes. This has at least three quite obvious and careless compilation errors. Please at least compile the code after you've altered it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 3/9] RCU: Preemptible RCU
On Fri, Sep 21, 2007 at 06:31:12PM -0400, Steven Rostedt wrote: > On Fri, Sep 21, 2007 at 05:46:53PM +0200, Peter Zijlstra wrote: > > On Fri, 21 Sep 2007 10:40:03 -0400 Steven Rostedt <[EMAIL PROTECTED]> > > wrote: > > > > > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote: > > > > > > > Can you have a pointer somewhere that explains these states. And not a > > > "it's in this paper or directory". Either have a short discription here, > > > or specify where exactly to find the information (perhaps a > > > Documentation/RCU/preemptible_states.txt?). > > > > > > Trying to understand these states has caused me the most agony in > > > reviewing these patches. > > > > > > > + */ > > > > + > > > > +enum rcu_try_flip_states { > > > > + rcu_try_flip_idle_state,/* "I" */ > > > > + rcu_try_flip_waitack_state, /* "A" */ > > > > + rcu_try_flip_waitzero_state,/* "Z" */ > > > > + rcu_try_flip_waitmb_state /* "M" */ > > > > +}; > > > > I thought the 4 flip states corresponded to the 4 GP stages, but now > > you confused me. It seems to indeed progress one stage for every 4 flip > > states. > > I'm still confused ;-) If you do a synchronize_rcu() it might well have to wait through the following sequence of states: Stage 0: (might have to wait through part of this to get out of "next" queue) rcu_try_flip_idle_state,/* "I" */ rcu_try_flip_waitack_state, /* "A" */ rcu_try_flip_waitzero_state,/* "Z" */ rcu_try_flip_waitmb_state /* "M" */ Stage 1: rcu_try_flip_idle_state,/* "I" */ rcu_try_flip_waitack_state, /* "A" */ rcu_try_flip_waitzero_state,/* "Z" */ rcu_try_flip_waitmb_state /* "M" */ Stage 2: rcu_try_flip_idle_state,/* "I" */ rcu_try_flip_waitack_state, /* "A" */ rcu_try_flip_waitzero_state,/* "Z" */ rcu_try_flip_waitmb_state /* "M" */ Stage 3: rcu_try_flip_idle_state,/* "I" */ rcu_try_flip_waitack_state, /* "A" */ rcu_try_flip_waitzero_state,/* "Z" */ rcu_try_flip_waitmb_state /* "M" */ Stage 4: rcu_try_flip_idle_state,/* "I" */ rcu_try_flip_waitack_state, /* "A" */ rcu_try_flip_waitzero_state,/* "Z" */ rcu_try_flip_waitmb_state /* "M" */ So yes, grace periods do indeed have some latency. > > Hmm, now I have to puzzle how these 4 stages are required by the lock > > and unlock magic. > > > > > > +/* > > > > + * Return the number of RCU batches processed thus far. Useful for > > > > debug > > > > + * and statistics. The _bh variant is identical to straight RCU. > > > > + */ > > > > > > If they are identical, then why the separation? > > > > I guess a smaller RCU domain makes for quicker grace periods. > > No, I mean that both the rcu_batches_completed and > rcu_batches_completed_bh are identical. Perhaps we can just put in a > > #define rcu_batches_completed_bh rcu_batches_completed > > in rcupreempt.h. In rcuclassic, they are different. But no need to have > two identical functions in the preempt version. A macro should do. Ah!!! Good point, #define does make sense here. > > > > +void __rcu_read_lock(void) > > > > +{ > > > > + int idx; > > > > + struct task_struct *me = current; > > > > > > Nitpick, but other places in the kernel usually use "t" or "p" as a > > > variable to assign current to. It's just that "me" thows me off a > > > little while reviewing this. But this is just a nitpick, so do as you > > > will. > > > > struct task_struct *curr = current; > > > > is also not uncommon. > > True, but the "me" confused me. Since that task struct is not me ;-) Well, who is it, then? ;-) > > > > + int nesting; > > > > + > > > > + nesting = ORDERED_WRT_IRQ(me->rcu_read_lock_nesting); > > > > + if (nesting != 0) { > > > > + > > > > + /* An earlier rcu_read_lock() covers us, just count it. > > > > */ > > > > + > > > > + me->rcu_read_lock_nesting = nesting + 1; > > > > + > > > > + } else { > > > > + unsigned long oldirq; > > > > > > > + > > > > + /* > > > > +* Disable local interrupts to prevent the grace-period > > > > +* detection state machine from seeing us half-done. > > > > +* NMIs can still occur, of course, and might themselves > > > > +* contain rcu_read_lock(). > > > > +*/ > > > > + > > > > + local_irq_save(oldirq); > > > > > > Isn't the GP detection done via a tasklet/softirq. So wouldn't a > > > local_bh_disable be sufficient here? You already cover NMIs, which would > > > also handle normal interrupts. > > > > This is also my understanding, but I think this disable is an > > 'optimization' in that it avoids the regular IRQs from jumping through > > these hoops outlined
Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu
On Sat, Sep 22, 2007 at 12:32:02AM +0200, Andi Kleen wrote: > + Select this for: > +Pentiums (Pentium 4, Pentium D, Celeron, Celeron D) corename: > +-Willamette > +-Northwood > +-Mobile Pentium 4 > +-Mobile Pentium 4 M > +-Extreme Edition (Gallatin) > +-Prescott > +-Prescott 2M > +-Cedar Mill > +-Presler > +-Smithfiled > +Xeons (Intel Xeon, Xeon MP, Xeon LV, Xeon MV) corename: > +-Foster > +-Prestonia > +-Gallatin > +-Nocona > +-Irwindale > +-Cranford > +-Potomac > +-Paxville > +-Dempsey This seems like yet another list that will need to be perpetually kept up to date, and given 99% of users don't know the codename of their core, just the marketing name, I question its value. > + more info: http://balusc.xs4all.nl/srv/har-cpu.html This URL is dead already. > config MPSC > bool "Intel P4 / older Netburst based Xeon" > help sidenote: I always wondered what 'PSC' stood for ? Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
> According to an earlier thread, dgrs was never really maintained, > written for hardware that was never really distributed widely, and very > likely hasn't had users in years... if ever. > > If that picture is accurate (it's a story I was told), then I am > definitely queueing up a deletion patch. I think thats sensible. If someone whines it can be put back but I really don't think anyone will - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [6/50] i386: clean up oops/bug reports
On 09/21/2007 06:32 PM, Andi Kleen wrote: > From: Pavel Emelyanov <[EMAIL PROTECTED]> > > Typically the oops first lines look like this: > > BUG: unable to handle kernel NULL pointer dereference at virtual address > > printing eip: > c049dfbd > *pde = > Oops: 0002 [#1] > PREEMPT SMP > ... > > Such output is gained with some ugly if (!nl) printk("\n"); code and > besides being a waste of lines, this is also annoying to read. The > following output looks better (and it is how it looks on x86_64): > > BUG: unable to handle kernel NULL pointer dereference at virtual address > > printing eip: c049dfbd *pde = > Oops: 0002 [#1] PREEMPT SMP > ... > > Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> Reviewed-by: Chuck Ebbert <[EMAIL PROTECTED]> > > --- > > arch/i386/kernel/traps.c | 16 > arch/i386/mm/fault.c | 13 +++-- > 2 files changed, 11 insertions(+), 18 deletions(-) > > Index: linux/arch/i386/kernel/traps.c > === > --- linux.orig/arch/i386/kernel/traps.c > +++ linux/arch/i386/kernel/traps.c > @@ -444,31 +444,23 @@ void die(const char * str, struct pt_reg > local_save_flags(flags); > > if (++die.lock_owner_depth < 3) { > - int nl = 0; > unsigned long esp; > unsigned short ss; > > report_bug(regs->eip, regs); > > - printk(KERN_EMERG "%s: %04lx [#%d]\n", str, err & 0x, > ++die_counter); > + printk(KERN_EMERG "%s: %04lx [#%d] ", str, err & 0x, > ++die_counter); > #ifdef CONFIG_PREEMPT > - printk(KERN_EMERG "PREEMPT "); > - nl = 1; > + printk("PREEMPT "); > #endif > #ifdef CONFIG_SMP > - if (!nl) > - printk(KERN_EMERG); > printk("SMP "); > - nl = 1; > #endif > #ifdef CONFIG_DEBUG_PAGEALLOC > - if (!nl) > - printk(KERN_EMERG); > printk("DEBUG_PAGEALLOC"); > - nl = 1; > #endif > - if (nl) > - printk("\n"); > + printk("\n"); > + > if (notify_die(DIE_OOPS, str, regs, err, > current->thread.trap_no, SIGSEGV) != > NOTIFY_STOP) { > Index: linux/arch/i386/mm/fault.c > === > --- linux.orig/arch/i386/mm/fault.c > +++ linux/arch/i386/mm/fault.c > @@ -544,23 +544,22 @@ no_context: > printk(KERN_ALERT "BUG: unable to handle kernel paging" > " request"); > printk(" at virtual address %08lx\n",address); > - printk(KERN_ALERT " printing eip:\n"); > - printk("%08lx\n", regs->eip); > + printk(KERN_ALERT "printing eip: %08lx ", regs->eip); > > page = read_cr3(); > page = ((__typeof__(page) *) __va(page))[address >> > PGDIR_SHIFT]; > #ifdef CONFIG_X86_PAE > - printk(KERN_ALERT "*pdpt = %016Lx\n", page); > + printk("*pdpt = %016Lx ", page); > if ((page >> PAGE_SHIFT) < max_low_pfn > && page & _PAGE_PRESENT) { > page &= PAGE_MASK; > page = ((__typeof__(page) *) __va(page))[(address >> > PMD_SHIFT) >& > (PTRS_PER_PMD - 1)]; > - printk(KERN_ALERT "*pde = %016Lx\n", page); > + printk(KERN_ALERT "*pde = %016Lx ", page); > page &= ~_PAGE_NX; > } > #else > - printk(KERN_ALERT "*pde = %08lx\n", page); > + printk("*pde = %08lx ", page); > #endif > > /* > @@ -574,8 +573,10 @@ no_context: > page &= PAGE_MASK; > page = ((__typeof__(page) *) __va(page))[(address >> > PAGE_SHIFT) >& > (PTRS_PER_PTE - 1)]; > - printk(KERN_ALERT "*pte = %0*Lx\n", sizeof(page)*2, > (u64)page); > + printk("*pte = %0*Lx ", sizeof(page)*2, (u64)page); > } > + > + printk("\n"); > } > > tsk->thread.cr2 = address; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
On Friday 21 September 2007 20:33, Krzysztof Oledzki wrote: > > On Fri, 21 Sep 2007, Denys Vlasenko wrote: > > > On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote: > >> On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said: > >> > >>> I plan to use gzip compression on following drivers' firmware, > >>> if patches will be accepted: > >>> > >>>textdata bss dec hex filename > >>> 17653 109968 240 127861 1f375 drivers/net/acenic.o > >>>6628 120448 4 127080 1f068 drivers/net/dgrs.o > >>> ^^ > >> > >> Should this be redone to use the existing firmware loading framework to > >> load the firmware instead? > > > > Not in every case. > > > > For example, bnx2 maintainer says that driver and > > firmware are closely tied for his driver. IOW: you upgrade kernel > > and your NIC is not working anymore. > > Firmware may come with a kernel. We have a "install modules", we can also > add "install firmware". Install where? I boot my machine over NFS, and it has no hard drive. > > Another argument is to make kernel be able to bring up NICs > > without needing firmware images in initramfs/initrd/hard drive. > > It is not possible to bring up things like FC or WiFi without firmware, > what special is in classic NICs? Nothing. It is just not (yet?) decreed from The Very Top that all and every firmware image should be loaded using request_firmware(). Also people may want to gzip something else than firmware. -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
On Friday 21 September 2007 21:13, Andi Kleen wrote: > Denys Vlasenko <[EMAIL PROTECTED]> writes: > > > > I plan to use gzip compression on following drivers' firmware, > > if patches will be accepted: > > > >textdata bss dec hex filename > > 17653 109968 240 127861 1f375 drivers/net/acenic.o > >6628 120448 4 127080 1f068 drivers/net/dgrs.o > > ^^ > > Just change the makefiles to always install gzip'ed modules > modutils knows how to unzip them on the fly. But I compile net/* into bzImage. I like netbooting :) -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Message codes (Re: [Announce] Linux-tiny project revival)
>-Original Message- >From: Joe Perches [mailto:[EMAIL PROTECTED] >Sent: Friday, September 21, 2007 3:33 PM >To: Gross, Mark >Cc: Rob Landley; Oleg Verych; Alexey Dobriyan; Michael Opdenacker; linux- >[EMAIL PROTECTED]; CE Linux Developers List; linux kernel >Subject: RE: Message codes (Re: [Announce] Linux-tiny project revival) > >On Fri, 2007-09-21 at 15:12 -0700, Gross, Mark wrote: >> Use compiler tricks to remove ALL the static printk string from >> the kernel and replace the printk with something that outputs a >> decimal index followed by tuples, of zero to N, hex-strings on > >> I proposed a mechanism for keeping all the printk data and saving space >> buy doing some table based compressions that has the side effect of >> making the syslog not human readable. You proposed a mechanism for >> no-oping out complete log-levels. > >How about compiler tricks to compress the static printk strings? >These could be expanded at runtime to use as the format. You would have to hold the text table (compressed) in memory to do this at run time. That would still be pretty large hunk of memory. > >Timothy Miller suggested something similar awhile ago. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Message codes (Re: [Announce] Linux-tiny project revival)
On Fri, 2007-09-21 at 15:12 -0700, Gross, Mark wrote: > Use compiler tricks to remove ALL the static printk string from > the kernel and replace the printk with something that outputs a > decimal index followed by tuples, of zero to N, hex-strings on > I proposed a mechanism for keeping all the printk data and saving space > buy doing some table based compressions that has the side effect of > making the syslog not human readable. You proposed a mechanism for > no-oping out complete log-levels. How about compiler tricks to compress the static printk strings? These could be expanded at runtime to use as the format. Timothy Miller suggested something similar awhile ago. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [20/45] x86_64: Use 8 byte stack alignment when possible
On Friday 21 September 2007 23:13, Dave Jones wrote: > On Fri, Sep 21, 2007 at 10:45:02PM +0200, Andi Kleen wrote: > > Kernel doesn't use SSE2, so it doesn't need 16 byte alignment. Also > > the stack can be already unaligned so letting the compiler align > > is useless. This may make some stack frames smaller. > > Only works with very recent gcc 4.3 > > My gcc 4.1.2 from Fedora 7 (with who knows what backported) > references this in its manpage. How was it broken before 4.3 ? Try it. It is rejected by the compiler in 64bit mode. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [50/50] x86_64: Remove fpu io port resource
Not needed on modern systems without external FPU TBD on i386 it is only needed for true 386s. Could remove it there TBD for >= 486 Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> --- arch/x86_64/kernel/setup.c |2 -- 1 file changed, 2 deletions(-) Index: linux/arch/x86_64/kernel/setup.c === --- linux.orig/arch/x86_64/kernel/setup.c +++ linux/arch/x86_64/kernel/setup.c @@ -121,8 +121,6 @@ struct resource standard_io_resources[] .flags = IORESOURCE_BUSY | IORESOURCE_IO }, { .name = "dma2", .start = 0xc0, .end = 0xdf, .flags = IORESOURCE_BUSY | IORESOURCE_IO }, - { .name = "fpu", .start = 0xf0, .end = 0xff, - .flags = IORESOURCE_BUSY | IORESOURCE_IO } }; #define IORESOURCE_RAM (IORESOURCE_BUSY | IORESOURCE_MEM) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] pcmcia: Convert io_req_t to use kio_addr_t
On Fri, 21 Sep 2007 17:15:16 -0500 Olof Johansson <[EMAIL PROTECTED]> wrote: > Convert the io_req_t members to kio_addr_t, to allow use on machines with > more than 16 bits worth of IO ports (i.e. secondary busses on ppc64, etc). What about the formatting and field widths ? ulong would probably be a lot saner than kio_addr_t and yet more type obfuscation. Otherwise looks sensible to me - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
Alan Cox wrote: For example, bnx2 maintainer says that driver and firmware are closely tied for his driver. IOW: you upgrade kernel and your NIC is not working anymore. Another argument is to make kernel be able to bring up NICs without needing firmware images in initramfs/initrd/hard drive. dgrs should be using the request_firmware interface. Actually dgrs is probably a good candidate for /dev/null According to an earlier thread, dgrs was never really maintained, written for hardware that was never really distributed widely, and very likely hasn't had users in years... if ever. If that picture is accurate (it's a story I was told), then I am definitely queueing up a deletion patch. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [48/50] x86_64: return correct error code from child_rip in x86_64 entry.S
From: Andrey Mirkin <[EMAIL PROTECTED]> Right now register edi is just cleared before calling do_exit. That is wrong because correct return value will be ignored. Value from rax should be copied to rdi instead of clearing edi. AK: changed to 32bit move because it's strictly an int Signed-off-by: Andrey Mirkin <[EMAIL PROTECTED]> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> - --- arch/x86_64/kernel/entry.S |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux/arch/x86_64/kernel/entry.S === --- linux.orig/arch/x86_64/kernel/entry.S +++ linux/arch/x86_64/kernel/entry.S @@ -989,7 +989,7 @@ child_rip: movq %rsi, %rdi call *%rax # exit - xorl %edi, %edi + mov %eax, %edi call do_exit CFI_ENDPROC ENDPROC(child_rip) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [49/50] x86_64: Initialize 64bit registers for a.out executables
Previously the data from before the exec was kept in there. Zero them instead Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> --- arch/x86_64/ia32/ia32_aout.c |2 ++ 1 file changed, 2 insertions(+) Index: linux/arch/x86_64/ia32/ia32_aout.c === --- linux.orig/arch/x86_64/ia32/ia32_aout.c +++ linux/arch/x86_64/ia32/ia32_aout.c @@ -422,6 +422,8 @@ beyond_if: (regs)->eflags = 0x200; (regs)->cs = __USER32_CS; (regs)->ss = __USER32_DS; + regs->r8 = regs->r9 = regs->r10 = regs->r11 = + regs->r12 = regs->r13 = regs->r14 = regs->r15 = 0; set_fs(USER_DS); if (unlikely(current->ptrace & PT_PTRACED)) { if (current->ptrace & PT_TRACE_EXEC) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [46/50] x86: also show non-zero IRQ counts for vectors that currently don't have a handler
From: "Jan Beulich" <[EMAIL PROTECTED]> It doesn't seem to make sense to hide these, even if their counts can't change at the point in time they're being displayed. Signed-off-by: Jan Beulich <[EMAIL PROTECTED]> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> arch/i386/kernel/irq.c | 18 ++ arch/x86_64/kernel/irq.c | 18 ++ 2 files changed, 28 insertions(+), 8 deletions(-) Index: linux/arch/i386/kernel/irq.c === --- linux.orig/arch/i386/kernel/irq.c +++ linux/arch/i386/kernel/irq.c @@ -259,9 +259,17 @@ int show_interrupts(struct seq_file *p, } if (i < NR_IRQS) { + unsigned any_count = 0; + spin_lock_irqsave(&irq_desc[i].lock, flags); +#ifndef CONFIG_SMP + any_count = kstat_irqs(i); +#else + for_each_online_cpu(j) + any_count |= kstat_cpu(j).irqs[i]; +#endif action = irq_desc[i].action; - if (!action) + if (!action && !any_count) goto skip; seq_printf(p, "%3d: ",i); #ifndef CONFIG_SMP @@ -272,10 +280,12 @@ int show_interrupts(struct seq_file *p, #endif seq_printf(p, " %8s", irq_desc[i].chip->name); seq_printf(p, "-%-8s", irq_desc[i].name); - seq_printf(p, " %s", action->name); - for (action=action->next; action; action = action->next) - seq_printf(p, ", %s", action->name); + if (action) { + seq_printf(p, " %s", action->name); + while ((action = action->next) != NULL) + seq_printf(p, ", %s", action->name); + } seq_putc(p, '\n'); skip: Index: linux/arch/x86_64/kernel/irq.c === --- linux.orig/arch/x86_64/kernel/irq.c +++ linux/arch/x86_64/kernel/irq.c @@ -64,9 +64,17 @@ int show_interrupts(struct seq_file *p, } if (i < NR_IRQS) { + unsigned any_count = 0; + spin_lock_irqsave(&irq_desc[i].lock, flags); +#ifndef CONFIG_SMP + any_count = kstat_irqs(i); +#else + for_each_online_cpu(j) + any_count |= kstat_cpu(j).irqs[i]; +#endif action = irq_desc[i].action; - if (!action) + if (!action && !any_count) goto skip; seq_printf(p, "%3d: ",i); #ifndef CONFIG_SMP @@ -78,9 +86,11 @@ int show_interrupts(struct seq_file *p, seq_printf(p, " %8s", irq_desc[i].chip->name); seq_printf(p, "-%-8s", irq_desc[i].name); - seq_printf(p, " %s", action->name); - for (action=action->next; action; action = action->next) - seq_printf(p, ", %s", action->name); + if (action) { + seq_printf(p, " %s", action->name); + while ((action = action->next) != NULL) + seq_printf(p, ", %s", action->name); + } seq_putc(p, '\n'); skip: spin_unlock_irqrestore(&irq_desc[i].lock, flags); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [47/50] i386: avoid temporarily inconsistent pte-s
From: "Jan Beulich" <[EMAIL PROTECTED]> One more of these issues (which were considered fixed a few releases back): Other than on x86-64, i386 allows set_fixmap() to replace already present mappings. Consequently, on PAE, care must be taken to not update the high half of a pte while the low half is still holding the old value. Signed-off-by: Jan Beulich <[EMAIL PROTECTED]> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> arch/i386/mm/pgtable.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) Index: linux/arch/i386/mm/pgtable.c === --- linux.orig/arch/i386/mm/pgtable.c +++ linux/arch/i386/mm/pgtable.c @@ -97,8 +97,7 @@ static void set_pte_pfn(unsigned long va } pte = pte_offset_kernel(pmd, vaddr); if (pgprot_val(flags)) - /* stored as-is, to permit clearing entries */ - set_pte(pte, pfn_pte(pfn, flags)); + set_pte_present(&init_mm, vaddr, pte, pfn_pte(pfn, flags)); else pte_clear(&init_mm, vaddr, pte); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/