date:20070921

Re: [PATCH] [19/50] Experimental: detect if SVM is disabled by BIOS

2007-09-21 Thread Sam Ravnborg

On Sat, Sep 22, 2007 at 12:32:18AM +0200, Andi Kleen wrote:
> 
> Also allow to set svm lock.
> 
> TBD double check, documentation, i386 support
> 
> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

Could we have this patch tagged with x86 instead of "Experimental" in subject.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources

2007-09-21 Thread Yinghai Lu

On 9/21/07, Yinghai Lu <[EMAIL PROTECTED]> wrote:
> On 9/21/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
> >
> > From: Robert Hancock <[EMAIL PROTECTED]>
> >
> > This path adds validation of the MMCONFIG table against the ACPI reserved
> > motherboard resources.  If the MMCONFIG table is found to be reserved in
> > ACPI, we don't bother checking the E820 table.  The PCI Express firmware
> > spec apparently tells BIOS developers that reservation in ACPI is required
> > and E820 reservation is optional, so checking against ACPI first makes
> > sense.  Many BIOSes don't reserve the MMCONFIG region in E820 even though
> > it is perfectly functional, the existing check needlessly disables MMCONFIG
> > in these cases.
> >
> > In order to do this, MMCONFIG setup has been split into two phases.  If PCI
> > configuration type 1 is not available then MMCONFIG is enabled early as
> > before.  Otherwise, it is enabled later after the ACPI interpreter is
> > enabled, since we need to be able to execute control methods in order to
> > check the ACPI reserved resources.  Presently this is just triggered off
> > the end of ACPI interpreter initialization.
> >
> > There are a few other behavioral changes here:
> >
> > - Validate all MMCONFIG configurations provided, not just the first one.
> >
> > - Validate the entire required length of each configuration according to
> >   the provided ending bus number is reserved, not just the minimum required
> >   allocation.
> >
> > - Validate that the area is reserved even if we read it from the chipset
> >   directly and not from the MCFG table.  This catches the case where the
> >   BIOS didn't set the location properly in the chipset and has mapped it
> >   over other things it shouldn't have.
> >
> > This also cleans up the MMCONFIG initialization functions so that they
> > simply do nothing if MMCONFIG is not compiled in.
> >
> > Based on an original patch by Rajesh Shah from Intel.
> >
> > [EMAIL PROTECTED]: many fixes and cleanups]
> > Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>
> > Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
> > Cc: Rajesh Shah <[EMAIL PROTECTED]>
> > Cc: Jesse Barnes <[EMAIL PROTECTED]>
> > Acked-by: Linus Torvalds <[EMAIL PROTECTED]>
> > Cc: Andi Kleen <[EMAIL PROTECTED]>
> > Cc: Greg KH <[EMAIL PROTECTED]>
> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>

Also the titile is misleading: it is x86 instead of i386.. because it
will affect x86_64 too.

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu

2007-09-21 Thread Sam Ravnborg

On Fri, Sep 21, 2007 at 06:45:39PM -0400, Dave Jones wrote:
> On Sat, Sep 22, 2007 at 12:32:02AM +0200, Andi Kleen wrote:
> 
> 
>  > +Select this for:
>  > +  Pentiums (Pentium 4, Pentium D, Celeron, Celeron D) corename:
>  > +  -Willamette
>  > +  -Northwood
>  > +  -Mobile Pentium 4
>  > +  -Mobile Pentium 4 M
>  > +  -Extreme Edition (Gallatin)
>  > +  -Prescott
>  > +  -Prescott 2M
>  > +  -Cedar Mill
>  > +  -Presler
>  > +  -Smithfiled
>  > +  Xeons (Intel Xeon, Xeon MP, Xeon LV, Xeon MV) corename:
>  > +  -Foster
>  > +  -Prestonia
>  > +  -Gallatin
>  > +  -Nocona
>  > +  -Irwindale
>  > +  -Cranford
>  > +  -Potomac
>  > +  -Paxville
>  > +  -Dempsey
> 
> This seems like yet another list that will need to be perpetually
> kept up to date, and given 99% of users don't know the codename
> of their core, just the marketing name, I question its value.

As a bare minimum requirement the list presented here shall use same
names as used in /proc/cpuinfo

On this box I read:

vendor_id   : GenuineIntel
model name  : Pentium III (Coppermine)

This info must be present in Kconfig text (help text) too.
I always have trouble selecting the right CPU before so I welcome this patch
that give me more info - and maybe a bit too much.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1: Build failures on ppc64_defconfig

2007-09-21 Thread Satyam Sharma



On Thu, 20 Sep 2007, Satyam Sharma wrote:
> 
> BTW ppc64_defconfig didn't quite like 2.6.23-rc6-mm1 either ...
> IIRC I got build failures in:

> drivers/net/spider_net.c


[PATCH -mm] spider_net: Misc build fixes after recent netdev stats changes

Unbreak the following:

drivers/net/spider_net.c: In function 'spider_net_release_tx_chain':
drivers/net/spider_net.c:818: error: 'dev' undeclared (first use in this 
function)
drivers/net/spider_net.c:818: error: (Each undeclared identifier is reported 
only once
drivers/net/spider_net.c:818: error: for each function it appears in.)
drivers/net/spider_net.c: In function 'spider_net_xmit':
drivers/net/spider_net.c:922: error: 'dev' undeclared (first use in this 
function)
drivers/net/spider_net.c: In function 'spider_net_pass_skb_up':
drivers/net/spider_net.c:1018: error: 'dev' undeclared (first use in this 
function)
drivers/net/spider_net.c: In function 'spider_net_decode_one_descr':
drivers/net/spider_net.c:1215: error: 'dev' undeclared (first use in this 
function)
make[2]: *** [drivers/net/spider_net.o] Error 1

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 drivers/net/spider_net.c |   24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff -ruNp a/drivers/net/spider_net.c b/drivers/net/spider_net.c
--- a/drivers/net/spider_net.c  2007-09-22 06:26:39.0 +0530
+++ b/drivers/net/spider_net.c  2007-09-22 12:12:23.0 +0530
@@ -795,6 +795,7 @@ spider_net_set_low_watermark(struct spid
 static int
 spider_net_release_tx_chain(struct spider_net_card *card, int brutal)
 {
+   struct net_device *dev = card->netdev;
struct spider_net_descr_chain *chain = &card->tx_chain;
struct spider_net_descr *descr;
struct spider_net_hw_descr *hwdescr;
@@ -919,7 +920,7 @@ spider_net_xmit(struct sk_buff *skb, str
spider_net_release_tx_chain(card, 0);
 
if (spider_net_prepare_tx_descr(card, skb) != 0) {
-   dev->stats.tx_dropped++;
+   netdev->stats.tx_dropped++;
netif_stop_queue(netdev);
return NETDEV_TX_BUSY;
}
@@ -979,16 +980,12 @@ static void
 spider_net_pass_skb_up(struct spider_net_descr *descr,
   struct spider_net_card *card)
 {
-   struct spider_net_hw_descr *hwdescr= descr->hwdescr;
-   struct sk_buff *skb;
-   struct net_device *netdev;
-   u32 data_status, data_error;
-
-   data_status = hwdescr->data_status;
-   data_error = hwdescr->data_error;
-   netdev = card->netdev;
+   struct spider_net_hw_descr *hwdescr = descr->hwdescr;
+   struct sk_buff *skb = descr->skb;
+   struct net_device *netdev = card->netdev;
+   u32 data_status = hwdescr->data_status;
+   u32 data_error = hwdescr->data_error;
 
-   skb = descr->skb;
skb_put(skb, hwdescr->valid_size);
 
/* the card seems to add 2 bytes of junk in front
@@ -1015,8 +1012,8 @@ spider_net_pass_skb_up(struct spider_net
}
 
/* update netdevice statistics */
-   dev->stats.rx_packets++;
-   dev->stats.rx_bytes += skb->len;
+   netdev->stats.rx_packets++;
+   netdev->stats.rx_bytes += skb->len;
 
/* pass skb up to stack */
netif_receive_skb(skb);
@@ -1184,6 +1181,7 @@ static int spider_net_resync_tail_ptr(st
 static int
 spider_net_decode_one_descr(struct spider_net_card *card)
 {
+   struct net_device *dev = card->netdev;
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *descr = chain->tail;
struct spider_net_hw_descr *hwdescr = descr->hwdescr;
@@ -1210,7 +1208,7 @@ spider_net_decode_one_descr(struct spide
 (status == SPIDER_NET_DESCR_PROTECTION_ERROR) ||
 (status == SPIDER_NET_DESCR_FORCE_END) ) {
if (netif_msg_rx_err(card))
-   dev_err(&card->netdev->dev,
+   dev_err(&dev->dev,
   "dropping RX descriptor with state %d\n", 
status);
dev->stats.rx_dropped++;
goto bad_desc;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources

2007-09-21 Thread Yinghai Lu

On 9/21/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> From: Robert Hancock <[EMAIL PROTECTED]>
>
> This path adds validation of the MMCONFIG table against the ACPI reserved
> motherboard resources.  If the MMCONFIG table is found to be reserved in
> ACPI, we don't bother checking the E820 table.  The PCI Express firmware
> spec apparently tells BIOS developers that reservation in ACPI is required
> and E820 reservation is optional, so checking against ACPI first makes
> sense.  Many BIOSes don't reserve the MMCONFIG region in E820 even though
> it is perfectly functional, the existing check needlessly disables MMCONFIG
> in these cases.
>
> In order to do this, MMCONFIG setup has been split into two phases.  If PCI
> configuration type 1 is not available then MMCONFIG is enabled early as
> before.  Otherwise, it is enabled later after the ACPI interpreter is
> enabled, since we need to be able to execute control methods in order to
> check the ACPI reserved resources.  Presently this is just triggered off
> the end of ACPI interpreter initialization.
>
> There are a few other behavioral changes here:
>
> - Validate all MMCONFIG configurations provided, not just the first one.
>
> - Validate the entire required length of each configuration according to
>   the provided ending bus number is reserved, not just the minimum required
>   allocation.
>
> - Validate that the area is reserved even if we read it from the chipset
>   directly and not from the MCFG table.  This catches the case where the
>   BIOS didn't set the location properly in the chipset and has mapped it
>   over other things it shouldn't have.
>
> This also cleans up the MMCONFIG initialization functions so that they
> simply do nothing if MMCONFIG is not compiled in.
>
> Based on an original patch by Rajesh Shah from Intel.
>
> [EMAIL PROTECTED]: many fixes and cleanups]
> Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>
> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
> Cc: Rajesh Shah <[EMAIL PROTECTED]>
> Cc: Jesse Barnes <[EMAIL PROTECTED]>
> Acked-by: Linus Torvalds <[EMAIL PROTECTED]>
> Cc: Andi Kleen <[EMAIL PROTECTED]>
> Cc: Greg KH <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> ---
>
>  arch/i386/pci/init.c|4 -
>  arch/i386/pci/mmconfig-shared.c |  151 
> +++-
>  arch/i386/pci/pci.h |1
>  drivers/acpi/bus.c  |2
>  include/linux/pci.h |8 ++
>  5 files changed, 144 insertions(+), 22 deletions(-)
>
> Index: linux/arch/i386/pci/init.c
> ===
> --- linux.orig/arch/i386/pci/init.c
> +++ linux/arch/i386/pci/init.c
> @@ -11,9 +11,7 @@ static __init int pci_access_init(void)
>  #ifdef CONFIG_PCI_DIRECT
> type = pci_direct_probe();
>  #endif
> -#ifdef CONFIG_PCI_MMCONFIG
> -   pci_mmcfg_init(type);
> -#endif
> +   pci_mmcfg_early_init(type);
> if (raw_pci_ops)
> return 0;
>  #ifdef CONFIG_PCI_BIOS
> Index: linux/arch/i386/pci/mmconfig-shared.c
> ===
> --- linux.orig/arch/i386/pci/mmconfig-shared.c
> +++ linux/arch/i386/pci/mmconfig-shared.c
> @@ -206,9 +206,78 @@ static void __init pci_mmcfg_insert_reso
> pci_mmcfg_resources_inserted = 1;
>  }
>
> -static void __init pci_mmcfg_reject_broken(int type)
> +static acpi_status __init check_mcfg_resource(struct acpi_resource *res,
> + void *data)
> +{
> +   struct resource *mcfg_res = data;
> +   struct acpi_resource_address64 address;
> +   acpi_status status;
> +
> +   if (res->type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) {
> +   struct acpi_resource_fixed_memory32 *fixmem32 =
> +   &res->data.fixed_memory32;
> +   if (!fixmem32)
> +   return AE_OK;
> +   if ((mcfg_res->start >= fixmem32->address) &&
> +   (mcfg_res->end < (fixmem32->address +
> + fixmem32->address_length))) {
> +   mcfg_res->flags = 1;
> +   return AE_CTRL_TERMINATE;
> +   }
> +   }
> +   if ((res->type != ACPI_RESOURCE_TYPE_ADDRESS32) &&
> +   (res->type != ACPI_RESOURCE_TYPE_ADDRESS64))
> +   return AE_OK;
> +
> +   status = acpi_resource_to_address64(res, &address);
> +   if (ACPI_FAILURE(status) ||
> +  (address.address_length <= 0) ||
> +  (address.resource_type != ACPI_MEMORY_RANGE))
> +   return AE_OK;
> +
> +   if ((mcfg_res->start >= address.minimum) &&
> +   (mcfg_res->end < (address.minimum + address.address_length))) {
> +   mcfg_res->flags = 1;
> +   return AE_CTRL_TERMINATE;
> +   }
> +   return AE_OK;
> +}
> +
> +static acpi_status __init find_mboard_resource(acpi_handle handle, u

Re: 2.6.23-rc6-mm1: Build failures on ppc64_defconfig

2007-09-21 Thread Satyam Sharma

On Thu, 20 Sep 2007, Satyam Sharma wrote:
> 
> BTW ppc64_defconfig didn't quite like 2.6.23-rc6-mm1 either ...
> IIRC I got build failures in:

> drivers/md/raid6int8.c

This turned out to be a gcc bug -- I was using an old cross-compiler.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Distributed storage. Security attributes and ducumentation update.

2007-09-21 Thread Pavel Machek

Hi!

> I'm pleased to announce third release of the distributed storage
> subsystem, which allows to form a storage on top of remote and local
> nodes, which in turn can be exported to another storage as a node to
> form tree-like storages.

How is this different from raid0/1 over nbd? Or raid0/1 over
ata-over-ethernet?

> +| DST storate ---|

storage?


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1: Build failures on ppc64_defconfig

2007-09-21 Thread Satyam Sharma



On Thu, 20 Sep 2007, Satyam Sharma wrote:
> 
> BTW ppc64_defconfig didn't quite like 2.6.23-rc6-mm1 either ...
> IIRC I got build failures in:

> drivers/ata/pata_scc.c

http://lkml.org/lkml/2007/9/21/557
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 7/7] Add documentation for extended crashkernel syntax

2007-09-21 Thread Pavel Machek

Hi!

> This adds the documentation for the extended crashkernel syntax into
> Documentation/kdump/kdump.txt.

Should you also update kernel-parameters.txt?

> +For example:
> +
> +crashkernel=512M-2G:64M,2G-:128M
> +
> +This would mean:
> +
> +1) if the RAM is smaller than 512M, then don't reserve anything
> +   (this is the "rescue" case)
> +2) if the RAM size is between 512M and 2G, then reserve 64M
> +3) if the RAM size is larger than 2G, then reserve 128M

Why is this useful? I mean... if 64M is enough to save a dump, why use
128M? ...or does the required size somehow scale with memory in
machine? (pagetables?)
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: clockevents: fix resume logic

2007-09-21 Thread Pavel Machek

Hi!

> > Ok, here we are. The bad one uses C2 which stops the local apic on the
> > VAIO. I suspect we end up in the suspend/resume with going into C2
> > without the broadcast active.
> > 
> > Can you try to get the output of SysRq-Q during the "it needs help from
> > keyboard" period ?
> > 
> 
> That's a bit tricky because hitting the keyboard is what unsticks things. 
> And the video is black after resume-from-RAM (has always been thus) and we

Ok, can we try to fix the video issue for you? That should make the
development easier... I assume you tried s2ram from suspend.sf.net,
and no combination of switches helped?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [NFS] NFS on loopback locks up entire system(2.6.23-rc6)?

2007-09-21 Thread Chakri n

On 9/21/07, Trond Myklebust <[EMAIL PROTECTED]> wrote:
> No. The requirement for 'hard' mounts is not that the server be up all
> the time. The server can go up and down as it pleases: the client can
> happily recover from that.
>
> The requirement is rather that nobody remove it permanently before the
> application is done with it, and the partition is unmounted. That is
> hardly unreasonable (it is the only way I know of to ensure data
> integrity), and it is much less strict than the requirements for local
> disks.

Yes. I completely agree. This is required for data consistency.

But in my testing, if one of the NFS server/mount goes offline for
some point of time, the entire system slows down, especially IO.

In my test program, I forked off 50 threads to do 4K writes on 50
different files in a NFS mounted directory.

Now, I have turned off the NFS server and started another dd process
on local disk ("dd if=/dev/zero of=/tmp/x count=1000") and this dd
process progresses.

I see I/O wait of 100% in vmstat.
procs ---memory-- ---swap-- -io --system--
-cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 0 21  0 2628416  15152 55102400 0 0   28  344  0
0  0 100  0
 0 21  0 2628416  15152 55102400 0 08  340  0
0  0 100  0
 0 21  0 2628416  15152 55102400 0 0   26  343  0
0  0 100  0
 0 21  0 2628416  15152 55102400 0 08  341  0
0  0 100  0
 0 21  0 2628416  15152 55102400 0 0   26  357  0
0  0 100  0
 0 21  0 2628416  15152 55102400 0 08  325  0
0  0 100  0
 0 21  0 2628416  15152 55102400 0 0   26  343  0
0  0 100  0
 0 21  0 2628416  15152 55102400 0 08  325  0
0  0 100  0

I have about 4Gig of RAM in the system and most of the memory is free.
I see only about 550MB in buffers, rest all is pretty much available.

[EMAIL PROTECTED] ~]# free
 total   used   free sharedbuffers cached
Mem:   3238004 6093402628664  0  15136 551024
-/+ buffers/cache:  431803194824
Swap:  4096532  04096532

Here is the stack trace for one of my test program threads and dd
process, both of them are stuck in congestion_wait.
--
PID: 3552   TASK: cb1fc610  CPU: 0   COMMAND: "dd"
 #0 [f5c04c38] schedule at c0624a34
 #1 [f5c04cac] schedule_timeout at c06250ee
 #2 [f5c04cf0] io_schedule_timeout at c0624c15
 #3 [f5c04d04] congestion_wait at c045eb7d
 #4 [f5c04d28] balance_dirty_pages_ratelimited_nr at c045ab91
 #5 [f5c04d7c] generic_file_buffered_write at c0457148
 #6 [f5c04e10] __generic_file_aio_write_nolock at c04576e5
 #7 [f5c04e84] generic_file_aio_write at c0457799
 #8 [f5c04eb4] ext3_file_write at ffd7
 #9 [f5c04ed0] do_sync_write at c0472e27
#10 [f5c04f7c] vfs_write at c0473689
#11 [f5c04f98] sys_write at c0473c95
#12 [f5c04fb4] sysenter_entry at c0404ddf
--
 #0 [f6050c10] schedule at c0624a34
 #1 [f6050c84] schedule_timeout at c06250ee
 #2 [f6050cc8] io_schedule_timeout at c0624c15
 #3 [f6050cdc] congestion_wait at c045eb7d
 #4 [f6050d00] balance_dirty_pages_ratelimited_nr at c045ab91
 #5 [f6050d54] generic_file_buffered_write at c0457148
 #6 [f6050de8] __generic_file_aio_write_nolock at c04576e5
 #7 [f6050e40] enqueue_entity at c042131f
 #8 [f6050e5c] generic_file_aio_write at c0457799
 #9 [f6050e8c] nfs_file_write at f8f90cee
#10 [f6050e9c] getnstimeofday at c043d3f7
#11 [f6050ed0] do_sync_write at c0472e27
#12 [f6050f7c] vfs_write at c0473689
#13 [f6050f98] sys_write at c0473c95
#14 [f6050fb4] sysenter_entry at c0404ddf
---

Can this be worked around, since most of the RAM is available, dd
process could infact find more memory for it's buffers rather than
waiting due to NFS requests. I believe this could be one reason why
file systems like VxFS use their own buffer cache different from
system-wide buffer cache.

Thanks
--Chakri
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] missing null termination in power supply uevent

2007-09-21 Thread Anton Vorontsov

On Thu, Sep 20, 2007 at 12:06:10PM -0700, Stephen Hemminger wrote:
> Need to null terminate environment. Found by inspection
> while looking for similar problems to platform uevent bug
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Much thanks, git-applymbox'ed to battery-2.6.git. I suppose this is
serious enough, thus should hit 2.6.23.

Though prior asking to pull I'll wait just a bit.

Thanks,

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.net/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] pcmcia: Convert io_req_t to use kio_addr_t

2007-09-21 Thread Matthew Wilcox

On Fri, Sep 21, 2007 at 11:39:36PM +0100, Alan Cox wrote:
> On Fri, 21 Sep 2007 17:15:16 -0500
> Olof Johansson <[EMAIL PROTECTED]> wrote:
> 
> > Convert the io_req_t members to kio_addr_t, to allow use on machines with
> > more than 16 bits worth of IO ports (i.e. secondary busses on ppc64, etc).
> 
> What about the formatting and field widths ?
> 
> ulong would probably be a lot saner than kio_addr_t and yet more type
> obfuscation.

I don't think anyone uses ioports > 32bit.  Certainly i386 takes an int
port as parameter to {in,out}[bwl] (and it really only uses 16-bits).
parisc uses 24 bits.  I don't know what the various ppcs do, but pci
bars can only be 32-bit for ioports.  So my opinion is that ioports
should be uint, not ulong.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RTC wakealarm write-only, still has 644 permissions

2007-09-21 Thread David Brownell

On Thursday 20 September 2007, Pavel Machek wrote:
> Hi!
> 
> > ...should they be changed to 200? Or perhaps file should be readable?

No, mode 644 is fine.  No reason to prevent "other" people from
reading the alarm time (is there?) and if you write a legal value,
that will work.  So $SUBJECT is no problem at all.

> > 
> > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm 
> > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# echo 132719 > wakealarm 

At which point I'd expect

# echo $?

would indicate the write failed.  That's a LONG time in the
past (January 2, 1970), so that setting would be rejected.

> > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# ls -al wakealarm 
> > -rw-r--r-- 1 root root 0 Sep 20 12:30 wakealarm
> > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm 
> > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm 

The alarm isn't set; so no value gets displayed.

> > [EMAIL PROTECTED]:/sys/class/rtc/rtc0# 
> > 
> > 
> > ...standard PC with reasonably recent kernel...

Yeah, well a "standard PC" is chock full of fairly bizarrely
glitchey hardware.  Clocks and timers have more than their
fair share, or x86_64 NOHZ support would be merged by now!

> Hmm, something is definitely wrong in here. I sometimes _do_ get
> something back.
> 
> [EMAIL PROTECTED]:~# s2ram
> Switching from vt9 to vt1
> 
> 
> switching back to vt9
> [EMAIL PROTECTED]:~# 
> [EMAIL PROTECTED]:~# 
> [EMAIL PROTECTED]:~# cd /sys/class/rtc/rtc0
> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# ls
> date  dev  device@  name  power/  since_epoch  subsystem@  time
> uevent  wakealarm
> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm 
> 2051629528
> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat power/wakeup 
> 
> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm 
> 2051629528
> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# date +%s
> 1190285030

OK, in that situation you've definitely got some buglike behavior.
My question is:  how to fix it?

The problem is that the RTC is reporting an alarm value with some
fields flagged as "wildcard" -- e.g. day/month/year "out of range"
so the hardware ignores those fields.  This is very common on PC
based RTCs, and much less common on embedded systems.  (Which for
some reason don't tend to cheap out on full date specs like PCs.)

And those cause date reports to look like garbage; /proc/driver/rtc
would show "**" in those fields, rather than trying to display the
canonical "seconds since POSIX epoch" value.  But the wakealarm code
just calls rtc_tm_to_time(), which doesn't validate its fields and
so will gladly spew the garbage you saw.  (On PCs especially.  This
code was originally tested on sane embedded hardware.)

Now, in the /dev/rtcX code there's some code working with a similar
problem:  ioctl(RTC_ALM_SET) morphs partial alarm dates into valid
form before passing them down.  This needs the same kind of fix,
but going in the other direction -- and not always kicking in.  That
could go into either the wakealarm display code, or rtc_read_alarm(),
or maybe someplace else.

I'm not sure which fix would be best; maybe Alessandro has an opinion.

I'd lean towards just fixing the wakealarm display code, except that
would force anyone using that other routine to know about this rude
"wildcard" convention, which is rather hardware-specific... and that's
not really aligned with the goal of an RTC framework that "just works"
without needing to know about such quirks.

> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# echo 1190285050 > wakealarm 

That is, 20 seconds from "now" modulo timezone offsets.

Better might be

echo $(( $(cat since_epoch) + 20 )) > wakealarm

which has no timezone offset issues.

> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# s2ram
> Switching from vt9 to vt1
> 
> 
> switching back to vt9
> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# 
> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# 
> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm 
> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm 

There's some wierdness related to ACPI, that crept in sometime
late in 2.6.21 (or thereabouts) ... where the RTC wake mechanism
got broken by redefining the pm_ops functions, for hibernation
at least.

That MIGHT be related to what you observe here ... unclear what
that was supposed to show.  If the RTC alarm woke that system
after 20 seconds, that's what you requested and all is fine.  If
not, and you had to wake it by hand, then you're seeing that issue
with the redefinition of hibernation ops having borked the RTC wake
mechanism interactions with ACPI.

In both cases, I'd expect that the result is that no alarm is
pending any more, so there's nothing to display.

> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# date +%s
> 1190285229

... which BTW should be what the "since_epoch" file shows,
other than the timezone offsets on some system RTCs.

> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# cat wakealarm 
> [EMAIL PROTECTED]:/sys/class/rtc/rtc0# 
> 
> Also, is there some documentation for wakealarm?

"git show 3925a5ce44

Re: [patches] [PATCH] [12/50] x86_64: Untable __init references between IO data

2007-09-21 Thread Yinghai Lu

On 9/21/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> Earlier patch added IO APIC setup into local APIC setup. This caused
> modpost warnings. Fix them by untangling setup_local_APIC() and splitting
> it into smaller functions. The IO APIC initialization is only called
> for the BP init.
>
> Also removed some outdated debugging code and minor cleanup.
> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
>
> ---
>  arch/x86_64/kernel/apic.c|   46 
> ---
>  arch/x86_64/kernel/smpboot.c |8 +++
>  include/asm-x86_64/apic.h|1
>  3 files changed, 31 insertions(+), 24 deletions(-)
>
> Index: linux/arch/x86_64/kernel/apic.c
> ===
> --- linux.orig/arch/x86_64/kernel/apic.c
> +++ linux/arch/x86_64/kernel/apic.c
> @@ -323,7 +323,7 @@ void __init init_bsp_APIC(void)
>
>  void __cpuinit setup_local_APIC (void)
>  {
> -   unsigned int value, maxlvt;
> +   unsigned int value;
> int i, j;
>
> value = apic_read(APIC_LVR);
> @@ -417,33 +417,22 @@ void __cpuinit setup_local_APIC (void)
> else
> value = APIC_DM_NMI | APIC_LVT_MASKED;
> apic_write(APIC_LVT1, value);
> +}
>
> +void __cpuinit lapic_setup_esr(void)

static ?

> +{
> +   unsigned maxlvt = get_maxlvt();
> +   apic_write(APIC_LVTERR, ERROR_APIC_VECTOR);
> /*
> -* Now enable IO-APICs, actually call clear_IO_APIC
> -* We need clear_IO_APIC before enabling vector on BP
> +* spec says clear errors after enabling vector.
>  */
> -   if (!smp_processor_id())
> -   if (!skip_ioapic_setup && nr_ioapics)
> -   enable_IO_APIC();
> -
> -   {
> -   unsigned oldvalue;
> -   maxlvt = get_maxlvt();
> -   oldvalue = apic_read(APIC_ESR);
> -   value = ERROR_APIC_VECTOR;  // enables sending errors
> -   apic_write(APIC_LVTERR, value);
> -   /*
> -* spec says clear errors after enabling vector.
> -*/
> -   if (maxlvt > 3)
> -   apic_write(APIC_ESR, 0);
> -   value = apic_read(APIC_ESR);
> -   if (value != oldvalue)
> -   apic_printk(APIC_VERBOSE,
> -   "ESR value after enabling vector: %08x, after %08x\n",
> -   oldvalue, value);
> -   }
> +   if (maxlvt > 3)
> +   apic_write(APIC_ESR, 0);
> +}
>
> +void __cpuinit end_local_APIC_setup(void)
> +{
> +   lapic_setup_esr();
> nmi_watchdog_default();
> setup_apic_nmi_watchdog(NULL);
> apic_pm_activate();
> @@ -1178,6 +1167,15 @@ int __init APIC_init_uniprocessor (void)
>
> setup_local_APIC();
>
> +   /*
> +* Now enable IO-APICs, actually call clear_IO_APIC
> +* We need clear_IO_APIC before enabling vector on BP

here it is uniprocessor...
so
+* We need clear_IO_APIC before enabling error vector

> +*/
> +   if (!skip_ioapic_setup && nr_ioapics)
> +   enable_IO_APIC();

could it cause modpost warning too?

> +
> +   end_local_APIC_setup();
> +
> if (smp_found_config && !skip_ioapic_setup && nr_ioapics)
> setup_IO_APIC();
> else
> Index: linux/arch/x86_64/kernel/smpboot.c
> ===
> --- linux.orig/arch/x86_64/kernel/smpboot.c
> +++ linux/arch/x86_64/kernel/smpboot.c
> @@ -211,6 +211,7 @@ void __cpuinit smp_callin(void)
>
> Dprintk("CALLIN, before setup_local_APIC().\n");
> setup_local_APIC();
> +   end_local_APIC_setup();
>
> /*
>  * Get our bogomips.
> @@ -870,6 +871,13 @@ void __init smp_prepare_cpus(unsigned in
>  */
> setup_local_APIC();
>
> +   /*
> +* Enable IO APIC before setting up error vector
> +*/
> +   if (!skip_ioapic_setup && nr_ioapics)
> +   enable_IO_APIC();
> +   end_local_APIC_setup();
> +
> if (GET_APIC_ID(apic_read(APIC_ID)) != boot_cpu_id) {
> panic("Boot APIC ID in local APIC unexpected (%d vs %d)",
>   GET_APIC_ID(apic_read(APIC_ID)), boot_cpu_id);
> Index: linux/include/asm-x86_64/apic.h
> ===
> --- linux.orig/include/asm-x86_64/apic.h
> +++ linux/include/asm-x86_64/apic.h
> @@ -73,6 +73,7 @@ extern void cache_APIC_registers (void);
>  extern void sync_Arb_IDs (void);

sync_Arb_IDs is still left there?

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [13/50] x86: Fix and reenable CLFLUSH support in change_page_attr()

2007-09-21 Thread Oleg Verych

* Sat, 22 Sep 2007 00:32:11 +0200 (CEST)
[]
> - flush_map(&l);
> + flush_map(&arg);

  + flush_map(&arg.l);

  CC  arch/x86_64/mm/pageattr.o
arch/x86_64/mm/pageattr.c: In function 'global_flush_tlb':
arch/x86_64/mm/pageattr.c:274: warning: passing argument 1 of 'flush_map' from 
incompatible pointer type

(for i386 seems too)

[]  
> +#define PageFlush(p) test_bit(PG_owner_priv_1, &(p)->flags)
> +#define SetPageFlush(p) set_bit(PG_owner_priv_1, &(p)->flags)
> +#define TestClearPageFlush(p) test_and_clear_bit(PG_owner_priv_1, 
> &(p)->flags)

Is it worth introducing more of that Pascal style? Yes, page stuff is
all about it, but still.

[]
> +static struct page *flush_page(unsigned long address)
>  {
> - if (!test_and_set_bit(PG_arch_1, &kpte_page->flags))
> - list_add(&kpte_page->lru, &df_list);
> + struct page *p;
> + if (!(pfn_valid(__pa(address) >> PAGE_SHIFT)))
> + return NULL;
> + p = virt_to_page(address);
> + if ((PageFlush(p) || PageLRU(p)) && !test_bit(PG_arch_1, &p->flags))
> + return NULL;
> + return p;
>  }

Saves 16 bytes in non optimized compile (if tcc will ever do this :)

static struct page *flush_page(unsigned long address)
{
struct page *p = NULL;

if (pfn_valid(__pa(address) >> PAGE_SHIFT)) {
p = virt_to_page(address);
if (PageFlush(p) || PageLRU(p))
if (!test_bit(PG_arch_1, &p->flags))
p = NULL;
}
return p;
}

>  static int
> @@ -158,6 +185,18 @@ __change_page_attr(struct page *page, pg
>   kpte_page = virt_to_page(kpte);
>   BUG_ON(PageLRU(kpte_page));
>   BUG_ON(PageCompound(kpte_page));
> + BUG_ON(PageLRU(kpte_page));
> +
> + /* Do caching attributes change?
> +Note: this will need changes if the PAT bit is used (it isn't
> +currently) because that one varies between 2MB and 4K pages. */
> + if ((pte_val(*kpte)&_PAGE_CACHE) != (pgprot_val(prot)&_PAGE_CACHE)) {
> + struct page *p = flush_page(address);
> + if (!p)
> + full_flush = 1;
> + else
> + save_page(p, 1);
> + }
>  
>   if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) { 
>   if (!pte_huge(*kpte)) {
> @@ -189,7 +228,7 @@ __change_page_attr(struct page *page, pg
>* replace it with a largepage.
>*/
>  
> - save_page(kpte_page);
> + save_page(kpte_page, 0);
>   if (!PageReserved(kpte_page)) {
>   if (cpu_has_pse && (page_private(kpte_page) == 0)) {
>   paravirt_release_pt(page_to_pfn(kpte_page));
> @@ -235,18 +274,22 @@ int change_page_attr(struct page *page, 
>  
>  void global_flush_tlb(void)
>  {
> - struct list_head l;
> + struct flush_arg arg;
>   struct page *pg, *next;
>  
>   BUG_ON(irqs_disabled());
>  
>   spin_lock_irq(&cpa_lock);
> - list_replace_init(&df_list, &l);
> + arg.full_flush = full_flush;
> + full_flush = 0;
> + list_replace_init(&df_list, &arg.l);
>   spin_unlock_irq(&cpa_lock);
> - flush_map(&l);
> - list_for_each_entry_safe(pg, next, &l, lru) {
> + flush_map(&arg);

i386 case here.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.23-rc7 + radeonfb/s2ram

2007-09-21 Thread Mihai Donțu

Hi,

Today, out of curiosity, I pulled 2.6.23-rc7 (leave on the edge in a quiet 
weekend).
Anyway, it seems that radeonfb and my:
"01:05.0 VGA compatible controller: ATI Technologies Inc ATI Radeon XPRESS 200M 
5955 (PCIE)"
don't get along anymore, by:
a) X somehow fails to initialize the card and everything moves really slow (I 
can
   see how surfaces are drawn pixel-by-pixel); furthermore, garbage stuff 
appears
   on the screen;
b) after powering up from a s2ram, the system freezes;

b) is not that bad, s2ram never worked on my machine (kjournald and some other 
kernel
processes, enter disk-sleep and in a matter of seconds, everything just... 
freezes.
I can type a few commands at the normal console but that is all);

Following the advices in 'Documentation/power/s2ram.txt' helped. Using the 
regular
VGA console got X on the right track (no more slowness);

Now that I got my hands "dirty", I'm in the mood to make my s2ram work (I've
been using Linux (exclusively) for three years now, it's about time I do a small
contribution). What kernel option must I enable to determine why some processes
enter (and stay in) disk-sleep? I'm on a laptop and I don't think it will 
withstand
too many reboots :)

I've also attached the output of lspci and dmesg. Maybe someone spots something.

Thanks,

-- 
Mihai Donțu
[0.00] Linux version 2.6.23-rc7 ([EMAIL PROTECTED]) (gcc version 4.1.2 
(Gentoo 4.1.2)) #3 PREEMPT Sat Sep 22 07:27:11 EEST 2007
[0.00] Command line: vga=791 nohz=on
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000e - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 37fd (usable)
[0.00]  BIOS-e820: 37fd - 37fefc00 (reserved)
[0.00]  BIOS-e820: 37fefc00 - 37ffb000 (ACPI NVS)
[0.00]  BIOS-e820: 37ffb000 - 4000 (reserved)
[0.00]  BIOS-e820: e000 - f000 (reserved)
[0.00]  BIOS-e820: fec0 - fec02000 (reserved)
[0.00]  BIOS-e820: ffb8 - ffc0 (reserved)
[0.00]  BIOS-e820: fff8 - 0001 (reserved)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used
[0.00] Entering add_active_range(0, 256, 229328) 1 entries of 256 used
[0.00] end_pfn_map = 1048576
[0.00] DMI 2.3 present.
[0.00] ACPI: RSDP 000FE270, 0014 (r0 HP)
[0.00] ACPI: RSDT 37FEFC84, 0034 (r1 HP 0944 22110520 HP
  1)
[0.00] ACPI: FACP 37FEFC00, 0084 (r2 HP 09442 HP
  1)
[0.00] ACPI: DSDT 37FEFD50, 7489 (r1 HPSB4001 MSFT  
10E)
[0.00] ACPI: FACS 37FFAE80, 0040
[0.00] ACPI: APIC 37FEFCB8, 005A (r1 HP 09441 HP
  1)
[0.00] ACPI: MCFG 37FEFD14, 003C (r1 HP 09441 HP
  1)
[0.00] ACPI: SSDT 37FF71D9, 0382 (r1 HP   HPQPpc 1001 MSFT  
10E)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used
[0.00] Entering add_active_range(0, 256, 229328) 1 entries of 256 used
[0.00] No mptable found.
[0.00] Zone PFN ranges:
[0.00]   DMA 0 -> 4096
[0.00]   DMA324096 ->  1048576
[0.00]   Normal1048576 ->  1048576
[0.00] Movable zone start PFN for each node
[0.00] early_node_map[2] active PFN ranges
[0.00] 0:0 ->  159
[0.00] 0:  256 ->   229328
[0.00] On node 0 totalpages: 229231
[0.00]   DMA zone: 56 pages used for memmap
[0.00]   DMA zone: 1960 pages reserved
[0.00]   DMA zone: 1983 pages, LIFO batch:0
[0.00]   DMA32 zone: 3079 pages used for memmap
[0.00]   DMA32 zone: 222153 pages, LIFO batch:31
[0.00]   Normal zone: 0 pages used for memmap
[0.00]   Movable zone: 0 pages used for memmap
[0.00] ATI board detected. Disabling timer routing over 8254.
[0.00] ACPI: PM-Timer IO Port: 0x8008
[0.00] ACPI: Local APIC address 0xfec01000
[0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[0.00] Processor #0 (Bootup-CPU)
[0.00] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[0.00] ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
[0.00] IOAPIC[0]: apic_id 1, address 0xfec0, GSI 0-23
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level)
[0.00] ACPI: IRQ0 used by override.
[0.00] ACPI: IRQ2 used by override.
[0.00] Setting APIC routing to flat
[0.00] Using ACPI (MADT) for SMP configuration information
[0.00]

Re: [PATCH] [34/50] i386: Fix argument signedness warnings

2007-09-21 Thread Satyam Sharma

Hi,


On Sat, 22 Sep 2007, Andi Kleen wrote:
> 
> From: Satyam Sharma <[EMAIL PROTECTED]>
> 
> 
> These build warnings:
> 
> In file included from include/asm/thread_info.h:16,
> from include/linux/thread_info.h:21,
> from include/linux/preempt.h:9,
> from include/linux/spinlock.h:49,
> from include/linux/vmalloc.h:4,
> from arch/i386/boot/compressed/misc.c:14:
> include/asm/processor.h: In function $B!F(Jcpuid_count$B!G(J:
   ^^   ^^
> include/asm/processor.h:615: warning: pointer targets in passing
> argument 1 of $B!F(Jnative_cpuid$B!G(J differ in signedness

> include/asm/processor.h:615: warning: pointer targets in passing
> argument 2 of $B!F(Jnative_cpuid$B!G(J differ in signedness

> include/asm/processor.h:615: warning: pointer targets in passing
> argument 3 of $B!F(Jnative_cpuid$B!G(J differ in signedness

> include/asm/processor.h:615: warning: pointer targets in passing
> argument 4 of $B!F(Jnative_cpuid$B!G(J differ in signedness
^^^^

Yikes. My bad, I had faulty (default) alpine settings (and a sad
combination of LANG=en_US.UTF-8) when I made and sent out that patch.
Please ensure that this finally gets committed in a somewhat saner and
more readable state to the tree.

Thanks,

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 04/28] Add cmpxchg64 and cmpxchg64_local to powerpc

2007-09-21 Thread Paul Mackerras

Mathieu Desnoyers writes:

> Make sure that at least cmpxchg64_local is available on all architectures to 
> use
> for unsigned long long values.
> 
> Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

Acked-by: Paul Mackerras <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] 2.6.22.6 user-mode linux: use address instead of value as argument in os_free_irq_by_cb

2007-09-21 Thread lepton

Hi,
  There is a bug in os_free_irq_by_cb, when the first element
  of active_fds list  is free, the value of active_fds is not
  updated, just value in stack is updated.

  The intresting thing is that without this patch, a poweroff
  in user mode linux guest will halt the host linux system.It
  seems that after the tracing thread is dead, the syscall to
  sys_reboot of the traced thread is executed by host. I don't
  know if it is another bug.

Signed-off-by: Lepton Wu <[EMAIL PROTECTED]>

diff -X linux-2.6.22.6/Documentation/dontdiff -pru 
linux-2.6.22.6/arch/um/include/os.h linux-2.6.22.6-lepton/arch/um/include/os.h
--- linux-2.6.22.6/arch/um/include/os.h 2007-09-14 17:41:10.0 +0800
+++ linux-2.6.22.6-lepton/arch/um/include/os.h  2007-09-22 12:15:28.0 
+0800
@@ -325,7 +325,7 @@ extern void reboot_skas(void);
 extern int os_waiting_for_events(struct irq_fd *active_fds);
 extern int os_create_pollfd(int fd, int events, void *tmp_pfd, int 
size_tmpfds);
 extern void os_free_irq_by_cb(int (*test)(struct irq_fd *, void *), void *arg,
-   struct irq_fd *active_fds, struct irq_fd ***last_irq_ptr2);
+   struct irq_fd **active_fds_ptr, struct irq_fd ***last_irq_ptr2);
 extern void os_free_irq_later(struct irq_fd *active_fds,
int irq, void *dev_id);
 extern int os_get_pollfd(int i);
diff -X linux-2.6.22.6/Documentation/dontdiff -pru 
linux-2.6.22.6/arch/um/kernel/irq.c linux-2.6.22.6-lepton/arch/um/kernel/irq.c
--- linux-2.6.22.6/arch/um/kernel/irq.c 2007-09-14 17:41:10.0 +0800
+++ linux-2.6.22.6-lepton/arch/um/kernel/irq.c  2007-09-22 12:15:05.0 
+0800
@@ -218,7 +218,7 @@ static void free_irq_by_cb(int (*test)(s
unsigned long flags;
 
spin_lock_irqsave(&irq_lock, flags);
-   os_free_irq_by_cb(test, arg, active_fds, &last_irq_ptr);
+   os_free_irq_by_cb(test, arg, &active_fds, &last_irq_ptr);
spin_unlock_irqrestore(&irq_lock, flags);
 }
 
diff -X linux-2.6.22.6/Documentation/dontdiff -pru 
linux-2.6.22.6/arch/um/os-Linux/irq.c 
linux-2.6.22.6-lepton/arch/um/os-Linux/irq.c
--- linux-2.6.22.6/arch/um/os-Linux/irq.c   2007-09-14 17:41:10.0 
+0800
+++ linux-2.6.22.6-lepton/arch/um/os-Linux/irq.c2007-09-22 
12:15:42.0 +0800
@@ -84,12 +84,12 @@ int os_create_pollfd(int fd, int events,
 }
 
 void os_free_irq_by_cb(int (*test)(struct irq_fd *, void *), void *arg,
-   struct irq_fd *active_fds, struct irq_fd ***last_irq_ptr2)
+   struct irq_fd **active_fds_ptr, struct irq_fd ***last_irq_ptr2)
 {
struct irq_fd **prev;
int i = 0;
 
-   prev = &active_fds;
+   prev = active_fds_ptr;
while (*prev != NULL) {
if ((*test)(*prev, arg)) {
struct irq_fd *old_fd = *prev;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] user.c: use kmem_cache_zalloc()

2007-09-21 Thread Satyam Sharma

On Fri, 21 Sep 2007, Andrew Morton wrote:
> 
> On Fri, 21 Sep 2007 13:39:06 +0400
> Alexey Dobriyan <[EMAIL PROTECTED]> wrote:
> 
> > Quite a few fields are zeroed during user_struct creation, so use
> > kmem_cache_zalloc() --  save a few lines and #ifdef. Also will help avoid
> >  #ifdef CONFIG_POSIX_MQUEUE in next patch.
> > 
> > Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
> > ---
> > 
> >  kernel/user.c |   13 +
> >  1 file changed, 1 insertion(+), 12 deletions(-)
> > 
> > --- a/kernel/user.c
> > +++ b/kernel/user.c
> > @@ -129,21 +129,11 @@ struct user_struct * alloc_uid(struct user_namespace 
> > *ns, uid_t uid)
> > if (!up) {
> > struct user_struct *new;
> >  
> > -   new = kmem_cache_alloc(uid_cachep, GFP_KERNEL);
> > +   new = kmem_cache_zalloc(uid_cachep, GFP_KERNEL);
> > if (!new)
> > return NULL;
> > new->uid = uid;
> > atomic_set(&new->__count, 1);
> > -   atomic_set(&new->processes, 0);
> > -   atomic_set(&new->files, 0);
> > -   atomic_set(&new->sigpending, 0);
> > -#ifdef CONFIG_INOTIFY_USER
> > -   atomic_set(&new->inotify_watches, 0);
> > -   atomic_set(&new->inotify_devs, 0);
> > -#endif
> > -
> > -   new->mq_bytes = 0;
> > -   new->locked_shm = 0;
> 
> 
> This assumes that setting an atomic_t to the all-zeroes pattern is
> equivalent to atomic_set(v, 0).
> 
> This happens to be true for all present architectures, afaik.  But an
> architecture which has crappy primitives could quite legitimately implement
> its atomic_t as:
> 
> typedef struct {
>   int counter;
>   spinlock_t lock;
> } atomic_t;
> 
> in which case your assumption breaks.

Agreed, and this (implementing atomic ops using spinlocks) is already true
for the CRIS platform.

However, cris' implementation explicitly takes care to ensure that
atomic_t contains just a solitary int member, and no spinlock_t's
inside the atomic_t itself. [ include/asm-cris/arch-v32/atomic.h ]

Of course, that "128" limits scalability, so no more than 128 CPUs can be
executing atomic ops at any given instant of time, but admittedly I'm
getting anal here myself ... (but probably that's often perfectly the
right attitude to have too)

> So it's all a bit theoretical and a bit anal, and I'm sure we're making the
> same mistake in other places, but it's not a change I particularly like..

Hmm, it's borderline.

Such changes make text smaller (in terms of LOC as well vmlinux size).

But they also hurt grepping. Often we (at least I) want to grep for when
is a variable/struct member/etc getting initialized or getting
set/assigned to. Take this case, for example -- I bet it's important (for
overall logic) that those variables get initialized to zero. But *zalloc()
functions do that implicitly, so it wastes precious seconds or minutes of
developer time when grepping that code.

OTOH, we could make it standard practise to put a little comment on top
of such *zalloc() usages, explicitly enumerating the struct members that
that the *zalloc() is assumed to initialize to zero.

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Patch 1/2] Trace code and documentation (resend)

2007-09-21 Thread David J. Wilder

My last posting was mangled by my mailer.  I hope this one is better.
Also corrected Randy's concerns.

Please see previous posting for more information:
http://lkml.org/lkml/2007/9/19/4 (PATCH 0/2)

Note: this patch requires "[Patch 2/2] Relay reset consumed" is applied.

-
Trace - Provides tracing primitives

Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]>
Signed-off-by: Martin Hunt <[EMAIL PROTECTED]>
Signed-off-by: David Wilder <[EMAIL PROTECTED]>
---
 Documentation/trace/src/Makefile |7 +
 Documentation/trace/src/README   |   18 +
 Documentation/trace/src/fork_trace.c |  119 +++
 Documentation/trace/trace.txt|  164 ++
 include/linux/trace.h|   99 ++
 lib/Kconfig  |9 +
 lib/Makefile |2 +
 lib/trace.c  |  563 ++
 8 files changed, 981 insertions(+), 0 deletions(-)

diff --git a/Documentation/trace/src/Makefile b/Documentation/trace/src/Makefile
new file mode 100644
index 000..9ee4c72
--- /dev/null
+++ b/Documentation/trace/src/Makefile
@@ -0,0 +1,7 @@
+obj-m := fork_trace.o
+KDIR := /lib/modules/$(shell uname -r)/build
+PWD := $(shell pwd)
+default:
+   $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
+clean:
+   rm -f *.mod.c *.ko *.o
diff --git a/Documentation/trace/src/README b/Documentation/trace/src/README
new file mode 100644
index 000..f538491
--- /dev/null
+++ b/Documentation/trace/src/README
@@ -0,0 +1,18 @@
+This small sample module creates a trace channel. It places a kprobe
+on the function do_fork(). The value of current->pid is written to
+the trace channel each time the kprobe is hit..
+
+How to run the example:
+$ mount -t debugfs /debug
+$ make
+$ insmod fork_trace.ko
+
+To view the data produced by the module:
+$ cat /debug/trace_example/do_fork/trace0
+
+Remove the module.
+$ rmmod fork_trace
+
+The function trace_cleanup() is called when the module
+is removed.  This will cause the TRACE channel to be destroyed and the
+corresponding files to disappear from the debug file system.
diff --git a/Documentation/trace/src/fork_trace.c 
b/Documentation/trace/src/fork_trace.c
new file mode 100644
index 000..8a7bc55
--- /dev/null
+++ b/Documentation/trace/src/fork_trace.c
@@ -0,0 +1,119 @@
+/*
+ * An example of using trace in a kprobes module
+ *
+ * Copyright (C) 2007 IBM Inc.
+ *
+ * David Wilder <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#define USE_GLOBAL_BUFFERS 1
+#define USE_FLIGHT 1
+
+#define PROBE_POINT "do_fork"
+
+static struct kprobe kp;
+static struct trace_info *kprobes_trace;
+
+#ifdef USE_GLOBAL_BUFFERS
+static DEFINE_SPINLOCK(trace_lock);
+#endif
+
+/*
+ * Send formatted trace data to trace channel.
+ * @note Preemption must be disabled to use this.
+ */
+static void trace_printf(struct trace_info *trace, const char *format, ...)
+{
+   va_list ap, aq;
+   char *record;
+   unsigned long flags;
+   int len;
+
+   if (!trace)
+   return;
+
+#ifdef USE_GLOBAL_BUFFERS
+   spin_lock_irqsave(&trace_lock, flags);
+#endif
+   if (trace_running(trace)) {
+   va_start(ap, format);
+   va_copy(aq, ap);
+   len = vsnprintf(NULL, 0, format, aq);
+   va_end(aq);
+   record = relay_reserve(trace->rchan, ++len);
+   if (record)
+   vsnprintf(record, len, format, ap);
+   va_end(ap);
+   }
+#ifdef USE_GLOBAL_BUFFERS
+   spin_unlock_irqrestore(&trace_lock, flags);
+#endif
+}
+
+static int handler_pre(struct kprobe *p, struct pt_regs *regs)
+{
+   rcu_read_lock();
+   trace_printf(kprobes_trace, "%d\n", current->pid);
+   rcu_read_unlock();
+   return 0;
+}
+
+int init_module(void)
+{
+   int ret;
+   u32 flags = 0;
+
+#ifdef USE_GLOBAL_BUFFERS
+   flags |= TRACE_GLOBAL_CHANNEL;
+#endif
+
+#ifdef USE_FLIGHT
+   flags |= TRACE_FLIGHT_CHANNEL;
+#endif
+
+   /* setup the trace */
+   kprobes_trace = trace_setup("trace_example", PROBE_POINT,
+1024, 8, flags);
+   if (IS_ERR(kprobes_trace))
+   return PTR_ERR(kprobes_trace);
+
+   trace_start(kprobes_trace);
+
+   /* setup the

[Patch 2/2] Relay reset consumed

2007-09-21 Thread David J. Wilder

This patch allows relay channels to be reset i.e. unconsumed.
Basically allows a 'rewind' function for flight-recorder tracing.

Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]>
Signed-off-by: David Wilder <[EMAIL PROTECTED]>
---
 Documentation/filesystems/relay.txt |   11 ++
 include/linux/relay.h   |1 +
 kernel/relay.c  |   58 ---
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/relay.txt 
b/Documentation/filesystems/relay.txt
index 18d23f9..d31113a 100644
--- a/Documentation/filesystems/relay.txt
+++ b/Documentation/filesystems/relay.txt
@@ -161,6 +161,7 @@ TBD(curr. line MT:/API/)
 relay_close(chan)
 relay_flush(chan)
 relay_reset(chan)
+relay_reset_consumed(chan)
 
   channel management typically called on instigation of userspace:
 
@@ -452,6 +453,16 @@ state without reallocating channel buffer memory or 
destroying
 existing mappings.  It should however only be called when it's safe to
 do so, i.e. when the channel isn't currently being written to.
 
+The read(2) implementation always 'consumes' the bytes read,
+i.e. those bytes won't be available again to subsequent reads.
+Certain applications may nonetheless wish to allow the 'consumed' data
+to be re-read; relay_reset_consumed() is provided for that purpose -
+it resets the internal consumed counters for all buffers in the
+channel.  For example, if a first set of reads 'drains' the channel,
+and then relay_reset_consumed() is called, a second set of reads will
+get the exact same data (assuming no new data was written between the
+first set of reads and the second).
+
 Finally, there are a couple of utility callbacks that can be used for
 different purposes.  buf_mapped() is called whenever a channel buffer
 is mmapped from user space and buf_unmapped() is called when it's
diff --git a/include/linux/relay.h b/include/linux/relay.h
index 6cd8c44..aca45fa 100644
--- a/include/linux/relay.h
+++ b/include/linux/relay.h
@@ -175,6 +175,7 @@ extern void relay_subbufs_consumed(struct rchan *chan,
   unsigned int cpu,
   size_t consumed);
 extern void relay_reset(struct rchan *chan);
+extern void relay_reset_consumed(struct rchan *chan);
 extern int relay_buf_full(struct rchan_buf *buf);
 
 extern size_t relay_switch_subbuf(struct rchan_buf *buf,
diff --git a/kernel/relay.c b/kernel/relay.c
index 61134eb..6b55eaa 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -383,6 +383,57 @@ void relay_reset(struct rchan *chan)
 }
 EXPORT_SYMBOL_GPL(relay_reset);
 
+/**
+ * __relay_reset_consumed - reset a channel buffer's consumed count
+ * @buf: the channel buffer
+ *
+ * See relay_reset_consumed for description of effect.
+ */
+static inline void __relay_reset_consumed(struct rchan_buf *buf)
+{
+   size_t n_subbufs = buf->chan->n_subbufs;
+   size_t produced = buf->subbufs_produced;
+   size_t consumed = buf->subbufs_consumed;
+
+   if (produced < n_subbufs)
+   buf->subbufs_consumed = 0;
+   else {
+   consumed = produced - n_subbufs;
+   if (buf->offset)
+   consumed++;
+   buf->subbufs_consumed = consumed;
+   }
+   buf->bytes_consumed = 0;
+}
+
+/**
+ * relay_reset_consumed - reset the channel's consumed counts
+ * @chan: the channel
+ *
+ * This has the effect of making all data previously read (and
+ * not overwritten by subsequent writes) from a channel available
+ * for reading again.
+ *
+ * NOTE: Care should be taken that the channel isn't actually
+ * being used by anything when this call is made.
+ */
+void relay_reset_consumed(struct rchan *chan)
+{
+   unsigned int i;
+   struct rchan_buf *prev = NULL;
+
+   if (!chan)
+   return;
+
+   for (i = 0; i < NR_CPUS; i++) {
+   if (!chan->buf[i] || chan->buf[i] == prev)
+   break;
+   __relay_reset_consumed(chan->buf[i]);
+   prev = chan->buf[i];
+   }
+}
+EXPORT_SYMBOL_GPL(relay_reset_consumed);
+
 /*
  * relay_open_buf - create a new relay channel buffer
  *
@@ -845,11 +896,8 @@ static int relay_file_read_avail(struct rchan_buf *buf, 
size_t read_pos)
return 1;
}
 
-   if (unlikely(produced - consumed >= n_subbufs)) {
-   consumed = produced - n_subbufs + 1;
-   buf->subbufs_consumed = consumed;
-   buf->bytes_consumed = 0;
-   }
+   if (unlikely(produced - consumed >= n_subbufs))
+   __relay_reset_consumed(buf);
 
produced = (produced % n_subbufs) * subbuf_size + buf->offset;
consumed = (consumed % n_subbufs) * subbuf_size + buf->bytes_consumed;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at

Re: [PATCH -mm] sb16: Shut up uninitialized var build warning

2007-09-21 Thread Rene Herman


On 09/20/2007 07:52 PM, Denys Vlasenko wrote:


On Sunday 02 September 2007 23:06, Rene Herman wrote:



Blah. Your message has:

Content-Type: TEXT/PLAIN; charset=iso-2022-jp

This apparently is caused by a combination of GCC using groovy UTF tickmarks 
in its error messages when in a UTF locale and alpine believing it to be a 
great idea to automatically try for the "simplest" character set it can 
encode the content in. No idea why that means that iso-2022-jp is picked, 
but it is.


While I could actually read the message this time you should see what 
iso-2022-jp does to my font. It's scary. Best solution as far as I'm 
concerned is slap a few GCC developers (not that it wil help, but it'll 
certainly feel good) and then teach alpine to go for UTF-8 directly if 
US-ASCII won't do.


rotfl.

Kindly give me permission to convert your email into gcc bugreport
and/or to forward it to gcc mailing list.


Blessings be upon thou, oh courageous one...

Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Paul E. McKenney

On Fri, Sep 21, 2007 at 10:56:56PM -0400, Steven Rostedt wrote:
> 
> [ sneaks away from the family for a bit to answer emails ]

[ same here, now that you mention it... ]

> --
> On Fri, 21 Sep 2007, Paul E. McKenney wrote:
> 
> > On Fri, Sep 21, 2007 at 09:19:22PM -0400, Steven Rostedt wrote:
> > >
> > > --
> > > On Fri, 21 Sep 2007, Paul E. McKenney wrote:
> > > > >
> > > > > In any case, I will be looking at the scenarios more carefully.  If
> > > > > it turns out that GP_STAGES can indeed be cranked down a bit, well,
> > > > > that is an easy change!  I just fired off a POWER run with GP_STAGES
> > > > > set to 3, will let you know how it goes.
> > > >
> > > > The first attempt blew up during boot badly enough that ABAT was unable
> > > > to recover the machine (sorry, grahal!!!).  Just for grins, I am trying
> > > > it again on a machine that ABAT has had a better record of reviving...
> > >
> > > This still frightens the hell out of me. Going through 15 states and
> > > failing. Seems the CPU is holding off writes for a long long time. That
> > > means we flipped the counter 4 times, and that still wasn't good enough?
> >
> > Might be that the other machine has its 2.6.22 version of .config messed
> > up.  I will try booting it on a stock 2.6.22 kernel when it comes back
> > to life -- not sure I ever did that before.  Besides, the other similar
> > machine seems to have gone down for the count, but without me torturing
> > it...
> >
> > Also, keep in mind that various stages can "record" a memory misordering,
> > for example, by incrementing the wrong counter.
> >
> > > Maybe I'll boot up my powerbook to see if it has the same issues.
> > >
> > > Well, I'm still finishing up on moving into my new house, so I wont be
> > > available this weekend.
> >
> > The other machine not only booted, but has survived several minutes of
> > rcutorture thus far.  I am also trying POWER5 machine as well, as the
> > one currently running is a POWER4, which is a bit less aggressive about
> > memory reordering than is the POWER5.
> >
> > Even if they pass, I refuse to reduce GP_STAGES until proven safe.
> > Trust me, you -don't- want to be unwittingly making use of a subtely
> > busted RCU implementation!!!
> 
> I totally agree. This is the same reason I want to understand -why- it
> fails with 3 stages. To make sure that adding a 4th stage really does fix
> it, and doesn't just make the chances for the bug smaller.

Or if it really does break, as opposed to my having happened upon a sick
or misconfigured machine.

> I just have that funny feeling that we are band-aiding this for POWER with
> extra stages and not really solving the bug.
> 
> I could be totally out in left field on this. But the more people have a
> good understanding of what is happening (this includes why things fail)
> the more people in general can trust this code.  Right now I'm thinking
> you may be the only one that understands this code enough to trust it. I'm
> just wanting you to help people like me to trust the code by understanding
> and not just having faith in others.

Agreed.  Trusting me is grossly insufficient.  For one thing, the Linux
kernel has a reasonable chance of outliving me.

> If you ever decide to give up jogging, we need to make sure that there are
> people here that can still fill those running shoes (-:

Well, I certainly don't jog as fast or as far as I used to!  ;-)

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Paul E. McKenney

On Fri, Sep 21, 2007 at 11:15:42PM -0400, Steven Rostedt wrote:
> On Fri, 21 Sep 2007, Paul E. McKenney wrote:
> > On Fri, Sep 21, 2007 at 09:15:03PM -0400, Steven Rostedt wrote:
> > > On Fri, 21 Sep 2007, Paul E. McKenney wrote:
> > > > On Fri, Sep 21, 2007 at 10:40:03AM -0400, Steven Rostedt wrote:
> > > > > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote:

[ . . . ]

> > > Are we sure that adding all these grace periods stages is better than just
> > > biting the bullet and put in a memory barrier?
> >
> > Good question.  I believe so, because the extra stages don't require
> > much additional processing, and because the ratio of rcu_read_lock()
> > calls to the number of grace periods is extremely high.  But, if I
> > can prove it is safe, I will certainly decrease GP_STAGES or otherwise
> > optimize the state machine.
> 
> But until others besides yourself understand that state machine (doesn't
> really need to be me) I would be worried about applying it without
> barriers.  The barriers may add a bit of overhead, but it adds some
> confidence in the code.  I'm arguing that we have barriers in there until
> there's a fine understanding of why we fail with 3 stages and not 4.
> Perhaps you don't have a box with enough cpus to fail at 4.
> 
> I don't know how the higher ups in the kernel command line feel, but I
> think that memory barriers on critical sections are justified. But if you
> can show a proof that adding extra stages is sufficient to deal with
> CPUS moving memory writes around, then so be it. But I'm still not
> convinced that these extra stages are really solving the bug instead of
> just making it much less likely to happen.
> 
> Ingo praised this code since it had several years of testing in the RT
> tree. But that version has barriers, so this new verison without the
> barriers has not had that "run it through the grinder" feeling to it.

Fair point...  Though the -rt variant has its shortcomings as well,
such as being unusable from NMI/SMI handlers.

How about this:  I continue running the GP_STAGES=3 run on the pair of
POWER machines (which are both going strong, and I also get a document
together describing the new version (and of course apply the changes we
have discussed, and merge with recent CPU-hotplug changes -- Gautham
Shenoy is currently working this), work out a good answer to "how
big exactly does GP_STAGES need to be", test whatever that number is,
assuming it is neither 3 nor 4, and figure out why the gekko-lp1 machine
choked on GP_STAGES=3.

Then we can work out the best path forward from wherever that ends up
being.

[ . . . ]

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: possible corrections in the docs (Re: [PATCH] [7/50] x86: expand /proc/interrupts to include missing vectors, v2)

2007-09-21 Thread Joe Korty

Looks good to me.
Joe

Acked-by: Joe Korty <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Eyleme Cagri: Kardesime Dokunma, Olum Yasalari Kaldirilsin

2007-09-21 Thread Deniz KURTUL

Eyleme Cagri: Kardesime Dokunma, Olum Yasalari Kaldirilsin
Nijeryali gocmen Festus Okey'in oldurulmesi bu soruna karsi hicbirimizin
duyarsiz kalmamasi gerektigini gosterdi. Olum yasalarinin kaldirilmasi ve
gocmen kardeslerimizle dayanismak icin 23 Eylul Pazar gunu saat 14.00'de
Taksim Tramvay Duragi'nda bulusuyoruz. Herkesi yapilacak basin
aciklamasina bekliyoruz.
http://www.anarsistkomunizm.org/karakizilforum/viewtopic.php?t=736


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, group scheduler, fixes

2007-09-21 Thread Tong Li


Mike,

Could you try this patch to see if it solves the latency problem?

  tong

Changes:

1. Modified vruntime adjustment logic in set_task_cpu(). See comments in 
code. This fixed the negative vruntime problem.


2. This code in update_curr() seems to be wrong:

if (unlikely(!curr))
return sync_vruntime(cfs_rq);

cfs_rq->curr can be NULL even if cfs_rq->nr_running is non-zero (e.g., 
when an RT task is running). We only want to call sync_vruntime when 
cfs_rq->nr_running is 0. This fixed the large latency problem (at least in 
my tests).


Signed-off-by: Tong Li <[EMAIL PROTECTED]>
---
diff -uprN linux-2.6-sched-devel-orig/kernel/sched.c 
linux-2.6-sched-devel/kernel/sched.c
--- linux-2.6-sched-devel-orig/kernel/sched.c   2007-09-20 12:15:41.0 
-0700
+++ linux-2.6-sched-devel/kernel/sched.c2007-09-21 19:40:08.0 
-0700
@@ -1033,9 +1033,20 @@ void set_task_cpu(struct task_struct *p,
if (p->se.block_start)
p->se.block_start -= clock_offset;
 #endif
-   if (likely(new_rq->cfs.min_vruntime))
-   p->se.vruntime -= old_rq->cfs.min_vruntime -
-   new_rq->cfs.min_vruntime;
+	/* 
+	 * Reset p's vruntime if it moves to new_cpu whose min_vruntime is 
+	 * 100,000,000 (equivalent to 100ms for nice-0 tasks) larger or

+* smaller than p's vruntime. This improves interactivity when
+* pinned and unpinned tasks co-exist. For example, pinning a few
+* tasks to a CPU can cause its min_vruntime much smaller than the
+* other CPUs. If a task moves to this CPU, its vruntime can be so
+* large it won't be scheduled until the locally pinned tasks'
+* vruntimes catch up, causing large delays.
+*/
+	if (unlikely(old_cpu != new_cpu && p->se.vruntime && 
+		(p->se.vruntime > new_rq->cfs.min_vruntime + 1 || 
+		 p->se.vruntime + 1 < new_rq->cfs.min_vruntime)))

+   p->se.vruntime = new_rq->cfs.min_vruntime;

__set_task_cpu(p, new_cpu);
 }
@@ -1599,6 +1610,7 @@ static void __sched_fork(struct task_str
p->se.exec_start = 0;
p->se.sum_exec_runtime   = 0;
p->se.prev_sum_exec_runtime  = 0;
+   p->se.vruntime   = 0;

 #ifdef CONFIG_SCHEDSTATS
p->se.wait_start = 0;
diff -uprN linux-2.6-sched-devel-orig/kernel/sched_fair.c 
linux-2.6-sched-devel/kernel/sched_fair.c
--- linux-2.6-sched-devel-orig/kernel/sched_fair.c  2007-09-20 
12:15:41.0 -0700
+++ linux-2.6-sched-devel/kernel/sched_fair.c   2007-09-21 17:23:09.0 
-0700
@@ -306,8 +306,10 @@ static void update_curr(struct cfs_rq *c
u64 now = rq_of(cfs_rq)->clock;
unsigned long delta_exec;

-   if (unlikely(!curr))
+   if (unlikely(!cfs_rq->nr_running))
return sync_vruntime(cfs_rq);
+   if (unlikely(!curr))
+   return;

/*
 * Get the amount of time the current task was running
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

possible corrections in the docs (Re: [PATCH] [7/50] x86: expand /proc/interrupts to include missing vectors, v2)

2007-09-21 Thread Oleg Verych

* Sat, 22 Sep 2007 00:32:05 +0200 (CEST)

[]
> Index: linux/Documentation/filesystems/proc.txt
>===
> --- linux.orig/Documentation/filesystems/proc.txt
> +++ linux/Documentation/filesystems/proc.txt
> @@ -347,7 +347,40 @@ connects the CPUs in a SMP system. This 
>  the IO-APIC automatically retry the transmission, so it should not be a big
>  problem, but you should read the SMP-FAQ.
>  
> -In this context it could be interesting to note the new irq directory in 2.4.
> +In 2.6.2* /proc/interrupts was expanded again.  This time the goal was for
> +/proc/interrupts to display every IRQ vector in use by the system, not
> +just those considered 'most important'.  The new vectors are:
> +
> +  THR -- a threshold interrupt occurs when ECC memory correction is occuring
> +  at too high a frequency.  Threshold interrupt machinery is often put
> +  into the ECC logic, as occasional ECC memory corrections are part of
> +  normal operation (due to random alpha particles), but sequences of
> +  ECC corrections or outright failures over some short interval usually
> +  indicate a memory chip that is about to fail.  Note that not every
> +  platform has ECC threshold logic, and those that do generally require
> +  it to be explicitly turned on.

 +  THR -- a threshold interrupt happens, when frequency of ECC memory
 +  corrections is too high. Threshold interrupt machinery is often put
 +  into the ECC hardware, and must be explicitly enabled, if so. Occasional
 +  ECC memory corrections are part of the normal operation (ionizing radiation
 +  background). Sequences of ECC corrections or outright failures over some
 +  short interval, usually indicate a memory chip, that is about to fail
 +  completely.

(that "random alpha particles" bs, must be killed anyway)

> +  TRM -- a thermal event interrupt occurs when a temperature threshold
> +  has been exceeded for some CPU chip.  This interrupt may also be generated
> +  when the temperature drops back to normal.
> +
> +  SPU -- a spurious interrupt is some interrupt that was raised then lowered
> +  by some IO device before it could be fully processed by the APIC.  Hence
> +  the APIC sees the interrupt but does not know what device it came from.
> +  For this case the APIC will generate the interrupt with a IRQ vector
> +  of 0xff.

 +  SPU -- a spurious interrupt. This is an interrupt, that was raised then 
lowered
 +  so quickly, that it was not fully processed by the APIC.  Hence,
 +  origin of it is unknown.
 +  For this case, interrupt with a IRQ vector of 0xff will be generated.

> +  RES, CAL, TLB -- rescheduling, call and tlb flush interrupts are
> +  sent from one CPU to another per the needs of the OS.  Typically,
> +  their statistics are used by kernel developers and interested users to
> +  determine the occurance of interrupt floods of the given type.

 +  RES, CAL, TLB -- rescheduling, call and tlb flush interrupts,
 +  produced by normal OS operation.  Typically,
 +  this information is used by kernel developers and interested users to
 +  determine the occurance of interrupt floods of the given type.


> +The above IRQ vectors are displayed only when relevent.  For example,
 available?
> +the threshold vector does not exist on x86_64 platforms.  Others are
> +suppressed when the system is a uniprocessor.  As of this writing, only
> +i386 and x86_64 platforms support the new IRQ vector displays.
> +
> +Of some interest is the introduction of the /proc/irq directory to 2.4.
>  It could be used to set IRQ to CPU affinity, this means that you can "hook" 
> an
>  IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of 
> the
>  irq subdir is one subdir for each IRQ, and one file; prof_cpu_mask
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Steven Rostedt

[took off Ingo, because he has my ISP blacklisted, and I'm tired of
getting those return mail messages. He can read LKML or you can re-CC
him. Sad since this is a topic he should be reading. ]

--
On Fri, 21 Sep 2007, Paul E. McKenney wrote:

> On Fri, Sep 21, 2007 at 09:15:03PM -0400, Steven Rostedt wrote:
> > On Fri, 21 Sep 2007, Paul E. McKenney wrote:
> > > On Fri, Sep 21, 2007 at 10:40:03AM -0400, Steven Rostedt wrote:
> > > > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote:
>
> [ . . . ]
>
> > > > > + /*
> > > > > +  * Take the next transition(s) through the RCU grace-period
> > > > > +  * flip-counter state machine.
> > > > > +  */
> > > > > +
> > > > > + switch (rcu_try_flip_state) {
> > > > > + case rcu_try_flip_idle_state:
> > > > > + if (rcu_try_flip_idle())
> > > > > + rcu_try_flip_state = rcu_try_flip_waitack_state;
> > > >
> > > > Just trying to understand all this. Here at flip_idle, only a CPU with
> > > > no pending RCU calls will flip it. Then all the cpus flags will be set
> > > > to rcu_flipped, and the ctrl.completed counter is incremented.
> > >
> > > s/no pending RCU calls/at least one pending RCU call/, but otherwise
> > > spot on.
> > >
> > > So if the RCU grace-period machinery is idle, the first CPU to take
> > > a scheduling-clock interrupt after having posted an RCU callback will
> > > get things going.
> >
> > I said 'no' becaues of this:
> >
> > +rcu_try_flip_idle(void)
> > +{
> > +   int cpu;
> > +
> > +   RCU_TRACE_ME(rcupreempt_trace_try_flip_i1);
> > +   if (!rcu_pending(smp_processor_id())) {
> > +   RCU_TRACE_ME(rcupreempt_trace_try_flip_ie1);
> > +   return 0;
> > +   }
> >
> > But now I'm a bit more confused. :-/
> >
> > Looking at the caller in kernel/timer.c I see
> >
> > if (rcu_pending(cpu))
> > rcu_check_callbacks(cpu, user_tick);
> >
> > And rcu_check_callbacks is the caller of rcu_try_flip. The confusion is
> > that we call this when we have a pending rcu, but if we have a pending
> > rcu, we won't flip the counter ??
>
> We don't enter unless there is something for RCU to do (might be a
> pending callback, for example, but might also be needing to acknowledge
> a counter flip).  If, by the time we get to rcu_try_flip_idle(), there
> is no longer anything to do (!rcu_pending()), we bail.
>
> So a given CPU kicks the state machine out of idle only if it -still-
> has something to do once it gets to rcu_try_flip_idle(), right?
>

Now I can slap my forehead!  Duh, I wasn't seeing that ! in front of the
rcu_pending condition in the rcu_try_flip_idle.  We only flip if we do
indeed have something pending. I need some sleep. I also need to
re-evaluate some of my analysis of that code. But it doesn't change my
opinion of the stages.

> >
> > Are we sure that adding all these grace periods stages is better than just
> > biting the bullet and put in a memory barrier?
>
> Good question.  I believe so, because the extra stages don't require
> much additional processing, and because the ratio of rcu_read_lock()
> calls to the number of grace periods is extremely high.  But, if I
> can prove it is safe, I will certainly decrease GP_STAGES or otherwise
> optimize the state machine.

But until others besides yourself understand that state machine (doesn't
really need to be me) I would be worried about applying it without
barriers.  The barriers may add a bit of overhead, but it adds some
confidence in the code.  I'm arguing that we have barriers in there until
there's a fine understanding of why we fail with 3 stages and not 4.
Perhaps you don't have a box with enough cpus to fail at 4.

I don't know how the higher ups in the kernel command line feel, but I
think that memory barriers on critical sections are justified. But if you
can show a proof that adding extra stages is sufficient to deal with
CPUS moving memory writes around, then so be it. But I'm still not
convinced that these extra stages are really solving the bug instead of
just making it much less likely to happen.

Ingo praised this code since it had several years of testing in the RT
tree. But that version has barriers, so this new verison without the
barriers has not had that "run it through the grinder" feeling to it.

>
> [ . . . ]
>
> > > > OK, that's all I have on this patch (will take a bit of a break before
> > > > reviewing your other patches).  But I will say that RCU has grown quite
> > > > a bit, and is looking very good.
> > >
> > > Glad you like it, and thank you again for the careful and thorough review.
> >
> > I'm scared to do the preempt portion %^O
>
> Ummm...  This -was- the preempt portion.  ;-)

hehe, I do need sleep I meant the boosting portion.

>
> > > > Basically, what I'm saying is "Great work, Paul!".  This is looking
> > > > good. Seems that we just need a little bit better explanation for those
> > > > that are not up at the IQ level of you.  I can write som

Re: [PATCH] [11/45] x86_64: Remove rogue default m in drivers/video/Kconfig

2007-09-21 Thread Len Brown

Acked-by: Len Brown <[EMAIL PROTECTED]>

Sorry, i thought we fixed this earlier.

thanks,
-Len

On Friday 21 September 2007 16:44, Andi Kleen wrote:
> 
> default m is near always wrong, like here. For some reason ACPI
> likes to reintroduce these and I like to immediately squash them again 
> before they pollute too many .configs.
> 
> Cc: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> 
> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
> 
> ---
>  drivers/video/Kconfig |1 -
>  1 file changed, 1 deletion(-)
> 
> Index: linux/drivers/video/Kconfig
> ===
> --- linux.orig/drivers/video/Kconfig
> +++ linux/drivers/video/Kconfig
> @@ -14,7 +14,6 @@ config VGASTATE
>  
>  config VIDEO_OUTPUT_CONTROL
>   tristate "Lowlevel video output switch controls"
> - default m
>   help
> This framework adds support for low-level control of the video 
> output switch.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Steven Rostedt


[ sneaks away from the family for a bit to answer emails ]

--
On Fri, 21 Sep 2007, Paul E. McKenney wrote:

> On Fri, Sep 21, 2007 at 09:19:22PM -0400, Steven Rostedt wrote:
> >
> > --
> > On Fri, 21 Sep 2007, Paul E. McKenney wrote:
> > > >
> > > > In any case, I will be looking at the scenarios more carefully.  If
> > > > it turns out that GP_STAGES can indeed be cranked down a bit, well,
> > > > that is an easy change!  I just fired off a POWER run with GP_STAGES
> > > > set to 3, will let you know how it goes.
> > >
> > > The first attempt blew up during boot badly enough that ABAT was unable
> > > to recover the machine (sorry, grahal!!!).  Just for grins, I am trying
> > > it again on a machine that ABAT has had a better record of reviving...
> >
> > This still frightens the hell out of me. Going through 15 states and
> > failing. Seems the CPU is holding off writes for a long long time. That
> > means we flipped the counter 4 times, and that still wasn't good enough?
>
> Might be that the other machine has its 2.6.22 version of .config messed
> up.  I will try booting it on a stock 2.6.22 kernel when it comes back
> to life -- not sure I ever did that before.  Besides, the other similar
> machine seems to have gone down for the count, but without me torturing
> it...
>
> Also, keep in mind that various stages can "record" a memory misordering,
> for example, by incrementing the wrong counter.
>
> > Maybe I'll boot up my powerbook to see if it has the same issues.
> >
> > Well, I'm still finishing up on moving into my new house, so I wont be
> > available this weekend.
>
> The other machine not only booted, but has survived several minutes of
> rcutorture thus far.  I am also trying POWER5 machine as well, as the
> one currently running is a POWER4, which is a bit less aggressive about
> memory reordering than is the POWER5.
>
> Even if they pass, I refuse to reduce GP_STAGES until proven safe.
> Trust me, you -don't- want to be unwittingly making use of a subtely
> busted RCU implementation!!!

I totally agree. This is the same reason I want to understand -why- it
fails with 3 stages. To make sure that adding a 4th stage really does fix
it, and doesn't just make the chances for the bug smaller.

I just have that funny feeling that we are band-aiding this for POWER with
extra stages and not really solving the bug.

I could be totally out in left field on this. But the more people have a
good understanding of what is happening (this includes why things fail)
the more people in general can trust this code.  Right now I'm thinking
you may be the only one that understands this code enough to trust it. I'm
just wanting you to help people like me to trust the code by understanding
and not just having faith in others.

If you ever decide to give up jogging, we need to make sure that there are
people here that can still fill those running shoes (-:


 -- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1: Build failure on ppc64 drivers/ata/pata_scc.c

2007-09-21 Thread Satyam Sharma

Hi,


On Thu, 20 Sep 2007, Alan Cox wrote:
> 
> On Thu, 20 Sep 2007 14:13:15 +0100
> [EMAIL PROTECTED] (Mel Gorman) wrote:
> 
> > PPC64 building allmodconfig fails to compile drivers/ata/pata_scc.c . It
> > doesn't show up on other arches because this driver is specific to the
> > architecture.
> > 
> > drivers/ata/pata_scc.c: In function `scc_bmdma_status'
> 
> Its not been updated to match the libata core changes. Try something like
> this. Whoever is maintaining it should also remove the prereset cable handling
> code and use the proper cable detect method.

It appears you forgot to fix scc_std_softreset() and one of its callsites
in scc_bdma_stop(). A complete patch is attempted below -- please review.


[PATCH -mm] pata_scc: Keep up with libata core API changes

Little fixlets, that the build started erroring / warning about:

drivers/ata/pata_scc.c: In function 'scc_bmdma_status':
drivers/ata/pata_scc.c:734: error: structure has no member named 'active_tag'
drivers/ata/pata_scc.c: In function 'scc_pata_prereset':
drivers/ata/pata_scc.c:866: warning: passing arg 1 of 'ata_std_prereset' from 
incompatible pointer type
drivers/ata/pata_scc.c: In function 'scc_error_handler':
drivers/ata/pata_scc.c:908: warning: passing arg 2 of 'ata_bmdma_drive_eh' from 
incompatible pointer type
drivers/ata/pata_scc.c:908: warning: passing arg 3 of 'ata_bmdma_drive_eh' from 
incompatible pointer type
drivers/ata/pata_scc.c:908: warning: passing arg 5 of 'ata_bmdma_drive_eh' from 
incompatible pointer type
make[2]: *** [drivers/ata/pata_scc.o] Error 1

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Cc: Alan Cox <[EMAIL PROTECTED]>
Cc: Mel Gorman <[EMAIL PROTECTED]>

---

Andrew, this includes (supercedes) the previous two ones from Mel / Alan.

 drivers/ata/pata_scc.c |   21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff -ruNp a/drivers/ata/pata_scc.c b/drivers/ata/pata_scc.c
--- a/drivers/ata/pata_scc.c2007-09-22 06:26:37.0 +0530
+++ b/drivers/ata/pata_scc.c2007-09-22 08:04:42.0 +0530
@@ -594,16 +594,17 @@ static unsigned int scc_bus_softreset(st
  * Note: Original code is ata_std_softreset().
  */
 
-static int scc_std_softreset (struct ata_port *ap, unsigned int *classes,
-  unsigned long deadline)
+static int scc_std_softreset(struct ata_link *link, unsigned int *classes,
+ unsigned long deadline)
 {
+   struct ata_port *ap = link->ap;
unsigned int slave_possible = ap->flags & ATA_FLAG_SLAVE_POSS;
unsigned int devmask = 0, err_mask;
u8 err;
 
DPRINTK("ENTER\n");
 
-   if (ata_link_offline(&ap->link)) {
+   if (ata_link_offline(link)) {
classes[0] = ATA_DEV_NONE;
goto out;
}
@@ -692,7 +693,7 @@ static void scc_bmdma_stop (struct ata_q
printk(KERN_WARNING "%s: Internal Bus Error\n", 
DRV_NAME);
out_be32(bmid_base + SCC_DMA_INTST, INTSTS_BMSINT);
/* TBD: SW reset */
-   scc_std_softreset(ap, &classes, deadline);
+   scc_std_softreset(&ap->link, &classes, deadline);
continue;
}
 
@@ -731,7 +732,7 @@ static u8 scc_bmdma_status (struct ata_p
void __iomem *mmio = ap->ioaddr.bmdma_addr;
u8 host_stat = in_be32(mmio + SCC_DMA_STATUS);
u32 int_status = in_be32(mmio + SCC_DMA_INTST);
-   struct ata_queued_cmd *qc = ata_qc_from_tag(ap, ap->active_tag);
+   struct ata_queued_cmd *qc = ata_qc_from_tag(ap, ap->link.active_tag);
static int retry = 0;
 
/* return if IOS_SS is cleared */
@@ -860,10 +861,10 @@ static void scc_bmdma_freeze (struct ata
  * @deadline: deadline jiffies for the operation
  */
 
-static int scc_pata_prereset(struct ata_port *ap, unsigned long deadline)
+static int scc_pata_prereset(struct ata_link *link, unsigned long deadline)
 {
-   ap->cbl = ATA_CBL_PATA80;
-   return ata_std_prereset(ap, deadline);
+   link->ap->cbl = ATA_CBL_PATA80;
+   return ata_std_prereset(link, deadline);
 }
 
 /**
@@ -874,8 +875,10 @@ static int scc_pata_prereset(struct ata_
  * Note: Original code is ata_std_postreset().
  */
 
-static void scc_std_postreset (struct ata_port *ap, unsigned int *classes)
+static void scc_std_postreset(struct ata_link *link, unsigned int *classes)
 {
+   struct ata_port *ap = link->ap;
+
DPRINTK("ENTER\n");
 
/* is double-select really necessary? */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Killing printk calls for size (Re: [PATCH] [6/50] i386: clean up oops/bug reports)

2007-09-21 Thread Oleg Verych

* Sat, 22 Sep 2007 00:32:04 +0200 (CEST)
[]
>  arch/i386/kernel/traps.c |   16 
>  arch/i386/mm/fault.c |   13 +++--
>  2 files changed, 11 insertions(+), 18 deletions(-)

It seems, like size can be reduced even more now:

[]
>   report_bug(regs->eip, regs);
>  
> - printk(KERN_EMERG "%s: %04lx [#%d]\n", str, err & 0x, 
> ++die_counter);
> + printk(KERN_EMERG "%s: %04lx [#%d] ", str, err & 0x, 
> ++die_counter);

  + printk(KERN_EMERG "%s: %04lx [#%d] %s", str, err &0x, 
++die_counter,

>  #ifdef CONFIG_PREEMPT
> - printk(KERN_EMERG "PREEMPT ");
> - nl = 1;
> + printk("PREEMPT ");

  + "PREEMPT "\

>  #endif
>  #ifdef CONFIG_SMP
> - if (!nl)
> - printk(KERN_EMERG);
>   printk("SMP ");

"SMP "\

> - nl = 1;
>  #endif
>  #ifdef CONFIG_DEBUG_PAGEALLOC
> - if (!nl)
> - printk(KERN_EMERG);
>   printk("DEBUG_PAGEALLOC");

"DEBUG_PAGEALLOC"\

> - nl = 1;
>  #endif
> - if (nl)
> - printk("\n");
> + printk("\n");

  + "\n");

Just hand waving.

FWIW, with more flexible kconfig, ifdiffery can be removed also...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux 2.6.23-rc7 - 14 compile warnings

2007-09-21 Thread WANG Cong

On Fri, Sep 21, 2007 at 10:33:56AM +0100, Dave Haywood wrote:
>Contents:
>   ver_linux output
>   Summary of compile warnings
>   Full compile log
>   .config
>
>
>
>Linux s1 2.6.23-rc7-g335fb8fc #9 SMP Fri Sep 21 09:31:01 BST 2007 i686 Pentium 
>III (Coppermine) GenuineIntel GNU/Linux
>
>Gnu C  4.2.0
>Gnu make   3.81
>binutils   2.17
>util-linux 2.12r
>mount  2.12r
>module-init-tools  3.2.2
>e2fsprogs  1.40.2
>PPP2.4.4
>Linux C Library2.5
>Dynamic linker (ldd)   2.5
>Procps 3.2.7
>Net-tools  1.60
>Kbd1.12
>Sh-utils   6.9
>udev   114
>
>
>
>Summary:
>  CC  mm/slub.o
>mm/slub.c: In function 'kfree':
>mm/slub.c:2491: warning: passing argument 3 of 'slab_free' discards
>qualifiers from pointer target type
>
>  CC  fs/autofs4/symlink.o
>fs/autofs4/symlink.c: In function 'autofs4_follow_link':
>fs/autofs4/symlink.c:18: warning: passing argument 2 of 'nd_set_link'
>discards qualifiers from pointer target type

These two warnings are suspicious. Explicit casts are already there, how
they come out? Or gcc bugs?

{snip}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[-rc7 Patch] fs/isofs/namei.c: mark variables as uninitialized_var

2007-09-21 Thread WANG Cong


Fix may-be-used-uninitialized warnings.

Signed-off-by: WANG Cong <[EMAIL PROTECTED]>

---
 fs/isofs/namei.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.23-rc7/fs/isofs/namei.c
===
--- linux-2.6.23-rc7.orig/fs/isofs/namei.c
+++ linux-2.6.23-rc7/fs/isofs/namei.c
@@ -158,7 +158,7 @@ isofs_find_entry(struct inode *dir, stru
 struct dentry *isofs_lookup(struct inode *dir, struct dentry *dentry, struct 
nameidata *nd)
 {
int found;
-   unsigned long block, offset;
+   unsigned long uninitialized_var(block), uninitialized_var(offset);
struct inode *inode;
struct page *page;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PATCH] ACPI patches for 2.6.23-rc7

2007-09-21 Thread Len Brown

Hi Linus,

Before 2.6.23, please pull from: 

git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git release

This restores some dmesg to what folks had in 2.6.22,
and prevents a possible system hang on video switching.

This will update the files shown below.

thanks!

-Len

ps. individual patches are available on [EMAIL PROTECTED]
and a consolidated plain patch is available here:
ftp://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/release/2.6.23/acpi-release-20070126-2.6.23-rc7.diff.gz

 drivers/acpi/sleep/Makefile   |2 
 drivers/acpi/sleep/main.c |   57 --
 drivers/acpi/sleep/poweroff.c |   75 --
 drivers/acpi/video.c  |3 -
 4 files changed, 53 insertions(+), 84 deletions(-)

through these commits:

Alexey Starikovskiy (1):
  ACPI: suspend: consolidate handling of Sx states.

Frans Pop (1):
  ACPI: suspend: consolidate handling of Sx states addendum

Maik Broemme (1):
  ACPI: video: remove dmesg spam

Zhang Rui (1):
  ACPI: video: _DOS=0 by default to prevent hotkey hang

with this log:

commit e5c86b5d4a517d10db89456426590ecba1597f1f
Merge: 19adc6b... 5a50fe7...
Author: Len Brown <[EMAIL PROTECTED]>
Date:   Fri Sep 21 21:55:34 2007 -0400

Pull suspend.now into release branch

commit 19adc6ba6c6a23e07617fe791db40c1b0668d123
Merge: 335fb8f... 7f10cc4...
Author: Len Brown <[EMAIL PROTECTED]>
Date:   Fri Sep 21 21:55:29 2007 -0400

Pull now into release branch

commit 5a50fe709d527f31169263e36601dd83446d5744
Author: Frans Pop <[EMAIL PROTECTED]>
Date:   Thu Sep 20 22:27:44 2007 +0200

ACPI: suspend: consolidate handling of Sx states addendum

Make the S0 state be always reported as supported

Signed-off: Frans Pop <[EMAIL PROTECTED]>
Acked-by: Alexey Starikovskiy <[EMAIL PROTECTED]>
Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
Signed-off-by: Len Brown <[EMAIL PROTECTED]>

commit f216cc3748a3a22c2b99390fddcdafa0583791a2
Author: Alexey Starikovskiy <[EMAIL PROTECTED]>
Date:   Thu Sep 20 21:32:35 2007 +0400

ACPI: suspend: consolidate handling of Sx states.

Recent changes to sleep initialization in ACPI dropped reporting of 
supported Sx
states above S3. Fix that and also move S5 init into same file as other Sx.
The only functional change is adding printk() for S4 and S5 cases.

Signed-off-by: Alexey Starikovskiy <[EMAIL PROTECTED]>
Acked-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
Signed-off-by: Len Brown <[EMAIL PROTECTED]>

commit 7f10cc4e838c2b2d7272031954c56c407569d497
Author: Maik Broemme <[EMAIL PROTECTED]>
Date:   Fri Sep 14 22:12:34 2007 +0200

ACPI: video: remove dmesg spam

i am actually heavily using the ACPI video extension for my Thinkpad X61
Tablet. I have bound the input events triggered by the brightness
up/down keys to a simple

echo  > /sys/class/backlight/acpi_video1/brightness

but everytime the event is triggered and acpi_video_device_lcd_set_level()
is called i got a notificication in my kernel log like:

set_level status: 0
set_level status: 0
set_level status: 0
set_level status: 0
...

Signed-off-by: Maik Broemme <[EMAIL PROTECTED]>
Signed-off-by: Len Brown <[EMAIL PROTECTED]>

commit a21101c46ca5b4320e31408853cdcbf7cb1ce4ed
Author: Zhang Rui <[EMAIL PROTECTED]>
Date:   Fri Sep 14 11:46:22 2007 +0800

ACPI: video: _DOS=0 by default to prevent hotkey hang

In the past, the Linux/ACPI video driver invoked _DOS
(Display Output Switch) with the parameter 1
to tell the BIOS to switch the video output display for us.

But this conflicts with Linux native graphics drivers,
and can cause all sorts of issues, including hanging the system.

http://bugzilla.kernel.org/show_bug.cgi?id=6001

Here we change the Linux default to evaluate _DOS=0,
which tells the BIOS to simply send us a hotkey event
and not touch the graphics hardware.

The acpi video driver sends the display switch hotkey
event up through the intput layer, and X can interpret
that and use its native graphics driver to switch the display.

For the case where Linux has no native graphics driver running,
or the graphics driver doesn't know how to switch video and
the BIOS (safely) does, the previous behaviour can be restored with:

# echo 1 > /proc/acpi/video/*/DOS

Signed-off-by: Zhang Rui <[EMAIL PROTECTED]>
Signed-off-by: Len Brown <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'

2007-09-21 Thread Fengguang Wu

On Thu, Sep 20, 2007 at 12:31:39PM +0100, Hugh Dickins wrote:
> On Wed, 19 Sep 2007, Peter Zijlstra wrote:
> > On Wed, 19 Sep 2007 21:03:19 +0100 (BST) Hugh Dickins
> > <[EMAIL PROTECTED]> wrote:
> > 
> > > On Wed, 19 Sep 2007, Andy Whitcroft wrote:
> > > > Seems I have a case of a largish i386 NUMA (NUMA-Q) which has a mkfs
> > > > stuck in a 'D' wait:
> > > > 
> > > >  ===
> > > > mkfs.ext2 D c10220f4 0  6233   6222
> > > >  [] io_schedule_timeout+0x1e/0x28
> > > >  [] congestion_wait+0x62/0x7a
> > > >  [] get_dirty_limits+0x16a/0x172
> > > >  [] balance_dirty_pages+0x154/0x1be
> > > >  [] generic_perform_write+0x168/0x18a
> > > >  [] generic_file_buffered_write+0x73/0x107
> > > >  [] __generic_file_aio_write_nolock+0x47a/0x4a5
> > > >  [] generic_file_aio_write_nolock+0x48/0x9b
> > > >  [] do_sync_write+0xbf/0xfc
> > > >  [] vfs_write+0x8d/0x108
> > > >  [] sys_write+0x41/0x67
> > > >  [] syscall_call+0x7/0xb
> > > >  ===
> > > 
> > > [edited out some bogus lines from stale stack]
> > > 
> > > > This machine and others have run numerous test runs on this kernel and
> > > > this is the first time I've see a hang like this.
> > > 
> > > I've been seeing something like that on 4-way PPC64: in my case I've
> > > shells hanging in D state trying to append to kernel build log on ext3
> > > (the builds themselves going on elsewhere, in tmpfs): one of the shells
> > > holding i_mutex and stuck doing congestion_waits from balance_dirty_pages.
> > > 
> > > > I wonder if this is the ultimate cause of the couple of mainline hangs
> > > > which were seen, but not diagnosed.
> > > 
> > > My *guess* is that this is peculiar to 2.6.23-rc6-mm1, and from Peter's
> > > mm-per-device-dirty-threshold.patch.  printks showed bdi_nr_reclaimable
> > > 0, bdi_nr_writeback 24, bdi_thresh 1 in balance_dirty_pages (though I've
> > > not done enough to check if those really correlate with the hangs),
> > > and I'm wondering if the bdi_stat_sum business is needed on the
> > > !nr_reclaimable path.
> > 
> > FWIW my tired brain seems to think it the !nr_reclaimable path needs it
> > just the same. So this change seems to make sense for now :-)
> 
> Thanks.
> 
> > > So I'm running now with the patch below, good so far, but can't judge
> > > until tomorrow whether it has actually addressed the problem seen.
> 
> Last night's run went well: that patch does indeed seem to have fixed it.
> Looking at the timings (some variance but _very_ much less than the night
> before), there does appear to be some other occasional slight slowdown -
> but I've no reason to suspect your patch for it, nor to suppose it's
> something new: it may just be an artifact of my heavy swap thrashing.
> 
> 
> [PATCH mm] mm per-device dirty threshold fix
> 
> Fix occasional hang when a task couldn't get out of balance_dirty_pages:
> mm-per-device-dirty-threshold.patch needs to reevaluate bdi_nr_writeback
> across all cpus when bdi_thresh is low, even in the case when there was
> no bdi_nr_reclaimable.
> 
> Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]>

Thank you Hugh. I ran into similar problems with many dd(large file)
operations.  This patch seems to fix it.

But now my desktop was locked up again when writing a lot of small
files. The problem is repeatable with the command
 $ ketchup 2.6.23-rc6-mm1

I writeup two debug patches:

---
 mm/page-writeback.c |9 +
 1 file changed, 9 insertions(+)

--- linux-2.6.22.orig/mm/page-writeback.c
+++ linux-2.6.22/mm/page-writeback.c
@@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a
bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
}
 
+   printk(KERN_DEBUG "balance_dirty_pages written %lu %lu 
congested %d limits %lu %lu %lu %lu %lu %ld\n",
+   pages_written,
+   write_chunk - wbc.nr_to_write,
+   bdi_write_congested(bdi),
+   background_thresh, dirty_thresh,
+   bdi_thresh, bdi_nr_reclaimable, 
bdi_nr_writeback,
+   bdi_thresh - bdi_nr_reclaimable - 
bdi_nr_writeback);
+
if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
break;
if (pages_written >= write_chunk)

---
 mm/page-writeback.c |5 +
 1 file changed, 5 insertions(+)

--- linux-2.6.22.orig/mm/page-writeback.c
+++ linux-2.6.22/mm/page-writeback.c
@@ -373,6 +373,7 @@ static void balance_dirty_pages(struct a
long bdi_thresh;
unsigned long pages_written = 0;
unsigned long write_chunk = sync_writeback_pages();
+   int i = 0;
 
struct backing_dev_info *bdi = mapping->backing_dev_info;
 
@@ -434,6 +435,10 @@ static void balance_dirty_pages(struct a
bdi_thresh, bdi_nr_reclaimable, 
bdi_nr_writeback,
bdi_thresh - bdi

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Paul E. McKenney

On Fri, Sep 21, 2007 at 09:15:03PM -0400, Steven Rostedt wrote:
> On Fri, 21 Sep 2007, Paul E. McKenney wrote:
> > On Fri, Sep 21, 2007 at 10:40:03AM -0400, Steven Rostedt wrote:
> > > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote:

[ . . . ]

> > > > +   /*
> > > > +* Take the next transition(s) through the RCU grace-period
> > > > +* flip-counter state machine.
> > > > +*/
> > > > +
> > > > +   switch (rcu_try_flip_state) {
> > > > +   case rcu_try_flip_idle_state:
> > > > +   if (rcu_try_flip_idle())
> > > > +   rcu_try_flip_state = rcu_try_flip_waitack_state;
> > >
> > > Just trying to understand all this. Here at flip_idle, only a CPU with
> > > no pending RCU calls will flip it. Then all the cpus flags will be set
> > > to rcu_flipped, and the ctrl.completed counter is incremented.
> >
> > s/no pending RCU calls/at least one pending RCU call/, but otherwise
> > spot on.
> >
> > So if the RCU grace-period machinery is idle, the first CPU to take
> > a scheduling-clock interrupt after having posted an RCU callback will
> > get things going.
> 
> I said 'no' becaues of this:
> 
> +rcu_try_flip_idle(void)
> +{
> +   int cpu;
> +
> +   RCU_TRACE_ME(rcupreempt_trace_try_flip_i1);
> +   if (!rcu_pending(smp_processor_id())) {
> +   RCU_TRACE_ME(rcupreempt_trace_try_flip_ie1);
> +   return 0;
> +   }
> 
> But now I'm a bit more confused. :-/
> 
> Looking at the caller in kernel/timer.c I see
> 
>   if (rcu_pending(cpu))
>   rcu_check_callbacks(cpu, user_tick);
> 
> And rcu_check_callbacks is the caller of rcu_try_flip. The confusion is
> that we call this when we have a pending rcu, but if we have a pending
> rcu, we won't flip the counter ??

We don't enter unless there is something for RCU to do (might be a
pending callback, for example, but might also be needing to acknowledge
a counter flip).  If, by the time we get to rcu_try_flip_idle(), there
is no longer anything to do (!rcu_pending()), we bail.

So a given CPU kicks the state machine out of idle only if it -still-
has something to do once it gets to rcu_try_flip_idle(), right?

[ . . . ]

> > > Is there a chance that overflow of a counter (although probably very
> > > very unlikely) would cause any problems?
> >
> > The only way it could cause a problem would be if there was ever
> > more than 4,294,967,296 outstanding rcu_read_lock() calls.  I believe
> > that lockdep screams if it sees more than 30 nested locks within a
> > single task, so for systems that support no more than 100M tasks, we
> > should be OK.  It might sometime be necessary to make this be a long
> > rather than an int.  Should we just do that now and be done with it?
> 
> Sure, why not. More and more and more overkill!!!
> 
> (rostedt hears in his head the Monty Python "Spam" song).

;-)  OK!

> > > Also, all the CPUs have their "check_mb" set.
> > >
> > > > +   rcu_try_flip_state = rcu_try_flip_waitmb_state;
> > > > +   break;
> > > > +   case rcu_try_flip_waitmb_state:
> > > > +   if (rcu_try_flip_waitmb())
> > >
> > > I have to admit that this seems a bit of an overkill, but I guess you
> > > know what you are doing.  After going through three states, we still
> > > need to do a memory barrier on each CPU?
> >
> > Yep.  Because there are no memory barriers in rcu_read_unlock(), the
> > CPU is free to reorder the contents of the RCU read-side critical section
> > to follow the counter decrement.  This means that this CPU would still
> > be referencing RCU-protected data after it had told the world that it
> > was no longer doing so.  Forcing a memory barrier on each CPU guarantees
> > that if we see the memory-barrier acknowledge, we also see any prior
> > RCU read-side critical section.
> 
> And this seem reasonable to me that this would be enough to satisfy a
> grace period. But the CPU moving around the rcu_read_(un)lock's around.
> 
> Are we sure that adding all these grace periods stages is better than just
> biting the bullet and put in a memory barrier?

Good question.  I believe so, because the extra stages don't require
much additional processing, and because the ratio of rcu_read_lock()
calls to the number of grace periods is extremely high.  But, if I
can prove it is safe, I will certainly decrease GP_STAGES or otherwise
optimize the state machine.

[ . . . ]

> > > OK, that's all I have on this patch (will take a bit of a break before
> > > reviewing your other patches).  But I will say that RCU has grown quite
> > > a bit, and is looking very good.
> >
> > Glad you like it, and thank you again for the careful and thorough review.
> 
> I'm scared to do the preempt portion %^O

Ummm...  This -was- the preempt portion.  ;-)

> > > Basically, what I'm saying is "Great work, Paul!".  This is looking
> > > good. Seems that we just need a little bit better explanation for those
> > >

Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Michael Chan

On Fri, 2007-09-21 at 10:49 -0700, David Miller wrote:
> From: Denys Vlasenko <[EMAIL PROTECTED]>
> Date: Fri, 21 Sep 2007 18:03:55 +0100
> 
> > Do patches look ok to you?
> 
> I'm travelling so I haven't looked closely yet :-)
> 
> Michael can take a look and I'll try to do so as well
> tonight.
> 

I've already reviewed the earlier versions of the patch and have made
some suggestions.  This latest one looks ok to me and tested ok.

I'll follow up later with another patch to remove all the zeros in other
firmware sections, and to remove the gzip headers completely.

Acked-by: Michael Chan <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Paul E. McKenney

On Fri, Sep 21, 2007 at 09:19:22PM -0400, Steven Rostedt wrote:
> 
> --
> On Fri, 21 Sep 2007, Paul E. McKenney wrote:
> > >
> > > In any case, I will be looking at the scenarios more carefully.  If
> > > it turns out that GP_STAGES can indeed be cranked down a bit, well,
> > > that is an easy change!  I just fired off a POWER run with GP_STAGES
> > > set to 3, will let you know how it goes.
> >
> > The first attempt blew up during boot badly enough that ABAT was unable
> > to recover the machine (sorry, grahal!!!).  Just for grins, I am trying
> > it again on a machine that ABAT has had a better record of reviving...
> 
> This still frightens the hell out of me. Going through 15 states and
> failing. Seems the CPU is holding off writes for a long long time. That
> means we flipped the counter 4 times, and that still wasn't good enough?

Might be that the other machine has its 2.6.22 version of .config messed
up.  I will try booting it on a stock 2.6.22 kernel when it comes back
to life -- not sure I ever did that before.  Besides, the other similar
machine seems to have gone down for the count, but without me torturing
it...

Also, keep in mind that various stages can "record" a memory misordering,
for example, by incrementing the wrong counter.

> Maybe I'll boot up my powerbook to see if it has the same issues.
> 
> Well, I'm still finishing up on moving into my new house, so I wont be
> available this weekend.

The other machine not only booted, but has survived several minutes of
rcutorture thus far.  I am also trying POWER5 machine as well, as the
one currently running is a POWER4, which is a bit less aggressive about
memory reordering than is the POWER5.

Even if they pass, I refuse to reduce GP_STAGES until proven safe.
Trust me, you -don't- want to be unwittingly making use of a subtely
busted RCU implementation!!!

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Message codes (Re: [Announce] Linux-tiny project revival)

2007-09-21 Thread Oleg Verych

On Fri, Sep 21, 2007 at 04:15:39PM -0500, Rob Landley wrote:
[]
> > >Not all, but critical info, that must exist in human-readable form of
> > >course.
> >
> > I disagree.  For a production product the you want minimal information
> > to reduce the communication bandwidth required between the remote
> > customer and the support organization.
> >
> > In fact there is a good argument that you don't what the remote customer
> > to know enough to start guessing.
> 
> Don't use Linux then.  Open source is a horrible fit for the way you think.
> 
> I'm sympathetic to "shrink the binary size" arguments.  I'm not really 
> sympathic to "keep the customer in the dark intentionally" arguments, whether 
> the justification is "because they're stupid", "to increase dependency on our 
> support staff", or any other reason.

{1} 

> > >Seriously. When in the Windows there are only messages like:
> > >
> > >"Error (Code:0x2012)".
> >
> > Now it's been ~8 years since I did any serious windows work, but if I
> > recall correctly ALL THE FRICKING TIME!!! When was the last time you've
> > seen a bug check on windows?  This is about all you get.
> 
> I believe he was holding it up as a bad example, and definitely not something 
> we want to emulate.

I tried to show, that keeping users in compete information vacuum is a
bad thing. Even without sources, _configuration_ makes another area of
mis-working and bugs, usually addressed by reinstalling.

That may be bad example, because here talk is about developers and
testers, who are not just ordinary users. And by applying Torvalds's Law,
all users are such in some degree. That's why {1} in your reply, Rob,
makes perfect sense.

If Mark have a bad experience with lusers only, then i just can say: what
a pity! AFAIK nobody can read somebody's plain-bin OOP output.

Anyway, anything must be opted by config options, even schedulers. But
maintenance and flame wars rule otherwise :).

What i can propose is form of binary-only "printk", where all info:
diagnostic, error, bug, statistics messages (in not debugging
environment, of course), is just fed right to output buffer (see,
pa, no kmallocs). Info itself must have structured content, that makes it
easy to extract and locate human-readable representation of both message
and data.

This doesn't address loglevels, though.

Implementation (seems) as easy as feeding output to `od` to have
unambiguous form of various troublesome bytes, like "0x00" and "0x0A",

Structuring, who is printing, i.e. arch code, fs, driver whatever, must be
agreed:

*  Profiles[0]: originator's ID of a message is a byte (or word, or double word)
   0x01 - arch, 0x02 - fs, 0x03 - net, 0x04 - hw drivers, etc.

*  Data itself can be sent in form of [0]

[0] Banana -- extendable protocol for sending and receiving s-expressions

http://twistedmatrix.com/projects/core/documentation/specifications/banana.html

and having shell script with functions, that have names that correspond
to actual structured content:
_*_
[EMAIL PROTECTED]:/tmp$ sh banana.sh < banana.c >bb
[EMAIL PROTECTED]:/tmp$ sh -c '. ./bb ; _07080'
start
[EMAIL PROTECTED]:/tmp$ sh -c '. ./bb ; _07081'
ti_startup - product , num configurations 0, configuration value 0
[EMAIL PROTECTED]:/tmp$ sh -c '. ./bb ; _07082'
not reached
[EMAIL PROTECTED]:/tmp$
[EMAIL PROTECTED]:/tmp$ sh -c '. ./bb ; _07081 777 7 8'
ti_startup - product 0x0309, num configurations 7, configuration value 8
[EMAIL PROTECTED]:/tmp$

_(banana.c and banana.sh can be found in the ftp /upload on my server)_

>From file linux/drivers/usb/serial/ti_usb*c with

[...]
dbg("%s - product 0x%4X, num configurations %d, configuration value %d",
__FUNCTION__, le16_to_cpu(dev->descriptor.idProduct),
dev->descriptor.bNumConfigurations,
dev->actconfig->desc.bConfigurationValue);
[...]


lets tacke one particular function (transformed a little bit):
_*_

#include 

#define dbg printf
#define ti_startup(foo) main (int argc, char **argv)
#define dev_descriptor_idProduct3
#define dev_descriptor_bNumConfigurations   4
#define dev_actconfig_desc_bConfigurationValue  5

/* declaration */
int ti_startup(void);

/* implementation */
int ti_startup(void)
{
dbg("start\n");

return dbg("%s - product %#.4x, num configurations %d, "
"configuration value %d\n",
__FUNCTION__, dev_descriptor_idProduct,
dev_descriptor_bNumConfigurations,
dev_actconfig_desc_bConfigurationValue);

/* bla bla */
dbg("not reached\n");
}
_*_

* Process this file with this script: *

_*_

# just as an example
USB_SERIAL_ID=07
TI_USB_ID=08
__FILE__="ti_usb_3410_5052.c" # possible
i=0

sed -n '
# finding function body
/^[[:alpha:]]/{
# found, print it for __FUNCTION__ keyword
s_[^ ]* *\([^ (]*\).*[^;]$_\1_p;
t_func ; b ;
# walking inside of a function
:_func;
# load ne

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Steven Rostedt

--
On Fri, 21 Sep 2007, Paul E. McKenney wrote:
> >
> > In any case, I will be looking at the scenarios more carefully.  If
> > it turns out that GP_STAGES can indeed be cranked down a bit, well,
> > that is an easy change!  I just fired off a POWER run with GP_STAGES
> > set to 3, will let you know how it goes.
>
> The first attempt blew up during boot badly enough that ABAT was unable
> to recover the machine (sorry, grahal!!!).  Just for grins, I am trying
> it again on a machine that ABAT has had a better record of reviving...

This still frightens the hell out of me. Going through 15 states and
failing. Seems the CPU is holding off writes for a long long time. That
means we flipped the counter 4 times, and that still wasn't good enough?

Maybe I'll boot up my powerbook to see if it has the same issues.

Well, I'm still finishing up on moving into my new house, so I wont be
available this weekend.

Thanks,

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Steven Rostedt


On Fri, 21 Sep 2007, Paul E. McKenney wrote:

> On Fri, Sep 21, 2007 at 10:40:03AM -0400, Steven Rostedt wrote:
> > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote:
>
> Covering the pieces that weren't in Peter's reply.  ;-)
>
> And thank you -very- much for the careful and thorough review!!!
>
> > >  #endif /* __KERNEL__ */
> > >  #endif /* __LINUX_RCUCLASSIC_H */
> > > diff -urpNa -X dontdiff 
> > > linux-2.6.22-b-fixbarriers/include/linux/rcupdate.h 
> > > linux-2.6.22-c-preemptrcu/include/linux/rcupdate.h
> > > --- linux-2.6.22-b-fixbarriers/include/linux/rcupdate.h   2007-07-19 
> > > 14:02:36.0 -0700
> > > +++ linux-2.6.22-c-preemptrcu/include/linux/rcupdate.h2007-08-22 
> > > 15:21:06.0 -0700
> > > @@ -52,7 +52,11 @@ struct rcu_head {
> > >   void (*func)(struct rcu_head *head);
> > >  };
> > >
> > > +#ifdef CONFIG_CLASSIC_RCU
> > >  #include 
> > > +#else /* #ifdef CONFIG_CLASSIC_RCU */
> > > +#include 
> > > +#endif /* #else #ifdef CONFIG_CLASSIC_RCU */
> >
> > A bit extreme on the comments here.
>
> My fingers do this without any help from the rest of me, but I suppose
> it is a bit of overkill in this case.

Heck, why stop the overkill here, the whole patch is overkill ;-)

> > > +
> > > +#define GP_STAGES 4
> >
> > I take it that GP stand for "grace period". Might want to state that
> > here. /* Grace period stages */  When I was looking at this code at 1am,
> > I kept asking myself "what's this GP?" (General Protection??). But
> > that's what happens when looking at code like this after midnight ;-)
>
> Good point, will add a comment.  You did get it right, "grace period".

Thanks, so many places in the kernel have acronyms that are just suppose
to be "obvious". I hate them, because they make me feel so stupid when I
don't know what they are. After I find out, I usually slap my forehead and
say "duh!". My mind is set on reading code, not deciphering TLAs.


> >
> > Can you have a pointer somewhere that explains these states. And not a
> > "it's in this paper or directory". Either have a short discription here,
> > or specify where exactly to find the information (perhaps a
> > Documentation/RCU/preemptible_states.txt?).
> >
> > Trying to understand these states has caused me the most agony in
> > reviewing these patches.
>
> Good point, perhaps a comment block above the enum giving a short
> description of the purpose of each state.  Maybe more detail in
> Documentation/RCU as well, as you suggest above.

That would be great.

> > > +
> > > +/*
> > > + * Return the number of RCU batches processed thus far.  Useful for debug
> > > + * and statistics.  The _bh variant is identical to straight RCU.
> > > + */
> >
> > If they are identical, then why the separation?
>
> I apologize for the repetition in this email.
>
> I apologize for the repetition in this email.
>
> I apologize for the repetition in this email.
>
> Yep, will fix with either #define or static inline, as you suggested
> in a later email.

you're starting to sound like me ;-)

> > > + struct task_struct *me = current;
> >
> > Nitpick, but other places in the kernel usually use "t" or "p" as a
> > variable to assign current to.  It's just that "me" thows me off a
> > little while reviewing this.  But this is just a nitpick, so do as you
> > will.
>
> Fair enough, as discussed earlier.

Who's on first, What's on second, and I-dont-know is on third.

> > > + unsigned long oldirq;
> >
> > Nitpick, "flags" is usually used for saving irq state.
>
> A later patch in the series fixes these -- I believe I got all of them.
> (The priority-boost patch, IIRC.)

OK

>
> > > +
> > > + /*
> > > +  * Disable local interrupts to prevent the grace-period
> > > +  * detection state machine from seeing us half-done.
> > > +  * NMIs can still occur, of course, and might themselves
> > > +  * contain rcu_read_lock().
> > > +  */
> > > +
> > > + local_irq_save(oldirq);
> >
> > Isn't the GP detection done via a tasklet/softirq. So wouldn't a
> > local_bh_disable be sufficient here? You already cover NMIs, which would
> > also handle normal interrupts.
>
> We beat this into the ground in other email.

Nothing like kicking a dead horse on LKML ;-)

> > > +
> > > + /*
> > > +  * It is now safe to decrement this task's nesting count.
> > > +  * NMIs that occur after this statement will route their
> > > +  * rcu_read_lock() calls through this "else" clause, and
> > > +  * will thus start incrementing the per-CPU coutner on
> >
> > s/coutner/counter/
>
> wlli fxi!!!

snousd oogd

> > > +
> > > +/*
> > > + * Attempt a single flip of the counters.  Remember, a single flip does
> > > + * -not- constitute a grace period.  Instead, the interval between
> > > + * at least three consecutive flips is a grace period.
> > > + *
> > > + * If anyone is nuts enough to run this CONFIG_PREEMPT_RCU implementation
> >
> > Oh, come now! It

Re: MTRR initialization

2007-09-21 Thread Howard Chu


Siddha, Suresh B wrote:

On Fri, Sep 14, 2007 at 09:33:30AM -0700, Howard Chu wrote:

So now I have this, which is pretty much
what
I wanted:

reg00: base=0x (   0MB), size=2048MB: write-back, count=1
reg01: base=0x8000 (2048MB), size=1024MB: write-back, count=1
reg02: base=0x1 (4096MB), size=1024MB: write-back, count=1
reg03: base=0xc000 (3072MB), size=1024MB: uncachable, count=1
reg04: base=0xc000 (3072MB), size= 256MB: write-combining, count=1
reg05: base=0xd000 (3328MB), size= 256MB: write-combining, count=1


BTW, having overlapping WC, UC regions make the end result UC. So in this
case, you may not be getting the desired performance.


Thanks, I noticed that later. I simply deleted the UC mapping since it was no 
longer needed.

--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sunhttp://highlandsun.com/hyc/
  Chief Architect, OpenLDAP http://www.openldap.org/project/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 2.6.23-rc7 0/3] async_tx and md-accel fixes for 2.6.23

2007-09-21 Thread Williams, Dan J

> From: Neil Brown [mailto:[EMAIL PROTECTED]
> On Friday September 21, [EMAIL PROTECTED] wrote:
> > On Thu, 20 Sep 2007 18:27:35 -0700
> > Dan Williams <[EMAIL PROTECTED]> wrote:
> >
> > > Fix a couple bugs and provide documentation for the async_tx api.
> > >
> > > Neil, please 'ack' patch #3.
> > >
> > > git://lost.foo-projects.org/~dwillia2/git/iop
async-tx-fixes-for-linus
> >
> > Well it looks like Neil is on vacation or is hiding from us or
something.
> 
> Neil is just not coping well with jet-lag
> 
> Patch #3 looks good and necessary
>   Acked-By: NeilBrown <[EMAIL PROTECTED]>
> 
> I know that should probably be a "reviewed-by"  I was a bit
I went ahead and added reviewed-by.

> surprised that the "handle_completed_read_requests" call was so early
> in handle_stripe5 - I don't think the code was originally that early.
It is slightly earlier than 2.6.22 (outside the '/* now count some
things */' loop) to make sure the R5_Wantfill flags from the last
request have been cleared before starting a new one:

/* maybe we can request a biofill operation
 *
 * new wantfill requests are only permitted while
 * STRIPE_OP_BIOFILL is clear
 */
if (test_bit(R5_UPTODATE, &dev->flags) && dev->toread &&
!test_bit(STRIPE_OP_BIOFILL, &sh->ops.pending))
set_bit(R5_Wantfill, &dev->flags);

> But it is probably right.   Hopefully my brain will have cleared by
> Monday and I'll review it again then.
>

Ok, the tree is updated with 'Reviewed-by' tags and the proposed
documentation updates from Randy and Shannon.

git://lost.foo-projects.org/~dwillia2/git/iop async-tx-fixes-for-linus

Dan Williams (3):
  async_tx: usage documentation and developer notes (v2)
  async_tx: fix dma_wait_for_async_tx
  raid5: fix ops_complete_biofill

> NeilBrown

--
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Paul E. McKenney

On Fri, Sep 21, 2007 at 04:03:43PM -0700, Paul E. McKenney wrote:
> On Fri, Sep 21, 2007 at 11:20:48AM -0400, Steven Rostedt wrote:
> > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote:

[ . . . ]

> > Paul,
> > 
> > Looking further into this, I still think this is a bit of overkill. We
> > go through 20 states from call_rcu to list->func().
> > 
> > On call_rcu we put our stuff on the next list. Before we move stuff from
> > next to wait, we need to go through 4 states. So we have
> > 
> > next -> 4 states -> wait[0] -> 4 states -> wait[1] -> 4 states ->
> > wait[2] -> 4 states -> wait[3] -> 4 states -> done.
> > 
> > That's 20 states that we go through from the time we add our function to
> > the list to the time it actually gets called. Do we really need the 4
> > wait lists?
> > 
> > Seems a bit overkill to me.
> > 
> > What am I missing?
> 
> "Nothing kills like overkill!!!"  ;-)
> 
> Seriously, I do expect to be able to squeeze this down over time, but
> feel the need to be a bit on the cowardly side at the moment.
> 
> In any case, I will be looking at the scenarios more carefully.  If
> it turns out that GP_STAGES can indeed be cranked down a bit, well,
> that is an easy change!  I just fired off a POWER run with GP_STAGES
> set to 3, will let you know how it goes.

The first attempt blew up during boot badly enough that ABAT was unable
to recover the machine (sorry, grahal!!!).  Just for grins, I am trying
it again on a machine that ABAT has had a better record of reviving...

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: MTRR initialization

2007-09-21 Thread Siddha, Suresh B

On Fri, Sep 14, 2007 at 09:33:30AM -0700, Howard Chu wrote:
> Hi, was wondering if anyone else has been tripped up by this... I've got
> 4GB of
> RAM in my Asus A8V Deluxe and memory hole mapping enabled in the BIOS. By
> default, my system boots up with these MTRR settings:
> 
> reg00: base=0x (   0MB), size=4096MB: write-back, count=1
> reg01: base=0x1 (4096MB), size=1024MB: write-back, count=1
> reg02: base=0xc000 (3072MB), size=1024MB: uncachable, count=1
> reg03: base=0xc000 (3072MB), size= 256MB: write-combining, count=1
> 
> The X server and various other programs try to add a mapping for my video
> card's buffer, at 0xd000, size=256MB, type=write-combining, and this
> always
> fails with a type mismatch error (old type is write-back). Apparently it's
> conflicting with mapping register 0. I can't just disable the existing
> settings
> and re-add them; the system hangs soon after disabling reg01.
> 
> I guess the kernel must be getting the initial setup from the BIOS. I've
> hacked
> around this in mtrr/generic.c by explicitly changing the MTRR state in
> get_mtrr_state to split the first mapping into two; one at base 0 size
> 2048M
> and one at base 2048M size 1024M. So now I have this, which is pretty much
> what
> I wanted:
> 
> reg00: base=0x (   0MB), size=2048MB: write-back, count=1
> reg01: base=0x8000 (2048MB), size=1024MB: write-back, count=1
> reg02: base=0x1 (4096MB), size=1024MB: write-back, count=1
> reg03: base=0xc000 (3072MB), size=1024MB: uncachable, count=1
> reg04: base=0xc000 (3072MB), size= 256MB: write-combining, count=1
> reg05: base=0xd000 (3328MB), size= 256MB: write-combining, count=1

BTW, having overlapping WC, UC regions make the end result UC. So in this
case, you may not be getting the desired performance.

> 
> So the question is - was there an easier/correct way to do this?
> 
> It might have been nice if the MTRR ioctls allowed the register number to
> be
> specified on the Set commands, though I'm not sure that would have helped
> in
> this case.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Paul E. McKenney

On Fri, Sep 21, 2007 at 10:40:03AM -0400, Steven Rostedt wrote:
> On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote:

Covering the pieces that weren't in Peter's reply.  ;-)

And thank you -very- much for the careful and thorough review!!!

> >  #endif /* __KERNEL__ */
> >  #endif /* __LINUX_RCUCLASSIC_H */
> > diff -urpNa -X dontdiff linux-2.6.22-b-fixbarriers/include/linux/rcupdate.h 
> > linux-2.6.22-c-preemptrcu/include/linux/rcupdate.h
> > --- linux-2.6.22-b-fixbarriers/include/linux/rcupdate.h 2007-07-19 
> > 14:02:36.0 -0700
> > +++ linux-2.6.22-c-preemptrcu/include/linux/rcupdate.h  2007-08-22 
> > 15:21:06.0 -0700
> > @@ -52,7 +52,11 @@ struct rcu_head {
> > void (*func)(struct rcu_head *head);
> >  };
> >  
> > +#ifdef CONFIG_CLASSIC_RCU
> >  #include 
> > +#else /* #ifdef CONFIG_CLASSIC_RCU */
> > +#include 
> > +#endif /* #else #ifdef CONFIG_CLASSIC_RCU */
> 
> A bit extreme on the comments here.

My fingers do this without any help from the rest of me, but I suppose
it is a bit of overkill in this case.

> >  #define RCU_HEAD_INIT  { .next = NULL, .func = NULL }
> >  #define RCU_HEAD(head) struct rcu_head head = RCU_HEAD_INIT
> > @@ -218,10 +222,13 @@ extern void FASTCALL(call_rcu_bh(struct 
> >  /* Exported common interfaces */
> >  extern void synchronize_rcu(void);
> >  extern void rcu_barrier(void);
> > +extern long rcu_batches_completed(void);
> > +extern long rcu_batches_completed_bh(void);
> >  
> >  /* Internal to kernel */
> >  extern void rcu_init(void);
> >  extern void rcu_check_callbacks(int cpu, int user);
> > +extern int rcu_needs_cpu(int cpu);
> >  
> >  #endif /* __KERNEL__ */
> >  #endif /* __LINUX_RCUPDATE_H */
> > diff -urpNa -X dontdiff 
> > linux-2.6.22-b-fixbarriers/include/linux/rcupreempt.h 
> > linux-2.6.22-c-preemptrcu/include/linux/rcupreempt.h
> > --- linux-2.6.22-b-fixbarriers/include/linux/rcupreempt.h   1969-12-31 
> > 16:00:00.0 -0800
> > +++ linux-2.6.22-c-preemptrcu/include/linux/rcupreempt.h2007-08-22 
> > 15:21:06.0 -0700
> > @@ -0,0 +1,78 @@
> > +/*
> > + * Read-Copy Update mechanism for mutual exclusion (RT implementation)
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, 
> > USA.
> > + *
> > + * Copyright (C) IBM Corporation, 2006
> > + *
> > + * Author:  Paul McKenney <[EMAIL PROTECTED]>
> > + *
> > + * Based on the original work by Paul McKenney <[EMAIL PROTECTED]>
> > + * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
> > + * Papers:
> > + * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf
> > + * http://lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf 
> > (OLS2001)
> > + *
> > + * For detailed explanation of Read-Copy Update mechanism see -
> > + * Documentation/RCU
> > + *
> > + */
> > +
> > +#ifndef __LINUX_RCUPREEMPT_H
> > +#define __LINUX_RCUPREEMPT_H
> > +
> > +#ifdef __KERNEL__
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define rcu_qsctr_inc(cpu)
> > +#define rcu_bh_qsctr_inc(cpu)
> > +#define call_rcu_bh(head, rcu) call_rcu(head, rcu)
> > +
> > +extern void __rcu_read_lock(void);
> > +extern void __rcu_read_unlock(void);
> > +extern int rcu_pending(int cpu);
> > +extern int rcu_needs_cpu(int cpu);
> > +
> > +#define __rcu_read_lock_bh()   { rcu_read_lock(); local_bh_disable(); }
> > +#define __rcu_read_unlock_bh() { local_bh_enable(); rcu_read_unlock(); 
> > }
> > +
> > +#define __rcu_read_lock_nesting()  (current->rcu_read_lock_nesting)
> > +
> > +extern void __synchronize_sched(void);
> > +
> > +extern void __rcu_init(void);
> > +extern void rcu_check_callbacks(int cpu, int user);
> > +extern void rcu_restart_cpu(int cpu);
> > +
> > +#ifdef CONFIG_RCU_TRACE
> > +struct rcupreempt_trace;
> > +extern int *rcupreempt_flipctr(int cpu);
> > +extern long rcupreempt_data_completed(void);
> > +extern int rcupreempt_flip_flag(int cpu);
> > +extern int rcupreempt_mb_flag(int cpu);
> > +extern char *rcupreempt_try_flip_state_name(void);
> > +extern struct rcupreempt_trace *rcupreempt_trace_cpu(int cpu);
> > +#endif
> > +
> > +struct softirq_action;
> > +
> > +#endif /* __KERNEL__ */
> > +#endif /* __LINUX_RCUPREEMPT_H */
> > diff -urpNa -X don

Re: [PATCH 2.6.23-rc7 0/3] async_tx and md-accel fixes for 2.6.23

2007-09-21 Thread Neil Brown

On Friday September 21, [EMAIL PROTECTED] wrote:
> On Thu, 20 Sep 2007 18:27:35 -0700
> Dan Williams <[EMAIL PROTECTED]> wrote:
> 
> > Fix a couple bugs and provide documentation for the async_tx api.
> > 
> > Neil, please 'ack' patch #3.
> > 
> > git://lost.foo-projects.org/~dwillia2/git/iop async-tx-fixes-for-linus
> 
> Well it looks like Neil is on vacation or is hiding from us or something.

Neil is just not coping well with jet-lag

Patch #3 looks good and necessary
  Acked-By: NeilBrown <[EMAIL PROTECTED]>

I know that should probably be a "reviewed-by"  I was a bit
surprised that the "handle_completed_read_requests" call was so early
in handle_stripe5 - I don't think the code was originally that early.
But it is probably right.   Hopefully my brain will have cleared by
Monday and I'll review it again then.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread maximilian attems

On Fri, Sep 21, 2007 at 11:48:05PM +0100, Alan Cox wrote:
> > According to an earlier thread, dgrs was never really maintained, 
> > written for hardware that was never really distributed widely, and very 
> > likely hasn't had users in years... if ever.
> > 
> > If that picture is accurate (it's a story I was told), then I am 
> > definitely queueing up a deletion patch.
> 
> I think thats sensible. If someone whines it can be put back but I really
> don't think anyone will

nobody did yet, please yell if you need a rebased patch.

-- 
maks
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[no subject]

2007-09-21 Thread Alexandr Andrijcuk





Новый сотовый телефон по цене использованного.

Уважаемые господа,
Преглагаем Вашему вниманию “refurbished” (обновленные) мобильные телефоны. 
Это означает, что мобильный телефон имеет:
б/у, но полностью функциональную материнскую плату
новый оригинальный корпус
новые аксессуары (зарядноеб аккумулятор, handsfree)
новую оригинальную упаковку

Внешне телефон является абсолютно новым и имеет гарантийный срок 12 месяцев

 Выборочные цены для ознакомления:

Model   Цена; USD
Motorola V3 93,96
Motorola V3x160,08
Motorola K1 195,39
Nokia 6111  110,16
Nokia 6131  160,20
Nokia 6280  179,06
Nokia 7360  126,12
Nokia 7373  202,96
Samsung E530120,84
Samsung E900155,14
Sony Ericsson K800i 266,23
Sony Ericsson W300i 117,36
Sony Ericsson W800i 172,56
Sony Ericsson Z550i 135,12
Sony Erisccon W810i 214,94

Полный прайслист содержит около 180 моделей сотовых телефонов Motorola, Nokia, 
Panasonic, Samsung, Sony Ericsson. 
Каждый из предложенных телефонов имеет следующий комплект поставки:
Телефон + зарядное устройство + 2 аккумулятора +  HandsFree + инструкция + 
упаковка 

По требованию вышлю полный прайслист .

C наилучшими пожеланиями
AVK Plus spol. s.r.o.
Nam Svobody 1626
27201 Kladno
Czech Republic







-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

edac_mc: sleeping function called from invalid context

2007-09-21 Thread Chuck Ebbert

Kernel 2.6.22.6:

# echo "1" >/sys/devices/system/edac/pci/check_pci_parity
# dmesg | tail -14
BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1

Call Trace:
 [] down_read+0x15/0x24
 [] pci_get_subsys+0x81/0x113
 [] schedule_timeout+0x85/0xad
 [] :edac_mc:edac_kernel_thread+0x9e/0x104
 [] :edac_mc:edac_kernel_thread+0x0/0x104
 [] kthread+0x47/0x73
 [] child_rip+0xa/0x12
 [] kthread+0x0/0x73
 [] child_rip+0x0/0x12
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Nigel Cunningham

Hi.

On Saturday 22 September 2007 09:19:18 Kyle Moffett wrote:
> I think that in order for this to work, there would need to be some  
> ABI whereby the resume-ing kernel can pass its entire ACPI state and  
> a bunch of other ACPI-related device details to the resume-ed kernel,  
> which I believe it does not do at the moment.  I believe that what  
> causes problems is the ACPI state data that the kernel stores is  
> *different* between identical sequential boots, especially when you  
> add/remove/replace batteries, AC, etc.

That's certainly possible. We already pass a very small amount of data between 
the boot and resuming kernels at the moment, and it's done quite simply - by 
putting the variables we want to 'transfer' in a nosave page/section. I could 
conceive of a scheme wherein this was extended for driver data. Since the 
memory needed would depend on the drivers loaded, it would probably require 
that the space be allocated when hibernating, and the locations of structures 
be stored in the image header and then drivers notified of the locations to 
use when preparing to resume, but it could work...

> Since we currently throw away most of that in-kernel ACPI interpreter  
> state data when we load the to-be-resumed image and replace it with  
> the state from the previous boot it looks to the ACPI code and  
> firmware like our system's hardware magically changed behind its  
> back.  The result is that the ACPI and firmware code is justifiably  
> confused (although probably it should be more idempotent to begin  
> with).  There's 2 potential solutions:
>1) Formalize and copy a *lot* of ACPI state from the resume-ing  
> kernel to the resume-ed kernel.
>2) Properly call the ACPI S4 methods in the proper order

... that said, I don't think the above should be necessary in most cases. I 
believe we're already calling the ACPI S4 methods in the proper order. If I 
understood correctly, Rafael put a lot of effort into learning what that was, 
and into ensuring it does get done.

> Neither one is particularly easy or particularly pleasant, especially  
> given all the vendor bugs in this general area.  Theoretically we  
> should be able to do both, since one will be more reliable than the  
> other on different systems depending on what kinds of firmware bugs  
> they have.

Regards,

Nigel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 07/25] r/o bind mounts: elevate write count for some ioctls

2007-09-21 Thread Andrew Morton

On Fri, 21 Sep 2007 16:39:40 -0700
Dave Hansen <[EMAIL PROTECTED]> wrote:

> On Fri, 2007-09-21 at 16:03 -0700, Andrew Morton wrote:
> > Dave Hansen <[EMAIL PROTECTED]> wrote:
> > 
> > > Some ioctl()s can cause writes to the filesystem.  Take
> > > these, and make them use mnt_want/drop_write() instead.
> > > 
> > > We need to pass the filp one layer deeper in XFS, but
> > > somebody _just_ pulled it out in February because nobody
> > > was using it, so I don't feel guilty for adding it back.
> > 
> > Note that -mm's ext2-reservations.patch adds EXT2_IOC_SETRSVSZ,
> > and it doesn't do mnt_want_write().
> 
> That doesn't quite apply to mainline (at least after the patches I just
> sent).  I'll wait and send you one on top of the next -mm so that I can
> get a coherent view of what's going on if that's all right.
> 

Sure, that's OK.

But I only noticed it because I happened to have my nose in there fixing a
reject.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Paul E. McKenney

On Fri, Sep 21, 2007 at 07:23:09PM -0400, Steven Rostedt wrote:
> --
> On Fri, 21 Sep 2007, Paul E. McKenney wrote:
> 
> > If you do a synchronize_rcu() it might well have to wait through the
> > following sequence of states:
> >
> > Stage 0: (might have to wait through part of this to get out of "next" 
> > queue)
> > rcu_try_flip_idle_state,/* "I" */
> > rcu_try_flip_waitack_state, /* "A" */
> > rcu_try_flip_waitzero_state,/* "Z" */
> > rcu_try_flip_waitmb_state   /* "M" */
> > Stage 1:
> > rcu_try_flip_idle_state,/* "I" */
> > rcu_try_flip_waitack_state, /* "A" */
> > rcu_try_flip_waitzero_state,/* "Z" */
> > rcu_try_flip_waitmb_state   /* "M" */
> > Stage 2:
> > rcu_try_flip_idle_state,/* "I" */
> > rcu_try_flip_waitack_state, /* "A" */
> > rcu_try_flip_waitzero_state,/* "Z" */
> > rcu_try_flip_waitmb_state   /* "M" */
> > Stage 3:
> > rcu_try_flip_idle_state,/* "I" */
> > rcu_try_flip_waitack_state, /* "A" */
> > rcu_try_flip_waitzero_state,/* "Z" */
> > rcu_try_flip_waitmb_state   /* "M" */
> > Stage 4:
> > rcu_try_flip_idle_state,/* "I" */
> > rcu_try_flip_waitack_state, /* "A" */
> > rcu_try_flip_waitzero_state,/* "Z" */
> > rcu_try_flip_waitmb_state   /* "M" */
> >
> > So yes, grace periods do indeed have some latency.
> 
> Yes they do. I'm now at the point that I'm just "trusting" you that you
> understand that each of these stages are needed. My IQ level only lets me
> understand next -> wait -> done, but not the extra 3 shifts in wait.
> 
> ;-)

In the spirit of full disclosure, I am not -absolutely- certain that
they are needed, only that they are sufficient.  Just color me paranoid.

> > > True, but the "me" confused me. Since that task struct is not me ;-)
> >
> > Well, who is it, then?  ;-)
> 
> It's the app I watch sitting there waiting it's turn for it's callback to
> run.

:-)

> > > > > Isn't the GP detection done via a tasklet/softirq. So wouldn't a
> > > > > local_bh_disable be sufficient here? You already cover NMIs, which 
> > > > > would
> > > > > also handle normal interrupts.
> > > >
> > > > This is also my understanding, but I think this disable is an
> > > > 'optimization' in that it avoids the regular IRQs from jumping through
> > > > these hoops outlined below.
> > >
> > > But isn't disabling irqs slower than doing a local_bh_disable? So the
> > > majority of times (where irqs will not happen) we have this overhead.
> >
> > The current code absolutely must exclude the scheduling-clock hardirq
> > handler.
> 
> ACKed,
> The reasoning you gave in Peter's reply most certainly makes sense.
> 
> > > > > > + *
> > > > > > + * If anyone is nuts enough to run this CONFIG_PREEMPT_RCU 
> > > > > > implementation
> > > > >
> > > > > Oh, come now! It's not "nuts" to use this ;-)
> > > > >
> > > > > > + * on a large SMP, they might want to use a hierarchical 
> > > > > > organization of
> > > > > > + * the per-CPU-counter pairs.
> > > > > > + */
> > > >
> > > > Its the large SMP case that's nuts, and on that I have to agree with
> > > > Paul, its not really large SMP friendly.
> > >
> > > Hmm, that could be true. But on large SMP systems, you usually have a
> > > large amounts of memory, so hopefully a really long synchronize_rcu
> > > would not be a problem.
> >
> > Somewhere in the range from 64 to a few hundred CPUs, the global lock
> > protecting the try_flip state machine would start sucking air pretty
> > badly.  But the real problem is synchronize_sched(), which loops through
> > all the CPUs --  this would likely cause problems at a few tens of
> > CPUs, perhaps as early as 10-20.
> 
> hehe, From someone who's largest box is 4 CPUs, to me 16 CPUS is large.
> But I can see hundreds, let alone thousands of CPUs would make a huge
> grinding halt on things like synchronize_sched. God, imaging if all CPUs
> did that approximately at the same time. The system would should a huge
> jitter.

Well, the first time the SGI guys tried to boot a 1024-CPU Altix, I got
an email complaining about RCU overheads.  ;-)  Manfred Spraul fixed
things up for them, though.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu

2007-09-21 Thread Alan Cox

>  >  config MPSC
>  > bool "Intel P4 / older Netburst based Xeon"
>  > help
> 
> sidenote: I always wondered what 'PSC' stood for ?

Produces Smoke and Cooks ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH Kbuild] Call only one make with all targets for O=

2007-09-21 Thread Milton Miller

Change the invocations of make in the output directory Makefile and the
main Makefile for seperate object trees to pass all goals to one $(MAKE)
via a new phony target "sub-make" and the existing target _all.

When compiling with seperate object directories, a seperate make is called
in the context of another directory (from the output directory the main
Makefile is called, the Makefile is then restarted with current directory
set to the object tree).  Before this patch, when multiple make command
goals are specified, each target results in a seperate make invocation.
With make -j, these invocations may run in parallel, resulting in multiple
commands running in the same directory clobbering each others results.

I did not try to address make -j for mixed dot-config and no-dot-config
targets.  Because the order does matter, a solution was not obvious.
Perhaps a simple check for MAKEFLAGS having -j and refusing to run would
be approprate.

Signed-off-by: Milton Miller <[EMAIL PROTECTED]>
---
I chose @: as the phony command after the sub-make target does all the
work; is there a better alternative?  It looks like :; is used for
Makefile.

Index: kernel/Makefile
===
--- kernel.orig/Makefile2007-09-19 01:55:45.0 -0500
+++ kernel/Makefile 2007-09-19 02:01:16.0 -0500
@@ -116,12 +116,16 @@ KBUILD_OUTPUT := $(shell cd $(KBUILD_OUT
 $(if $(KBUILD_OUTPUT),, \
  $(error output directory "$(saved-output)" does not exist))
 
-PHONY += $(MAKECMDGOALS)
+PHONY += $(MAKECMDGOALS) sub-make
 
-$(filter-out _all,$(MAKECMDGOALS)) _all:
+$(filter-out _all sub-make,$(MAKECMDGOALS)) _all: sub-make
+   @:
+
+sub-make: FORCE
$(if $(KBUILD_VERBOSE:1=),@)$(MAKE) -C $(KBUILD_OUTPUT) \
KBUILD_SRC=$(CURDIR) \
-   KBUILD_EXTMOD="$(KBUILD_EXTMOD)" -f $(CURDIR)/Makefile $@
+   KBUILD_EXTMOD="$(KBUILD_EXTMOD)" -f $(CURDIR)/Makefile \
+   $(filter-out _all sub-make,$(MAKECMDGOALS))
 
 # Leave processing to above invocation of make
 skip-makefile := 1
Index: kernel/scripts/mkmakefile
===
--- kernel.orig/scripts/mkmakefile  2007-09-19 01:55:45.0 -0500
+++ kernel/scripts/mkmakefile   2007-09-19 02:01:53.0 -0500
@@ -26,11 +26,13 @@ MAKEFLAGS += --no-print-directory
 
 .PHONY: all \$(MAKECMDGOALS)
 
+all:= \$(filter-out all Makefile,\$(MAKECMDGOALS))
+
 all:
-   \$(MAKE) -C \$(KERNELSRC) O=\$(KERNELOUTPUT)
+   \$(MAKE) -C \$(KERNELSRC) O=\$(KERNELOUTPUT) \$(all)
 
 Makefile:;
 
-\$(filter-out all Makefile,\$(MAKECMDGOALS)) %/:
-   \$(MAKE) -C \$(KERNELSRC) O=\$(KERNELOUTPUT) \$@
+\$(all) %/: all
+   @:
 EOF
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 07/25] r/o bind mounts: elevate write count for some ioctls

2007-09-21 Thread Dave Hansen

On Fri, 2007-09-21 at 16:03 -0700, Andrew Morton wrote:
> Dave Hansen <[EMAIL PROTECTED]> wrote:
> 
> > Some ioctl()s can cause writes to the filesystem.  Take
> > these, and make them use mnt_want/drop_write() instead.
> > 
> > We need to pass the filp one layer deeper in XFS, but
> > somebody _just_ pulled it out in February because nobody
> > was using it, so I don't feel guilty for adding it back.
> 
> Note that -mm's ext2-reservations.patch adds EXT2_IOC_SETRSVSZ,
> and it doesn't do mnt_want_write().

That doesn't quite apply to mainline (at least after the patches I just
sent).  I'll wait and send you one on top of the next -mm so that I can
get a coherent view of what's going on if that's all right.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/22] NFS: Use local caching

2007-09-21 Thread David Howells

David Howells <[EMAIL PROTECTED]> wrote:

> Peter Staubach <[EMAIL PROTECTED]> wrote:
> 
> > Did I miss the section where the modified semantics about which
> > mounted file systems can use the cache and which ones can not
> > was implemented?
> 
> Yes.

fs/nfs/super.c:

case Opt_sharecache:
mnt->flags &= ~NFS_MOUNT_UNSHARED;
break;
case Opt_nosharecache:
mnt->flags |= NFS_MOUNT_UNSHARED;
mnt->options &= ~NFS_OPTION_FSCACHE;
break;
case Opt_fscache:
/* sharing is mandatory with fscache */
mnt->options |= NFS_OPTION_FSCACHE;
mnt->flags &= ~NFS_MOUNT_UNSHARED;
break;
case Opt_nofscache:
mnt->options &= ~NFS_OPTION_FSCACHE;
break;

Hmmm...  Actually, I'm not sure this is sufficient.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] taskstats: fix stats->ac_exitcode to work on threads and use group_exit_code

2007-09-21 Thread Guillaume Chazarain

Threads also have an exit code on their own, so report it in
TASKSTATS_CMD_ATTR_PID.

For TASKSTATS_CMD_ATTR_TGID, instead of relying only on the exit code of the
leader, we use task->signal->group_exit_code if not null as suggested by
Oleg Nesterov.

Also, document that as of this patch, fill_threadgroup() must be called after
add_tsk() as it may overwrite some stats.

Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]>
Cc: Balbir Singh <[EMAIL PROTECTED]>
Cc: Jay Lan <[EMAIL PROTECTED]>
Cc: Jonathan Lim <[EMAIL PROTECTED]>
Cc: Oleg Nesterov <[EMAIL PROTECTED]>
---

 kernel/taskstats.c |3 +++
 kernel/tsacct.c|   12 +++-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index 42d3110..24d7f62 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -181,6 +181,9 @@ static void send_cpu_listeners(struct sk_buff *skb,
  * memory usage), so are taken from the group leader.
  * XXX_threadgroup() methods deal with the first type while XXX_add_tsk() with
  * the second.
+ *
+ * fill_threadgroup() may overwrite stats from add_tsk(), so it must be called
+ * after add_tsk().
  */
 static void fill_threadgroup(struct taskstats *stats, struct task_struct *task,
 bool tg_stats)
diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 24056aa..526b134 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -44,6 +44,8 @@ static void fill_wall_times(struct taskstats *stats, struct 
task_struct *tsk)
 void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task,
bool tg_stats)
 {
+   int group_exit_code;
+
BUILD_BUG_ON(TS_COMM_LEN < TASK_COMM_LEN);
 
rcu_read_lock();
@@ -53,11 +55,11 @@ void bacct_fill_threadgroup(struct taskstats *stats, struct 
task_struct *task,
 
fill_wall_times(stats, task);
 
-   if (thread_group_leader(task)) {
-   stats->ac_exitcode = task->exit_code;
-   if (task->flags & PF_FORKNOEXEC)
-   stats->ac_flag |= AFORK;
-   }
+   if (thread_group_leader(task) && (task->flags & PF_FORKNOEXEC))
+   stats->ac_flag |= AFORK;
+
+   group_exit_code = tg_stats ? task->signal->group_exit_code : 0;
+   stats->ac_exitcode  = group_exit_code ? : task->exit_code;
 
stats->ac_nice  = task_nice(task);
stats->ac_sched = task->policy;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] taskstats: tell fill_thread_group() whether it replies with PID or TGID stats

2007-09-21 Thread Guillaume Chazarain

fill_thread_group() may want to know if it is filling TASKSTATS_CMD_ATTR_TGID
or TASKSTATS_CMD_ATTR_PID stats, so give it this information in the tg_stats
boolean.

Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]>
Cc: Balbir Singh <[EMAIL PROTECTED]>
Cc: Jay Lan <[EMAIL PROTECTED]>
Cc: Jonathan Lim <[EMAIL PROTECTED]>
Cc: Oleg Nesterov <[EMAIL PROTECTED]>
---

 include/linux/tsacct_kern.h |4 ++--
 kernel/taskstats.c  |   12 +++-
 kernel/tsacct.c |3 ++-
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/tsacct_kern.h b/include/linux/tsacct_kern.h
index 93dffc2..5652ae0 100644
--- a/include/linux/tsacct_kern.h
+++ b/include/linux/tsacct_kern.h
@@ -10,10 +10,10 @@
 #include 
 
 #ifdef CONFIG_TASKSTATS
-void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task);
+void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task, 
bool tg_stats);
 void bacct_add_tsk(struct taskstats *stats, struct task_struct *task);
 #else
-static inline void bacct_fill_threadgroup(struct taskstats *stats, struct 
task_struct *task)
+static inline void bacct_fill_threadgroup(struct taskstats *stats, struct 
task_struct *task, bool tg_stats)
 {}
 static inline void bacct_add_tsk(struct taskstats *stats, struct task_struct 
*task)
 {}
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index ce43fae..42d3110 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -172,6 +172,7 @@ static void send_cpu_listeners(struct sk_buff *skb,
  * fill_threadgroup - initialize some common stats for the thread group
  * @stats: the taskstats to write into
  * @task: the thread representing the whole group
+ * @tg_stats: whether in the end thread group stats are requested
  *
  * There are two types of taskstats fields when considering a thread group:
  * - those that can be aggregated from each thread in the group (like CPU
@@ -181,7 +182,8 @@ static void send_cpu_listeners(struct sk_buff *skb,
  * XXX_threadgroup() methods deal with the first type while XXX_add_tsk() with
  * the second.
  */
-static void fill_threadgroup(struct taskstats *stats, struct task_struct *task)
+static void fill_threadgroup(struct taskstats *stats, struct task_struct *task,
+bool tg_stats)
 {
/*
 * Each accounting subsystem adds calls to its functions to initialize
@@ -193,7 +195,7 @@ static void fill_threadgroup(struct taskstats *stats, 
struct task_struct *task)
stats->version = TASKSTATS_VERSION;
 
/* fill in basic acct fields */
-   bacct_fill_threadgroup(stats, task);
+   bacct_fill_threadgroup(stats, task, tg_stats);
 
/* fill in extended acct fields */
xacct_fill_threadgroup(stats, task);
@@ -248,7 +250,7 @@ static int fill_pid(pid_t pid, struct task_struct *tsk,
 
memset(stats, 0, sizeof(*stats));
add_tsk(stats, tsk);
-   fill_threadgroup(stats, tsk);
+   fill_threadgroup(stats, tsk, false);
 
/* Define err: label here if needed */
put_task_struct(tsk);
@@ -289,7 +291,7 @@ static int fill_tgid(pid_t tgid, struct task_struct *first,
add_tsk(stats, tsk);
while_each_thread(first, tsk);
 
-   fill_threadgroup(stats, first->group_leader);
+   fill_threadgroup(stats, first->group_leader, true);
unlock_task_sighand(first, &flags);
rc = 0;
 out:
@@ -545,7 +547,7 @@ void taskstats_exit(struct task_struct *tsk, int group_dead)
 */
 
memcpy(stats, tsk->signal->stats, sizeof(*stats));
-   fill_threadgroup(stats, tsk->group_leader);
+   fill_threadgroup(stats, tsk->group_leader, true);
 
 send:
send_cpu_listeners(rep_skb, listeners);
diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 9541a1a..24056aa 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -41,7 +41,8 @@ static void fill_wall_times(struct taskstats *stats, struct 
task_struct *tsk)
  * fill in basic accounting fields
  */
 
-void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task)
+void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task,
+   bool tg_stats)
 {
BUILD_BUG_ON(TS_COMM_LEN < TASK_COMM_LEN);
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] JBD2/ext4 naming cleanup

2007-09-21 Thread Mingming Cao

JBD2 naming cleanup

From: Mingming Cao <[EMAIL PROTECTED]>

change micros name from JBD_XXX to JBD2_XXX in JBD2/Ext4

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/ext4/extents.c |2 +-
 fs/ext4/super.c   |2 +-
 fs/jbd2/commit.c  |2 +-
 fs/jbd2/journal.c |8 
 fs/jbd2/recovery.c|2 +-
 fs/jbd2/revoke.c  |4 ++--
 include/linux/ext4_jbd2.h |6 +++---
 include/linux/jbd2.h  |   30 +++---
 8 files changed, 28 insertions(+), 28 deletions(-)

Index: linux-2.6.23-rc6/fs/ext4/super.c
===
--- linux-2.6.23-rc6.orig/fs/ext4/super.c   2007-09-21 16:27:31.0 
-0700
+++ linux-2.6.23-rc6/fs/ext4/super.c2007-09-21 16:27:46.0 -0700
@@ -966,7 +966,7 @@ static int parse_options (char *options,
if (option < 0)
return 0;
if (option == 0)
-   option = JBD_DEFAULT_MAX_COMMIT_AGE;
+   option = JBD2_DEFAULT_MAX_COMMIT_AGE;
sbi->s_commit_interval = HZ * option;
break;
case Opt_data_journal:
Index: linux-2.6.23-rc6/include/linux/ext4_jbd2.h
===
--- linux-2.6.23-rc6.orig/include/linux/ext4_jbd2.h 2007-09-10 
19:50:29.0 -0700
+++ linux-2.6.23-rc6/include/linux/ext4_jbd2.h  2007-09-21 16:27:46.0 
-0700
@@ -12,8 +12,8 @@
  * Ext4-specific journaling extensions.
  */
 
-#ifndef _LINUX_EXT4_JBD_H
-#define _LINUX_EXT4_JBD_H
+#ifndef _LINUX_EXT4_JBD2_H
+#define _LINUX_EXT4_JBD2_H
 
 #include 
 #include 
@@ -228,4 +228,4 @@ static inline int ext4_should_writeback_
return 0;
 }
 
-#endif /* _LINUX_EXT4_JBD_H */
+#endif /* _LINUX_EXT4_JBD2_H */
Index: linux-2.6.23-rc6/include/linux/jbd2.h
===
--- linux-2.6.23-rc6.orig/include/linux/jbd2.h  2007-09-21 09:07:09.0 
-0700
+++ linux-2.6.23-rc6/include/linux/jbd2.h   2007-09-21 16:27:46.0 
-0700
@@ -13,8 +13,8 @@
  * filesystem journaling support.
  */
 
-#ifndef _LINUX_JBD_H
-#define _LINUX_JBD_H
+#ifndef _LINUX_JBD2_H
+#define _LINUX_JBD2_H
 
 /* Allow this file to be included directly into e2fsprogs */
 #ifndef __KERNEL__
@@ -37,26 +37,26 @@
 #define journal_oom_retry 1
 
 /*
- * Define JBD_PARANIOD_IOFAIL to cause a kernel BUG() if ext3 finds
+ * Define JBD2_PARANIOD_IOFAIL to cause a kernel BUG() if ext4 finds
  * certain classes of error which can occur due to failed IOs.  Under
- * normal use we want ext3 to continue after such errors, because
+ * normal use we want ext4 to continue after such errors, because
  * hardware _can_ fail, but for debugging purposes when running tests on
  * known-good hardware we may want to trap these errors.
  */
-#undef JBD_PARANOID_IOFAIL
+#undef JBD2_PARANOID_IOFAIL
 
 /*
  * The default maximum commit age, in seconds.
  */
-#define JBD_DEFAULT_MAX_COMMIT_AGE 5
+#define JBD2_DEFAULT_MAX_COMMIT_AGE 5
 
 #ifdef CONFIG_JBD2_DEBUG
 /*
- * Define JBD_EXPENSIVE_CHECKING to enable more expensive internal
+ * Define JBD2_EXPENSIVE_CHECKING to enable more expensive internal
  * consistency checks.  By default we don't do this unless
  * CONFIG_JBD2_DEBUG is on.
  */
-#define JBD_EXPENSIVE_CHECKING
+#define JBD2_EXPENSIVE_CHECKING
 extern u8 jbd2_journal_enable_debug;
 
 #define jbd_debug(n, f, a...)  \
@@ -163,8 +163,8 @@ typedef struct journal_block_tag_s
__be32  t_blocknr_high; /* most-significant high 32bits. */
 } journal_block_tag_t;
 
-#define JBD_TAG_SIZE32 (offsetof(journal_block_tag_t, t_blocknr_high))
-#define JBD_TAG_SIZE64 (sizeof(journal_block_tag_t))
+#define JBD2_TAG_SIZE32 (offsetof(journal_block_tag_t, t_blocknr_high))
+#define JBD2_TAG_SIZE64 (sizeof(journal_block_tag_t))
 
 /*
  * The revoke descriptor: used on disk to describe a series of blocks to
@@ -256,8 +256,8 @@ typedef struct journal_superblock_s
 #include 
 #include 
 
-#define JBD_ASSERTIONS
-#ifdef JBD_ASSERTIONS
+#define JBD2_ASSERTIONS
+#ifdef JBD2_ASSERTIONS
 #define J_ASSERT(assert)   \
 do {   \
if (!(assert)) {\
@@ -284,9 +284,9 @@ void buffer_assertion_failure(struct buf
 
 #else
 #define J_ASSERT(assert)   do { } while (0)
-#endif /* JBD_ASSERTIONS */
+#endif /* JBD2_ASSERTIONS */
 
-#if defined(JBD_PARANOID_IOFAIL)
+#if defined(JBD2_PARANOID_IOFAIL)
 #define J_EXPECT(expr, why...) J_ASSERT(expr)
 #define J_EXPECT_BH(bh, expr, why...)  J_ASSERT_BH(bh, expr)
 #define J_EXPECT_JH(jh, expr, why...)  J_ASSERT_JH(jh, expr)
@@ -1104,4 +1104,4 @@ extern int jbd_blocks_per_p

[PATCH 1/3] taskstats: separate PID/TGID stats producers to complete the TGID ones

2007-09-21 Thread Guillaume Chazarain

TASKSTATS_CMD_ATTR_TGID used to return only the delay accounting stats, not
the basic and extended accounting.  With this patch,
TASKSTATS_CMD_ATTR_TGID also aggregates the accounting info for all threads
of a thread group.

TASKSTATS_CMD_ATTR_PID output should be unchanged
TASKSTATS_CMD_ATTR_TGID output should have all fields set, unlike before the
patch where most of the fiels were set to 0.

To this aim, two functions were introduced: fill_threadgroup() and add_tsk().
These functions are responsible for aggregating the subsystem specific
accounting information. Taskstats requesters (fill_pid(), fill_tgid() and
fill_tgid_exit()) should only call add_tsk() and fill_threadgroup() to get the
stats.

Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]>
Cc: Balbir Singh <[EMAIL PROTECTED]>
Cc: Jay Lan <[EMAIL PROTECTED]>
Cc: Jonathan Lim <[EMAIL PROTECTED]>
Cc: Oleg Nesterov <[EMAIL PROTECTED]>
---

 Documentation/accounting/getdelays.c |2 -
 include/linux/tsacct_kern.h  |   12 ++-
 kernel/taskstats.c   |  131 ++
 kernel/tsacct.c  |  106 
 4 files changed, 155 insertions(+), 96 deletions(-)

diff --git a/Documentation/accounting/getdelays.c 
b/Documentation/accounting/getdelays.c
index cbee3a2..78773c0 100644
--- a/Documentation/accounting/getdelays.c
+++ b/Documentation/accounting/getdelays.c
@@ -76,7 +76,7 @@ static void usage(void)
fprintf(stderr, "getdelays [-dilv] [-w logfile] [-r bufsize] "
"[-m cpumask] [-t tgid] [-p pid]\n");
fprintf(stderr, "  -d: print delayacct stats\n");
-   fprintf(stderr, "  -i: print IO accounting (works only with -p)\n");
+   fprintf(stderr, "  -i: print IO accounting\n");
fprintf(stderr, "  -l: listen forever\n");
fprintf(stderr, "  -v: debug on\n");
 }
diff --git a/include/linux/tsacct_kern.h b/include/linux/tsacct_kern.h
index 7e50ac7..93dffc2 100644
--- a/include/linux/tsacct_kern.h
+++ b/include/linux/tsacct_kern.h
@@ -10,17 +10,23 @@
 #include 
 
 #ifdef CONFIG_TASKSTATS
-extern void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk);
+void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task);
+void bacct_add_tsk(struct taskstats *stats, struct task_struct *task);
 #else
-static inline void bacct_add_tsk(struct taskstats *stats, struct task_struct 
*tsk)
+static inline void bacct_fill_threadgroup(struct taskstats *stats, struct 
task_struct *task)
+{}
+static inline void bacct_add_tsk(struct taskstats *stats, struct task_struct 
*task)
 {}
 #endif /* CONFIG_TASKSTATS */
 
 #ifdef CONFIG_TASK_XACCT
-extern void xacct_add_tsk(struct taskstats *stats, struct task_struct *p);
+void xacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task);
+void xacct_add_tsk(struct taskstats *stats, struct task_struct *p);
 extern void acct_update_integrals(struct task_struct *tsk);
 extern void acct_clear_integrals(struct task_struct *tsk);
 #else
+static inline void xacct_fill_threadgroup(struct taskstats *stats, struct 
task_struct *task)
+{}
 static inline void xacct_add_tsk(struct taskstats *stats, struct task_struct 
*p)
 {}
 static inline void acct_update_integrals(struct task_struct *tsk)
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index 059431e..ce43fae 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -168,6 +168,68 @@ static void send_cpu_listeners(struct sk_buff *skb,
up_write(&listeners->sem);
 }
 
+/**
+ * fill_threadgroup - initialize some common stats for the thread group
+ * @stats: the taskstats to write into
+ * @task: the thread representing the whole group
+ *
+ * There are two types of taskstats fields when considering a thread group:
+ * - those that can be aggregated from each thread in the group (like CPU
+ * times),
+ * - those that cannot be aggregated (like UID) or are identical (like
+ * memory usage), so are taken from the group leader.
+ * XXX_threadgroup() methods deal with the first type while XXX_add_tsk() with
+ * the second.
+ */
+static void fill_threadgroup(struct taskstats *stats, struct task_struct *task)
+{
+   /*
+* Each accounting subsystem adds calls to its functions to initialize
+* relevant parts of struct taskstsats for a single tgid as follows:
+*
+*  per-task-foo-fill_threadgroup(stats, task);
+*/
+
+   stats->version = TASKSTATS_VERSION;
+
+   /* fill in basic acct fields */
+   bacct_fill_threadgroup(stats, task);
+
+   /* fill in extended acct fields */
+   xacct_fill_threadgroup(stats, task);
+}
+
+/**
+ * add_tsk - combine some thread specific stats in a taskstats
+ * @stats: the taskstats to write into
+ * @task: the thread to combine
+ *
+ * Stats specific to each thread in the thread group. Stats of @task should be
+ * combined with those already present in @stats. add_tsk() works in
+ * conjunction with f

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Steven Rostedt



--
On Fri, 21 Sep 2007, Paul E. McKenney wrote:

>
> If you do a synchronize_rcu() it might well have to wait through the
> following sequence of states:
>
> Stage 0: (might have to wait through part of this to get out of "next" queue)
>   rcu_try_flip_idle_state,/* "I" */
>   rcu_try_flip_waitack_state, /* "A" */
>   rcu_try_flip_waitzero_state,/* "Z" */
>   rcu_try_flip_waitmb_state   /* "M" */
> Stage 1:
>   rcu_try_flip_idle_state,/* "I" */
>   rcu_try_flip_waitack_state, /* "A" */
>   rcu_try_flip_waitzero_state,/* "Z" */
>   rcu_try_flip_waitmb_state   /* "M" */
> Stage 2:
>   rcu_try_flip_idle_state,/* "I" */
>   rcu_try_flip_waitack_state, /* "A" */
>   rcu_try_flip_waitzero_state,/* "Z" */
>   rcu_try_flip_waitmb_state   /* "M" */
> Stage 3:
>   rcu_try_flip_idle_state,/* "I" */
>   rcu_try_flip_waitack_state, /* "A" */
>   rcu_try_flip_waitzero_state,/* "Z" */
>   rcu_try_flip_waitmb_state   /* "M" */
> Stage 4:
>   rcu_try_flip_idle_state,/* "I" */
>   rcu_try_flip_waitack_state, /* "A" */
>   rcu_try_flip_waitzero_state,/* "Z" */
>   rcu_try_flip_waitmb_state   /* "M" */
>
> So yes, grace periods do indeed have some latency.

Yes they do. I'm now at the point that I'm just "trusting" you that you
understand that each of these stages are needed. My IQ level only lets me
understand next -> wait -> done, but not the extra 3 shifts in wait.

;-)

> >
> > True, but the "me" confused me. Since that task struct is not me ;-)
>
> Well, who is it, then?  ;-)

It's the app I watch sitting there waiting it's turn for it's callback to
run.

> > > >
> > > > Isn't the GP detection done via a tasklet/softirq. So wouldn't a
> > > > local_bh_disable be sufficient here? You already cover NMIs, which would
> > > > also handle normal interrupts.
> > >
> > > This is also my understanding, but I think this disable is an
> > > 'optimization' in that it avoids the regular IRQs from jumping through
> > > these hoops outlined below.
> >
> > But isn't disabling irqs slower than doing a local_bh_disable? So the
> > majority of times (where irqs will not happen) we have this overhead.
>
> The current code absolutely must exclude the scheduling-clock hardirq
> handler.

ACKed,
The reasoning you gave in Peter's reply most certainly makes sense.


> > > > > + *
> > > > > + * If anyone is nuts enough to run this CONFIG_PREEMPT_RCU 
> > > > > implementation
> > > >
> > > > Oh, come now! It's not "nuts" to use this ;-)
> > > >
> > > > > + * on a large SMP, they might want to use a hierarchical 
> > > > > organization of
> > > > > + * the per-CPU-counter pairs.
> > > > > + */
> > >
> > > Its the large SMP case that's nuts, and on that I have to agree with
> > > Paul, its not really large SMP friendly.
> >
> > Hmm, that could be true. But on large SMP systems, you usually have a
> > large amounts of memory, so hopefully a really long synchronize_rcu
> > would not be a problem.
>
> Somewhere in the range from 64 to a few hundred CPUs, the global lock
> protecting the try_flip state machine would start sucking air pretty
> badly.  But the real problem is synchronize_sched(), which loops through
> all the CPUs --  this would likely cause problems at a few tens of
> CPUs, perhaps as early as 10-20.

hehe, From someone who's largest box is 4 CPUs, to me 16 CPUS is large.
But I can see hundreds, let alone thousands of CPUs would make a huge
grinding halt on things like synchronize_sched. God, imaging if all CPUs
did that approximately at the same time. The system would should a huge
jitter.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/22] NFS: Use local caching

2007-09-21 Thread David Howells

Peter Staubach <[EMAIL PROTECTED]> wrote:

> Did I miss the section where the modified semantics about which
> mounted file systems can use the cache and which ones can not
> was implemented?

Yes.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Kyle Moffett


On Sep 21, 2007, at 17:16:59, Jeremy Maitin-Shepard wrote:

"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
The ACPI platform firmware is allowed to preserve information  
accross the hibernation-resume cycle, so this need not be the same.


All of my comments related to the case where S4 is not being used  
(instead the system is just powered off normally), and a boot  
kernel that does not initialize ACPI is used.  In that case, the  
ACPI platform firmware should not be able to distinguish a normal  
boot from a resume from hibernation.


I think that in order for this to work, there would need to be some  
ABI whereby the resume-ing kernel can pass its entire ACPI state and  
a bunch of other ACPI-related device details to the resume-ed kernel,  
which I believe it does not do at the moment.  I believe that what  
causes problems is the ACPI state data that the kernel stores is  
*different* between identical sequential boots, especially when you  
add/remove/replace batteries, AC, etc.


Since we currently throw away most of that in-kernel ACPI interpreter  
state data when we load the to-be-resumed image and replace it with  
the state from the previous boot it looks to the ACPI code and  
firmware like our system's hardware magically changed behind its  
back.  The result is that the ACPI and firmware code is justifiably  
confused (although probably it should be more idempotent to begin  
with).  There's 2 potential solutions:
  1) Formalize and copy a *lot* of ACPI state from the resume-ing  
kernel to the resume-ed kernel.

  2) Properly call the ACPI S4 methods in the proper order

Neither one is particularly easy or particularly pleasant, especially  
given all the vendor bugs in this general area.  Theoretically we  
should be able to do both, since one will be more reliable than the  
other on different systems depending on what kinds of firmware bugs  
they have.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/22] Introduce credential record

2007-09-21 Thread David Howells

Casey Schaufler <[EMAIL PROTECTED]> wrote:

> They are nonetheless in effect and (heaven forbid) should they be
> abused you don't want to hide the facts from concerned observers.

Because, I suspect, what the observer through /proc should see is what the
process thinks it is doing, not what is transparently going on behind the
scenes.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/22] CacheFiles: Permit the page lock state to be monitored

2007-09-21 Thread David Howells

Trond Myklebust <[EMAIL PROTECTED]> wrote:

> > This is used by CacheFiles to detect read completion on a page in the
> > backing filesystem so that it can then copy the data to the waiting netfs
> > page.
> 
> Won't it in any case want to lock the page too?

No.  Why would it?  All it wants to do is to read the page (copying it to the
netfs's page), assuming it becomes PG_uptodate.

> That would be the only way to ensure that the page is still mapped into the
> address space when you're writing it out...

I don't understand what you're getting at.  Write the page out where?  We've
just read it in from the cache, so why would we be writing it back out?

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] JBD/ext34 cleanups: convert to kzalloc

2007-09-21 Thread Mingming Cao

Convert kmalloc to kzalloc() and get rid of the memset().

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/ext3/xattr.c   |3 +--
 fs/ext4/xattr.c   |3 +--
 fs/jbd/journal.c  |3 +--
 fs/jbd/transaction.c  |2 +-
 fs/jbd2/journal.c |3 +--
 fs/jbd2/transaction.c |2 +-
 6 files changed, 6 insertions(+), 10 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-21 09:08:02.0
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-21 09:10:37.0
-0700
@@ -653,10 +653,9 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kzalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
if (!journal)
goto fail;
-   memset(journal, 0, sizeof(*journal));
 
init_waitqueue_head(&journal->j_wait_transaction_locked);
init_waitqueue_head(&journal->j_wait_logspace);
Index: linux-2.6.23-rc6/fs/jbd/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/transaction.c  2007-09-21
09:13:11.0 -0700
+++ linux-2.6.23-rc6/fs/jbd/transaction.c   2007-09-21 09:13:24.0
-0700
@@ -96,7 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal->j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
+   new_transaction = kzalloc(sizeof(*new_transaction),
GFP_NOFS|__GFP_NOFAIL);
if (!new_transaction) {
ret = -ENOMEM;
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-21
09:10:53.0 -0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c  2007-09-21 09:11:13.0
-0700
@@ -654,10 +654,9 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kzalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
if (!journal)
goto fail;
-   memset(journal, 0, sizeof(*journal));
 
init_waitqueue_head(&journal->j_wait_transaction_locked);
init_waitqueue_head(&journal->j_wait_logspace);
Index: linux-2.6.23-rc6/fs/jbd2/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/transaction.c 2007-09-21
09:12:46.0 -0700
+++ linux-2.6.23-rc6/fs/jbd2/transaction.c  2007-09-21 09:12:59.0
-0700
@@ -96,7 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal->j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
+   new_transaction = kzalloc(sizeof(*new_transaction),
GFP_NOFS|__GFP_NOFAIL);
if (!new_transaction) {
ret = -ENOMEM;
Index: linux-2.6.23-rc6/fs/ext3/xattr.c
===
--- linux-2.6.23-rc6.orig/fs/ext3/xattr.c   2007-09-21 10:22:24.0
-0700
+++ linux-2.6.23-rc6/fs/ext3/xattr.c2007-09-21 10:24:19.0 -0700
@@ -741,12 +741,11 @@ ext3_xattr_block_set(handle_t *handle, s
}
} else {
/* Allocate a buffer where we construct the new block. */
-   s->base = kmalloc(sb->s_blocksize, GFP_KERNEL);
+   s->base = kzalloc(sb->s_blocksize, GFP_KERNEL);
/* assert(header == s->base) */
error = -ENOMEM;
if (s->base == NULL)
goto cleanup;
-   memset(s->base, 0, sb->s_blocksize);
header(s->base)->h_magic = cpu_to_le32(EXT3_XATTR_MAGIC);
header(s->base)->h_blocks = cpu_to_le32(1);
header(s->base)->h_refcount = cpu_to_le32(1);
Index: linux-2.6.23-rc6/fs/ext4/xattr.c
===
--- linux-2.6.23-rc6.orig/fs/ext4/xattr.c   2007-09-21 10:20:21.0
-0700
+++ linux-2.6.23-rc6/fs/ext4/xattr.c2007-09-21 10:21:00.0 -0700
@@ -750,12 +750,11 @@ ext4_xattr_block_set(handle_t *handle, s
}
} else {
/* Allocate a buffer where we construct the new block. */
-   s->base = kmalloc(sb->s_blocksize, GFP_KERNEL);
+   s->base = kzalloc(sb->s_blocksize, GFP_KERNEL);
/* assert(header == s->base) */
error = -ENOMEM;
if (s->base == NULL)
goto cleanup;
-   memset(s->base, 0, sb->s_blocksize);

Re: [PATCH 10/22] CacheFiles: Add a hook to write a single page of data to an inode

2007-09-21 Thread David Howells

Trond Myklebust <[EMAIL PROTECTED]> wrote:

> So why do you need a new address space operation? AFAICS the generic
> implementation will work for pretty much everyone who supports the
> existing prepare_write()/commit_write().

Because Christoph decreed that I wasn't allowed to call prepare_write() and
commit_write() directly.  It's possible that the method should be in the
inode_operations rather than on the address space.

> Furthermore, you don't appear to supply any alternative "optimised"
> implementations...

Optimised in what fashion?

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] Linux-tiny project revival

2007-09-21 Thread Joe Perches

On Fri, 2007-09-21 at 18:05 -0500, Rob Landley wrote:
> > from printks and defining something that modifies pr_.
> pr_level doesn't exist in mainline.

pr_info and pr_debug do.

pr_alert, pr_emerg, pr_crit, pr_err, and pr_warn could be added.

> > #define pr_info(fmt, arg) printk(KERN_INFO PR_FMT fmt PR_ARG, ##arg)
> Do we really need another layer of indirection?

It'd make file/function/line cost free for embedded use.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] Linux-tiny project revival

2007-09-21 Thread Rob Landley

On Friday 21 September 2007 12:45:27 pm Joe Perches wrote:
> On Fri, 2007-09-21 at 13:16 -0400, [EMAIL PROTECTED] wrote:
> > What about something *really* hardcore ugly like:
> > #ifdef __FILE__
> > #undef __FILE__
> > #define __FILE__ ""
> > #endif
> > (or similar preprocessor blecherousness) if you want to *really* shrink
> > that binary down?
>
> I prefer removing all __FILE__, __FUNCTION__, __LINE__ uses
> from printks and defining something that modifies pr_.

pr_level doesn't exist in mainline.

> #define pr_info(fmt, arg) printk(KERN_INFO PR_FMT fmt PR_ARG, ##arg)

Do we really need another layer of indirection?

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: Megaraid driver not detecting RAID volumes in kernel 2.6.22?

2007-09-21 Thread Bagalkote, Sreenivas

> 
> https://bugzilla.redhat.com/show_bug.cgi?id=288421
> 
> When running Fedora on a Dell 2950 w/ integrated LSI Perc5i
(megaraid), the
> system will not boot after upgrading to 2.6.22.  The boot message
indicates the
> system is somehow seeing through RAID, cannot access logical volume.
This
> causes the root device to be unavailable and the kernel to panic.
> 
> Version-Release number of selected component (if applicable):
> I experience this problem with kernel 2.6.22 and higher.  I do not
believe it is
> isolated to FC6, as I downloaded the stock 2.6.22 kernel from
kernel.org and was
> able to reproduce.
> 
> How reproducible:
> Every time.
> 
> Steps to Reproduce:
> 1. Configure RAID10 (I've also tried RAID5) on a Perc5i in this
system.
> 
> 2. Load Fedora Core.  The installer works fine since the kernel
version it uses
> has a working LSI driver.
> 
> 3. Upgrade to 2.6.22 kernel image (in yum) or download kernel.org
sources,
> compile, and install.
> 
> 4. Reboot system.  It comes up unable to boot.  The kernel panics.
> 
> Actual results:
> As the system boots, it cannot mount the root device.  Also in the
output we see
> all 6 disks separately, when they should be showing up as one logical
volume.

Could a standard MPT driver (non-RAID) be loading on this controller?
During the reboot, can you see megaraid driver loading at all? Or do you
see mpt_scsi driver?

Before upgrading, can you blacklist this controller in pci hotplug? I
see shpchp on your screenshot.

Sreenivas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Paul E. McKenney

On Fri, Sep 21, 2007 at 11:20:48AM -0400, Steven Rostedt wrote:
> On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote:
> > +
> > +/*
> > + * PREEMPT_RCU data structures.
> > + */
> > +
> > +#define GP_STAGES 4
> > +struct rcu_data {
> > +   spinlock_t  lock;   /* Protect rcu_data fields. */
> > +   longcompleted;  /* Number of last completed batch. */
> > +   int waitlistcount;
> > +   struct tasklet_struct rcu_tasklet;
> > +   struct rcu_head *nextlist;
> > +   struct rcu_head **nexttail;
> > +   struct rcu_head *waitlist[GP_STAGES];
> > +   struct rcu_head **waittail[GP_STAGES];
> > +   struct rcu_head *donelist;
> > +   struct rcu_head **donetail;
> > +#ifdef CONFIG_RCU_TRACE
> > +   struct rcupreempt_trace trace;
> > +#endif /* #ifdef CONFIG_RCU_TRACE */
> > +};
> > +struct rcu_ctrlblk {
> > +   spinlock_t  fliplock;   /* Protect state-machine transitions. */
> > +   longcompleted;  /* Number of last completed batch. */
> > +};
> > +static DEFINE_PER_CPU(struct rcu_data, rcu_data);
> > +static struct rcu_ctrlblk rcu_ctrlblk = {
> > +   .fliplock = SPIN_LOCK_UNLOCKED,
> > +   .completed = 0,
> > +};
> > +static DEFINE_PER_CPU(int [2], rcu_flipctr) = { 0, 0 };
> > +
> > +/*
> > + * States for rcu_try_flip() and friends.
> > + */
> > +
> > +enum rcu_try_flip_states {
> > +   rcu_try_flip_idle_state,/* "I" */
> > +   rcu_try_flip_waitack_state, /* "A" */
> > +   rcu_try_flip_waitzero_state,/* "Z" */
> > +   rcu_try_flip_waitmb_state   /* "M" */
> > +};
> > +static enum rcu_try_flip_states rcu_try_flip_state = 
> > rcu_try_flip_idle_state;
> > +#ifdef CONFIG_RCU_TRACE
> > +static char *rcu_try_flip_state_names[] =
> > +   { "idle", "waitack", "waitzero", "waitmb" };
> > +#endif /* #ifdef CONFIG_RCU_TRACE */
> 
> [snip]
> 
> > +/*
> > + * If a global counter flip has occurred since the last time that we
> > + * advanced callbacks, advance them.  Hardware interrupts must be
> > + * disabled when calling this function.
> > + */
> > +static void __rcu_advance_callbacks(struct rcu_data *rdp)
> > +{
> > +   int cpu;
> > +   int i;
> > +   int wlc = 0;
> > +
> > +   if (rdp->completed != rcu_ctrlblk.completed) {
> > +   if (rdp->waitlist[GP_STAGES - 1] != NULL) {
> > +   *rdp->donetail = rdp->waitlist[GP_STAGES - 1];
> > +   rdp->donetail = rdp->waittail[GP_STAGES - 1];
> > +   RCU_TRACE_RDP(rcupreempt_trace_move2done, rdp);
> > +   }
> > +   for (i = GP_STAGES - 2; i >= 0; i--) {
> > +   if (rdp->waitlist[i] != NULL) {
> > +   rdp->waitlist[i + 1] = rdp->waitlist[i];
> > +   rdp->waittail[i + 1] = rdp->waittail[i];
> > +   wlc++;
> > +   } else {
> > +   rdp->waitlist[i + 1] = NULL;
> > +   rdp->waittail[i + 1] =
> > +   &rdp->waitlist[i + 1];
> > +   }
> > +   }
> > +   if (rdp->nextlist != NULL) {
> > +   rdp->waitlist[0] = rdp->nextlist;
> > +   rdp->waittail[0] = rdp->nexttail;
> > +   wlc++;
> > +   rdp->nextlist = NULL;
> > +   rdp->nexttail = &rdp->nextlist;
> > +   RCU_TRACE_RDP(rcupreempt_trace_move2wait, rdp);
> > +   } else {
> > +   rdp->waitlist[0] = NULL;
> > +   rdp->waittail[0] = &rdp->waitlist[0];
> > +   }
> > +   rdp->waitlistcount = wlc;
> > +   rdp->completed = rcu_ctrlblk.completed;
> > +   }
> > +
> > +   /*
> > +* Check to see if this CPU needs to report that it has seen
> > +* the most recent counter flip, thereby declaring that all
> > +* subsequent rcu_read_lock() invocations will respect this flip.
> > +*/
> > +
> > +   cpu = raw_smp_processor_id();
> > +   if (per_cpu(rcu_flip_flag, cpu) == rcu_flipped) {
> > +   smp_mb();  /* Subsequent counter accesses must see new value */
> > +   per_cpu(rcu_flip_flag, cpu) = rcu_flip_seen;
> > +   smp_mb();  /* Subsequent RCU read-side critical sections */
> > +  /*  seen -after- acknowledgement. */
> > +   }
> > +}
> 
> [snip]
> 
> > +/*
> > + * Attempt a single flip of the counters.  Remember, a single flip does
> > + * -not- constitute a grace period.  Instead, the interval between
> > + * at least three consecutive flips is a grace period.
> > + *
> > + * If anyone is nuts enough to run this CONFIG_PREEMPT_RCU implementation
> > + * on a large SMP, they might want to use a hierarchical organization of
> > + * the per-CPU-counter pairs.
> > + */
> > +static void rcu_try_flip(void)
> > +{
> > +   unsigned long oldirq;
> > +
> > +   RCU_TRACE_ME(rcupreempt_trace_try_flip_1);
> > +   if (unlikely(!spin_trylock_irqsave(&rcu_ctrlblk.fliplock, oldirq))) {

Re: [PATCH 07/25] r/o bind mounts: elevate write count for some ioctls

2007-09-21 Thread Andrew Morton

On Thu, 20 Sep 2007 12:52:57 -0700
Dave Hansen <[EMAIL PROTECTED]> wrote:

> Some ioctl()s can cause writes to the filesystem.  Take
> these, and make them use mnt_want/drop_write() instead.
> 
> We need to pass the filp one layer deeper in XFS, but
> somebody _just_ pulled it out in February because nobody
> was using it, so I don't feel guilty for adding it back.

Note that -mm's ext2-reservations.patch adds EXT2_IOC_SETRSVSZ,
and it doesn't do mnt_want_write().
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.22.7

2007-09-21 Thread Chris Wright

diff --git a/Makefile b/Makefile
index 3067f6a..12edea0 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 22
-EXTRAVERSION = .6
+EXTRAVERSION = .7
 NAME = Holy Dancing Manatees, Batman!
 
 # *DOCUMENTATION*
diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S
index 47565c3..0bc623a 100644
--- a/arch/x86_64/ia32/ia32entry.S
+++ b/arch/x86_64/ia32/ia32entry.S
@@ -38,6 +38,18 @@
movq%rax,R8(%rsp)
.endm
 
+   .macro LOAD_ARGS32 offset
+   movl \offset(%rsp),%r11d
+   movl \offset+8(%rsp),%r10d
+   movl \offset+16(%rsp),%r9d
+   movl \offset+24(%rsp),%r8d
+   movl \offset+40(%rsp),%ecx
+   movl \offset+48(%rsp),%edx
+   movl \offset+56(%rsp),%esi
+   movl \offset+64(%rsp),%edi
+   movl \offset+72(%rsp),%eax
+   .endm
+   
.macro CFI_STARTPROC32 simple
CFI_STARTPROC   \simple
CFI_UNDEFINED   r8
@@ -152,7 +164,7 @@ sysenter_tracesys:
movq$-ENOSYS,RAX(%rsp)  /* really needed? */
movq%rsp,%rdi/* &pt_regs -> arg1 */
callsyscall_trace_enter
-   LOAD_ARGS ARGOFFSET  /* reload args from stack in case ptrace changed 
it */
+   LOAD_ARGS32 ARGOFFSET  /* reload args from stack in case ptrace changed 
it */
RESTORE_REST
movl%ebp, %ebp
/* no need to do an access_ok check here because rbp has been
@@ -255,7 +267,7 @@ cstar_tracesys:
movq $-ENOSYS,RAX(%rsp) /* really needed? */
movq %rsp,%rdi/* &pt_regs -> arg1 */
call syscall_trace_enter
-   LOAD_ARGS ARGOFFSET  /* reload args from stack in case ptrace changed 
it */
+   LOAD_ARGS32 ARGOFFSET  /* reload args from stack in case ptrace changed 
it */
RESTORE_REST
movl RSP-ARGOFFSET(%rsp), %r8d
/* no need to do an access_ok check here because r8 has been
@@ -333,7 +345,7 @@ ia32_tracesys:
movq $-ENOSYS,RAX(%rsp) /* really needed? */
movq %rsp,%rdi/* &pt_regs -> arg1 */
call syscall_trace_enter
-   LOAD_ARGS ARGOFFSET  /* reload args from stack in case ptrace changed 
it */
+   LOAD_ARGS32 ARGOFFSET  /* reload args from stack in case ptrace changed 
it */
RESTORE_REST
jmp ia32_do_syscall
 END(ia32_syscall)
diff --git a/arch/x86_64/kernel/ptrace.c b/arch/x86_64/kernel/ptrace.c
index 9409117..8d89d8c 100644
--- a/arch/x86_64/kernel/ptrace.c
+++ b/arch/x86_64/kernel/ptrace.c
@@ -223,10 +223,6 @@ static int putreg(struct task_struct *child,
 {
unsigned long tmp; 

-   /* Some code in the 64bit emulation may not be 64bit clean.
-  Don't take any chances. */
-   if (test_tsk_thread_flag(child, TIF_IA32))
-   value &= 0x;
switch (regno) {
case offsetof(struct user_regs_struct,fs):
if (value && (value & 3) != 3)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] Linux-tiny project revival

2007-09-21 Thread Kyle Moffett


On Sep 21, 2007, at 18:05:34, Joe Perches wrote:

On Fri, 2007-09-21 at 17:34 -0400, Kyle Moffett wrote:
With a bit more glue that would cause GCC to notice that for a  
given qprintk_kmalloc the "qpk->type" is always zero because the  
level is too high, and therefore it would optimize out *ALL* of  
the _qprintk_kmalloc(), _qprintk(), and _qprintk_finish() calls.


A negative is that lockup conditions swallow partial messages.


But typically you don't care if a "partial line" gets swallowed  
regardless.  The only reason people really use partial lines is when  
they're accumulating a variable number of things into a single line  
and so a single printk() won't do, and in that case it's really not a  
problem to "lose" the first half of the line in event of a crash.   
And hell, if it matters that much you could just make the qprintk_ 
{kmalloc,percpu,irq} functions chain the qpk variables on a little  
linked list and stuff an smp_wmb() in the _gprint() function after  
writing the text and before writing the size.  That way any panic  
could very carefully look at the messages being queued during the  
crash and attempt to write out partial buffers.


It's a technique which in combination with looking at the first 3  
characters of the arguments to printk() would let you elide 99% of  
the non-critical printks pretty easily while only needing to change  
the much smaller proportion of the printk()s which are partial  
lines.  Furthermore it's pretty easy to grep for the partial-line  
printk()s and you can even have it emit warnings when you hit a  
partial-line printk() (it doesn't start with "<"[0-9]">") in -mm to  
help fix up the last few users and keep people from adding new ones.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [50/50] x86_64: Remove fpu io port resource

2007-09-21 Thread Jeff Garzik


Andi Kleen wrote:

Not needed on modern systems without external FPU

TBD on i386 it is only needed for true 386s. Could remove it there
TBD for >= 486

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86_64/kernel/setup.c |2 --
 1 file changed, 2 deletions(-)

Index: linux/arch/x86_64/kernel/setup.c
===
--- linux.orig/arch/x86_64/kernel/setup.c
+++ linux/arch/x86_64/kernel/setup.c
@@ -121,8 +121,6 @@ struct resource standard_io_resources[] 
 		.flags = IORESOURCE_BUSY | IORESOURCE_IO },

{ .name = "dma2", .start = 0xc0, .end = 0xdf,
.flags = IORESOURCE_BUSY | IORESOURCE_IO },
-   { .name = "fpu", .start = 0xf0, .end = 0xff,
-   .flags = IORESOURCE_BUSY | IORESOURCE_IO }


Since we are merging x86 and x86-64, I think it would be nice at least 
to CC Thomas on patches that increase 32/64-bit differences...  because 
won't this patch have to be partial un-done when we merge i386 and x86-64?


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux 2.6.22.7

2007-09-21 Thread Chris Wright

We (the -stable team) are announcing the release of the 2.6.22.7 kernel.
It contains a single security bugfix for the x86_64 architecture.
There is potential for local privilege escalation, so all x86_64 users
are certainly encouraged to upgrade.

CVE-2007-4573: x86_64: Zero extend all registers after ptrace in 32bit entry 
path.

I'll also be replying to this message with a copy of the patch between
2.6.22.6 and 2.6.22.7

The updated 2.6.22.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.22.y.git
and can be browsed at the normal kernel.org git web browser:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.22.y.git;a=summary

thanks,
-chris



 Makefile |2 +-
 arch/x86_64/ia32/ia32entry.S |   18 +++---
 arch/x86_64/kernel/ptrace.c  |4 
 3 files changed, 16 insertions(+), 8 deletions(-)

Summary of changes from v2.6.22.6 to v2.6.22.7
==

Andi Kleen (1):
  x86_64: Zero extend all registers after ptrace in 32bit entry path.

Chris Wright (1):
  Linux 2.6.22.7

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [20/45] x86_64: Use 8 byte stack alignment when possible

2007-09-21 Thread Dave Jones

On Sat, Sep 22, 2007 at 12:34:31AM +0200, Andi Kleen wrote:
 > On Friday 21 September 2007 23:13, Dave Jones wrote:
 > > On Fri, Sep 21, 2007 at 10:45:02PM +0200, Andi Kleen wrote:
 > >  > Kernel doesn't use SSE2, so it doesn't need 16 byte alignment. Also
 > >  > the stack can be already unaligned so letting the compiler align
 > >  > is useless. This may make some stack frames smaller.
 > >  > Only works with very recent gcc 4.3
 > >
 > > My gcc 4.1.2 from Fedora 7 (with who knows what backported)
 > > references this in its manpage. How was it broken before 4.3 ?
 > 
 > Try it. It is rejected by the compiler in 64bit mode.

Ah yes, it fails if not between 4 & 12, but the call cc-option
catches that.  Looks fine to me.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Jeff Garzik


Denys Vlasenko wrote:

On Friday 21 September 2007 20:33, Krzysztof Oledzki wrote:

On Fri, 21 Sep 2007, Denys Vlasenko wrote:


On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote:

On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said:


I plan to use gzip compression on following drivers' firmware,
if patches will be accepted:

   textdata bss dec hex filename
  17653  109968 240  127861   1f375 drivers/net/acenic.o
   6628  120448   4  127080   1f068 drivers/net/dgrs.o
 ^^

Should this be redone to use the existing firmware loading framework to
load the firmware instead?

Not in every case.

For example, bnx2 maintainer says that driver and
firmware are closely tied for his driver. IOW: you upgrade kernel
and your NIC is not working anymore.
Firmware may come with a kernel. We have a "install modules", we can also 
add "install firmware".


Install where? I boot my machine over NFS, and it has no hard drive.


Special cases already fail when using distro-linked targets like "make 
install."


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] x86: Convert cpuinfo_x86 array to a per_cpu array v2

2007-09-21 Thread Andrew Morton

On Thu, 20 Sep 2007 14:30:05 -0700
[EMAIL PROTECTED] wrote:

> cpu_data is currently an array defined using NR_CPUS. This means that
> we overallocate since we will rarely really use maximum configured cpus.
> When NR_CPU count is raised to 4096 the size of cpu_data becomes
> 3,145,728 bytes.

This has at least three quite obvious and careless compilation errors.

Please at least compile the code after you've altered it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-21 Thread Paul E. McKenney

On Fri, Sep 21, 2007 at 06:31:12PM -0400, Steven Rostedt wrote:
> On Fri, Sep 21, 2007 at 05:46:53PM +0200, Peter Zijlstra wrote:
> > On Fri, 21 Sep 2007 10:40:03 -0400 Steven Rostedt <[EMAIL PROTECTED]>
> > wrote:
> > 
> > > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote:
> > 
> > 
> > > Can you have a pointer somewhere that explains these states. And not a
> > > "it's in this paper or directory". Either have a short discription here,
> > > or specify where exactly to find the information (perhaps a
> > > Documentation/RCU/preemptible_states.txt?).
> > > 
> > > Trying to understand these states has caused me the most agony in
> > > reviewing these patches.
> > > 
> > > > + */
> > > > +
> > > > +enum rcu_try_flip_states {
> > > > +   rcu_try_flip_idle_state,/* "I" */
> > > > +   rcu_try_flip_waitack_state, /* "A" */
> > > > +   rcu_try_flip_waitzero_state,/* "Z" */
> > > > +   rcu_try_flip_waitmb_state   /* "M" */
> > > > +};
> > 
> > I thought the 4 flip states corresponded to the 4 GP stages, but now
> > you confused me. It seems to indeed progress one stage for every 4 flip
> > states.
> 
> I'm still confused ;-)

If you do a synchronize_rcu() it might well have to wait through the
following sequence of states:

Stage 0: (might have to wait through part of this to get out of "next" queue)
rcu_try_flip_idle_state,/* "I" */
rcu_try_flip_waitack_state, /* "A" */
rcu_try_flip_waitzero_state,/* "Z" */
rcu_try_flip_waitmb_state   /* "M" */
Stage 1:
rcu_try_flip_idle_state,/* "I" */
rcu_try_flip_waitack_state, /* "A" */
rcu_try_flip_waitzero_state,/* "Z" */
rcu_try_flip_waitmb_state   /* "M" */
Stage 2:
rcu_try_flip_idle_state,/* "I" */
rcu_try_flip_waitack_state, /* "A" */
rcu_try_flip_waitzero_state,/* "Z" */
rcu_try_flip_waitmb_state   /* "M" */
Stage 3:
rcu_try_flip_idle_state,/* "I" */
rcu_try_flip_waitack_state, /* "A" */
rcu_try_flip_waitzero_state,/* "Z" */
rcu_try_flip_waitmb_state   /* "M" */
Stage 4:
rcu_try_flip_idle_state,/* "I" */
rcu_try_flip_waitack_state, /* "A" */
rcu_try_flip_waitzero_state,/* "Z" */
rcu_try_flip_waitmb_state   /* "M" */

So yes, grace periods do indeed have some latency.

> > Hmm, now I have to puzzle how these 4 stages are required by the lock
> > and unlock magic.
> > 
> > > > +/*
> > > > + * Return the number of RCU batches processed thus far.  Useful for 
> > > > debug
> > > > + * and statistics.  The _bh variant is identical to straight RCU.
> > > > + */
> > > 
> > > If they are identical, then why the separation?
> > 
> > I guess a smaller RCU domain makes for quicker grace periods.
> 
> No, I mean that both the rcu_batches_completed and
> rcu_batches_completed_bh are identical. Perhaps we can just put in a
> 
> #define rcu_batches_completed_bh rcu_batches_completed
> 
> in rcupreempt.h.  In rcuclassic, they are different. But no need to have
> two identical functions in the preempt version. A macro should do.

Ah!!!  Good point, #define does make sense here.

> > > > +void __rcu_read_lock(void)
> > > > +{
> > > > +   int idx;
> > > > +   struct task_struct *me = current;
> > > 
> > > Nitpick, but other places in the kernel usually use "t" or "p" as a
> > > variable to assign current to.  It's just that "me" thows me off a
> > > little while reviewing this.  But this is just a nitpick, so do as you
> > > will.
> > 
> > struct task_struct *curr = current;
> > 
> > is also not uncommon.
> 
> True, but the "me" confused me. Since that task struct is not me ;-)

Well, who is it, then?  ;-)

> > > > +   int nesting;
> > > > +
> > > > +   nesting = ORDERED_WRT_IRQ(me->rcu_read_lock_nesting);
> > > > +   if (nesting != 0) {
> > > > +
> > > > +   /* An earlier rcu_read_lock() covers us, just count it. 
> > > > */
> > > > +
> > > > +   me->rcu_read_lock_nesting = nesting + 1;
> > > > +
> > > > +   } else {
> > > > +   unsigned long oldirq;
> > > 
> > > > +
> > > > +   /*
> > > > +* Disable local interrupts to prevent the grace-period
> > > > +* detection state machine from seeing us half-done.
> > > > +* NMIs can still occur, of course, and might themselves
> > > > +* contain rcu_read_lock().
> > > > +*/
> > > > +
> > > > +   local_irq_save(oldirq);
> > > 
> > > Isn't the GP detection done via a tasklet/softirq. So wouldn't a
> > > local_bh_disable be sufficient here? You already cover NMIs, which would
> > > also handle normal interrupts.
> > 
> > This is also my understanding, but I think this disable is an
> > 'optimization' in that it avoids the regular IRQs from jumping through
> > these hoops outlined

Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu

2007-09-21 Thread Dave Jones

On Sat, Sep 22, 2007 at 12:32:02AM +0200, Andi Kleen wrote:


 > +  Select this for:
 > +Pentiums (Pentium 4, Pentium D, Celeron, Celeron D) corename:
 > +-Willamette
 > +-Northwood
 > +-Mobile Pentium 4
 > +-Mobile Pentium 4 M
 > +-Extreme Edition (Gallatin)
 > +-Prescott
 > +-Prescott 2M
 > +-Cedar Mill
 > +-Presler
 > +-Smithfiled
 > +Xeons (Intel Xeon, Xeon MP, Xeon LV, Xeon MV) corename:
 > +-Foster
 > +-Prestonia
 > +-Gallatin
 > +-Nocona
 > +-Irwindale
 > +-Cranford
 > +-Potomac
 > +-Paxville
 > +-Dempsey

This seems like yet another list that will need to be perpetually
kept up to date, and given 99% of users don't know the codename
of their core, just the marketing name, I question its value.

 > +  more info: http://balusc.xs4all.nl/srv/har-cpu.html
 
This URL is dead already.

 >  config MPSC
 > bool "Intel P4 / older Netburst based Xeon"
 > help

sidenote: I always wondered what 'PSC' stood for ?

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Alan Cox

> According to an earlier thread, dgrs was never really maintained, 
> written for hardware that was never really distributed widely, and very 
> likely hasn't had users in years... if ever.
> 
> If that picture is accurate (it's a story I was told), then I am 
> definitely queueing up a deletion patch.

I think thats sensible. If someone whines it can be put back but I really
don't think anyone will
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [6/50] i386: clean up oops/bug reports

2007-09-21 Thread Chuck Ebbert

On 09/21/2007 06:32 PM, Andi Kleen wrote:
> From: Pavel Emelyanov <[EMAIL PROTECTED]>
> 
> Typically the oops first lines look like this:
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 
>  printing eip:
> c049dfbd
> *pde = 
> Oops: 0002 [#1]
> PREEMPT SMP
> ...
> 
> Such output is gained with some ugly if (!nl) printk("\n"); code and
> besides being a waste of lines, this is also annoying to read. The
> following output looks better (and it is how it looks on x86_64):
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 
> printing eip: c049dfbd *pde = 
> Oops: 0002 [#1] PREEMPT SMP
> ...
> 
> Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

Reviewed-by: Chuck Ebbert <[EMAIL PROTECTED]>

> 
> ---
> 
>  arch/i386/kernel/traps.c |   16 
>  arch/i386/mm/fault.c |   13 +++--
>  2 files changed, 11 insertions(+), 18 deletions(-)
> 
> Index: linux/arch/i386/kernel/traps.c
> ===
> --- linux.orig/arch/i386/kernel/traps.c
> +++ linux/arch/i386/kernel/traps.c
> @@ -444,31 +444,23 @@ void die(const char * str, struct pt_reg
>   local_save_flags(flags);
>  
>   if (++die.lock_owner_depth < 3) {
> - int nl = 0;
>   unsigned long esp;
>   unsigned short ss;
>  
>   report_bug(regs->eip, regs);
>  
> - printk(KERN_EMERG "%s: %04lx [#%d]\n", str, err & 0x, 
> ++die_counter);
> + printk(KERN_EMERG "%s: %04lx [#%d] ", str, err & 0x, 
> ++die_counter);
>  #ifdef CONFIG_PREEMPT
> - printk(KERN_EMERG "PREEMPT ");
> - nl = 1;
> + printk("PREEMPT ");
>  #endif
>  #ifdef CONFIG_SMP
> - if (!nl)
> - printk(KERN_EMERG);
>   printk("SMP ");
> - nl = 1;
>  #endif
>  #ifdef CONFIG_DEBUG_PAGEALLOC
> - if (!nl)
> - printk(KERN_EMERG);
>   printk("DEBUG_PAGEALLOC");
> - nl = 1;
>  #endif
> - if (nl)
> - printk("\n");
> + printk("\n");
> +
>   if (notify_die(DIE_OOPS, str, regs, err,
>   current->thread.trap_no, SIGSEGV) !=
>   NOTIFY_STOP) {
> Index: linux/arch/i386/mm/fault.c
> ===
> --- linux.orig/arch/i386/mm/fault.c
> +++ linux/arch/i386/mm/fault.c
> @@ -544,23 +544,22 @@ no_context:
>   printk(KERN_ALERT "BUG: unable to handle kernel paging"
>   " request");
>   printk(" at virtual address %08lx\n",address);
> - printk(KERN_ALERT " printing eip:\n");
> - printk("%08lx\n", regs->eip);
> + printk(KERN_ALERT "printing eip: %08lx ", regs->eip);
>  
>   page = read_cr3();
>   page = ((__typeof__(page) *) __va(page))[address >> 
> PGDIR_SHIFT];
>  #ifdef CONFIG_X86_PAE
> - printk(KERN_ALERT "*pdpt = %016Lx\n", page);
> + printk("*pdpt = %016Lx ", page);
>   if ((page >> PAGE_SHIFT) < max_low_pfn
>   && page & _PAGE_PRESENT) {
>   page &= PAGE_MASK;
>   page = ((__typeof__(page) *) __va(page))[(address >> 
> PMD_SHIFT)
>& 
> (PTRS_PER_PMD - 1)];
> - printk(KERN_ALERT "*pde = %016Lx\n", page);
> + printk(KERN_ALERT "*pde = %016Lx ", page);
>   page &= ~_PAGE_NX;
>   }
>  #else
> - printk(KERN_ALERT "*pde = %08lx\n", page);
> + printk("*pde = %08lx ", page);
>  #endif
>  
>   /*
> @@ -574,8 +573,10 @@ no_context:
>   page &= PAGE_MASK;
>   page = ((__typeof__(page) *) __va(page))[(address >> 
> PAGE_SHIFT)
>& 
> (PTRS_PER_PTE - 1)];
> - printk(KERN_ALERT "*pte = %0*Lx\n", sizeof(page)*2, 
> (u64)page);
> + printk("*pte = %0*Lx ", sizeof(page)*2, (u64)page);
>   }
> +
> + printk("\n");
>   }
>  
>   tsk->thread.cr2 = address;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko

On Friday 21 September 2007 20:33, Krzysztof Oledzki wrote:
> 
> On Fri, 21 Sep 2007, Denys Vlasenko wrote:
> 
> > On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote:
> >> On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said:
> >>
> >>> I plan to use gzip compression on following drivers' firmware,
> >>> if patches will be accepted:
> >>>
> >>>textdata bss dec hex filename
> >>>   17653  109968 240  127861   1f375 drivers/net/acenic.o
> >>>6628  120448   4  127080   1f068 drivers/net/dgrs.o
> >>>  ^^
> >>
> >> Should this be redone to use the existing firmware loading framework to
> >> load the firmware instead?
> >
> > Not in every case.
> >
> > For example, bnx2 maintainer says that driver and
> > firmware are closely tied for his driver. IOW: you upgrade kernel
> > and your NIC is not working anymore.
>
> Firmware may come with a kernel. We have a "install modules", we can also 
> add "install firmware".

Install where? I boot my machine over NFS, and it has no hard drive.

> > Another argument is to make kernel be able to bring up NICs
> > without needing firmware images in initramfs/initrd/hard drive.
> 
> It is not possible to bring up things like FC or WiFi without firmware, 
> what special is in classic NICs?

Nothing.

It is just not (yet?) decreed from The Very Top that all and every
firmware image should be loaded using request_firmware().

Also people may want to gzip something else than firmware.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko

On Friday 21 September 2007 21:13, Andi Kleen wrote:
> Denys Vlasenko <[EMAIL PROTECTED]> writes:
> > 
> > I plan to use gzip compression on following drivers' firmware,
> > if patches will be accepted:
> > 
> >textdata bss dec hex filename
> >   17653  109968 240  127861   1f375 drivers/net/acenic.o
> >6628  120448   4  127080   1f068 drivers/net/dgrs.o
> >  ^^
> 
> Just change the makefiles to always install gzip'ed modules
> modutils knows how to unzip them on the fly.

But I compile net/* into bzImage. I like netbooting :)
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: Message codes (Re: [Announce] Linux-tiny project revival)

2007-09-21 Thread Gross, Mark



>-Original Message-
>From: Joe Perches [mailto:[EMAIL PROTECTED]
>Sent: Friday, September 21, 2007 3:33 PM
>To: Gross, Mark
>Cc: Rob Landley; Oleg Verych; Alexey Dobriyan; Michael Opdenacker;
linux-
>[EMAIL PROTECTED]; CE Linux Developers List; linux kernel
>Subject: RE: Message codes (Re: [Announce] Linux-tiny project revival)
>
>On Fri, 2007-09-21 at 15:12 -0700, Gross, Mark wrote:
>> Use compiler tricks to remove ALL the static printk string from
>> the kernel and replace the printk with something that outputs a
>> decimal index followed by tuples, of zero to N, hex-strings on
>
>> I proposed a mechanism for keeping all the printk data and saving
space
>> buy doing some table based compressions that has the side effect of
>> making the syslog not human readable.  You proposed a mechanism for
>> no-oping out complete log-levels.
>
>How about compiler tricks to compress the static printk strings?
>These could be expanded at runtime to use as the format.

You would have to hold the text table (compressed) in memory to do this
at run time.  That would still be pretty large hunk of memory.  

>
>Timothy Miller suggested something similar awhile ago.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: Message codes (Re: [Announce] Linux-tiny project revival)

2007-09-21 Thread Joe Perches

On Fri, 2007-09-21 at 15:12 -0700, Gross, Mark wrote:
> Use compiler tricks to remove ALL the static printk string from
> the kernel and replace the printk with something that outputs a
> decimal index followed by tuples, of zero to N, hex-strings on

> I proposed a mechanism for keeping all the printk data and saving space
> buy doing some table based compressions that has the side effect of
> making the syslog not human readable.  You proposed a mechanism for
> no-oping out complete log-levels.   

How about compiler tricks to compress the static printk strings?
These could be expanded at runtime to use as the format.

Timothy Miller suggested something similar awhile ago.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [20/45] x86_64: Use 8 byte stack alignment when possible

2007-09-21 Thread Andi Kleen

On Friday 21 September 2007 23:13, Dave Jones wrote:
> On Fri, Sep 21, 2007 at 10:45:02PM +0200, Andi Kleen wrote:
>  > Kernel doesn't use SSE2, so it doesn't need 16 byte alignment. Also
>  > the stack can be already unaligned so letting the compiler align
>  > is useless. This may make some stack frames smaller.
>  > Only works with very recent gcc 4.3
>
> My gcc 4.1.2 from Fedora 7 (with who knows what backported)
> references this in its manpage. How was it broken before 4.3 ?

Try it. It is rejected by the compiler in 64bit mode.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [50/50] x86_64: Remove fpu io port resource

2007-09-21 Thread Andi Kleen


Not needed on modern systems without external FPU

TBD on i386 it is only needed for true 386s. Could remove it there
TBD for >= 486

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86_64/kernel/setup.c |2 --
 1 file changed, 2 deletions(-)

Index: linux/arch/x86_64/kernel/setup.c
===
--- linux.orig/arch/x86_64/kernel/setup.c
+++ linux/arch/x86_64/kernel/setup.c
@@ -121,8 +121,6 @@ struct resource standard_io_resources[] 
.flags = IORESOURCE_BUSY | IORESOURCE_IO },
{ .name = "dma2", .start = 0xc0, .end = 0xdf,
.flags = IORESOURCE_BUSY | IORESOURCE_IO },
-   { .name = "fpu", .start = 0xf0, .end = 0xff,
-   .flags = IORESOURCE_BUSY | IORESOURCE_IO }
 };
 
 #define IORESOURCE_RAM (IORESOURCE_BUSY | IORESOURCE_MEM)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] pcmcia: Convert io_req_t to use kio_addr_t

2007-09-21 Thread Alan Cox

On Fri, 21 Sep 2007 17:15:16 -0500
Olof Johansson <[EMAIL PROTECTED]> wrote:

> Convert the io_req_t members to kio_addr_t, to allow use on machines with
> more than 16 bits worth of IO ports (i.e. secondary busses on ppc64, etc).

What about the formatting and field widths ?

ulong would probably be a lot saner than kio_addr_t and yet more type
obfuscation.

Otherwise looks sensible to me
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Jeff Garzik


Alan Cox wrote:

For example, bnx2 maintainer says that driver and
firmware are closely tied for his driver. IOW: you upgrade kernel
and your NIC is not working anymore.

Another argument is to make kernel be able to bring up NICs
without needing firmware images in initramfs/initrd/hard drive.


dgrs should be using the request_firmware interface. Actually dgrs is
probably a good candidate for /dev/null


According to an earlier thread, dgrs was never really maintained, 
written for hardware that was never really distributed widely, and very 
likely hasn't had users in years... if ever.


If that picture is accurate (it's a story I was told), then I am 
definitely queueing up a deletion patch.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [48/50] x86_64: return correct error code from child_rip in x86_64 entry.S

2007-09-21 Thread Andi Kleen


From: Andrey Mirkin <[EMAIL PROTECTED]>

Right now register edi is just cleared before calling do_exit.
That is wrong because correct return value will be ignored.
Value from rax should be copied to rdi instead of clearing edi.

AK: changed to 32bit move because it's strictly an int

Signed-off-by: Andrey Mirkin <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

-

---
 arch/x86_64/kernel/entry.S |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/x86_64/kernel/entry.S
===
--- linux.orig/arch/x86_64/kernel/entry.S
+++ linux/arch/x86_64/kernel/entry.S
@@ -989,7 +989,7 @@ child_rip:
movq %rsi, %rdi
call *%rax
# exit
-   xorl %edi, %edi
+   mov %eax, %edi
call do_exit
CFI_ENDPROC
 ENDPROC(child_rip)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [49/50] x86_64: Initialize 64bit registers for a.out executables

2007-09-21 Thread Andi Kleen


Previously the data from before the exec was kept in there. Zero
them instead
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86_64/ia32/ia32_aout.c |2 ++
 1 file changed, 2 insertions(+)

Index: linux/arch/x86_64/ia32/ia32_aout.c
===
--- linux.orig/arch/x86_64/ia32/ia32_aout.c
+++ linux/arch/x86_64/ia32/ia32_aout.c
@@ -422,6 +422,8 @@ beyond_if:
(regs)->eflags = 0x200;
(regs)->cs = __USER32_CS;
(regs)->ss = __USER32_DS;
+   regs->r8 = regs->r9 = regs->r10 = regs->r11 =
+   regs->r12 = regs->r13 = regs->r14 = regs->r15 = 0;
set_fs(USER_DS);
if (unlikely(current->ptrace & PT_PTRACED)) {
if (current->ptrace & PT_TRACE_EXEC)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [46/50] x86: also show non-zero IRQ counts for vectors that currently don't have a handler

2007-09-21 Thread Andi Kleen


From: "Jan Beulich" <[EMAIL PROTECTED]>
It doesn't seem to make sense to hide these, even if their counts
can't change at the point in time they're being displayed.

Signed-off-by: Jan Beulich <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

 arch/i386/kernel/irq.c   |   18 ++
 arch/x86_64/kernel/irq.c |   18 ++
 2 files changed, 28 insertions(+), 8 deletions(-)

Index: linux/arch/i386/kernel/irq.c
===
--- linux.orig/arch/i386/kernel/irq.c
+++ linux/arch/i386/kernel/irq.c
@@ -259,9 +259,17 @@ int show_interrupts(struct seq_file *p, 
}
 
if (i < NR_IRQS) {
+   unsigned any_count = 0;
+
spin_lock_irqsave(&irq_desc[i].lock, flags);
+#ifndef CONFIG_SMP
+   any_count = kstat_irqs(i);
+#else
+   for_each_online_cpu(j)
+   any_count |= kstat_cpu(j).irqs[i];
+#endif
action = irq_desc[i].action;
-   if (!action)
+   if (!action && !any_count)
goto skip;
seq_printf(p, "%3d: ",i);
 #ifndef CONFIG_SMP
@@ -272,10 +280,12 @@ int show_interrupts(struct seq_file *p, 
 #endif
seq_printf(p, " %8s", irq_desc[i].chip->name);
seq_printf(p, "-%-8s", irq_desc[i].name);
-   seq_printf(p, "  %s", action->name);
 
-   for (action=action->next; action; action = action->next)
-   seq_printf(p, ", %s", action->name);
+   if (action) {
+   seq_printf(p, "  %s", action->name);
+   while ((action = action->next) != NULL)
+   seq_printf(p, ", %s", action->name);
+   }
 
seq_putc(p, '\n');
 skip:
Index: linux/arch/x86_64/kernel/irq.c
===
--- linux.orig/arch/x86_64/kernel/irq.c
+++ linux/arch/x86_64/kernel/irq.c
@@ -64,9 +64,17 @@ int show_interrupts(struct seq_file *p, 
}
 
if (i < NR_IRQS) {
+   unsigned any_count = 0;
+
spin_lock_irqsave(&irq_desc[i].lock, flags);
+#ifndef CONFIG_SMP
+   any_count = kstat_irqs(i);
+#else
+   for_each_online_cpu(j)
+   any_count |= kstat_cpu(j).irqs[i];
+#endif
action = irq_desc[i].action;
-   if (!action) 
+   if (!action && !any_count)
goto skip;
seq_printf(p, "%3d: ",i);
 #ifndef CONFIG_SMP
@@ -78,9 +86,11 @@ int show_interrupts(struct seq_file *p, 
seq_printf(p, " %8s", irq_desc[i].chip->name);
seq_printf(p, "-%-8s", irq_desc[i].name);
 
-   seq_printf(p, "  %s", action->name);
-   for (action=action->next; action; action = action->next)
-   seq_printf(p, ", %s", action->name);
+   if (action) {
+   seq_printf(p, "  %s", action->name);
+   while ((action = action->next) != NULL)
+   seq_printf(p, ", %s", action->name);
+   }
seq_putc(p, '\n');
 skip:
spin_unlock_irqrestore(&irq_desc[i].lock, flags);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [47/50] i386: avoid temporarily inconsistent pte-s

2007-09-21 Thread Andi Kleen


From: "Jan Beulich" <[EMAIL PROTECTED]>
One more of these issues (which were considered fixed a few releases
back): Other than on x86-64, i386 allows set_fixmap() to replace
already present mappings. Consequently, on PAE, care must be taken to
not update the high half of a pte while the low half is still holding
the old value.

Signed-off-by: Jan Beulich <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

 arch/i386/mm/pgtable.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux/arch/i386/mm/pgtable.c
===
--- linux.orig/arch/i386/mm/pgtable.c
+++ linux/arch/i386/mm/pgtable.c
@@ -97,8 +97,7 @@ static void set_pte_pfn(unsigned long va
}
pte = pte_offset_kernel(pmd, vaddr);
if (pgprot_val(flags))
-   /*  stored as-is, to permit clearing entries */
-   set_pte(pte, pfn_pte(pfn, flags));
+   set_pte_present(&init_mm, vaddr, pte, pfn_pte(pfn, flags));
else
pte_clear(&init_mm, vaddr, pte);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 >

1 - 100 of 557 matches

Mail list logo