Re: [PATCH 1/13] timestamp fixes

2005-02-23 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> 1/13
> 

ugh, has this been tested? It needs the patch below.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- linux/kernel/sched.c.orig
+++ linux/kernel/sched.c
@@ -2704,11 +2704,11 @@ need_resched_nonpreemptible:
 
schedstat_inc(rq, sched_cnt);
now = sched_clock();
-   if (likely((long long)now - prev->timestamp < NS_MAX_SLEEP_AVG))
+   if (likely((long long)now - prev->timestamp < NS_MAX_SLEEP_AVG)) {
run_time = now - prev->timestamp;
if (unlikely((long long)now - prev->timestamp < 0))
run_time = 0;
-   else
+   } else
run_time = NS_MAX_SLEEP_AVG;
 
/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Non-DMA mode for floppy on PowerPC, new version

2005-02-23 Thread Pavel Fedin
 Here is a cleaned up version of my 2.6.8 kernel patch.

 This patch allows to use floppy drive in non-DMA mode on PegasosPPC and 
AmigaOne machines. To use it:
 1. Do not build floppy driver as a module, link it statically. Transferring 
parameters to it from insmod is still problematic, at least it doesn't work 
properly on my system. May be i'll clean it up in future.
 2. Specify floppy=nodma in kernel's arguments. Also you'll need to specify 
your drive type here using floppy=,,cmos. For 
example, floppy=0,4,cmos specifies type 4 (1.44 mb 3.5") for drive 0 on my 
system. Default drive type is 2.88 mb.
 This patch does not affect operation of the driver in DMA mode so it's safe to 
use on any platform.

-- 
Best regards,
Pavel Fedin,
mailto:[EMAIL PROTECTED]
--- linux-2.6.8.1-10mdk/include/asm-ppc/floppy.h.orig   2004-08-14 
06:55:10.0 -0400
+++ linux-2.6.8.1-10mdk/include/asm-ppc/floppy.h2005-02-24 
09:41:54.594830800 -0500
@@ -11,30 +11,163 @@
 #ifndef __ASM_PPC_FLOPPY_H
 #define __ASM_PPC_FLOPPY_H
 
+#include 
+
+#define CSW fd_routine[can_use_virtual_dma & 1]
+
 #define fd_inb(port)   inb_p(port)
 #define fd_outb(value,port)outb_p(value,port)
 
-#define fd_enable_dma() enable_dma(FLOPPY_DMA)
-#define fd_disable_dma()disable_dma(FLOPPY_DMA)
-#define fd_request_dma()request_dma(FLOPPY_DMA,"floppy")
-#define fd_free_dma()   free_dma(FLOPPY_DMA)
-#define fd_clear_dma_ff()   clear_dma_ff(FLOPPY_DMA)
-#define fd_set_dma_mode(mode)   set_dma_mode(FLOPPY_DMA,mode)
-#define fd_set_dma_addr(addr)   set_dma_addr(FLOPPY_DMA,(unsigned 
int)virt_to_bus(addr))
-#define fd_set_dma_count(count) set_dma_count(FLOPPY_DMA,count)
+#define fd_disable_dma()   CSW._disable_dma(FLOPPY_DMA)
+#define fd_request_dma()CSW._request_dma(FLOPPY_DMA,"floppy")
+#define fd_free_dma()   CSW._free_dma(FLOPPY_DMA)
+#define fd_get_dma_residue()CSW._get_dma_residue(FLOPPY_DMA)
+#define fd_dma_mem_alloc(size) CSW._dma_mem_alloc(size)
+#define fd_dma_setup(addr, size, mode, io) CSW._dma_setup(addr, size, mode, io)
 #define fd_enable_irq() enable_irq(FLOPPY_IRQ)
 #define fd_disable_irq()disable_irq(FLOPPY_IRQ)
-#define fd_cacheflush(addr,size) /* nothing */
-#define fd_request_irq()request_irq(FLOPPY_IRQ, floppy_interrupt, \
-   SA_INTERRUPT|SA_SAMPLE_RANDOM, \
-   "floppy", NULL)
 #define fd_free_irq()   free_irq(FLOPPY_IRQ, NULL);
 
-__inline__ void virtual_dma_init(void)
+static int virtual_dma_count;
+static int virtual_dma_residue;
+static char *virtual_dma_addr;
+static int virtual_dma_mode;
+static int doing_pdma;
+
+static irqreturn_t floppy_hardint(int irq, void *dev_id, struct pt_regs * regs)
+{
+   unsigned char st;
+
+   if (!doing_pdma)
+   return floppy_interrupt(irq, dev_id, regs);
+
+   {
+   int lcount;
+   char *lptr;
+
+   st = 1;
+   for (lcount=virtual_dma_count, lptr=virtual_dma_addr; 
+   lcount; lcount--, lptr++) {
+   st=inb(virtual_dma_port+4) & 0xa0 ;
+   if (st != 0xa0) 
+   break;
+   if (virtual_dma_mode)
+   outb_p(*lptr, virtual_dma_port+5);
+   else
+   *lptr = inb_p(virtual_dma_port+5);
+   }
+   virtual_dma_count = lcount;
+   virtual_dma_addr = lptr;
+   st = inb(virtual_dma_port+4);
+   }
+
+   if (st == 0x20)
+   return IRQ_HANDLED;
+   if (!(st & 0x20)) {
+   virtual_dma_residue += virtual_dma_count;
+   virtual_dma_count=0;
+   doing_pdma = 0;
+   floppy_interrupt(irq, dev_id, regs);
+   return IRQ_HANDLED;
+   }
+   return IRQ_HANDLED;
+}
+
+static void vdma_disable_dma(unsigned int dummy)
 {
-   /* Nothing to do on PowerPC */
+   doing_pdma = 0;
+   virtual_dma_residue += virtual_dma_count;
+   virtual_dma_count=0;
 }
 
+static int vdma_request_dma(unsigned int dmanr, const char * device_id)
+{
+   return 0;
+}
+
+static void vdma_nop(unsigned int dummy)
+{
+}
+
+
+static int vdma_get_dma_residue(unsigned int dummy)
+{
+   return virtual_dma_count + virtual_dma_residue;
+}
+
+
+static int fd_request_irq(void)
+{
+   if (can_use_virtual_dma)
+   return request_irq(FLOPPY_IRQ, floppy_hardint,SA_INTERRUPT,
+  "floppy", NULL);
+   else
+   return request_irq(FLOPPY_IRQ, floppy_interrupt,
+  
SA_INTERRUPT|SA_SAMPLE_RANDOM,
+  "floppy", NULL);

netdev-2.6, wireless-2.6 queues updated

2005-02-23 Thread Jeff Garzik
See attached changelog.  I'm too slack to post a patch tonight.
Please do a

bk pull bk://gkernel.bkbits.net/netdev-2.6

This will update the following files:

 drivers/net/bagetlance.c| 1368 -
 include/linux/dp83840.h |   41 
 Documentation/networking/bonding.txt| 2101 +
 Documentation/networking/e100.txt   |3 
 Documentation/networking/ixgb.txt   |9 
 MAINTAINERS |7 
 arch/arm/mach-pxa/lubbock.c |2 
 arch/arm/mach-sa1100/neponset.c |2 
 drivers/net/3c503.c |   67 
 drivers/net/3c509.c |4 
 drivers/net/3c515.c |   32 
 drivers/net/3c527.c |2 
 drivers/net/3c59x.c |2 
 drivers/net/8139cp.c|  100 
 drivers/net/8139too.c   |  293 -
 drivers/net/Kconfig |   59 
 drivers/net/Makefile|2 
 drivers/net/Space.c |   11 
 drivers/net/amd8111e.c  |6 
 drivers/net/arcnet/arc-rawmode.c|4 
 drivers/net/arcnet/arc-rimi.c   |   14 
 drivers/net/arcnet/arcnet.c |   30 
 drivers/net/arcnet/com20020.c   |6 
 drivers/net/arcnet/com90io.c|4 
 drivers/net/arcnet/com90xx.c|8 
 drivers/net/arcnet/rfc1051.c|8 
 drivers/net/arcnet/rfc1201.c|   12 
 drivers/net/au1000_eth.c| 1361 -
 drivers/net/au1000_eth.h|   55 
 drivers/net/b44.c   |2 
 drivers/net/b44.h   |   14 
 drivers/net/bonding/bond_3ad.c  |2 
 drivers/net/bonding/bond_3ad.h  |1 
 drivers/net/bonding/bond_alb.c  |   12 
 drivers/net/bonding/bond_main.c |   35 
 drivers/net/cs89x0.c|4 
 drivers/net/depca.c |4 
 drivers/net/dgrs.c  |6 
 drivers/net/e100.c  |4 
 drivers/net/e1000/e1000.h   |3 
 drivers/net/e1000/e1000_ethtool.c   |   11 
 drivers/net/e1000/e1000_hw.c|   86 
 drivers/net/e1000/e1000_hw.h|   11 
 drivers/net/e1000/e1000_main.c  |  249 -
 drivers/net/eepro100.c  |   17 
 drivers/net/epic100.c   |2 
 drivers/net/es3210.c|   32 
 drivers/net/ethertap.c  |4 
 drivers/net/ewrk3.c |   87 
 drivers/net/fealnx.c|  275 -
 drivers/net/hamradio/baycom_epp.c   |   53 
 drivers/net/hamradio/baycom_par.c   |8 
 drivers/net/hamradio/baycom_ser_fdx.c   |7 
 drivers/net/hamradio/baycom_ser_hdx.c   |7 
 drivers/net/hamradio/bpqether.c |   17 
 drivers/net/hamradio/dmascc.c   | 2073 
 drivers/net/hamradio/hdlcdrv.c  |   48 
 drivers/net/hamradio/mkiss.c|   12 
 drivers/net/hamradio/yam.c  |   38 
 drivers/net/ibm_emac/ibm_emac.h |4 
 drivers/net/ibmlana.c   |   99 
 drivers/net/ibmlana.h   |1 
 drivers/net/ioc3-eth.c  |   83 
 drivers/net/irda/act200l-sir.c  |3 
 drivers/net/irda/donauboe.c |2 
 drivers/net/irda/irtty-sir.c|4 
 drivers/net/irda/ma600-sir.c|   12 
 drivers/net/irda/sir_dev.c  |4 
 drivers/net/irda/tekram-sir.c   |3 
 drivers/net/ixgb/ixgb.h |3 
 drivers/net/ixgb/ixgb_ee.c  |   16 
 drivers/net/ixgb/ixgb_ee.h  |3 
 drivers/net/ixgb/ixgb_ethtool.c |5 
 drivers/net/ixgb/ixgb_hw.c  |2 
 drivers/net/ixgb/ixgb_hw.h  |2 
 drivers/net/ixgb/ixgb_ids.h |2 
 drivers/net/ixgb/ixgb_main.c|   73 
 drivers/net/ixgb/ixgb_osdep.h   |2 
 drivers/net/ixgb/ixgb_param.c   |2 
 drivers/net/jazzsonic.c |  217 
 drivers/net/loopback.c  |2 
 drivers/net/lp486e.c|8 
 drivers/net/meth.c  |  275 -
 drivers/net/meth.h  |2 
 

Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-23 Thread Guillaume Thouvenin
On Wed, 2005-02-23 at 11:11 -0800, Jay Lan wrote:
> Guillaume Thouvenin wrote:
> > It's what I'm proposing. The problem is to be alerted when a new process
> > is created in order to add it in the correct group of processes if the
> > parent belongs to one (or several) groups. The notification can be done
> > with the fork connector patch. 
> 
> I am not quite comfortable of ELSA requesting a fork hook this way.
> How many hooks in the stock kernel that are related to accounting? Can
> anyone answer this question? I know of 'acct_process()' in exit.c used
> by the BSD accounting and ELSA is requesting a hook in fork. If people
> raise the same question again a few years later, how many people will
> still remember this ELSA hook?

  The fork connector is not related to accounting. It's a connector that
allows to send information to a user space application when a fork
occurs in the kernel.

  This information is used by ELSA by I think that this hook will be
used by some others user space applications and IMHO, it's not
incompatible with a specific hook for accounting tool if needed.

Guillaume

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/13] basic tuning

2005-02-23 Thread Nick Piggin
13/13

Do some basic initial tuning.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/include/asm-x86_64/topology.h
===
--- linux-2.6.orig/include/asm-x86_64/topology.h	2005-02-24 17:39:07.615911131 +1100
+++ linux-2.6/include/asm-x86_64/topology.h	2005-02-24 17:39:07.990864853 +1100
@@ -52,12 +52,11 @@
 	.cache_nice_tries	= 2,			\
 	.busy_idx		= 3,			\
 	.idle_idx		= 2,			\
-	.newidle_idx		= 1, 			\
+	.newidle_idx		= 0, 			\
 	.wake_idx		= 1,			\
 	.forkexec_idx		= 1,			\
 	.per_cpu_gain		= 100,			\
 	.flags			= SD_LOAD_BALANCE	\
-| SD_BALANCE_NEWIDLE	\
 | SD_BALANCE_FORK	\
 | SD_BALANCE_EXEC	\
 | SD_WAKE_BALANCE,	\
Index: linux-2.6/include/linux/topology.h
===
--- linux-2.6.orig/include/linux/topology.h	2005-02-24 17:39:07.616911007 +1100
+++ linux-2.6/include/linux/topology.h	2005-02-24 17:39:07.991864730 +1100
@@ -118,15 +118,14 @@
 	.cache_nice_tries	= 1,			\
 	.per_cpu_gain		= 100,			\
 	.busy_idx		= 2,			\
-	.idle_idx		= 0,			\
-	.newidle_idx		= 1,			\
+	.idle_idx		= 1,			\
+	.newidle_idx		= 2,			\
 	.wake_idx		= 1,			\
-	.forkexec_idx		= 0,			\
+	.forkexec_idx		= 1,			\
 	.flags			= SD_LOAD_BALANCE	\
 | SD_BALANCE_NEWIDLE	\
 | SD_BALANCE_EXEC	\
-| SD_WAKE_AFFINE	\
-| SD_WAKE_BALANCE,	\
+| SD_WAKE_AFFINE,	\
 	.last_balance		= jiffies,		\
 	.balance_interval	= 1,			\
 	.nr_balance_failed	= 0,			\


Re: [8/14] Orinoco driver updates - PCMCIA initialization cleanups

2005-02-23 Thread Jeff Garzik
Dominik Brodowski wrote:
@@ -184,6 +186,7 @@
dev_list = link;
client_reg.dev_info = _info;
+   client_reg.Attributes = INFO_IO_CLIENT | INFO_CARD_SHARE;

That's not needed any longer for 2.6.
So who wants to send the incremental update patch?  :)
Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/13] sched-domains aware balance-on-fork

2005-02-23 Thread Nick Piggin
11/13

Reimplement the balance on exec balancing to be sched-domains aware. Use
this to also do balance on fork balancing. Make x86_64 do balance on fork
over the NUMA domain.

The problem that the non sched domains aware blancing became apparent on
dual core, multi socket opterons. What we want is for the new tasks to be
sent to a different socket, but more often than not, we would first load
up our sibling core, or fill two cores of a single remote socket before
selecting a new one.

This gives large improvements to STREAM on such systems.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>


Index: linux-2.6/include/asm-x86_64/topology.h
===
--- linux-2.6.orig/include/asm-x86_64/topology.h	2005-02-24 17:39:07.320947536 +1100
+++ linux-2.6/include/asm-x86_64/topology.h	2005-02-24 17:43:37.077660523 +1100
@@ -54,9 +54,11 @@
 	.idle_idx		= 2,			\
 	.newidle_idx		= 1, 			\
 	.wake_idx		= 1,			\
+	.forkexec_idx		= 1,			\
 	.per_cpu_gain		= 100,			\
 	.flags			= SD_LOAD_BALANCE	\
 | SD_BALANCE_NEWIDLE	\
+| SD_BALANCE_FORK	\
 | SD_BALANCE_EXEC	\
 | SD_WAKE_BALANCE,	\
 	.last_balance		= jiffies,		\
Index: linux-2.6/include/linux/sched.h
===
--- linux-2.6.orig/include/linux/sched.h	2005-02-24 17:39:06.806011090 +1100
+++ linux-2.6/include/linux/sched.h	2005-02-24 17:43:37.274636222 +1100
@@ -423,10 +423,11 @@
 #define SD_LOAD_BALANCE		1	/* Do load balancing on this domain. */
 #define SD_BALANCE_NEWIDLE	2	/* Balance when about to become idle */
 #define SD_BALANCE_EXEC		4	/* Balance on exec */
-#define SD_WAKE_IDLE		8	/* Wake to idle CPU on task wakeup */
-#define SD_WAKE_AFFINE		16	/* Wake task to waking CPU */
-#define SD_WAKE_BALANCE		32	/* Perform balancing at task wakeup */
-#define SD_SHARE_CPUPOWER	64	/* Domain members share cpu power */
+#define SD_BALANCE_FORK		8	/* Balance on fork, clone */
+#define SD_WAKE_IDLE		16	/* Wake to idle CPU on task wakeup */
+#define SD_WAKE_AFFINE		32	/* Wake task to waking CPU */
+#define SD_WAKE_BALANCE		64	/* Perform balancing at task wakeup */
+#define SD_SHARE_CPUPOWER	128	/* Domain members share cpu power */
 
 struct sched_group {
 	struct sched_group *next;	/* Must be a circular list */
@@ -455,6 +456,7 @@
 	unsigned int idle_idx;
 	unsigned int newidle_idx;
 	unsigned int wake_idx;
+	unsigned int forkexec_idx;
 	int flags;			/* See SD_* */
 
 	/* Runtime fields. */
Index: linux-2.6/include/linux/topology.h
===
--- linux-2.6.orig/include/linux/topology.h	2005-02-24 17:39:07.320947536 +1100
+++ linux-2.6/include/linux/topology.h	2005-02-24 17:43:37.078660399 +1100
@@ -90,6 +90,7 @@
 	.idle_idx		= 0,			\
 	.newidle_idx		= 0,			\
 	.wake_idx		= 0,			\
+	.forkexec_idx		= 0,			\
 	.flags			= SD_LOAD_BALANCE	\
 | SD_BALANCE_NEWIDLE	\
 | SD_BALANCE_EXEC	\
@@ -120,6 +121,7 @@
 	.idle_idx		= 0,			\
 	.newidle_idx		= 1,			\
 	.wake_idx		= 1,			\
+	.forkexec_idx		= 0,			\
 	.flags			= SD_LOAD_BALANCE	\
 | SD_BALANCE_NEWIDLE	\
 | SD_BALANCE_EXEC	\
Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c	2005-02-24 17:39:07.322947289 +1100
+++ linux-2.6/kernel/sched.c	2005-02-24 17:43:37.274636222 +1100
@@ -891,6 +891,79 @@
 	return max(rq->cpu_load[type-1], load_now);
 }
 
+/*
+ * find_idlest_group finds and returns the least busy CPU group within the
+ * domain.
+ */
+static struct sched_group *
+find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
+{
+	struct sched_group *idlest = NULL, *this = NULL, *group = sd->groups;
+	unsigned long min_load = ULONG_MAX, this_load = 0;
+	int load_idx = sd->forkexec_idx;
+	int imbalance = 100 + (sd->imbalance_pct-100)/2;
+
+	do {
+		unsigned long load, avg_load;
+		int local_group;
+		int i;
+
+		local_group = cpu_isset(this_cpu, group->cpumask);
+		/* XXX: put a cpus allowed check */
+
+		/* Tally up the load of all CPUs in the group */
+		avg_load = 0;
+
+		for_each_cpu_mask(i, group->cpumask) {
+			/* Bias balancing toward cpus of our domain */
+			if (local_group)
+load = source_load(i, load_idx);
+			else
+load = target_load(i, load_idx);
+
+			avg_load += load;
+		}
+
+		/* Adjust by relative CPU power of the group */
+		avg_load = (avg_load * SCHED_LOAD_SCALE) / group->cpu_power;
+
+		if (local_group) {
+			this_load = avg_load;
+			this = group;
+		} else if (avg_load < min_load) {
+			min_load = avg_load;
+			idlest = group;
+		}
+		group = group->next;
+	} while (group != sd->groups);
+
+	if (!idlest || 100*this_load < imbalance*min_load)
+		return NULL;
+	return idlest;
+}
+
+/*
+ * find_idlest_queue - find the idlest runqueue among the cpus in group.
+ */
+static int find_idlest_cpu(struct sched_group *group, int this_cpu)
+{
+	unsigned long load, min_load = ULONG_MAX;
+	int idlest = -1;
+	int i;
+
+	

[PATCH 10/13] remove aggressive idle balancing

2005-02-23 Thread Nick Piggin
10/13
Remove the very aggressive idle stuff that has recently gone into
2.6 - it is going against the direction we are trying to go. Hopefully
we can regain performance through other methods.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/include/asm-i386/topology.h
===
--- linux-2.6.orig/include/asm-i386/topology.h	2005-02-24 17:39:06.805011214 +1100
+++ linux-2.6/include/asm-i386/topology.h	2005-02-24 17:39:07.320947536 +1100
@@ -85,7 +85,6 @@
 	.flags			= SD_LOAD_BALANCE	\
 | SD_BALANCE_EXEC	\
 | SD_BALANCE_NEWIDLE	\
-| SD_WAKE_IDLE		\
 | SD_WAKE_BALANCE,	\
 	.last_balance		= jiffies,		\
 	.balance_interval	= 1,			\
Index: linux-2.6/include/asm-x86_64/topology.h
===
--- linux-2.6.orig/include/asm-x86_64/topology.h	2005-02-24 17:39:06.805011214 +1100
+++ linux-2.6/include/asm-x86_64/topology.h	2005-02-24 17:43:37.503607973 +1100
@@ -58,7 +58,6 @@
 	.flags			= SD_LOAD_BALANCE	\
 | SD_BALANCE_NEWIDLE	\
 | SD_BALANCE_EXEC	\
-| SD_WAKE_IDLE		\
 | SD_WAKE_BALANCE,	\
 	.last_balance		= jiffies,		\
 	.balance_interval	= 1,			\
Index: linux-2.6/include/linux/topology.h
===
--- linux-2.6.orig/include/linux/topology.h	2005-02-24 17:39:06.806011090 +1100
+++ linux-2.6/include/linux/topology.h	2005-02-24 17:43:37.503607973 +1100
@@ -124,7 +124,6 @@
 | SD_BALANCE_NEWIDLE	\
 | SD_BALANCE_EXEC	\
 | SD_WAKE_AFFINE	\
-| SD_WAKE_IDLE		\
 | SD_WAKE_BALANCE,	\
 	.last_balance		= jiffies,		\
 	.balance_interval	= 1,			\
Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c	2005-02-24 17:39:07.057979992 +1100
+++ linux-2.6/kernel/sched.c	2005-02-24 17:43:37.504607850 +1100
@@ -412,22 +412,6 @@
 	return rq;
 }
 
-#ifdef CONFIG_SCHED_SMT
-static int cpu_and_siblings_are_idle(int cpu)
-{
-	int sib;
-	for_each_cpu_mask(sib, cpu_sibling_map[cpu]) {
-		if (idle_cpu(sib))
-			continue;
-		return 0;
-	}
-
-	return 1;
-}
-#else
-#define cpu_and_siblings_are_idle(A) idle_cpu(A)
-#endif
-
 #ifdef CONFIG_SCHEDSTATS
 /*
  * Called when a process is dequeued from the active array and given
@@ -1650,16 +1634,15 @@
 
 	/*
 	 * Aggressive migration if:
-	 * 1) the [whole] cpu is idle, or
+	 * 1) task is cache cold, or
 	 * 2) too many balance attempts have failed.
 	 */
 
-	if (cpu_and_siblings_are_idle(this_cpu) || \
-			sd->nr_balance_failed > sd->cache_nice_tries)
+	if (sd->nr_balance_failed > sd->cache_nice_tries)
 		return 1;
 
 	if (task_hot(p, rq->timestamp_last_tick, sd))
-			return 0;
+		return 0;
 	return 1;
 }
 
@@ -2131,7 +2114,7 @@
 if (cpu_isset(cpu, visited_cpus))
 	continue;
 cpu_set(cpu, visited_cpus);
-if (!cpu_and_siblings_are_idle(cpu) || cpu == busiest_cpu)
+if (cpu == busiest_cpu)
 	continue;
 
 target_rq = cpu_rq(cpu);


[PATCH 12/13] schedstats additions for sched-balance-fork

2005-02-23 Thread Nick Piggin
12/13

Add SCHEDSTAT statistics for sched-balance-fork.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>


Index: linux-2.6/include/linux/sched.h
===
--- linux-2.6.orig/include/linux/sched.h	2005-02-24 17:39:07.616911007 +1100
+++ linux-2.6/include/linux/sched.h	2005-02-24 17:39:07.819885956 +1100
@@ -480,10 +480,16 @@
 	unsigned long alb_failed;
 	unsigned long alb_pushed;
 
-	/* sched_balance_exec() stats */
-	unsigned long sbe_attempts;
+	/* SD_BALANCE_EXEC stats */
+	unsigned long sbe_cnt;
+	unsigned long sbe_balanced;
 	unsigned long sbe_pushed;
 
+	/* SD_BALANCE_FORK stats */
+	unsigned long sbf_cnt;
+	unsigned long sbf_balanced;
+	unsigned long sbf_pushed;
+
 	/* try_to_wake_up() stats */
 	unsigned long ttwu_wake_remote;
 	unsigned long ttwu_move_affine;
Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c	2005-02-24 17:39:07.618910761 +1100
+++ linux-2.6/kernel/sched.c	2005-02-24 17:43:36.887683960 +1100
@@ -307,7 +307,7 @@
  * bump this up when changing the output format or the meaning of an existing
  * format, so that tools can adapt (or abort)
  */
-#define SCHEDSTAT_VERSION 11
+#define SCHEDSTAT_VERSION 12
 
 static int show_schedstat(struct seq_file *seq, void *v)
 {
@@ -354,9 +354,10 @@
 sd->lb_nobusyq[itype],
 sd->lb_nobusyg[itype]);
 			}
-			seq_printf(seq, " %lu %lu %lu %lu %lu %lu %lu %lu\n",
+			seq_printf(seq, " %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu\n",
 			sd->alb_cnt, sd->alb_failed, sd->alb_pushed,
-			sd->sbe_pushed, sd->sbe_attempts,
+			sd->sbe_cnt, sd->sbe_balanced, sd->sbe_pushed,
+			sd->sbf_cnt, sd->sbf_balanced, sd->sbf_pushed,
 			sd->ttwu_wake_remote, sd->ttwu_move_affine, sd->ttwu_move_balance);
 		}
 #endif
@@ -1262,24 +1263,34 @@
 			sd = tmp;
 
 	if (sd) {
+		int new_cpu;
 		struct sched_group *group;
 
+		schedstat_inc(sd, sbf_cnt);
 		cpu = task_cpu(p);
 		group = find_idlest_group(sd, p, cpu);
-		if (group) {
-			int new_cpu;
-			new_cpu = find_idlest_cpu(group, cpu);
-			if (new_cpu != -1 && new_cpu != cpu &&
-	cpu_isset(new_cpu, p->cpus_allowed)) {
-set_task_cpu(p, new_cpu);
-task_rq_unlock(rq, );
-rq = task_rq_lock(p, );
-cpu = task_cpu(p);
-			}
+		if (!group) {
+			schedstat_inc(sd, sbf_balanced);
+			goto no_forkbalance;
+		}
+
+		new_cpu = find_idlest_cpu(group, cpu);
+		if (new_cpu == -1 || new_cpu == cpu) {
+			schedstat_inc(sd, sbf_balanced);
+			goto no_forkbalance;
+		}
+
+		if (cpu_isset(new_cpu, p->cpus_allowed)) {
+			schedstat_inc(sd, sbf_pushed);
+			set_task_cpu(p, new_cpu);
+			task_rq_unlock(rq, );
+			rq = task_rq_lock(p, );
+			cpu = task_cpu(p);
 		}
 	}
-#endif
 
+no_forkbalance:
+#endif
 	/*
 	 * We decrease the sleep average of forking parents
 	 * and children as well, to keep max-interactive tasks
@@ -1616,30 +1627,28 @@
 	struct sched_domain *tmp, *sd = NULL;
 	int new_cpu, this_cpu = get_cpu();
 
-	/* Prefer the current CPU if there's only this task running */
-	if (this_rq()->nr_running <= 1)
-		goto out;
-
 	for_each_domain(this_cpu, tmp)
 		if (tmp->flags & SD_BALANCE_EXEC)
 			sd = tmp;
 
 	if (sd) {
 		struct sched_group *group;
-		schedstat_inc(sd, sbe_attempts);
+		schedstat_inc(sd, sbe_cnt);
 		group = find_idlest_group(sd, current, this_cpu);
-		if (!group)
+		if (!group) {
+			schedstat_inc(sd, sbe_balanced);
 			goto out;
+		}
 		new_cpu = find_idlest_cpu(group, this_cpu);
-		if (new_cpu == -1)
+		if (new_cpu == -1 || new_cpu == this_cpu) {
+			schedstat_inc(sd, sbe_balanced);
 			goto out;
-
-		if (new_cpu != this_cpu) {
-			schedstat_inc(sd, sbe_pushed);
-			put_cpu();
-			sched_migrate_task(current, new_cpu);
-			return;
 		}
+
+		schedstat_inc(sd, sbe_pushed);
+		put_cpu();
+		sched_migrate_task(current, new_cpu);
+		return;
 	}
 out:
 	put_cpu();


[PATCH 8/13] generalised CPU load averaging

2005-02-23 Thread Nick Piggin
8/13

Do CPU load averaging over a number of different intervals. Allow
each interval to be chosen by sending a parameter to source_load
and target_load. 0 is instantaneous, idx > 0 returns a decaying average
with the most recent sample weighted at 2^(idx-1). To a maximum of 3
(could be easily increased).

So generally a higher number will result in more conservative balancing.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/include/asm-i386/topology.h
===
--- linux-2.6.orig/include/asm-i386/topology.h	2005-02-24 17:31:22.664322588 +1100
+++ linux-2.6/include/asm-i386/topology.h	2005-02-24 17:43:37.733579601 +1100
@@ -77,6 +77,10 @@
 	.imbalance_pct		= 125,			\
 	.cache_hot_time		= (10*100),		\
 	.cache_nice_tries	= 1,			\
+	.busy_idx		= 3,			\
+	.idle_idx		= 1,			\
+	.newidle_idx		= 2,			\
+	.wake_idx		= 1,			\
 	.per_cpu_gain		= 100,			\
 	.flags			= SD_LOAD_BALANCE	\
 | SD_BALANCE_EXEC	\
Index: linux-2.6/include/asm-x86_64/topology.h
===
--- linux-2.6.orig/include/asm-x86_64/topology.h	2005-02-24 17:31:22.664322588 +1100
+++ linux-2.6/include/asm-x86_64/topology.h	2005-02-24 17:43:37.733579601 +1100
@@ -49,7 +49,11 @@
 	.busy_factor		= 32,			\
 	.imbalance_pct		= 125,			\
 	.cache_hot_time		= (10*100),		\
-	.cache_nice_tries	= 1,			\
+	.cache_nice_tries	= 2,			\
+	.busy_idx		= 3,			\
+	.idle_idx		= 2,			\
+	.newidle_idx		= 1, 			\
+	.wake_idx		= 1,			\
 	.per_cpu_gain		= 100,			\
 	.flags			= SD_LOAD_BALANCE	\
 | SD_BALANCE_NEWIDLE	\
Index: linux-2.6/include/linux/sched.h
===
--- linux-2.6.orig/include/linux/sched.h	2005-02-24 17:31:28.428610071 +1100
+++ linux-2.6/include/linux/sched.h	2005-02-24 17:43:37.503607973 +1100
@@ -451,6 +451,10 @@
 	unsigned long long cache_hot_time; /* Task considered cache hot (ns) */
 	unsigned int cache_nice_tries;	/* Leave cache hot tasks for # tries */
 	unsigned int per_cpu_gain;	/* CPU % gained by adding domain cpus */
+	unsigned int busy_idx;
+	unsigned int idle_idx;
+	unsigned int newidle_idx;
+	unsigned int wake_idx;
 	int flags;			/* See SD_* */
 
 	/* Runtime fields. */
Index: linux-2.6/include/linux/topology.h
===
--- linux-2.6.orig/include/linux/topology.h	2005-02-24 17:31:22.665322464 +1100
+++ linux-2.6/include/linux/topology.h	2005-02-24 17:43:37.733579601 +1100
@@ -86,6 +86,10 @@
 	.cache_hot_time		= 0,			\
 	.cache_nice_tries	= 0,			\
 	.per_cpu_gain		= 25,			\
+	.busy_idx		= 0,			\
+	.idle_idx		= 0,			\
+	.newidle_idx		= 0,			\
+	.wake_idx		= 0,			\
 	.flags			= SD_LOAD_BALANCE	\
 | SD_BALANCE_NEWIDLE	\
 | SD_BALANCE_EXEC	\
@@ -112,6 +116,10 @@
 	.cache_hot_time		= (5*100/2),	\
 	.cache_nice_tries	= 1,			\
 	.per_cpu_gain		= 100,			\
+	.busy_idx		= 2,			\
+	.idle_idx		= 0,			\
+	.newidle_idx		= 1,			\
+	.wake_idx		= 1,			\
 	.flags			= SD_LOAD_BALANCE	\
 | SD_BALANCE_NEWIDLE	\
 | SD_BALANCE_EXEC	\
Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c	2005-02-24 17:39:06.530045151 +1100
+++ linux-2.6/kernel/sched.c	2005-02-24 17:43:37.913557397 +1100
@@ -204,7 +204,7 @@
 	 */
 	unsigned long nr_running;
 #ifdef CONFIG_SMP
-	unsigned long cpu_load;
+	unsigned long cpu_load[3];
 #endif
 	unsigned long long nr_switches;
 
@@ -884,23 +884,27 @@
  * We want to under-estimate the load of migration sources, to
  * balance conservatively.
  */
-static inline unsigned long source_load(int cpu)
+static inline unsigned long source_load(int cpu, int type)
 {
 	runqueue_t *rq = cpu_rq(cpu);
 	unsigned long load_now = rq->nr_running * SCHED_LOAD_SCALE;
+	if (type == 0)
+		return load_now;
 
-	return min(rq->cpu_load, load_now);
+	return min(rq->cpu_load[type-1], load_now);
 }
 
 /*
  * Return a high guess at the load of a migration-target cpu
  */
-static inline unsigned long target_load(int cpu)
+static inline unsigned long target_load(int cpu, int type)
 {
 	runqueue_t *rq = cpu_rq(cpu);
 	unsigned long load_now = rq->nr_running * SCHED_LOAD_SCALE;
+	if (type == 0)
+		return load_now;
 
-	return max(rq->cpu_load, load_now);
+	return max(rq->cpu_load[type-1], load_now);
 }
 
 #endif
@@ -965,7 +969,7 @@
 	runqueue_t *rq;
 #ifdef CONFIG_SMP
 	unsigned long load, this_load;
-	struct sched_domain *sd;
+	struct sched_domain *sd, *this_sd = NULL;
 	int new_cpu;
 #endif
 
@@ -984,72 +988,64 @@
 	if (unlikely(task_running(rq, p)))
 		goto out_activate;
 
-#ifdef CONFIG_SCHEDSTATS
+	new_cpu = cpu;
+
 	schedstat_inc(rq, ttwu_cnt);
 	if (cpu == this_cpu) {
 		schedstat_inc(rq, ttwu_local);
-	} else {
-		for_each_domain(this_cpu, sd) {
-			if (cpu_isset(cpu, sd->span)) {
-schedstat_inc(sd, ttwu_wake_remote);
-break;
-			}
+		goto out_set_cpu;
+	}
+
+	for_each_domain(this_cpu, sd) {
+		if 

[PATCH 9/13] less affine wakups

2005-02-23 Thread Nick Piggin
9/13

Do less affine wakeups. We're trying to reduce dbt2-pgsql idle time
regressions here... make sure we don't don't move tasks the wrong way
in an imbalance condition. Also, remove the cache coldness requirement
from the calculation - this seems to induce sharp cutoff points where
behaviour will suddenly change on some workloads if the load creeps
slightly over or under some point. It is good for periodic balancing
because in that case have otherwise have no other context to determine
what task to move.

But also make a minor tweak to "wake balancing" - the imbalance
tolerance is now set at half the domain's imbalance, so we get the
opportunity to do wake balancing before the more random periodic
rebalancing gets preformed.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c	2005-02-24 17:39:06.808010844 +1100
+++ linux-2.6/kernel/sched.c	2005-02-24 17:43:37.734579478 +1100
@@ -1014,38 +1014,45 @@
 		int idx = this_sd->wake_idx;
 		unsigned int imbalance;
 
+		imbalance = 100 + (this_sd->imbalance_pct - 100) / 2;
+
 		load = source_load(cpu, idx);
 		this_load = target_load(this_cpu, idx);
 
-		/*
-		 * If sync wakeup then subtract the (maximum possible) effect of
-		 * the currently running task from the load of the current CPU:
-		 */
-		if (sync)
-			this_load -= SCHED_LOAD_SCALE;
-
-		 /* Don't pull the task off an idle CPU to a busy one */
-		if (load < SCHED_LOAD_SCALE/2 && this_load > SCHED_LOAD_SCALE/2)
-			goto out_set_cpu;
-
 		new_cpu = this_cpu; /* Wake to this CPU if we can */
 
-		if ((this_sd->flags & SD_WAKE_AFFINE) &&
-			!task_hot(p, rq->timestamp_last_tick, this_sd)) {
-			/*
-			 * This domain has SD_WAKE_AFFINE and p is cache cold
-			 * in this domain.
-			 */
-			schedstat_inc(this_sd, ttwu_move_affine);
-			goto out_set_cpu;
-		} else if ((this_sd->flags & SD_WAKE_BALANCE) &&
-imbalance*this_load <= 100*load) {
+		if (this_sd->flags & SD_WAKE_AFFINE) {
+			unsigned long tl = this_load;
 			/*
-			 * This domain has SD_WAKE_BALANCE and there is
-			 * an imbalance.
+			 * If sync wakeup then subtract the (maximum possible)
+			 * effect of the currently running task from the load
+			 * of the current CPU:
 			 */
-			schedstat_inc(this_sd, ttwu_move_balance);
-			goto out_set_cpu;
+			if (sync)
+tl -= SCHED_LOAD_SCALE;
+
+			if ((tl <= load &&
+tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) ||
+100*(tl + SCHED_LOAD_SCALE) <= imbalance*load) {
+/*
+ * This domain has SD_WAKE_AFFINE and
+ * p is cache cold in this domain, and
+ * there is no bad imbalance.
+ */
+schedstat_inc(this_sd, ttwu_move_affine);
+goto out_set_cpu;
+			}
+		}
+
+		/*
+		 * Start passive balancing when half the imbalance_pct
+		 * limit is reached.
+		 */
+		if (this_sd->flags & SD_WAKE_BALANCE) {
+			if (imbalance*this_load <= 100*load) {
+schedstat_inc(this_sd, ttwu_move_balance);
+goto out_set_cpu;
+			}
 		}
 	}
 


Re: [RFC] PCI bridge driver rewrite

2005-02-23 Thread Jon Smirl
When you start writing the PCI root bridge driver you'll run into the
AGP drivers that are already attached to the bridge. I was surprised
by this since I expected AGP to be attached to the AGP bridge but now
I learned that it is a root bridge function.

An ISA LPC bridge driver would be nice too. It would let you turn off
serial ports, etc and let other systems know how many ports there are.
No real need for this, just a nice toy.

Does this work to cause a probe based on PCI class?
static struct pci_device_id p2p_id_tbl[] = {
   { PCI_DEVICE_CLASS(PCI_CLASS_BRIDGE_PCI << 8, 0x00) },
   { 0 },
};

I would like to install a driver that gets called whenever new
CLASS_VGA hardware shows up via hotplug. It won't attach to the
device, it will just add some sysfs attributes. The framebuffer
drivers need to attach the device. If I add attributes this way how
can I remove them?

-- 
Jon Smirl
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/13] better active balancing heuristic

2005-02-23 Thread Nick Piggin
7/13

Fix up active load balancing a bit so it doesn't get called when it shouldn't.
Reset the nr_balance_failed counter at more points where we have  found
conditions to be balanced. This reduces too aggressive active balancing seen
on some workloads.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>


Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c	2005-02-24 17:39:05.851128944 +1100
+++ linux-2.6/kernel/sched.c	2005-02-24 17:43:38.162526682 +1100
@@ -2009,6 +2009,7 @@
 
 	schedstat_inc(sd, lb_balanced[idle]);
 
+	sd->nr_balance_failed = 0;
 	/* tune up the balancing interval */
 	if (sd->balance_interval < sd->max_interval)
 		sd->balance_interval *= 2;
@@ -2034,16 +2035,14 @@
 	schedstat_inc(sd, lb_cnt[NEWLY_IDLE]);
 	group = find_busiest_group(sd, this_cpu, , NEWLY_IDLE);
 	if (!group) {
-		schedstat_inc(sd, lb_balanced[NEWLY_IDLE]);
 		schedstat_inc(sd, lb_nobusyg[NEWLY_IDLE]);
-		goto out;
+		goto out_balanced;
 	}
 
 	busiest = find_busiest_queue(group);
 	if (!busiest || busiest == this_rq) {
-		schedstat_inc(sd, lb_balanced[NEWLY_IDLE]);
 		schedstat_inc(sd, lb_nobusyq[NEWLY_IDLE]);
-		goto out;
+		goto out_balanced;
 	}
 
 	/* Attempt to move tasks */
@@ -2054,11 +2053,16 @@
 			imbalance, sd, NEWLY_IDLE, _pinned);
 	if (!nr_moved)
 		schedstat_inc(sd, lb_failed[NEWLY_IDLE]);
+	else
+sd->nr_balance_failed = 0;
 
 	spin_unlock(>lock);
-
-out:
 	return nr_moved;
+
+out_balanced:
+	schedstat_inc(sd, lb_balanced[NEWLY_IDLE]);
+	sd->nr_balance_failed = 0;
+	return 0;
 }
 
 /*


[PATCH 5/13] find_busiest_group cleanup

2005-02-23 Thread Nick Piggin
5/13
Cleanup find_busiest_group a bit. New sched-domains code
means we can't have groups without a CPU.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>


Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c	2005-02-24 17:31:29.298502546 +1100
+++ linux-2.6/kernel/sched.c	2005-02-24 17:43:38.629469074 +1100
@@ -1771,7 +1771,7 @@
 	do {
 		unsigned long load;
 		int local_group;
-		int i, nr_cpus = 0;
+		int i;
 
 		local_group = cpu_isset(this_cpu, group->cpumask);
 
@@ -1785,13 +1785,9 @@
 			else
 load = source_load(i);
 
-			nr_cpus++;
 			avg_load += load;
 		}
 
-		if (!nr_cpus)
-			goto nextgroup;
-
 		total_load += avg_load;
 		total_pwr += group->cpu_power;
 


[PATCH 4/13] find_busiest_group fixlets

2005-02-23 Thread Nick Piggin
4/13

Fix up a few small warts in the periodic multiprocessor rebalancing
code.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c	2005-02-24 17:31:28.431609701 +1100
+++ linux-2.6/kernel/sched.c	2005-02-24 17:43:38.806447240 +1100
@@ -1830,13 +1830,12 @@
 	 * by pulling tasks to us.  Be careful of negative numbers as they'll
 	 * appear as very large values with unsigned longs.
 	 */
-	*imbalance = min(max_load - avg_load, avg_load - this_load);
-
 	/* How much load to actually move to equalise the imbalance */
-	*imbalance = (*imbalance * min(busiest->cpu_power, this->cpu_power))
-/ SCHED_LOAD_SCALE;
+	*imbalance = min((max_load - avg_load) * busiest->cpu_power,
+(avg_load - this_load) * this->cpu_power)
+			/ SCHED_LOAD_SCALE;
 
-	if (*imbalance < SCHED_LOAD_SCALE - 1) {
+	if (*imbalance < SCHED_LOAD_SCALE) {
 		unsigned long pwr_now = 0, pwr_move = 0;
 		unsigned long tmp;
 
@@ -1862,14 +1861,16 @@
 			max_load - tmp);
 
 		/* Amount of load we'd add */
-		tmp = SCHED_LOAD_SCALE*SCHED_LOAD_SCALE/this->cpu_power;
-		if (max_load < tmp)
-			tmp = max_load;
+		if (max_load*busiest->cpu_power <
+SCHED_LOAD_SCALE*SCHED_LOAD_SCALE)
+			tmp = max_load*busiest->cpu_power/this->cpu_power;
+		else
+			tmp = SCHED_LOAD_SCALE*SCHED_LOAD_SCALE/this->cpu_power;
 		pwr_move += this->cpu_power*min(SCHED_LOAD_SCALE, this_load + tmp);
 		pwr_move /= SCHED_LOAD_SCALE;
 
-		/* Move if we gain another 8th of a CPU worth of throughput */
-		if (pwr_move < pwr_now + SCHED_LOAD_SCALE / 8)
+		/* Move if we gain throughput */
+		if (pwr_move <= pwr_now)
 			goto out_balanced;
 
 		*imbalance = 1;
@@ -1877,7 +1878,7 @@
 	}
 
 	/* Get rid of the scaling factor, rounding down as we divide */
-	*imbalance = (*imbalance + 1) / SCHED_LOAD_SCALE;
+	*imbalance = *imbalance / SCHED_LOAD_SCALE;
 
 	return busiest;
 


[PATCH 3/13] rework schedstats

2005-02-23 Thread Nick Piggin
3/13

I have an updated userspace parser for this thing, if you
are still keeping it on your website.

Move balancing fields into struct sched_domain, so we can get more
useful results on systems with multiple domains (eg SMT+SMP, CMP+NUMA,
SMP+NUMA, etc).

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/include/linux/sched.h
===
--- linux-2.6.orig/include/linux/sched.h	2005-02-24 17:31:24.598083557 +1100
+++ linux-2.6/include/linux/sched.h	2005-02-24 17:43:38.161526805 +1100
@@ -462,17 +462,26 @@
 	/* load_balance() stats */
 	unsigned long lb_cnt[MAX_IDLE_TYPES];
 	unsigned long lb_failed[MAX_IDLE_TYPES];
+	unsigned long lb_balanced[MAX_IDLE_TYPES];
 	unsigned long lb_imbalance[MAX_IDLE_TYPES];
+	unsigned long lb_gained[MAX_IDLE_TYPES];
+	unsigned long lb_hot_gained[MAX_IDLE_TYPES];
 	unsigned long lb_nobusyg[MAX_IDLE_TYPES];
 	unsigned long lb_nobusyq[MAX_IDLE_TYPES];
 
+	/* Active load balancing */
+	unsigned long alb_cnt;
+	unsigned long alb_failed;
+	unsigned long alb_pushed;
+
 	/* sched_balance_exec() stats */
 	unsigned long sbe_attempts;
 	unsigned long sbe_pushed;
 
 	/* try_to_wake_up() stats */
-	unsigned long ttwu_wake_affine;
-	unsigned long ttwu_wake_balance;
+	unsigned long ttwu_wake_remote;
+	unsigned long ttwu_move_affine;
+	unsigned long ttwu_move_balance;
 #endif
 };
 
Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c	2005-02-24 17:31:27.503724395 +1100
+++ linux-2.6/kernel/sched.c	2005-02-24 17:43:38.983425407 +1100
@@ -246,35 +246,13 @@
 	unsigned long yld_cnt;
 
 	/* schedule() stats */
-	unsigned long sched_noswitch;
 	unsigned long sched_switch;
 	unsigned long sched_cnt;
 	unsigned long sched_goidle;
 
-	/* pull_task() stats */
-	unsigned long pt_gained[MAX_IDLE_TYPES];
-	unsigned long pt_lost[MAX_IDLE_TYPES];
-
-	/* active_load_balance() stats */
-	unsigned long alb_cnt;
-	unsigned long alb_lost;
-	unsigned long alb_gained;
-	unsigned long alb_failed;
-
 	/* try_to_wake_up() stats */
 	unsigned long ttwu_cnt;
-	unsigned long ttwu_attempts;
-	unsigned long ttwu_moved;
-
-	/* wake_up_new_task() stats */
-	unsigned long wunt_cnt;
-	unsigned long wunt_moved;
-
-	/* sched_migrate_task() stats */
-	unsigned long smt_cnt;
-
-	/* sched_balance_exec() stats */
-	unsigned long sbe_cnt;
+	unsigned long ttwu_local;
 #endif
 };
 
@@ -329,7 +307,7 @@
  * bump this up when changing the output format or the meaning of an existing
  * format, so that tools can adapt (or abort)
  */
-#define SCHEDSTAT_VERSION 10
+#define SCHEDSTAT_VERSION 11
 
 static int show_schedstat(struct seq_file *seq, void *v)
 {
@@ -347,22 +325,14 @@
 
 		/* runqueue-specific stats */
 		seq_printf(seq,
-		"cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu "
-		"%lu %lu %lu %lu %lu %lu %lu %lu %lu %lu",
+		"cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu",
 		cpu, rq->yld_both_empty,
-		rq->yld_act_empty, rq->yld_exp_empty,
-		rq->yld_cnt, rq->sched_noswitch,
+		rq->yld_act_empty, rq->yld_exp_empty, rq->yld_cnt,
 		rq->sched_switch, rq->sched_cnt, rq->sched_goidle,
-		rq->alb_cnt, rq->alb_gained, rq->alb_lost,
-		rq->alb_failed,
-		rq->ttwu_cnt, rq->ttwu_moved, rq->ttwu_attempts,
-		rq->wunt_cnt, rq->wunt_moved,
-		rq->smt_cnt, rq->sbe_cnt, rq->rq_sched_info.cpu_time,
+		rq->ttwu_cnt, rq->ttwu_local,
+		rq->rq_sched_info.cpu_time,
 		rq->rq_sched_info.run_delay, rq->rq_sched_info.pcnt);
 
-		for (itype = SCHED_IDLE; itype < MAX_IDLE_TYPES; itype++)
-			seq_printf(seq, " %lu %lu", rq->pt_gained[itype],
-		rq->pt_lost[itype]);
 		seq_printf(seq, "\n");
 
 #ifdef CONFIG_SMP
@@ -373,17 +343,21 @@
 			cpumask_scnprintf(mask_str, NR_CPUS, sd->span);
 			seq_printf(seq, "domain%d %s", dcnt++, mask_str);
 			for (itype = SCHED_IDLE; itype < MAX_IDLE_TYPES;
-		itype++) {
-seq_printf(seq, " %lu %lu %lu %lu %lu",
+	itype++) {
+seq_printf(seq, " %lu %lu %lu %lu %lu %lu %lu %lu",
 sd->lb_cnt[itype],
+sd->lb_balanced[itype],
 sd->lb_failed[itype],
 sd->lb_imbalance[itype],
+sd->lb_gained[itype],
+sd->lb_hot_gained[itype],
 sd->lb_nobusyq[itype],
 sd->lb_nobusyg[itype]);
 			}
-			seq_printf(seq, " %lu %lu %lu %lu\n",
+			seq_printf(seq, " %lu %lu %lu %lu %lu %lu %lu %lu\n",
+			sd->alb_cnt, sd->alb_failed, sd->alb_pushed,
 			sd->sbe_pushed, sd->sbe_attempts,
-			sd->ttwu_wake_affine, sd->ttwu_wake_balance);
+			sd->ttwu_wake_remote, sd->ttwu_move_affine, sd->ttwu_move_balance);
 		}
 #endif
 	}
@@ -996,7 +970,6 @@
 #endif
 
 	rq = task_rq_lock(p, );
-	schedstat_inc(rq, ttwu_cnt);
 	old_state = p->state;
 	if (!(old_state & state))
 		goto out;
@@ -1011,8 +984,21 @@
 	if (unlikely(task_running(rq, p)))
 		goto out_activate;
 
-	new_cpu = cpu;
+#ifdef CONFIG_SCHEDSTATS
+	schedstat_inc(rq, ttwu_cnt);

[PATCH 2/13] improve pinned task handling

2005-02-23 Thread Nick Piggin
2/13

John Hawkes explained the problem best:
	A large number of processes that are pinned to a single CPU results
	in every other CPU's load_balance() seeing this overloaded CPU as
	"busiest", yet move_tasks() never finds a task to pull-migrate.  This
	condition occurs during module unload, but can also occur as a
	denial-of-service using sys_sched_setaffinity().  Several hundred
	CPUs performing this fruitless load_balance() will livelock on the
	busiest CPU's runqueue lock.  A smaller number of CPUs will livelock
	if the pinned task count gets high.
	
Expanding slightly on John's patch, this one attempts to work out
whether the balancing failure has been due to too many tasks pinned
on the runqueue. This allows it to be basically invisible to the
regular blancing paths (ie. when there are no pinned tasks). We can
use this extra knowledge to shut down the balancing faster, and ensure
the migration threads don't start running which is another problem
observed in the wild.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>


Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c	2005-02-24 17:31:27.042781371 +1100
+++ linux-2.6/kernel/sched.c	2005-02-24 17:43:39.180401105 +1100
@@ -1650,7 +1650,7 @@
  */
 static inline
 int can_migrate_task(task_t *p, runqueue_t *rq, int this_cpu,
-		 struct sched_domain *sd, enum idle_type idle)
+		 struct sched_domain *sd, enum idle_type idle, int *pinned)
 {
 	/*
 	 * We do not migrate tasks that are:
@@ -1660,8 +1660,10 @@
 	 */
 	if (task_running(rq, p))
 		return 0;
-	if (!cpu_isset(this_cpu, p->cpus_allowed))
+	if (!cpu_isset(this_cpu, p->cpus_allowed)) {
+		*pinned++;
 		return 0;
+	}
 
 	/*
 	 * Aggressive migration if:
@@ -1687,11 +1689,11 @@
  */
 static int move_tasks(runqueue_t *this_rq, int this_cpu, runqueue_t *busiest,
 		  unsigned long max_nr_move, struct sched_domain *sd,
-		  enum idle_type idle)
+		  enum idle_type idle, int *all_pinned)
 {
 	prio_array_t *array, *dst_array;
 	struct list_head *head, *curr;
-	int idx, pulled = 0;
+	int idx, pulled = 0, pinned = 0;
 	task_t *tmp;
 
 	if (max_nr_move <= 0 || busiest->nr_running <= 1)
@@ -1735,7 +1737,7 @@
 
 	curr = curr->prev;
 
-	if (!can_migrate_task(tmp, busiest, this_cpu, sd, idle)) {
+	if (!can_migrate_task(tmp, busiest, this_cpu, sd, idle, )) {
 		if (curr != head)
 			goto skip_queue;
 		idx++;
@@ -1761,6 +1763,9 @@
 		goto skip_bitmap;
 	}
 out:
+	*all_pinned = 0;
+	if (unlikely(pinned >= max_nr_move) && pulled == 0)
+		*all_pinned = 1;
 	return pulled;
 }
 
@@ -1935,7 +1940,7 @@
 	struct sched_group *group;
 	runqueue_t *busiest;
 	unsigned long imbalance;
-	int nr_moved;
+	int nr_moved, all_pinned;
 
 	spin_lock(_rq->lock);
 	schedstat_inc(sd, lb_cnt[idle]);
@@ -1974,9 +1979,14 @@
 		 */
 		double_lock_balance(this_rq, busiest);
 		nr_moved = move_tasks(this_rq, this_cpu, busiest,
-		imbalance, sd, idle);
+		imbalance, sd, idle,
+		_pinned);
 		spin_unlock(>lock);
 	}
+	/* All tasks on this runqueue were pinned by CPU affinity */
+	if (unlikely(all_pinned))
+		goto out_balanced;
+
 	spin_unlock(_rq->lock);
 
 	if (!nr_moved) {
@@ -2041,7 +2051,7 @@
 	struct sched_group *group;
 	runqueue_t *busiest = NULL;
 	unsigned long imbalance;
-	int nr_moved = 0;
+	int nr_moved = 0, all_pinned;
 
 	schedstat_inc(sd, lb_cnt[NEWLY_IDLE]);
 	group = find_busiest_group(sd, this_cpu, , NEWLY_IDLE);
@@ -2061,7 +2071,7 @@
 
 	schedstat_add(sd, lb_imbalance[NEWLY_IDLE], imbalance);
 	nr_moved = move_tasks(this_rq, this_cpu, busiest,
-	imbalance, sd, NEWLY_IDLE);
+			imbalance, sd, NEWLY_IDLE, _pinned);
 	if (!nr_moved)
 		schedstat_inc(sd, lb_failed[NEWLY_IDLE]);
 
@@ -2119,6 +2129,7 @@
 		cpu_group = sd->groups;
 		do {
 			for_each_cpu_mask(cpu, cpu_group->cpumask) {
+int all_pinned;
 if (busiest_rq->nr_running <= 1)
 	/* no more tasks left to move */
 	return;
@@ -2139,7 +2150,7 @@
 /* move a task from busiest_rq to target_rq */
 double_lock_balance(busiest_rq, target_rq);
 if (move_tasks(target_rq, cpu, busiest_rq,
-		1, sd, SCHED_IDLE)) {
+	1, sd, SCHED_IDLE, _pinned)) {
 	schedstat_inc(busiest_rq, alb_lost);
 	schedstat_inc(target_rq, alb_gained);
 } else {


[PATCH 1/13] timestamp fixes

2005-02-23 Thread Nick Piggin
1/13

Some fixes for unsynchronised TSCs. A task's timestamp may have been set
by another CPU. Although we try to adjust this correctly with the
timestamp_last_tick field, there is no guarantee this will be exactly right.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c	2005-02-24 17:31:25.384986289 +1100
+++ linux-2.6/kernel/sched.c	2005-02-24 17:43:39.356379395 +1100
@@ -648,6 +648,7 @@
 
 static void recalc_task_prio(task_t *p, unsigned long long now)
 {
+	/* Caller must always ensure 'now >= p->timestamp' */
 	unsigned long long __sleep_time = now - p->timestamp;
 	unsigned long sleep_time;
 
@@ -2703,8 +2704,10 @@
 
 	schedstat_inc(rq, sched_cnt);
 	now = sched_clock();
-	if (likely(now - prev->timestamp < NS_MAX_SLEEP_AVG))
+	if (likely((long long)now - prev->timestamp < NS_MAX_SLEEP_AVG))
 		run_time = now - prev->timestamp;
+		if (unlikely((long long)now - prev->timestamp < 0))
+			run_time = 0;
 	else
 		run_time = NS_MAX_SLEEP_AVG;
 
@@ -2782,6 +2785,8 @@
 
 	if (!rt_task(next) && next->activated > 0) {
 		unsigned long long delta = now - next->timestamp;
+		if (unlikely((long long)now - next->timestamp < 0))
+			delta = 0;
 
 		if (next->activated == 1)
 			delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;


[PATCH 0/13] Multiprocessor CPU scheduler patches

2005-02-23 Thread Nick Piggin
Hi,

I hope that you can include the following set of CPU scheduler
patches in -mm soon, if you have no other significant performance
work going on.

There are some fairly significant changes, with a few basic aims:
* Improve SMT behaviour
* Improve CMP behaviour, CMP/NUMA scheduling (ie. Opteron)
* Reduce task movement, esp over NUMA nodes.

They are not going to be very well tuned for most usages at the
moment (unfortunately dbt2/3-pgsql on OSDL isn't working, which
is a good one). So hopefully I can address regressions as they
come up.

There are a few problems with the scheduler currently:

Problem #1:
It has _very_ aggressive idle CPU pulling. Not only does it not
really obey imbalances, it is also wrong for eg. an SMT CPU
who's sibling is not idle. The reason this was done really is to
bring down idle time on some workloads (dbt2-pgsql, other
database stuff).

So I address this in the following ways; reduce special casing
for idle balancing, revert some of the recent moves toward even
more aggressive balancing.

Then provide a range of averaging levels for CPU "load averages",
and we choose which to use in which situation on a sched-domain
basis. This allows idle balancing to use a more instantaneous value
for calculating load, so idle CPUs need not wait many timer ticks
for the load averages to catch up. This can hopefully solve our
idle time problems.

Also, further moderate "affine wakeups", which can tend to move
most tasks to one CPU on some workloads and cause idle problems.

Problem #2:
The second problem is that balance-on-exec is not sched-domains
aware. This means it will tend to (for example) fill up two cores
of a CPU on one socket, then fill up two cores on the next socket,
etc. What we want is to try to spread load evenly across memory
controllers.

So make that sched-domains aware following the same pattern as
find_busiest_group / find_busiest_queue.

Problem #3:
Lastly, implement balance-on-fork/clone again. I have come to the
realisation that for NUMA, this is probably the best solution.
Run-cloned-child-last has run out of steam on CMP systems. What
it was supposed to do was provide a period where the child could
be pulled to another CPU before it starts running and allocating
memory. Unfortunately on CMP systems, this tends to just be to the
other sibling.

Also, having such a difference between thread and process creation
was not really ideal, so we balance on all types of fork/clone.
This really helps some things (like STREAM) on CMP Opterons, but
also hurts others, so naturally it is settable per-domain.

Problem #4:
Sched domains isn't very useful to me in its current form. Bring
it up to date with what I've been using. I don't think anyone other
than myself uses it so that should be OK.

Nick




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] PCI bridge driver rewrite

2005-02-23 Thread Adam Belay
On Thu, 2005-02-24 at 01:45 -0500, Jon Smirl wrote:
> On Thu, 24 Feb 2005 01:22:01 -0500, Adam Belay <[EMAIL PROTECTED]> wrote:
> > For the past couple weeks I have been reorganizing the PCI subsystem to
> > better utilize the driver model.  Specifically, the bus detection code
> > is now using a standard PCI driver.  It turns out to be a major
> 
> What about VGA routing? Most PCI buses do it with the normal VGA bit
> but big hardware supports multiple legacy IO spaces via the bridge
> chips.
> 
> Are you going to make sysfs entries for the bridges? If so I'd like a
> VGA attribute that directly reads the VGA bit from the hardware and
> display it instead of using the shadow copy.

Yeah, actually I've been thinking about this issue a lot.  I think it
would make a lot of sense to export this sort of thing under the
"pci_bus" class in sysfs.  The ISA enable bit should probably also be
exported.  Furthermore, we should be verifying the BIOS's configuration
of VGA and ISA.  I'll try to integrate this in my future releases.  I
appreciate the code.

I also have a number of resource management plans for the VGA enable bit
that I'll get into in my next set of patches.

> 
> Jesse can comment on the specific support needed for multiple legacy IO 
> spaces.
> 

That would be great.  Most of my experience has been with only a couple
legacy IO port ranges passing through the bridge.

Thanks,
Adam


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [8/14] Orinoco driver updates - PCMCIA initialization cleanups

2005-02-23 Thread Dominik Brodowski
> @@ -184,6 +186,7 @@
>   dev_list = link;
>  
>   client_reg.dev_info = _info;
> + client_reg.Attributes = INFO_IO_CLIENT | INFO_CARD_SHARE;

That's not needed any longer for 2.6.

Dominik
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Andrew Morton
"Chad N. Tindel" <[EMAIL PROTECTED]> wrote:
>
> > `xterm' is waiting for the other CPU to schedule a kernel thread (which is
> > bound to that CPU).  Once that kernel thread has done a little bit of work,
> > `xterm' can terminate.
> > 
> > But kernel threads don't run with realtime policy, so your userspace app
> > has permanently starved that kernel thread.
> > 
> > It's potentially quite a problem, really.  For example it could prevent
> > various tty operations from completing, it will prevent kjournald from ever
> > writing back anything (on uniprocessor, etc).  I've been waiting for
> > someone to complain ;)
> > 
> > But the other side of the coin is that a SCHED_FIFO userspace task
> > presumably has extreme latency requirements, so it doesn't *want* to be
> > preempted by some routine kernel operation.  People would get irritated if
> > we were to do that.
> > 
> > So what to do?
> 
> It shouldn't need to preempt the kernel operation.  Why is the design such 
> that
> the necessary kernel thread can't run on the other CPU?
> 

This particular kernel function is implemented via a kernel thread per CPU,
with each thread bound to each CPU.  The xterm-does-exit cleanup code is
waiting for the thread which is bound to the busy CPU to do something.

No other CPU can, or is allowed, to do that thread's work.  If it were to
do so, the implicit locking which we get from the per-cpuness would be
violated.

I don't know if any clients of the workqueue code rely upon the
pinned-to-cpu feature.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] PCI bridge driver rewrite

2005-02-23 Thread Jon Smirl
On Thu, 24 Feb 2005 01:22:01 -0500, Adam Belay <[EMAIL PROTECTED]> wrote:
> For the past couple weeks I have been reorganizing the PCI subsystem to
> better utilize the driver model.  Specifically, the bus detection code
> is now using a standard PCI driver.  It turns out to be a major

What about VGA routing? Most PCI buses do it with the normal VGA bit
but big hardware supports multiple legacy IO spaces via the bridge
chips.

Are you going to make sysfs entries for the bridges? If so I'd like a
VGA attribute that directly reads the VGA bit from the hardware and
display it instead of using the shadow copy.

/* sysfs show for VGA routing bridge */
static ssize_t vga_bridge_show(struct device *dev, char *buf)
{
struct pci_dev *pdev = to_pci_dev(dev);
u16 l;

/* don't trust the shadow PCI_BRIDGE_CTL_VGA in pdev */
/* user space (X) may change hardware without telling the kernel */
pci_read_config_word(pdev, PCI_BRIDGE_CONTROL, );
return sprintf(buf, "%d\n", (l & PCI_BRIDGE_CTL_VGA) != 0);
}

I also use these functions to control VGA routing, maybe they should
be part of bridge support.

static void bridge_yes(struct pci_dev *pdev)
{
struct pci_dev *bridge;
struct pci_bus *bus;

/* Make sure the bridges route to us */
bus = pdev->bus;
while (bus) {
bridge = bus->self;
if (bridge) {
bus->bridge_ctl |= PCI_BRIDGE_CTL_VGA;
pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, 
bus->bridge_ctl);
}
bus = bus->parent;
}
}

static void bridge_no(struct pci_dev *pdev)
{
struct pci_dev *bridge;
struct pci_bus *bus;

/* Make sure the bridges don't route to us */
bus = pdev->bus;
while (bus) {
bridge = bus->self;
if (bridge) {
bus->bridge_ctl &= ~PCI_BRIDGE_CTL_VGA;
pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, 
bus->bridge_ctl);
}
bus = bus->parent;
}
}

Jesse can comment on the specific support needed for multiple legacy IO spaces.

-- 
Jon Smirl
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

2005-02-23 Thread Guillaume Thouvenin
On Wed, 2005-02-23 at 14:41 +0300, Evgeniy Polyakov wrote:
> > Please assume that  > originally written for> will always be listening.
> >
> > > > What happened to the idea of sending an on/off message down the netlink
> > > > socket?
> > ...
> > Arrange for the userspace daemon to send a message to the fork_connector
> > subsystem turning it on or off.  So we can bypass all this code in the
> > common case where  is listening, but your daemon is
> > not.
> 
> Ok, now I see(I'm not a fork connector author, so I did not receive them).
> That will require to add real fork connector with callback routing.
> Guillaume?

Yes the connector's callback is a good solution. I will add a fork
enable/disable callback in drivers/connector/connector.c that will
switch a global variable when called from user space. It will be
something like:

void cn_fork_callback(void)
{
if (cn_already_initialized) 
cn_fork_enable = cn_fork_enable ? 0 : 1 ;
} 

With cn_fork_enable set to 0 by default. In the do_fork() I will replace
the statement "if (cn_already_initialized)" by "if (cn_fork_enable)"

> > Without a lock you can have two messages with the same sequence number. 
> > Even if the daemon which you're planning on implementing can handle that,
> > we shouldn't allow it.
> 
> Yes, they can have the same number, but does it cost atomic/lock overhead?
> Anyway, simple spin_lock() should be enough in do_fork() context.
> Guillaume?

I will protect the incrementation by a spin_lock(_cn_lock).

Guillaume

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02

2005-02-23 Thread Lee Revell
On Thu, 2005-02-24 at 04:56 +, Hugh Dickins wrote:
> On Wed, 23 Feb 2005, Lee Revell wrote:
> > On Wed, 2005-02-23 at 20:53 +, Hugh Dickins wrote:
> > > On Wed, 23 Feb 2005, Hugh Dickins wrote:
> > > > Please replace by new patch below, which I'm now running through 
> > > > lmbench.
> > > 
> > > That second patch seems fine, and I see no lmbench regression from it.
> > 
> > Should go into 2.6.11, right?
> 
> That's up to Andrew (and Linus).
> 
> I was thinking that way when I rushed you the patch.  But given that
> you have remaining unresolved latency issues nearby (zap_pte_range,
> clear_page_range), and given the warning shot that I screwed up my
> first attempt, I'd be inclined to say hold off.
> 
> It's a pity: for a while we were thinking 2.6.11 would be a big step
> forward for mainline latency; but it now looks to me like these tests
> have come too late in the cycle to be dealt with safely.
> 
> In other mail, you do expect people still to be using Ingo's patches,
> so probably this patch should stick there (and in -mm) for now.

Well all of these were fixed in the past so it may not be unreasonable
to fix them for 2.6.11.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc5

2005-02-23 Thread Michael Neuffer
Quoting Linus Torvalds ([EMAIL PROTECTED]):
> 
> 
> Hey, I hoped -rc4 was the last one, but we had some laptop resource
> conflicts, various ppc TLB flush issues, some possible stack overflows in
> networking and a number of other details warranting a quick -rc5 before
> the final 2.6.11.
> 
> This time it's really supposed to be a quickie, so people who can, please 
> check it out, and we'll make the real 2.6.11 asap.
> 
> Mostly pretty small changes (the largest is a new SATA driver that crept
> in, our bad). But worth another quick round.


Are you sure you uploaded the correct patch file ?



-rw-rw-r--1 536  536 50907 Feb 24 04:13 ChangeLog-2.6.11-rc5
-rw-rw-r--1 536  536 0 Feb 24 04:13 LATEST-IS-2.6.11-rc5
-rw-rw-r--1 536  536  46586159 Feb 24 04:20 linux-2.6.11-rc5.tar.gz
-rw-rw-r--1 536  536   248 Feb 24 04:20 
linux-2.6.11-rc5.tar.gz.sign
-rw-rw-r--1 536  53637 Feb 24 04:20 patch-2.6.11-rc5.gz
-rw-rw-r--1 536  536  37080033 Feb 24 04:20 linux-2.6.11-rc5.tar.bz2
-rw-rw-r--1 536  536   248 Feb 24 04:20 
linux-2.6.11-rc5.tar.bz2.sign
-rw-rw-r--1 536  536   248 Feb 24 04:20 
linux-2.6.11-rc5.tar.sign
-rw-rw-r--1 536  53614 Feb 24 04:20 patch-2.6.11-rc5.bz2
-rw-rw-r--1 536  536   248 Feb 24 04:20 
patch-2.6.11-rc5.bz2.sign
-rw-rw-r--1 536  536   248 Feb 24 04:20 patch-2.6.11-rc5.gz.sign
-rw-rw-r--1 536  536   248 Feb 24 04:20 patch-2.6.11-rc5.sign
drwxrwsr-x2 536  536  8192 Feb 24 04:57 incr
drwxrwsr-x4 536  536 16384 Feb 24 05:00 .
lftp ftp.kernel.org:/pub/linux/kernel/v2.6/testing> 


Cheers
   Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc5

2005-02-23 Thread Matt Mackall
On Wed, Feb 23, 2005 at 08:18:08PM -0800, Linus Torvalds wrote:
> 
> 
> Hey, I hoped -rc4 was the last one, but we had some laptop resource
> conflicts, various ppc TLB flush issues, some possible stack overflows in
> networking and a number of other details warranting a quick -rc5 before
> the final 2.6.11.
> 
> This time it's really supposed to be a quickie, so people who can, please 
> check it out, and we'll make the real 2.6.11 asap.
> 
> Mostly pretty small changes (the largest is a new SATA driver that crept
> in, our bad). But worth another quick round.

Very small.

[   ] patch-2.6.11-rc5.bz2   23-Feb-2005 20:20   14   
[   ] patch-2.6.11-rc5.bz2.sign  23-Feb-2005 20:20  248   
[   ] patch-2.6.11-rc5.gz23-Feb-2005 20:20   37   
[   ] patch-2.6.11-rc5.gz.sign   23-Feb-2005 20:20  248   
[   ] patch-2.6.11-rc5.sign  23-Feb-2005 20:20  248   

Seems to have passed the gpg signature test on my end.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] PCI bridge driver rewrite

2005-02-23 Thread Adam Belay
Hi all,

For the past couple weeks I have been reorganizing the PCI subsystem to
better utilize the driver model.  Specifically, the bus detection code
is now using a standard PCI driver.  It turns out to be a major
undertaking, as the PCI probing code is closely tied into a lot of other
PCI components, and is spread throughout various architecture specific
areas.  I'm hoping that these changes will allow for a much cleaner and
more functional PCI implementation.

The basic flow of the new code is as follows:
1.) A standard "driver core" driver binds to a bridge device.
2.) When "*probe" is called it sets up the hardware and allocates a
"struct pci_bus".
3.) The "struct pci_bus" is filled with information about the detected
bridge.
4.) The driver then registers the "struct pci_bus" with the PCI Bus
Class.
5.) The PCI Bus Class makes the bridge available to sysfs.
6.) It then detects hardware attached to the bridge.
7.) Each new PCI bridge device is registered with the driver model.
8.) All remaining PCI devices are registered with the driver model.

Steps 7 and 8 allow for better resource management.


I've attached an early version of my code.  It has most of the new PCI
bus class registration code in place, and an early implementation of the
PCI-to-PCI bridge driver.  The following remains to be done:

1.) refine and cleanup the new PCI Bus API
2.) export the new API in "linux/pci.h", and cleanup any users of the
old code.
3.) fix every PCI hotplug driver.
4.) write a bridge driver for the PCI root bridge
5.) write a bridge driver for Cardbus hardware
6.) refine device registration order
7.) redesign PCI bus number assignment and support bus renumbering
8.) redesign PCI resource management to be compatible with the new code
9.) testing on various architectures
10.) Write "*suspend" and "*resume" routines for PCI bridges.  Any ideas
on what needs to be done?
11.) fix "PCI_LEGACY" (I may have broke it, but it should be trivial)

I look forward to any comments or suggestions.

Thanks,
Adam


diffstat:
 Makefile  |9
 bus-class.c   |  225 +++
 bus/Makefile  |6
 bus/bus-p2p.c |  133 ++
 device.c  |  142 +++
 pci.h |4
 probe.c   |  546
--
 remove.c  |  126 -
 9 files changed, 598 insertions(+), 593 deletions(-)

Patch is against 2.6.11-RC3.


diff -urN linux/drivers/pci/bus/bus-p2p.c linux-pci/drivers/pci/bus/bus-p2p.c
--- linux/drivers/pci/bus/bus-p2p.c 1969-12-31 19:00:00.0 -0500
+++ linux-pci/drivers/pci/bus/bus-p2p.c 2005-02-24 00:19:05.0 -0500
@@ -0,0 +1,133 @@
+/*
+ * bus-p2p.c - a generic PCI bus driver for PCI<->PCI bridges
+ *
+ */
+
+#include 
+#include 
+#include 
+
+static struct pci_device_id p2p_id_tbl[] = {
+   { PCI_DEVICE_CLASS(PCI_CLASS_BRIDGE_PCI << 8, 0x00) },
+   { 0 },
+};
+MODULE_DEVICE_TABLE(pci, p2p_id_tbl);
+
+static void p2p_setup_bus_numbers(struct pci_dev *dev, struct pci_bus *bus)
+{
+   u32 buses;
+
+   pci_read_config_dword(dev, PCI_PRIMARY_BUS, );
+
+   bus->primary = buses & 0xFF;
+   bus->secondary = (buses >> 8) & 0xFF;
+   bus->subordinate = (buses >> 16) & 0xFF;
+}
+
+static void pci_enable_crs(struct pci_dev *dev)
+{
+   u16 cap, rpctl;
+   int rpcap = pci_find_capability(dev, PCI_CAP_ID_EXP);
+   if (!rpcap)
+   return;
+
+   pci_read_config_word(dev, rpcap + PCI_CAP_FLAGS, );
+   if (((cap & PCI_EXP_FLAGS_TYPE) >> 4) != PCI_EXP_TYPE_ROOT_PORT)
+   return;
+
+   pci_read_config_word(dev, rpcap + PCI_EXP_RTCTL, );
+   rpctl |= PCI_EXP_RTCTL_CRSSVE;
+   pci_write_config_word(dev, rpcap + PCI_EXP_RTCTL, rpctl);
+}
+
+static void p2p_prepare_hardware(struct pci_dev *dev, struct pci_bus *bus)
+{
+   u16 bctl;
+
+   /* Disable MasterAbortMode during probing to avoid reporting
+  of bus errors (in some architectures) */ 
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
+ bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
+
+   bus->bridge_ctl = bctl;
+
+   pci_enable_crs(dev);
+}
+
+/* FIXME: these need to be defined in linux/pci.h */
+extern struct pci_bus * pci_alloc_bus(void);
+extern int pci_add_bus(struct pci_bus *bus);
+extern struct pci_bus * pci_derive_parent(struct device *);
+
+static int p2p_probe(struct pci_dev *dev, const struct pci_device_id *id)
+{
+   int err, i;
+   struct pci_bus *bus;
+
+   if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE)
+   return -ENODEV;
+
+   bus = pci_alloc_bus();
+
+   if (!bus)
+   return -ENOMEM;
+
+   bus->bridge = >dev;
+   bus->parent = pci_derive_parent(>self->dev);
+   if (!bus->parent) {
+   err = -ENODEV;
+   goto out;
+   }
+
+   bus->ops = bus->parent->ops;
+   bus->sysdata = 

Re: 2.6.11-rc5

2005-02-23 Thread Voluspa

>This time it's really supposed to be a quickie, so people who can,
>please check it out, and we'll make the real 2.6.11 asap.

Out of diskspace on kernel.org?

http://www.kernel.org/pub/linux/kernel/v2.6/testing/
[...]
 patch-2.6.11-rc5.bz2   23-Feb-2005 20:20   14
 patch-2.6.11-rc5.bz2.sign  23-Feb-2005 20:20  248
 patch-2.6.11-rc5.gz23-Feb-2005 20:20   37
 patch-2.6.11-rc5.gz.sign   23-Feb-2005 20:20  248
 patch-2.6.11-rc5.sign  23-Feb-2005 20:20  248

Mvh
Mats Johannesson
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] I8K - convert to platform device (sysfs)

2005-02-23 Thread Dmitry Torokhov


 i8k.c |  117 ++
 1 files changed, 117 insertions(+)

Index: dtor/drivers/char/i8k.c
===
--- dtor.orig/drivers/char/i8k.c
+++ dtor/drivers/char/i8k.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -87,6 +88,13 @@ static struct file_operations i8k_fops =
.ioctl  = i8k_ioctl,
 };
 
+static struct device_driver i8k_driver = {
+   .name   = "i8k",
+   .bus= _bus_type,
+};
+
+static struct platform_device *i8k_device;
+ 
 struct smm_regs {
unsigned int eax;
unsigned int ebx __attribute__ ((packed));
@@ -406,6 +414,89 @@ static int i8k_open_fs(struct inode *ino
return single_open(file, i8k_proc_show, NULL);
 }
 
+static ssize_t i8k_sysfs_cpu_temp_show(struct device *dev, char *buf)
+{
+   int temp = i8k_get_cpu_temp();
+
+   return temp < 0 ? -EIO : sprintf(buf, "%d\n", temp);
+}
+
+static ssize_t i8k_sysfs_fan1_show(struct device *dev, char *buf)
+{
+   int status = i8k_get_fan_status(0);
+   return status < 0 ? -EIO : sprintf(buf, "%d\n", status);
+}
+
+static ssize_t i8k_sysfs_fan1_set(struct device *dev, const char *buf, size_t 
count)
+{
+   unsigned long state;
+   char *rest;
+
+   if (restricted && !capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   state = simple_strtoul(buf, , 10);
+   if (*rest || state > I8K_FAN_MAX)
+   return -EINVAL;
+
+   if (i8k_set_fan(0, state) < 0)
+   return -EIO;
+
+   return count;
+}
+
+static ssize_t i8k_sysfs_fan2_show(struct device *dev, char *buf)
+{
+   int status = i8k_get_fan_status(1);
+   return status < 0 ? -EIO : sprintf(buf, "%d\n", status);
+}
+
+static ssize_t i8k_sysfs_fan2_set(struct device *dev, const char *buf, size_t 
count)
+{
+   unsigned long state;
+   char *rest;
+
+   if (restricted && !capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   state = simple_strtoul(buf, , 10);
+   if (*rest || state > I8K_FAN_MAX)
+   return -EINVAL;
+
+   if (i8k_set_fan(1, state) < 0)
+   return -EIO;
+
+   return count;
+}
+
+static ssize_t i8k_sysfs_fan1_speed_show(struct device *dev, char *buf)
+{
+   int speed = i8k_get_fan_speed(0);
+   return speed < 0 ? -EIO : sprintf(buf, "%d\n", speed);
+}
+
+static ssize_t i8k_sysfs_fan2_speed_show(struct device *dev, char *buf)
+{
+   int speed = i8k_get_fan_speed(1);
+   return speed < 0 ? -EIO : sprintf(buf, "%d\n", speed);
+}
+
+static ssize_t i8k_sysfs_power_status_show(struct device *dev, char *buf)
+{
+   int status = power_status ? i8k_get_power_status() : -1;
+   return status < 0 ? -EIO : sprintf(buf, "%d\n", status);
+}
+
+static struct device_attribute i8k_device_attrs[] = {
+   __ATTR(cpu_temp, 0444, i8k_sysfs_cpu_temp_show, NULL),
+   __ATTR(fan1_state, 0644, i8k_sysfs_fan1_show, i8k_sysfs_fan1_set),
+   __ATTR(fan2_state, 0644, i8k_sysfs_fan2_show, i8k_sysfs_fan2_set),
+   __ATTR(fan1_speed, 0444, i8k_sysfs_fan1_speed_show, NULL),
+   __ATTR(fan2_speed, 0444, i8k_sysfs_fan2_speed_show, NULL),
+   __ATTR(power_status, 0444, i8k_sysfs_power_status_show, NULL),
+   __ATTR_NULL
+};
+
 static struct dmi_system_id __initdata i8k_dmi_table[] = {
{
.ident = "Dell Inspiron",
@@ -490,6 +581,7 @@ static int __init i8k_probe(void)
 static int __init i8k_init(void)
 {
struct proc_dir_entry *proc_i8k;
+   int err, i;
 
/* Are we running on an supported laptop? */
if (i8k_probe())
@@ -503,15 +595,40 @@ static int __init i8k_init(void)
proc_i8k->proc_fops = _fops;
proc_i8k->owner = THIS_MODULE;
 
+   err = driver_register(_driver);
+   if (err)
+   goto fail1;
+
+   i8k_device = platform_device_register_simple("i8k", -1, NULL, 0);
+   if (IS_ERR(i8k_device)) {
+   err = PTR_ERR(i8k_device);
+   goto fail2;
+   }
+
+   for (i = 0; attr_name(i8k_device_attrs[i]); i++) {
+   err = device_create_file(_device->dev, 
_device_attrs[i]);
+   if (err)
+   goto fail3;
+   }
+
printk(KERN_INFO
   "Dell laptop SMM driver v%s Massimo Dal Zotto ([EMAIL 
PROTECTED])\n",
   I8K_VERSION);
 
return 0;
+
+fail3: while (--i >= 0)
+   device_remove_file(_device->dev, _device_attrs[i]);
+   platform_device_unregister(i8k_device);
+fail2: driver_unregister(_driver);
+fail1: remove_proc_entry("i8k", NULL);
+   return err;
 }
 
 static void __exit i8k_exit(void)
 {
+   platform_device_unregister(i8k_device);
+   driver_unregister(_driver);
remove_proc_entry("i8k", NULL);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a 

[PATCH 4/5] I8K - switch to module_{init|exit}

2005-02-23 Thread Dmitry Torokhov
===

I8K: use module_{init|exit} instead of old style #ifdef MODULE
 code, some formatting changes.

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>


 i8k.c  |  149 -
 misc.c |4 -
 2 files changed, 47 insertions(+), 106 deletions(-)

Index: dtor/drivers/char/misc.c
===
--- dtor.orig/drivers/char/misc.c
+++ dtor/drivers/char/misc.c
@@ -67,7 +67,6 @@ extern int rtc_DP8570A_init(void);
 extern int rtc_MK48T08_init(void);
 extern int pmu_device_init(void);
 extern int tosh_init(void);
-extern int i8k_init(void);
 
 #ifdef CONFIG_PROC_FS
 static void *misc_seq_start(struct seq_file *seq, loff_t *pos)
@@ -317,9 +316,6 @@ static int __init misc_init(void)
 #ifdef CONFIG_TOSHIBA
tosh_init();
 #endif
-#ifdef CONFIG_I8K
-   i8k_init();
-#endif
if (register_chrdev(MISC_MAJOR,"misc",_fops)) {
printk("unable to get major %d for misc devices\n",
   MISC_MAJOR);
Index: dtor/drivers/char/i8k.c
===
--- dtor.orig/drivers/char/i8k.c
+++ dtor/drivers/char/i8k.c
@@ -87,14 +87,14 @@ static struct file_operations i8k_fops =
.ioctl  = i8k_ioctl,
 };
 
-typedef struct {
+struct smm_regs {
unsigned int eax;
unsigned int ebx __attribute__ ((packed));
unsigned int ecx __attribute__ ((packed));
unsigned int edx __attribute__ ((packed));
unsigned int esi __attribute__ ((packed));
unsigned int edi __attribute__ ((packed));
-} SMMRegisters;
+};
 
 static inline char *i8k_get_dmi_data(int field)
 {
@@ -104,7 +104,7 @@ static inline char *i8k_get_dmi_data(int
 /*
  * Call the System Management Mode BIOS. Code provided by Jonathan Buzzard.
  */
-static int i8k_smm(SMMRegisters * regs)
+static int i8k_smm(struct smm_regs *regs)
 {
int rc;
int eax = regs->eax;
@@ -134,9 +134,8 @@ static int i8k_smm(SMMRegisters * regs)
:"a"(regs)
:"%ebx", "%ecx", "%edx", "%esi", "%edi", "memory");
 
-   if ((rc != 0) || ((regs->eax & 0x) == 0x) || (regs->eax == 
eax)) {
+   if (rc != 0 || (regs->eax & 0x) == 0x || regs->eax == eax)
return -EINVAL;
-   }
 
return 0;
 }
@@ -147,15 +146,9 @@ static int i8k_smm(SMMRegisters * regs)
  */
 static int i8k_get_bios_version(void)
 {
-   SMMRegisters regs = { 0, 0, 0, 0, 0, 0 };
-   int rc;
-
-   regs.eax = I8K_SMM_BIOS_VERSION;
-   if ((rc = i8k_smm()) < 0) {
-   return rc;
-   }
+   struct smm_regs regs = { .eax = I8K_SMM_BIOS_VERSION, };
 
-   return regs.eax;
+   return i8k_smm() < 0 ? : regs.eax;
 }
 
 /*
@@ -163,13 +156,11 @@ static int i8k_get_bios_version(void)
  */
 static int i8k_get_fn_status(void)
 {
-   SMMRegisters regs = { 0, 0, 0, 0, 0, 0 };
+   struct smm_regs regs = { .eax = I8K_SMM_FN_STATUS, };
int rc;
 
-   regs.eax = I8K_SMM_FN_STATUS;
-   if ((rc = i8k_smm()) < 0) {
+   if ((rc = i8k_smm()) < 0)
return rc;
-   }
 
switch ((regs.eax >> I8K_FN_SHIFT) & I8K_FN_MASK) {
case I8K_FN_UP:
@@ -188,20 +179,13 @@ static int i8k_get_fn_status(void)
  */
 static int i8k_get_power_status(void)
 {
-   SMMRegisters regs = { 0, 0, 0, 0, 0, 0 };
+   struct smm_regs regs = { .eax = I8K_SMM_POWER_STATUS, };
int rc;
 
-   regs.eax = I8K_SMM_POWER_STATUS;
-   if ((rc = i8k_smm()) < 0) {
+   if ((rc = i8k_smm()) < 0)
return rc;
-   }
 
-   switch (regs.eax & 0xff) {
-   case I8K_POWER_AC:
-   return I8K_AC;
-   default:
-   return I8K_BATTERY;
-   }
+   return (regs.eax & 0xff) == I8K_POWER_AC ? I8K_AC : I8K_BATTERY;
 }
 
 /*
@@ -209,16 +193,10 @@ static int i8k_get_power_status(void)
  */
 static int i8k_get_fan_status(int fan)
 {
-   SMMRegisters regs = { 0, 0, 0, 0, 0, 0 };
-   int rc;
+   struct smm_regs regs = { .eax = I8K_SMM_GET_FAN, };
 
-   regs.eax = I8K_SMM_GET_FAN;
regs.ebx = fan & 0xff;
-   if ((rc = i8k_smm()) < 0) {
-   return rc;
-   }
-
-   return (regs.eax & 0xff);
+   return i8k_smm() < 0 ? : regs.eax & 0xff;
 }
 
 /*
@@ -226,16 +204,10 @@ static int i8k_get_fan_status(int fan)
  */
 static int i8k_get_fan_speed(int fan)
 {
-   SMMRegisters regs = { 0, 0, 0, 0, 0, 0 };
-   int rc;
+   struct smm_regs regs = { .eax = I8K_SMM_GET_SPEED, };
 
-   regs.eax = I8K_SMM_GET_SPEED;
regs.ebx = fan & 0xff;
-   if ((rc = i8k_smm()) < 0) {
-   return rc;
-   }
-
-   return (regs.eax & 0x) * I8K_FAN_MULT;
+   return i8k_smm() < 0 ? : (regs.eax & 0x) * I8K_FAN_MULT;
 }
 
 /*
@@ -243,18 +215,12 @@ static int i8k_get_fan_speed(int 

[PATCH 2/5] I8K - use standard DMI functions

2005-02-23 Thread Dmitry Torokhov
===

I8K: Change to use stock dmi infrastructure instead of homegrown
 parsing code. The driver now requres box's DMI data to match
 list of supported models so driver can be safely compiled-in
 by default without fear of it poking into random SMM BIOS
 code. DMI checks can be ignored with i8k.ignore_dmi option.

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>

 Documentation/kernel-parameters.txt |3 
 arch/i386/kernel/dmi_scan.c |1 
 drivers/char/i8k.c  |  304 ++--
 include/linux/dmi.h |1 
 4 files changed, 60 insertions(+), 249 deletions(-)

Index: dtor/arch/i386/kernel/dmi_scan.c
===
--- dtor.orig/arch/i386/kernel/dmi_scan.c
+++ dtor/arch/i386/kernel/dmi_scan.c
@@ -416,6 +416,7 @@ static void __init dmi_decode(struct dmi
dmi_save_ident(dm, DMI_PRODUCT_VERSION, 6);
dmi_printk(("Serial Number: %s\n",
dmi_string(dm, data[7])));
+   dmi_save_ident(dm, DMI_PRODUCT_SERIAL, 7);
break;
case 2:
dmi_printk(("Board Vendor: %s\n",
Index: dtor/include/linux/dmi.h
===
--- dtor.orig/include/linux/dmi.h
+++ dtor/include/linux/dmi.h
@@ -9,6 +9,7 @@ enum dmi_field {
DMI_SYS_VENDOR,
DMI_PRODUCT_NAME,
DMI_PRODUCT_VERSION,
+   DMI_PRODUCT_SERIAL,
DMI_BOARD_VENDOR,
DMI_BOARD_NAME,
DMI_BOARD_VERSION,
Index: dtor/drivers/char/i8k.c
===
--- dtor.orig/drivers/char/i8k.c
+++ dtor/drivers/char/i8k.c
@@ -20,7 +20,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -52,18 +52,7 @@
 
 #define I8K_TEMPERATURE_BUG1
 
-#define DELL_SIGNATURE "Dell Computer"
-
-static char *supported_models[] = {
-   "Inspiron",
-   "Latitude",
-   NULL
-};
-
-static char system_vendor[48] = "?";
-static char product_name[48] = "?";
-static char bios_version[4] = "?";
-static char serial_number[16] = "?";
+static char bios_version[4];
 
 MODULE_AUTHOR("Massimo Dal Zotto ([EMAIL PROTECTED])");
 MODULE_DESCRIPTION("Driver for accessing SMM BIOS on Dell laptops");
@@ -73,6 +62,10 @@ static int force;
 module_param(force, bool, 0);
 MODULE_PARM_DESC(force, "Force loading without checking for supported models");
 
+static int ignore_dmi;
+module_param(ignore_dmi, bool, 0);
+MODULE_PARM_DESC(ignore_dmi, "Continue probing hardware even if DMI data does 
not match");
+
 static int restricted;
 module_param(restricted, bool, 0);
 MODULE_PARM_DESC(restricted, "Allow fan control if SYS_ADMIN capability set");
@@ -99,11 +92,10 @@ typedef struct {
unsigned int edi __attribute__ ((packed));
 } SMMRegisters;
 
-typedef struct {
-   u8 type;
-   u8 length;
-   u16 handle;
-} DMIHeader;
+static inline char *i8k_get_dmi_data(int field)
+{
+   return dmi_get_system_info(field) ? : "N/A";
+}
 
 /*
  * Call the System Management Mode BIOS. Code provided by Jonathan Buzzard.
@@ -163,15 +155,6 @@ static int i8k_get_bios_version(void)
 }
 
 /*
- * Read the machine id.
- */
-static int i8k_get_serial_number(unsigned char *buff)
-{
-   strlcpy(buff, serial_number, sizeof(serial_number));
-   return 0;
-}
-
-/*
  * Read the Fn key status.
  */
 static int i8k_get_fn_status(void)
@@ -328,7 +311,7 @@ static int i8k_get_dell_signature(void)
 static int i8k_ioctl(struct inode *ip, struct file *fp, unsigned int cmd,
 unsigned long arg)
 {
-   int val;
+   int val = 0;
int speed;
unsigned char buff[16];
int __user *argp = (int __user *)arg;
@@ -343,7 +326,7 @@ static int i8k_ioctl(struct inode *ip, s
 
case I8K_MACHINE_ID:
memset(buff, 0, 16);
-   val = i8k_get_serial_number(buff);
+   strlcpy(buff, i8k_get_dmi_data(DMI_PRODUCT_SERIAL), 
sizeof(buff));
break;
 
case I8K_FN_STATUS:
@@ -451,10 +434,10 @@ static int i8k_get_info(char *buffer, ch
n = sprintf(buffer, "%s %s %s %d %d %d %d %d %d %d\n",
I8K_PROC_FMT,
bios_version,
-   serial_number,
+   dmi_get_system_info(DMI_PRODUCT_SERIAL) ? : "N/A",
cpu_temp,
-   left_fan,
-   right_fan, left_speed, right_speed, ac_power, fn_key);
+   left_fan, right_fan, left_speed, right_speed,
+   ac_power, fn_key);
 
return n;
 }
@@ -486,201 +469,23 @@ static ssize_t i8k_read(struct file *f, 
return len;
 }
 
-static char *__init string_trim(char *s, int size)
-{
-   int len;
-   char *p;
-
- 

[PATCH 3/5] I8K - switch to seq_file

2005-02-23 Thread Dmitry Torokhov
===

I8K: Change proc code to use seq_file.

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>


 i8k.c |   64 ++--
 1 files changed, 22 insertions(+), 42 deletions(-)

Index: dtor/drivers/char/i8k.c
===
--- dtor.orig/drivers/char/i8k.c
+++ dtor/drivers/char/i8k.c
@@ -20,13 +20,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
 #include 
 
-#define I8K_VERSION"1.13 14/05/2002"
+#define I8K_VERSION"1.14 21/02/2005"
 
 #define I8K_SMM_FN_STATUS  0x0025
 #define I8K_SMM_POWER_STATUS   0x0069
@@ -74,13 +75,16 @@ static int power_status;
 module_param(power_status, bool, 0600);
 MODULE_PARM_DESC(power_status, "Report power status in /proc/i8k");
 
-static ssize_t i8k_read(struct file *, char __user *, size_t, loff_t *);
+static int i8k_open_fs(struct inode *inode, struct file *file);
 static int i8k_ioctl(struct inode *, struct file *, unsigned int,
 unsigned long);
 
 static struct file_operations i8k_fops = {
-   .read = i8k_read,
-   .ioctl = i8k_ioctl,
+   .open   = i8k_open_fs,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+   .ioctl  = i8k_ioctl,
 };
 
 typedef struct {
@@ -400,9 +404,9 @@ static int i8k_ioctl(struct inode *ip, s
 /*
  * Print the information for /proc/i8k.
  */
-static int i8k_get_info(char *buffer, char **start, off_t fpos, int length)
+static int i8k_proc_show(struct seq_file *seq, void *offset)
 {
-   int n, fn_key, cpu_temp, ac_power;
+   int fn_key, cpu_temp, ac_power;
int left_fan, right_fan, left_speed, right_speed;
 
cpu_temp= i8k_get_cpu_temp();   /* 11100 ??s */
@@ -431,42 +435,18 @@ static int i8k_get_info(char *buffer, ch
 * 9)  AC power
 * 10) Fn Key status
 */
-   n = sprintf(buffer, "%s %s %s %d %d %d %d %d %d %d\n",
-   I8K_PROC_FMT,
-   bios_version,
-   dmi_get_system_info(DMI_PRODUCT_SERIAL) ? : "N/A",
-   cpu_temp,
-   left_fan, right_fan, left_speed, right_speed,
-   ac_power, fn_key);
-
-   return n;
+   return seq_printf(seq, "%s %s %s %d %d %d %d %d %d %d\n",
+ I8K_PROC_FMT,
+ bios_version,
+ dmi_get_system_info(DMI_PRODUCT_SERIAL) ? : "N/A",
+ cpu_temp,
+ left_fan, right_fan, left_speed, right_speed,
+ ac_power, fn_key);
 }
 
-static ssize_t i8k_read(struct file *f, char __user * buffer, size_t len,
-   loff_t * fpos)
+static int i8k_open_fs(struct inode *inode, struct file *file)
 {
-   int n;
-   char info[128];
-
-   n = i8k_get_info(info, NULL, 0, 128);
-   if (n <= 0) {
-   return n;
-   }
-
-   if (*fpos >= n) {
-   return 0;
-   }
-
-   if ((*fpos + len) >= n) {
-   len = n - *fpos;
-   }
-
-   if (copy_to_user(buffer, info, len) != 0) {
-   return -EFAULT;
-   }
-
-   *fpos += len;
-   return len;
+   return single_open(file, i8k_proc_show, NULL);
 }
 
 static struct dmi_system_id __initdata i8k_dmi_table[] = {
@@ -562,10 +542,10 @@ int __init i8k_init(void)
return -ENODEV;
 
/* Register the proc entry */
-   proc_i8k = create_proc_info_entry("i8k", 0, NULL, i8k_get_info);
-   if (!proc_i8k) {
+   proc_i8k = create_proc_entry("i8k", 0, NULL);
+   if (!proc_i8k)
return -ENOENT;
-   }
+
proc_i8k->proc_fops = _fops;
proc_i8k->owner = THIS_MODULE;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/5] I8K driver facelift

2005-02-23 Thread Dmitry Torokhov
Hi,

here are some changes that freshen I8K driver (Dell Inspiron/Latitude
platform driver). The patches have been tested on Inspiron 8100.

i8k-lindent.patch
- pass the driver through Lindent to comply with CondingStyle requirements
  (4 spaces vs. TAB indentation)

i8k-use-dmi.patch
- use standard DMI handling functions instead of homemade ones. The driver
  now requires DMI data to match list of supported models - this way driver
  can be safely enabled without fear of it poking into SMM code on wrong
  box. DMI checks can be ignored with i8k.ignore_dmi option.   

i8k-seqfile.patch
- switch proc handlig code to seq_file instead of having custom read
  function splitting output to fit into user's buffer.

i8k-cleanup.patch
- use module_{init|exit} instead of old-style module intialization code,
  some formatting changes.

i8k-sysfs.patch
- make i8k a platform device and export temperatiure and both fan states
  as sysfs attributes. Wringing into fan1_state and fan2_state attributes
  allows switching fans on and off without need for special utility.

Please consider for inclusion.

Thanks!

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] I8K - pass though Lindent

2005-02-23 Thread Dmitry Torokhov
===

I8K: pass through Lindent to change 4 spaces identation to TABs

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>


 i8k.c |  954 +-
 1 files changed, 477 insertions(+), 477 deletions(-)

Index: dtor/drivers/char/i8k.c
===
--- dtor.orig/drivers/char/i8k.c
+++ dtor/drivers/char/i8k.c
@@ -55,14 +55,14 @@
 #define DELL_SIGNATURE "Dell Computer"
 
 static char *supported_models[] = {
-"Inspiron",
-"Latitude",
-NULL
+   "Inspiron",
+   "Latitude",
+   NULL
 };
 
 static char system_vendor[48] = "?";
-static char product_name [48] = "?";
-static char bios_version [4]  = "?";
+static char product_name[48] = "?";
+static char bios_version[4] = "?";
 static char serial_number[16] = "?";
 
 MODULE_AUTHOR("Massimo Dal Zotto ([EMAIL PROTECTED])");
@@ -86,64 +86,63 @@ static int i8k_ioctl(struct inode *, str
 unsigned long);
 
 static struct file_operations i8k_fops = {
-.read  = i8k_read,
-.ioctl = i8k_ioctl,
+   .read = i8k_read,
+   .ioctl = i8k_ioctl,
 };
 
 typedef struct {
-unsigned int eax;
-unsigned int ebx __attribute__ ((packed));
-unsigned int ecx __attribute__ ((packed));
-unsigned int edx __attribute__ ((packed));
-unsigned int esi __attribute__ ((packed));
-unsigned int edi __attribute__ ((packed));
+   unsigned int eax;
+   unsigned int ebx __attribute__ ((packed));
+   unsigned int ecx __attribute__ ((packed));
+   unsigned int edx __attribute__ ((packed));
+   unsigned int esi __attribute__ ((packed));
+   unsigned int edi __attribute__ ((packed));
 } SMMRegisters;
 
 typedef struct {
-u8 type;
-u8 length;
-u16handle;
+   u8 type;
+   u8 length;
+   u16 handle;
 } DMIHeader;
 
 /*
  * Call the System Management Mode BIOS. Code provided by Jonathan Buzzard.
  */
-static int i8k_smm(SMMRegisters *regs)
+static int i8k_smm(SMMRegisters * regs)
 {
-int rc;
-int eax = regs->eax;
+   int rc;
+   int eax = regs->eax;
 
-asm("pushl %%eax\n\t" \
-   "movl 0(%%eax),%%edx\n\t" \
-   "push %%edx\n\t" \
-   "movl 4(%%eax),%%ebx\n\t" \
-   "movl 8(%%eax),%%ecx\n\t" \
-   "movl 12(%%eax),%%edx\n\t" \
-   "movl 16(%%eax),%%esi\n\t" \
-   "movl 20(%%eax),%%edi\n\t" \
-   "popl %%eax\n\t" \
-   "out %%al,$0xb2\n\t" \
-   "out %%al,$0x84\n\t" \
-   "xchgl %%eax,(%%esp)\n\t"
-   "movl %%ebx,4(%%eax)\n\t" \
-   "movl %%ecx,8(%%eax)\n\t" \
-   "movl %%edx,12(%%eax)\n\t" \
-   "movl %%esi,16(%%eax)\n\t" \
-   "movl %%edi,20(%%eax)\n\t" \
-   "popl %%edx\n\t" \
-   "movl %%edx,0(%%eax)\n\t" \
-   "lahf\n\t" \
-   "shrl $8,%%eax\n\t" \
-   "andl $1,%%eax\n" \
-   : "=a" (rc)
-   : "a" (regs)
-   : "%ebx", "%ecx", "%edx", "%esi", "%edi", "memory");
-
-if ((rc != 0) || ((regs->eax & 0x) == 0x) || (regs->eax == eax)) {
-   return -EINVAL;
-}
+   asm("pushl %%eax\n\t"
+   "movl 0(%%eax),%%edx\n\t"
+   "push %%edx\n\t"
+   "movl 4(%%eax),%%ebx\n\t"
+   "movl 8(%%eax),%%ecx\n\t"
+   "movl 12(%%eax),%%edx\n\t"
+   "movl 16(%%eax),%%esi\n\t"
+   "movl 20(%%eax),%%edi\n\t"
+   "popl %%eax\n\t"
+   "out %%al,$0xb2\n\t"
+   "out %%al,$0x84\n\t"
+   "xchgl %%eax,(%%esp)\n\t"
+   "movl %%ebx,4(%%eax)\n\t"
+   "movl %%ecx,8(%%eax)\n\t"
+   "movl %%edx,12(%%eax)\n\t"
+   "movl %%esi,16(%%eax)\n\t"
+   "movl %%edi,20(%%eax)\n\t"
+   "popl %%edx\n\t"
+   "movl %%edx,0(%%eax)\n\t"
+   "lahf\n\t"
+   "shrl $8,%%eax\n\t"
+   "andl $1,%%eax\n":"=a"(rc)
+   :"a"(regs)
+   :"%ebx", "%ecx", "%edx", "%esi", "%edi", "memory");
 
-return 0;
+   if ((rc != 0) || ((regs->eax & 0x) == 0x) || (regs->eax == 
eax)) {
+   return -EINVAL;
+   }
+
+   return 0;
 }
 
 /*
@@ -152,15 +151,15 @@ static int i8k_smm(SMMRegisters *regs)
  */
 static int i8k_get_bios_version(void)
 {
-SMMRegisters regs = { 0, 0, 0, 0, 0, 0 };
-int rc;
+   SMMRegisters regs = { 0, 0, 0, 0, 0, 0 };
+   int rc;
 
-regs.eax = I8K_SMM_BIOS_VERSION;
-if ((rc=i8k_smm()) < 0) {
-   return rc;
-}
+   regs.eax = I8K_SMM_BIOS_VERSION;
+   if ((rc = i8k_smm()) < 0) {
+   return rc;
+   }
 
-return regs.eax;
+   return regs.eax;
 }
 
 /*
@@ -168,8 +167,8 @@ static int i8k_get_bios_version(void)
  */
 static int i8k_get_serial_number(unsigned char *buff)
 {
-strlcpy(buff, serial_number, sizeof(serial_number));
-return 0;
+   strlcpy(buff, serial_number, sizeof(serial_number));
+   return 0;
 }
 
 /*
@@ -177,24 

A Proposal for an MMU abstraction layer

2005-02-23 Thread Christoph Lameter
1. Rationale


Currently the Linux kernel implements a hierachical page table utilizing 4
layers. Architectures that have less layers may cause the kernel to not
generate code for certain layers. However, there are other means for mmu
to describe page tables to the system. For example the Itanium (and other
CPUs) support hashed page table structures or linear page tables. IA64 has
to simulate the hierachical layers through its linear page tables and
implements the higher layers in software.

Moreover, different architectures have different means of implementing
huge page table entries. On IA32 this is realized by omitting the lower
layer entries and providing single PMD entry replacing 512/1024 PTE
entries. On IA64 a PTE entry is used for that purpose. Other architecture
realize huge page table entries through groups of PTE entries. There are
hooks for each of these methods in the kernel. Moreover the way of
handling huge pages is not like other pages but they are managed through a
file system. Only one size of huge pages is supported. It would be much
better if huge pages would be handled more like regular pages and also to
have support for multiple page sizes (which then may lead to support
variable page sizes in the VM).

It would be best to hide these implementation differences in an mmu
abstraction layer. Various architectures could then implement their own
way of representing page table entries. We would provide a legacy 4 layer,
3 layer and 2 layer implementation that would take care of the existing
implementations. These generic implementations can then be taken by an
architecture and emendedto provide the huge page table entries in way
fitting for that architecture. For IA64 and otherplatforms that allow
alternate ways of maintaining translations, we could avoid maintaining a
hierachical table.

There are a couple of additional features for page tables that then could
also be worked into that abstraction layer:

A. Global translation entries.
B. Variable page size.
C. Use a transactional scheme to allow a variety of synchronization
schemes.

Early idea for an mmu abstraction layer API
===

Three new opaque types:

mmu_entry_t
mmu_translation_set_t
mmu_transaction_t

*mmu_entry_t* replaces the existing pte_t and has roughly the same features.
However, mmu_entry_t describes a translation of a logical address to a
physical address in general. This means that the mmu_entry_t must be able
to represent all possible mappings including mappings for huge pages and
pages of various sizes if these features are supported by the method of
handling page tables. If statistics need to be kept about entries then this
entry will also contain a number to indicate what counter to update when
inserting or deleting this type of entry [spare bits may be used for this
purpose]

*mmu_translation_set_t* represents a virtual address space for a process and is 
essentially
a set of mmu_entry_t's plus additional management information that may be 
necessary to
manage an address space.

*mmu_transaction_t* allows to perform transactions on translation entries and 
maintains the
state of a transaction. The state information allows to undo changes or commit 
them in
a way that must appear to be atomic to any other access in the system.

Operations on mmu_translation_set_t
---

void mmu_new_translation_set(struct mmu_translation_set_t *t);
Generates an empty translation set

void mmu_dup_translation_set(struct mmu_translation_set_t *t, struct 
mmu_translation_set *t);
Generates a duplicate of a translation set

void mmu_remove_translation_set(struct mmu_translation_set *t);
Removes a translation set

void mmu_clear_range(struct mmu_translation_set_t *t, unsigned long start, 
unsigned long end);
Wipe out a range of addresses in the translation set

void mmu_copy_range(struct mmu_translation_set *dest, struct
mmu_translation_set_t *src, unsinged long dest_start, unsigned long src_start, 
unsigned long
length);

These functions are not implemented for the period in which old and new
schemes are coexisting since this would require a major change to mm_struct.

Transactional operations


void mmu_transaction(struct mmu_transaction_t *ta, struct mmu_translation_set_t 
*tr);
Begin a transaction

For the coexistence period this is implemented as

mmu_transaction(struct mmu_transaction_t , struct mm_struct *mm,
struct vm_are_struct *);

void mmu_commit(struct mmu_transaction_t);
Commit changes done

void mmu_forget(struct mmu_transaction_t);
Undo changes undone

struct mmu_entry_t mmu_find(struct mmu_transaction_t *ta, unsigned long 
address);
Find mmu entry and make this the current entry

void mmu_update(struct mmu_transaction_t *ta, mmu_entry_t entry);
Update the current entry

void mmu_add(struct mmu_transaction_t *ta, mmu_entry_t 

Re: [PATCH 2/2] page table iterators

2005-02-23 Thread Nick Piggin
On Thu, 2005-02-24 at 05:12 +, Hugh Dickins wrote:
> On Thu, 24 Feb 2005, Nick Piggin wrote:

> > OK after sleeping on it, I'm warming to your way.
> > 
> > I don't think it makes something like David's modifications any
> > easier, but mine didn't go a long way to that end either. And
> > being a more incremental approach gives us more room to move in
> > future (for example, maybe toward something that really *will*
> > accommodate the bitmap walking code nicely).
> 
> I'll take a quick look at David's today.
> Just so long as we don't make them harder.
> 

No, I think we may want to move to something better abstracted:
it makes things sufficiently complex that you wouldn't want to
have it open coded everywhere.

But no, you're not making it harder than the present situation.

> > So I'd be pretty happy for you to queue this up with Andrew for
> > 2.6.12. Anyone else?
> 
> Oh, okay, thanks.  You weren't very happy with p??_limit(addr, end),
> and good naming is important to me.  I didn't care for your tentative
> p??_span or p??_span_end.  Would p??_end be better?  p??_enda would
> be fun for one of them...
> 

pud_addr_end?



http://mobile.yahoo.com.au - Yahoo! Mobile

- Check & compose your email via SMS on your Telstra or Vodafone mobile.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-23 Thread Hugh Dickins
On Thu, 24 Feb 2005, Nick Piggin wrote:
> Hugh Dickins wrote:
> 
> > I'm inlining pmd and pud levels, but not pte and pgd levels.
> 
> OK - that's probably sufficient for debugging. There is only so
> much that can go wrong in the middle levels... 

Yes, that was my thinking.

> how does it look
> performance wise? (I can give it a test when it gets split out)

Yesterday shattered in various directions, I hope to try today.

> > One point worth making, I do believe throughout that whatever the
> > address layout, "end" cannot be 0 - BUG_ON(addr >= end) assures.

Of course, that does allow some simplifications in your for_each
macros; but it still looked like my p??_limits were better for
shortest codepath, and close to yours for codesize.

> OK after sleeping on it, I'm warming to your way.
> 
> I don't think it makes something like David's modifications any
> easier, but mine didn't go a long way to that end either. And
> being a more incremental approach gives us more room to move in
> future (for example, maybe toward something that really *will*
> accommodate the bitmap walking code nicely).

I'll take a quick look at David's today.
Just so long as we don't make them harder.

> So I'd be pretty happy for you to queue this up with Andrew for
> 2.6.12. Anyone else?

Oh, okay, thanks.  You weren't very happy with p??_limit(addr, end),
and good naming is important to me.  I didn't care for your tentative
p??_span or p??_span_end.  Would p??_end be better?  p??_enda would
be fun for one of them...

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at mm/rmap.c:483!

2005-02-23 Thread Hugh Dickins
On Wed, 23 Feb 2005, Ammar T. Al-Sayegh wrote:
> - Original Message - From: "Hugh Dickins" <[EMAIL PROTECTED]>
> > though quite possibly you cannot afford
> > such experiments on this server, and will revert to 2.4 for now.
> 
> The problem is that my server is already in production
> mode. I'm running great portion of my business on it,
> where there is very little tolerance for downtime.

I feared as much.

> Because the server is located in a remote datacenter,
> every time it goes down it takes several hours to have
> someone sent up there to manually reboot it for a hefty
> emergency fee. So this bug has already cost me a lot of
> money, and I'm worried that it will cost me a lot of my
> clients as well if it persists.

I'm very sorry for that.

> Remote hands are rather expensive, so it will cost me
> $100/hr to have someone runs memtest86 on my server
> since I can't perform it remotely. I'll do it though
> since that's your recommendation for the time being.
> Hope it will not take more than an hour to run the
> test, and hope it turns out as bad memory modules as
> you expect because I hate to downgrade after all the
> time and money I expended on the upgrade.

One hour will be enough if it does find a problem in that time,
worth a shot; but not enough to give confidence in the memory
if it does not find one, 12 hours better.  I actually wonder
whether rmap.c:483 is the best memory tester (serious answer
would be, in some cases yes, but not in all).

Do let me know.  If I can find time to rejig the debug patch
against your kernel, it would itself keep your server running,
replacing the BUG_ON by printks and safety.  But without knowing
what it will report, I can't judge how satisfactory that would
be (and it's unlikely to lead us to the final answer in one go).

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Chad N. Tindel
> But the other side of the coin is that a SCHED_FIFO userspace task
> presumably has extreme latency requirements, so it doesn't *want* to be
> preempted by some routine kernel operation.  People would get irritated if
> we were to do that.

Just to follow up a bit.  People writing apps that run at SCHED_FIFO know
that they aren't getting hard real-time, and they are OK with that.  If they
wanted something more they'd run on RTLinux.  Why would it be wrong to preempt
the SCHED_FIFO process in the case, assuming that it is too hard to fix a broken
design that doesn't allow the necessary kernel threads to run on any CPU?

Chad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Chad N. Tindel
> `xterm' is waiting for the other CPU to schedule a kernel thread (which is
> bound to that CPU).  Once that kernel thread has done a little bit of work,
> `xterm' can terminate.
> 
> But kernel threads don't run with realtime policy, so your userspace app
> has permanently starved that kernel thread.
> 
> It's potentially quite a problem, really.  For example it could prevent
> various tty operations from completing, it will prevent kjournald from ever
> writing back anything (on uniprocessor, etc).  I've been waiting for
> someone to complain ;)
> 
> But the other side of the coin is that a SCHED_FIFO userspace task
> presumably has extreme latency requirements, so it doesn't *want* to be
> preempted by some routine kernel operation.  People would get irritated if
> we were to do that.
> 
> So what to do?

It shouldn't need to preempt the kernel operation.  Why is the design such that
the necessary kernel thread can't run on the other CPU?

Chad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.11+ sata_qstor] libata: sata_qstor cosmetic fixes

2005-02-23 Thread Jeff Garzik
Mark Lord wrote:
Minor patch for new 2.6.xx sata_qstor driver attached,
as per Alexey's fine-toothed comb!  :)
Signed-off-by: Mark Lord <[EMAIL PROTECTED]>
I had to apply this manually, since your mailer "corrupts" the patch by 
encoding text/plain as base64.  Please fix your mailer...

The ideal is an inline patch, rather than an attachment anyway.  e.g.
To: ...
From: ...
Subject: ...

Patch description
Patch
cat'd to 'sendmail -t'.  Sendmail (or another MTA which provides a 
/usr/sbin/sendmail wrapper) will automatically fill in other headers 
like Message-ID and Date.

Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


module insert question

2005-02-23 Thread Anil Kumar
Hi,

Can you please let me know, what all files does the OS look into to
load modules?
I see the following messages during boot rather installation:
==
Finished bus probing
modules to insert tg3 aic79xx
==

which files does the OS look into to load tg3 and aic79xx after
finishing bus probing. I guess modprobe.conf, modules.alias,
modules.pcimap.

with regards,
Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


about printk

2005-02-23 Thread mike
Dear all
   I am new to this place, Please correct me if i am wrong.
  Before console_init, printk is just filling up the printk buffer. 
After console_init,  will the message print out immediately?

best regard
Mike,Lee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02

2005-02-23 Thread Hugh Dickins
On Wed, 23 Feb 2005, Lee Revell wrote:
> On Wed, 2005-02-23 at 20:53 +, Hugh Dickins wrote:
> > On Wed, 23 Feb 2005, Hugh Dickins wrote:
> > > Please replace by new patch below, which I'm now running through lmbench.
> > 
> > That second patch seems fine, and I see no lmbench regression from it.
> 
> Should go into 2.6.11, right?

That's up to Andrew (and Linus).

I was thinking that way when I rushed you the patch.  But given that
you have remaining unresolved latency issues nearby (zap_pte_range,
clear_page_range), and given the warning shot that I screwed up my
first attempt, I'd be inclined to say hold off.

It's a pity: for a while we were thinking 2.6.11 would be a big step
forward for mainline latency; but it now looks to me like these tests
have come too late in the cycle to be dealt with safely.

In other mail, you do expect people still to be using Ingo's patches,
so probably this patch should stick there (and in -mm) for now.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2/14] Orinoco driver updates - update printk()s

2005-02-23 Thread David Gibson
Reformats printk()s, comments, labels and other cosmetic strings in
the orinoco driver.  Also moves, removes, and adds ratelimiting in
some places.  Behavioural changes are trivial/cosmetic only.  This
reduces the cosmetic/trivial differences between the current kernel
version, and the CVS version of the driver; one small step towards
full merge.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/hermes.c
===
--- working-2.6.orig/drivers/net/wireless/hermes.c  2005-02-10 
14:47:39.572667480 +1100
+++ working-2.6/drivers/net/wireless/hermes.c   2005-02-10 14:47:41.293405888 
+1100
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "hermes.h"
@@ -232,13 +233,16 @@
err = hermes_issue_cmd(hw, cmd, parm0);
if (err) {
if (! hermes_present(hw)) {
-   printk(KERN_WARNING "hermes @ %p: "
-  "Card removed while issuing command.\n",
-  hw->iobase);
+   if (net_ratelimit())
+   printk(KERN_WARNING "hermes @ %p: "
+  "Card removed while issuing command "
+  "0x%04x.\n", hw->iobase, cmd);
err = -ENODEV;
} else 
-   printk(KERN_ERR "hermes @ %p: Error %d issuing 
command.\n",
-  hw->iobase, err);
+   if (net_ratelimit())
+   printk(KERN_ERR "hermes @ %p: "
+  "Error %d issuing command 0x%04x.\n",
+  hw->iobase, err, cmd);
goto out;
}
 
@@ -251,17 +255,16 @@
}
 
if (! hermes_present(hw)) {
-   printk(KERN_WARNING "hermes @ %p: "
-  "Card removed while waiting for command completion.\n",
-  hw->iobase);
+   printk(KERN_WARNING "hermes @ %p: Card removed "
+  "while waiting for command 0x%04x completion.\n",
+  hw->iobase, cmd);
err = -ENODEV;
goto out;
}

if (! (reg & HERMES_EV_CMD)) {
-   printk(KERN_ERR "hermes @ %p: "
-  "Timeout waiting for command completion.\n",
-  hw->iobase);
+   printk(KERN_ERR "hermes @ %p: Timeout waiting for "
+  "command 0x%04x completion.\n", hw->iobase, cmd);
err = -ETIMEDOUT;
goto out;
}
@@ -481,14 +484,13 @@
*length = rlength;
 
if (rtype != rid)
-   printk(KERN_WARNING "hermes @ %p: "
-  "hermes_read_ltv(): rid  (0x%04x) does not match type 
(0x%04x)\n",
-  hw->iobase, rid, rtype);
+   printk(KERN_WARNING "hermes @ %p: %s(): "
+  "rid (0x%04x) does not match type (0x%04x)\n",
+  hw->iobase, __FUNCTION__, rid, rtype);
if (HERMES_RECLEN_TO_BYTES(rlength) > bufsize)
printk(KERN_WARNING "hermes @ %p: "
   "Truncating LTV record from %d to %d bytes. "
-  "(rid=0x%04x, len=0x%04x)\n",
-  hw->iobase,
+  "(rid=0x%04x, len=0x%04x)\n", hw->iobase,
   HERMES_RECLEN_TO_BYTES(rlength), bufsize, rid, rlength);
 
nwords = min((unsigned)rlength - 1, bufsize / 2);
Index: working-2.6/drivers/net/wireless/orinoco_pci.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco_pci.c 2005-02-10 
14:47:39.573667328 +1100
+++ working-2.6/drivers/net/wireless/orinoco_pci.c  2005-02-10 
14:47:41.294405736 +1100
@@ -151,24 +151,18 @@
 
/* Assert the reset until the card notice */
hermes_write_regn(hw, PCI_COR, HERMES_PCI_COR_MASK);
-   printk(KERN_NOTICE "Reset done");
timeout = jiffies + (HERMES_PCI_COR_ONT * HZ / 1000);
while(time_before(jiffies, timeout)) {
-   printk(".");
mdelay(1);
}
-   printk(";\n");
//mdelay(HERMES_PCI_COR_ONT);
 
/* Give time for the card to recover from this hard effort */
hermes_write_regn(hw, PCI_COR, 0x);
-   printk(KERN_NOTICE "Clear Reset");
timeout = jiffies + (HERMES_PCI_COR_OFFT * HZ / 1000);
while(time_before(jiffies, timeout)) {
-   printk(".");
mdelay(1);
}
-   printk(";\n");
//mdelay(HERMES_PCI_COR_OFFT);
 
/* The card is ready when it's no longer busy */
@@ -183,7 +177,6 @@
printk(KERN_ERR PFX "Busy timeout\n");
return -ETIMEDOUT;
}
-   

[7/14] Orinoco driver updates - use modern module_parm()

2005-02-23 Thread David Gibson
Add descrptions to module parameters in the orinoco driver, and also
add permissions to allow them to be exported in sysfs.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-10 
13:19:14.0 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-02-10 13:24:03.0 
+1100
@@ -461,12 +461,14 @@
 /* Level of debugging. Used in the macros in orinoco.h */
 #ifdef ORINOCO_DEBUG
 int orinoco_debug = ORINOCO_DEBUG;
-module_param(orinoco_debug, int, 0);
+module_param(orinoco_debug, int, 0644);
+MODULE_PARM_DESC(orinoco_debug, "Debug level");
 EXPORT_SYMBOL(orinoco_debug);
 #endif
 
 static int suppress_linkstatus; /* = 0 */
-module_param(suppress_linkstatus, bool, 0);
+module_param(suppress_linkstatus, bool, 0644);
+MODULE_PARM_DESC(suppress_linkstatus, "Don't log link status changes");
 
 //
 /* Compile time configuration and compatibility stuff   */


-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist.  NOT _the_ _other_ _way_
| _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[5/14] Orinoco driver updates - cleanup low-level code

2005-02-23 Thread David Gibson
Apply some cleanups to the low-level orinoco handling code in
hermes.[ch].  This cleans up some error handling code, corrects an
error code to something more accurate, and also increases a timeout
value.  This last can (when the hardware plays up) cause long delays
with spinlocks held, which is bad, but is rather less prone to
prematurely giving up, which has the unfortunate habit of fatally
confusing the hardware in other ways :-/.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/hermes.c
===
--- working-2.6.orig/drivers/net/wireless/hermes.c  2005-01-12 
15:22:34.263633584 +1100
+++ working-2.6/drivers/net/wireless/hermes.c   2004-11-05 13:59:07.0 
+1100
@@ -383,12 +383,17 @@
reg = hermes_read_reg(hw, oreg);
}
 
-   if (reg & HERMES_OFFSET_BUSY) {
-   return -ETIMEDOUT;
-   }
+   if (reg != offset) {
+   printk(KERN_ERR "hermes @ %p: BAP%d offset %s: "
+  "reg=0x%x id=0x%x offset=0x%x\n", hw->iobase, bap,
+  (reg & HERMES_OFFSET_BUSY) ? "timeout" : "error",
+  reg, id, offset);
+
+   if (reg & HERMES_OFFSET_BUSY) {
+   return -ETIMEDOUT;
+   }
 
-   if (reg & HERMES_OFFSET_ERR) {
-   return -EIO;
+   return -EIO;/* error or wrong offset */
}
 
return 0;
@@ -476,7 +481,7 @@
rlength = hermes_read_reg(hw, dreg);
 
if (! rlength)
-   return -ENOENT;
+   return -ENODATA;
 
rtype = hermes_read_reg(hw, dreg);
 
Index: working-2.6/drivers/net/wireless/hermes.h
===
--- working-2.6.orig/drivers/net/wireless/hermes.h  2005-01-12 
11:13:41.0 +1100
+++ working-2.6/drivers/net/wireless/hermes.h   2004-11-05 13:53:55.0 
+1100
@@ -340,7 +340,7 @@
 #ifdef __KERNEL__
 
 /* Timeouts */
-#define HERMES_BAP_BUSY_TIMEOUT (500) /* In iterations of ~1us */
+#define HERMES_BAP_BUSY_TIMEOUT (1) /* In iterations of ~1us */
 
 /* Basic control structure */
 typedef struct hermes {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist.  NOT _the_ _other_ _way_
| _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [14/14] Orinoco driver updates - update version and changelog

2005-02-23 Thread Jeff Garzik
applied patches 1-14 to netdev-2.6.  We'll let it sit there for a bit, 
for testing and such.  (netdev-2.6 gets auto-propagated to -mm)

Thanks for your patience and perserverance.
Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext2/3 files per directory limits

2005-02-23 Thread Andrew Morton
Ron Peterson <[EMAIL PROTECTED]> wrote:
>
> I would like to better understand ext2/3's performance characteristics.
> 
> I'm specifically interested in how ext2/3 will handle a /var/spool/mail
> directory w/ ~6000 mbox format inboxes, handling approx 1GB delivered as
> 75,000 messages daily.  Virtually all access is via imap, w/ approx
> ~1000 imapd processes running during peak load.  Local delivery is via
> procmail, which by default uses both kernel-supported locking calls and
> .lock files.
> 
> I understand that various tuning parameters will have an impact,
> e.g. putting the journal on a separate device, setting the noatime mount
> option, etc.  I also understand that there are other mailbox formats and
> other strategies for locating mail spools (e.g. in user's home
> directories).
> 
> I'm interested in people's thoughts on these issues, but I'm mostly
> interested in whether or not the scenario I described falls within
> ext2/3's designed capabilities.
> 

noatime will help.

increasing the journal size _may_ help.

With 6k files per directory you'll benefit from indexed directories
(htree).  Use `tune2fs -O dir_index'.  dir_index isn't available for ext2.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.11-rc5

2005-02-23 Thread Linus Torvalds


Hey, I hoped -rc4 was the last one, but we had some laptop resource
conflicts, various ppc TLB flush issues, some possible stack overflows in
networking and a number of other details warranting a quick -rc5 before
the final 2.6.11.

This time it's really supposed to be a quickie, so people who can, please 
check it out, and we'll make the real 2.6.11 asap.

Mostly pretty small changes (the largest is a new SATA driver that crept
in, our bad). But worth another quick round.

Linus



Summary of changes from v2.6.11-rc4 to v2.6.11-rc5


:
  o ppc32: Wrong vaddr in flush_hash_one_pte()

:
  o [libata] add ->bmdma_{stop,status} hooks

Alan Stern:
  o USB Hub driver: Add reset recovery-time delay

Andrew Morton:
  o mca resource layout fix
  o end_buffer_async_read printk ratelimiting
  o strip.c build fix
  o alpha: struct resource fix
  o ppc32: resource layout fixes
  o sparc64 rusage build fix
  o sparc64 usb build fix
  o x86_64: resource layout fix

Anton Blanchard:
  o ppc64: Fix 32bit largepage issue

Antonino Daplas:
  o fbdev: Fix gcc 4.0 compile failure

Arjan van de Ven:
  o Allow heap to be marked executable too

Arnaldo Carvalho de Melo:
  o [TCP]: Fix excessive stack usage resulting in OOPS with 4KSTACKS

Art Haas:
  o [SPARC]:Check prom_getproperty() return value in prom_nodematch()

Bartlomiej Zolnierkiewicz:
  o [ide] fix ide_get_error_location() for LBA28

Ben Dooks:
  o [ARM PATCH] 2480/1: IXP4XX - cleanup resource for i2c controller
  o [ARM PATCH] 2481/1: IXP2000 - replace sti/cli with
local_irq{save,restore}
  o [ide] Kconfig for VR1000 machine driver selection

Benjamin Herrenschmidt:
  o radeonfb: typos fixes
  o radeonfb: Fix hang on boot with some laptops
  o Fix possible race with 4level-fixup.h
  o Check for wraps in copy_page_range
  o Fix buf in zeromap_pud_range() losing virtual address
  o radeonfb: Workaround memory corruption accel problem
  o ppc32: fix ptep_test_and_clear_young
  o ppc32: kernel mapping breakage

Bjorn Helgaas:
  o de214x.c uses uninitialized pci_dev->irq

Bob Breuer:
  o [SPARC]: Check prom_getproperty return value

Brian Murphy:
  o USB: ehci requeue revisit

Christoph Hellwig:
  o block new writers on frozen filesystems

Corey Minyard:
  o IPMI: Fix LAN bridging

Daniel Ritz:
  o PCI: support PCI_PM_CAP version 1

David Brownell:
  o USB: ehci patch for NF4 port miscounting

David S. Miller:
  o [COMPAT]: TUNSETIFF needs to copy back data after ioctl
  o [SPARC]: Fix cg3 fb blanking
  o [SPARC]: Fix video mode probing in atyfb driver
  o [TG3]: Always check tg3_readphy() return value
  o [TG3]: Update driver version and reldate
  o [SPARC64]: auxio_register is pointer not integer
  o [SPARC64]: Put PROM trampolines into asm file
  o [SPARC64]: Fix access_ok() and friends warnings
  o [SPARC64]: Fix access_ok() args in sys_sparc32.c:get_tv32()
  o [SPARC64]: Use common sys_ipc() compat code
  o [SPARC64]: BUG on rediculious memcpy lengths

Dmitry Torokhov:
  o ALPS: do not activate on unsupported models

François Romieu:
  o dscc4: use of uncompletely initialized struct
  o dscc4: code factorisation
  o dscc4: error status checking and pci janitoring
  o dscc4: removal of unneeded casts
  o dscc4: removal of unneeded variable
  o r8169: endianness fixes
  o r8169: merge of Realtek's code
  o r8169: typo in debugging code
  o r8169: screaming irq when the device is closed
  o r8169: synchronization and balancing when the device is closed
  o r8169: fix rx skb allocation error logging
  o r8169: skb alignment nitpicking
  o r8169: removal of unused #define
  o r8169: uniformize comments
  o r8169: IRQ races during change of mtu
  o r8169: factor out some code

Gary N. Spiess:
  o natsemi long cable fix

Herbert Xu:
  o ISDN locking fix
  o [IPSEC]: Move dst->child loop from dst_ifdown to xfrm_dst_ifdown
  o [NET]: Add netdev argument to dst ifdown

Hideaki Yoshifuji:
  o [IPV6]: Fix IPV6_PKTINFO et al. handling in udpv6_recvmsg()

Hirokazu Takata:
  o m32r: build fix for SMP kernel
  o m32r: fix sys_clone()
  o m32r: defconfig updates
  o m32r: warning fix

Jeff Garzik:
  o [libata sata_via] minor cleanups
  o [libata sata_via] add support for VT6421 SATA
  o [libata] do not call pci_disable_device() for certain errors
  o libata kfree fix
  o [libata] Add missing hooks, to avoid oops in advanced SATA drivers

Joe Korty:
  o memset argument order misuses

John W. Linville:
  o libata: fix command queue leak when xlat_func fails

Krzysztof Helt:
  o [SPARC32]: Need to clear PSR_EF in psr of childregs on fork() on
SMP

Len Brown:
  o [ACPI] ACPICA 20050211 from Bob Moore

Lennert Buytenhek:
  o [ARM PATCH] 2485/1: fix enp2611 coexistence with other machine
types
  o [ARM PATCH] 2486/1: fix incorrect comment in
arch/arm/kernel/debug.S
  o [ARM PATCH] 2487/1: minor IRQ routing tweaks for ENP-2611
  o [ARM PATCH] 2493/1: put IXP2000 slowport in 8-bit mode after boot
  o [ARM PATCH] 

Re: [6/14] Orinoco driver updates - cleanup PCI initialization

2005-02-23 Thread Jeff Garzik
FYI, pci_set_drvdata() needs to be one of the last functions called 
during PCI ->probe().

Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [7/14] Orinoco driver updates - use modern module_parm()

2005-02-23 Thread Jeff Garzik
David Gibson wrote:
Add descrptions to module parameters in the orinoco driver, and also
add permissions to allow them to be exported in sysfs.
Signed-off-by: David Gibson <[EMAIL PROTECTED]>
Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-10 
13:19:14.0 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-02-10 13:24:03.0 
+1100
@@ -461,12 +461,14 @@
 /* Level of debugging. Used in the macros in orinoco.h */
 #ifdef ORINOCO_DEBUG
 int orinoco_debug = ORINOCO_DEBUG;
-module_param(orinoco_debug, int, 0);
+module_param(orinoco_debug, int, 0644);
+MODULE_PARM_DESC(orinoco_debug, "Debug level");
 EXPORT_SYMBOL(orinoco_debug);
 #endif
eventually it would be nice to support netif_msg_*
Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[8/14] Orinoco driver updates - PCMCIA initialization cleanups

2005-02-23 Thread David Gibson
Cleanup the various bits of initialization code for PCMCIA / PC-Card
orinoco devices.  This includes one important bugfix where we could
fail to take the lock in some circumstances.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco_cs.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco_cs.c  2005-02-18 
12:04:03.157157240 +1100
+++ working-2.6/drivers/net/wireless/orinoco_cs.c   2005-02-18 
12:11:49.0 +1100
@@ -57,8 +57,8 @@
 /* Some D-Link cards have buggy CIS. They do work at 5v properly, but
  * don't have any CIS entry for it. This workaround it... */
 static int ignore_cis_vcc; /* = 0 */
-
 module_param(ignore_cis_vcc, int, 0);
+MODULE_PARM_DESC(ignore_cis_vcc, "Allow voltage mismatch between card and 
socket");
 
 //
 /* Magic constants */
@@ -128,6 +128,7 @@
if (err)
return err;
 
+   msleep(100);
clear_bit(0, >hard_reset_in_progress);
 
return 0;
@@ -166,9 +167,10 @@
link->priv = dev;
 
/* Interrupt setup */
-   link->irq.Attributes = IRQ_TYPE_EXCLUSIVE;
+   link->irq.Attributes = IRQ_TYPE_EXCLUSIVE | IRQ_HANDLE_PRESENT;
link->irq.IRQInfo1 = IRQ_LEVEL_ID;
-   link->irq.Handler = NULL;
+   link->irq.Handler = orinoco_interrupt;
+   link->irq.Instance = dev; 
 
/* General socket configuration defaults can go here.  In this
 * client, we assume very little, and rely on the CIS for
@@ -184,6 +186,7 @@
dev_list = link;
 
client_reg.dev_info = _info;
+   client_reg.Attributes = INFO_IO_CLIENT | INFO_CARD_SHARE;
client_reg.EventMask =
CS_EVENT_CARD_INSERTION | CS_EVENT_CARD_REMOVAL |
CS_EVENT_RESET_PHYSICAL | CS_EVENT_CARD_RESET |
@@ -309,8 +312,8 @@
cistpl_cftable_entry_t *cfg = &(parse.cftable_entry);
cistpl_cftable_entry_t dflt = { .index = 0 };
 
-   if (pcmcia_get_tuple_data(handle, ) != 0 ||
-   pcmcia_parse_tuple(handle, , ) != 0)
+   if ( (pcmcia_get_tuple_data(handle, ) != 0)
+   || (pcmcia_parse_tuple(handle, , ) != 0))
goto next_entry;
 
if (cfg->flags & CISTPL_CFTABLE_DEFAULT)
@@ -349,8 +352,7 @@
dflt.vpp1.param[CISTPL_POWER_VNOM] / 1;

/* Do we need to allocate an interrupt? */
-   if (cfg->irq.IRQInfo1 || dflt.irq.IRQInfo1)
-   link->conf.Attributes |= CONF_ENABLE_IRQ;
+   link->conf.Attributes |= CONF_ENABLE_IRQ;
 
/* IO window settings */
link->io.NumPorts1 = link->io.NumPorts2 = 0;
@@ -402,14 +404,7 @@
 * a handler to the interrupt, unless the 'Handler' member of
 * the irq structure is initialized.
 */
-   if (link->conf.Attributes & CONF_ENABLE_IRQ) {
-   link->irq.Attributes = IRQ_TYPE_EXCLUSIVE | IRQ_HANDLE_PRESENT;
-   link->irq.IRQInfo1 = IRQ_LEVEL_ID;
-   link->irq.Handler = orinoco_interrupt; 
-   link->irq.Instance = dev; 
-   
-   CS_CHECK(RequestIRQ, pcmcia_request_irq(link->handle, 
>irq));
-   }
+   CS_CHECK(RequestIRQ, pcmcia_request_irq(link->handle, >irq));
 
/* We initialize the hermes structure before completing PCMCIA
 * configuration just in case the interrupt handler gets
@@ -434,8 +429,6 @@
SET_MODULE_OWNER(dev);
card->node.major = card->node.minor = 0;
 
-   /* register_netdev will give us an ethX name */
-   dev->name[0] = '\0';
SET_NETDEV_DEV(dev, _to_dev(handle));
/* Tell the stack we exist */
if (register_netdev(dev) != 0) {
@@ -458,8 +451,7 @@
if (link->conf.Vpp1)
printk(", Vpp %d.%d", link->conf.Vpp1 / 10,
   link->conf.Vpp1 % 10);
-   if (link->conf.Attributes & CONF_ENABLE_IRQ)
-   printk(", irq %d", link->irq.AssignedIRQ);
+   printk(", irq %d", link->irq.AssignedIRQ);
if (link->io.NumPorts1)
printk(", io 0x%04x-0x%04x", link->io.BasePort1,
   link->io.BasePort1 + link->io.NumPorts1 - 1);
@@ -525,12 +517,12 @@
case CS_EVENT_CARD_REMOVAL:
link->state &= ~DEV_PRESENT;
if (link->state & DEV_CONFIG) {
-   orinoco_lock(priv, );
+   unsigned long flags;
 
+   spin_lock_irqsave(>lock, flags);
netif_device_detach(dev);
priv->hw_unavailable++;
-
-   orinoco_unlock(priv, );
+   spin_unlock_irqrestore(>lock, flags);
}

Re: ext2/3 files per directory limits

2005-02-23 Thread Lee Revell
On Wed, 2005-02-23 at 22:11 -0500, Ron Peterson wrote:
> I would like to better understand ext2/3's performance characteristics.
> 
> I'm specifically interested in how ext2/3 will handle a /var/spool/mail
> directory w/ ~6000 mbox format inboxes, handling approx 1GB delivered as
> 75,000 messages daily.  Virtually all access is via imap, w/ approx
> ~1000 imapd processes running during peak load.  Local delivery is via
> procmail, which by default uses both kernel-supported locking calls and
> .lock files.
> 
> I understand that various tuning parameters will have an impact,
> e.g. putting the journal on a separate device, setting the noatime mount
> option, etc.  I also understand that there are other mailbox formats and
> other strategies for locating mail spools (e.g. in user's home
> directories).
> 
> I'm interested in people's thoughts on these issues, but I'm mostly
> interested in whether or not the scenario I described falls within
> ext2/3's designed capabilities.

Yes, ext2 and ext3 can handle that load easily.  You should not have to
do any special tuning.

The real question is why in the world you would want to use mbox format
for this.  It simply does not scale.  Use maildir.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[12/14] Orinoco driver updates - WEP updates

2005-02-23 Thread David Gibson
Updates to the WEP configuration code.  This adds support for shared
key authentication on Agere firmwares.  It also adds support (in some
cases) for changing the WEP keys without disabling the MAC port (thus
triggering a reassociation by the firmware).  This is needed by 802.1x
implementations, although it's not clear if the code so far is
sufficient to allow working 802.1x.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-24 
14:50:55.904651256 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-02-24 14:50:57.300439064 
+1100
@@ -1437,55 +1437,46 @@
return err;
 }
 
-static int __orinoco_hw_setup_wep(struct orinoco_private *priv)
+/* Change the WEP keys and/or the current keys.  Can be called
+ * either from __orinoco_hw_setup_wep() or directly from
+ * orinoco_ioctl_setiwencode().  In the later case the association
+ * with the AP is not broken (if the firmware can handle it),
+ * which is needed for 802.1x implementations. */
+static int __orinoco_hw_setup_wepkeys(struct orinoco_private *priv)
 {
hermes_t *hw = >hw;
int err = 0;
-   int master_wep_flag;
-   int auth_flag;
 
switch (priv->firmware_type) {
-   case FIRMWARE_TYPE_AGERE: /* Agere style WEP */
-   if (priv->wep_on) {
-   err = hermes_write_wordrec(hw, USER_BAP,
-  HERMES_RID_CNFTXKEY_AGERE,
-  priv->tx_key);
-   if (err)
-   return err;
-   
-   err = HERMES_WRITE_RECORD(hw, USER_BAP,
- HERMES_RID_CNFWEPKEYS_AGERE,
- >keys);
-   if (err)
-   return err;
-   }
+   case FIRMWARE_TYPE_AGERE:
+   err = HERMES_WRITE_RECORD(hw, USER_BAP,
+ HERMES_RID_CNFWEPKEYS_AGERE,
+ >keys);
+   if (err)
+   return err;
err = hermes_write_wordrec(hw, USER_BAP,
-  HERMES_RID_CNFWEPENABLED_AGERE,
-  priv->wep_on);
+  HERMES_RID_CNFTXKEY_AGERE,
+  priv->tx_key);
if (err)
return err;
break;
-
-   case FIRMWARE_TYPE_INTERSIL: /* Intersil style WEP */
-   case FIRMWARE_TYPE_SYMBOL: /* Symbol style WEP */
-   master_wep_flag = 0;/* Off */
-   if (priv->wep_on) {
+   case FIRMWARE_TYPE_INTERSIL:
+   case FIRMWARE_TYPE_SYMBOL:
+   {
int keylen;
int i;
 
-   /* Fudge around firmware weirdness */
+   /* Force uniform key length to work around firmware 
bugs */
keylen = le16_to_cpu(priv->keys[priv->tx_key].len);

+   if (keylen > LARGE_KEY_SIZE) {
+   printk(KERN_ERR "%s: BUG: Key %d has oversize 
length %d.\n",
+  priv->ndev->name, priv->tx_key, keylen);
+   return -E2BIG;
+   }
+
/* Write all 4 keys */
for(i = 0; i < ORINOCO_MAX_KEYS; i++) {
-/* int keylen = le16_to_cpu(priv->keys[i].len); */
-   
-   if (keylen > LARGE_KEY_SIZE) {
-   printk(KERN_ERR "%s: BUG: Key %d has 
oversize length %d.\n",
-  priv->ndev->name, i, keylen);
-   return -E2BIG;
-   }
-
err = hermes_write_ltv(hw, USER_BAP,
   
HERMES_RID_CNFDEFAULTKEY0 + i,
   
HERMES_BYTES_TO_RECLEN(keylen),
@@ -1500,27 +1491,63 @@
   priv->tx_key);
if (err)
return err;
-   
-   if (priv->wep_restrict) {
-   auth_flag = 2;
-   master_wep_flag = 3;
-   } else {
-   /* Authentication is where Intersil and Symbol
-* firmware differ... */
- 

Re: PPC RT Patch..

2005-02-23 Thread Frank Rowand
john cooper wrote:
Ingo,
We've had a PPC port of your RT work underway with
a focus on trace instrumentation.  This is based upon
realtime-preempt-2.6.11-rc2-V0.7.37-02.  A diff is
attached.
To the extent possible the tracing facilities are the
same as your x86 work.  In the process a few PPC/gcc
issues needed to be resolved.  There is also a bug fix
contained for tlb_gather_mmu() which was causing debug
assertions to be generated in a path which attempted to
sleep with a non-zero preempt count.
Manish Lachwani mentioned to me that he faced the same issue
with the MIPS RT support and that when he discussed
it with Ingo that the solution was for include/asm-ppc/tlb.h
to include/asm-generic/tlb-simple.h when PREEMPT_RT is turned on.
The patch does this for the #ifdef CONFIG_PPC_STD_MMU case,
but not for the #else case.  I don't know which case is used
for the Ampro board.

This does build and function when SMP is configured,
though we have not yet verified it on other than a
uniprocessor.  As a simplifying assumption, testing has
thus far concentrated on the following modes:
PREEMPT_NONE
- verify baseline regression
PREEMPT_RT && !PREEMPT_SMP
- typical for an embedded RT PPC application
PREEMPT_RT && PREEMPT_SMP
- kicks in live locking code which otherwise receives no
coverage.  This is functionally equivalent to the above
config on a single CPU target thus no MP dynamic testing
is achieved.  Still quite useful IMHO.
The target used for development/testing is an Ampro EnCore PP1
which sports a 300Mhz MPC8245.  For testing this boots with NFS
as root.  An mp3 decode at nice --20 is launched which requires
just under 20% of the CPU to maintain an uninterrupted audio
decode and output.  To this a series of "du -s /" are launched
to soak up excess CPU bandwidth.  Perhaps not rigorous but a
fair sanity check and load for the purpose at hand.
Under these conditions maximum scheduling latencies are seen in
the 120-150us range.  Note no attempt has yet been made to
optimize arch specific paths and full trace instrumentation has
been enabled.
I've written some logging code to help find problems such as
the tlb issue above.  As it has not been made general I've
removed it from this patch.  At some point I'll likely revisit
this.
Comments/suggestions welcome.
I am glad to see the instrumentation and measurement related code
in your patch.  (My patch of last week ("Frank's patch") is lacking
that code.)
Other differences between the two patches are:
arch/ppc/syslib/i8259.c
   Frank neglected to convert i8259_lock to a raw spinlock.
arch/ppc/kernel/signal.c
   John added an enable of irqs in do_signal()  #ifdef CONFIG_PREEMPT_RT
arch/ppc/kernel/traps.c
   John added an enable of irqs and preempt_check_resched() in _exception().
various files
   Frank added the intrusive variable tb_to_us for use by cycles_to_usec()
   and added an ugly #ifdef in cycles_to_usec().
   John hard-coded cpu_khz for one specific board so that no change would
   be needed in cycles_to_usec().
various files
   John has the mmu_gather fix that is described above.
John's patch and Frank's patch are otherwise mostly the same, except for
the differences that result from being based on different kernel
versions.  I am glad to see that because it means that two sets of
eyes have agreed.
Frank's patch may have missed some EXPORT_SYMBOL()s in arch/ppc/lib/locks.c.
I'll check those over again tomorrow.

-john

-Frank
--
Frank Rowand <[EMAIL PROTECTED]>
MontaVista Software, Inc
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[13/14] Orinoco driver updates - update firmware detection

2005-02-23 Thread David Gibson
Update firmware detection code.  This will now reliably detect
Intersil firmwares past verison 1.x, a serious flaw in the previous
code.  It cleans up the code, and reduces the size of the private
structure by using single bits for the various firmware feature flags.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-24 
14:50:57.300439064 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-02-24 14:50:59.879047056 
+1100
@@ -2047,39 +2047,54 @@
 /* Initialization   */
 //
 
-struct sta_id {
+struct comp_id {
u16 id, variant, major, minor;
 } __attribute__ ((packed));
 
-static int determine_firmware_type(struct net_device *dev, struct sta_id 
*sta_id)
+static inline fwtype_t determine_firmware_type(struct comp_id *nic_id)
 {
-   /* FIXME: this is fundamentally broken */
-   unsigned int firmver = ((u32)sta_id->major << 16) | sta_id->minor;
-   
-   if (sta_id->variant == 1)
+   if (nic_id->id < 0x8000)
return FIRMWARE_TYPE_AGERE;
-   else if ((sta_id->variant == 2) &&
-  ((firmver == 0x10001) || (firmver == 0x20001)))
+   else if (nic_id->id == 0x8000 && nic_id->major == 0)
return FIRMWARE_TYPE_SYMBOL;
else
return FIRMWARE_TYPE_INTERSIL;
 }
 
-static void determine_firmware(struct net_device *dev)
+/* Set priv->firmware type, determine firmware properties */
+static int determine_firmware(struct net_device *dev)
 {
struct orinoco_private *priv = netdev_priv(dev);
hermes_t *hw = >hw;
int err;
-   struct sta_id sta_id;
+   struct comp_id nic_id, sta_id;
unsigned int firmver;
char tmp[SYMBOL_MAX_VER_LEN+1];
 
+   /* Get the hardware version */
+   err = HERMES_READ_RECORD(hw, USER_BAP, HERMES_RID_NICID, _id);
+   if (err) {
+   printk(KERN_ERR "%s: Cannot read hardware identity: error %d\n",
+  dev->name, err);
+   return err;
+   }
+
+   le16_to_cpus(_id.id);
+   le16_to_cpus(_id.variant);
+   le16_to_cpus(_id.major);
+   le16_to_cpus(_id.minor);
+   printk(KERN_DEBUG "%s: Hardware identity %04x:%04x:%04x:%04x\n",
+  dev->name, nic_id.id, nic_id.variant,
+  nic_id.major, nic_id.minor);
+
+   priv->firmware_type = determine_firmware_type(_id);
+
/* Get the firmware version */
err = HERMES_READ_RECORD(hw, USER_BAP, HERMES_RID_STAID, _id);
if (err) {
-   printk(KERN_WARNING "%s: Error %d reading firmware info. Wildly 
guessing capabilities...\n",
+   printk(KERN_ERR "%s: Cannot read station identity: error %d\n",
   dev->name, err);
-   memset(_id, 0, sizeof(sta_id));
+   return err;
}
 
le16_to_cpus(_id.id);
@@ -2090,8 +2105,23 @@
   dev->name, sta_id.id, sta_id.variant,
   sta_id.major, sta_id.minor);
 
-   if (! priv->firmware_type)
-   priv->firmware_type = determine_firmware_type(dev, _id);
+   switch (sta_id.id) {
+   case 0x15:
+   printk(KERN_ERR "%s: Primary firmware is active\n",
+  dev->name);
+   return -ENODEV;
+   case 0x14b:
+   printk(KERN_ERR "%s: Tertiary firmware is active\n",
+  dev->name);
+   return -ENODEV;
+   case 0x1f:  /* Intersil, Agere, Symbol Spectrum24 */
+   case 0x21:  /* Symbol Spectrum24 Trilogy */
+   break;
+   default:
+   printk(KERN_NOTICE "%s: Unknown station ID, please report\n",
+  dev->name);
+   break;
+   }
 
/* Default capabilities */
priv->has_sensitivity = 1;
@@ -2107,9 +2137,8 @@
case FIRMWARE_TYPE_AGERE:
/* Lucent Wavelan IEEE, Lucent Orinoco, Cabletron RoamAbout,
   ELSA, Melco, HP, IBM, Dell 1150, Compaq 110/210 */
-   printk(KERN_DEBUG "%s: Looks like a Lucent/Agere firmware "
-  "version %d.%02d\n", dev->name,
-  sta_id.major, sta_id.minor);
+   snprintf(priv->fw_name, sizeof(priv->fw_name) - 1,
+"Lucent/Agere %d.%02d", sta_id.major, sta_id.minor);
 
firmver = ((unsigned long)sta_id.major << 16) | sta_id.minor;
 
@@ -2152,14 +2181,15 @@
tmp[SYMBOL_MAX_VER_LEN] = '\0';
}
 
-   printk(KERN_DEBUG "%s: Looks like a Symbol firmware "
-  "version [%s] (parsing to %X)\n", dev->name,
-  tmp, firmver);
+   

[11/14] Orinoco driver updates - delay Tx wake

2005-02-23 Thread David Gibson
Delay netif_wake_queue() until the packet has actually been
transmitted, rather than just when the firmware has copied it into its
internal buffers.  This seems to prevent problems on some Intersil
firmware versions (I suspect the problems were caused by the
firmware's buffers filling up).

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-18 
12:48:30.523655896 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-02-18 12:58:09.407652152 
+1100
@@ -901,8 +901,6 @@
printk(KERN_WARNING "%s: Allocate event on unexpected 
fid (%04X)\n",
   dev->name, fid);
return;
-   } else {
-   netif_wake_queue(dev);
}
 
hermes_write_regn(hw, ALLOCFID, DUMMY_FID);
@@ -915,6 +913,8 @@
 
stats->tx_packets++;
 
+   netif_wake_queue(dev);
+
hermes_write_regn(hw, TXCOMPLFID, DUMMY_FID);
 }
 
@@ -941,6 +941,7 @@

stats->tx_errors++;
 
+   netif_wake_queue(dev);
hermes_write_regn(hw, TXCOMPLFID, DUMMY_FID);
 }
 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist.  NOT _the_ _other_ _way_
| _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[6/14] Orinoco driver updates - cleanup PCI initialization

2005-02-23 Thread David Gibson
Update the initialization code in the various PCI incarnations of the
orinoco driver.  This applies similar initialization and shutdown
cleanups to the orinoco_pci, orinoco_plx and orinoco_tmd drivers.  It
also adds COR reset support to the orinoco_plx and orinoco_tmd
drivers, improves PCI power management support in the orinoco_pci
driver and adds a couple of extra supported cards to the ID tables.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco_pci.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco_pci.c 2005-01-12 
15:47:48.215477920 +1100
+++ working-2.6/drivers/net/wireless/orinoco_pci.c  2005-01-12 
16:10:57.324301280 +1100
@@ -129,6 +129,11 @@
 #define HERMES_PCI_COR_OFFT(500)   /* ms */
 #define HERMES_PCI_COR_BUSYT   (500)   /* ms */
 
+/* Orinoco PCI specific data */
+struct orinoco_pci_card {
+   void __iomem *pci_ioaddr;
+};
+
 /*
  * Do a soft reset of the PCI card using the Configuration Option Register
  * We need this to get going...
@@ -164,8 +169,9 @@
mdelay(1);
reg = hermes_read_regn(hw, CMD);
}
-   /* Did we timeout ? */
-   if(time_after_eq(jiffies, timeout)) {
+
+   /* Still busy? */
+   if (reg & HERMES_CMD_BUSY) {
printk(KERN_ERR PFX "Busy timeout\n");
return -ETIMEDOUT;
}
@@ -184,6 +190,7 @@
u16 __iomem *pci_ioaddr = NULL;
unsigned long pci_iolen;
struct orinoco_private *priv = NULL;
+   struct orinoco_pci_card *card;
struct net_device *dev = NULL;
 
err = pci_enable_device(pdev);
@@ -192,24 +199,31 @@
return err;
}
 
+   err = pci_request_regions(pdev, DRIVER_NAME);
+   if (err != 0) {
+   printk(KERN_ERR PFX "Cannot obtain PCI resources\n");
+   goto fail_resources;
+   }
+
/* Resource 0 is mapped to the hermes registers */
pci_iorange = pci_resource_start(pdev, 0);
pci_iolen = pci_resource_len(pdev, 0);
pci_ioaddr = ioremap(pci_iorange, pci_iolen);
-   if (! pci_iorange) {
+   if (!pci_iorange) {
printk(KERN_ERR PFX "Cannot remap hardware registers\n");
-   goto fail;
+   goto fail_map;
}
 
/* Allocate network device */
-   dev = alloc_orinocodev(0, NULL);
+   dev = alloc_orinocodev(sizeof(*card), orinoco_pci_cor_reset);
if (! dev) {
err = -ENOMEM;
-   goto fail;
+   goto fail_alloc;
}
 
priv = netdev_priv(dev);
-   dev->base_addr = (unsigned long) pci_ioaddr;
+   card = priv->card;
+   card->pci_ioaddr = pci_ioaddr;
dev->mem_start = pci_iorange;
dev->mem_end = pci_iorange + pci_iolen - 1;
SET_MODULE_OWNER(dev);
@@ -226,14 +240,14 @@
if (err) {
printk(KERN_ERR PFX "Cannot allocate IRQ %d\n", pdev->irq);
err = -EBUSY;
-   goto fail;
+   goto fail_irq;
}
dev->irq = pdev->irq;
 
/* Perform a COR reset to start the card */
-   if(orinoco_pci_cor_reset(priv) != 0) {
+   err = orinoco_pci_cor_reset(priv);
+   if (err) {
printk(KERN_ERR PFX "Initial reset failed\n");
-   err = -ETIMEDOUT;
goto fail;
}
 
@@ -250,16 +264,19 @@
return 0;
 
  fail:
-   if (dev) {
-   if (dev->irq)
-   free_irq(dev->irq, dev);
+   free_irq(pdev->irq, dev);
 
-   free_orinocodev(dev);
-   }
+ fail_irq:
+   pci_set_drvdata(pdev, NULL);
+   free_orinocodev(dev);
+
+ fail_alloc:
+   iounmap(pci_ioaddr);
 
-   if (pci_ioaddr)
-   iounmap(pci_ioaddr);
+ fail_map:
+   pci_release_regions(pdev);
 
+ fail_resources:
pci_disable_device(pdev);
 
return err;
@@ -269,18 +286,14 @@
 {
struct net_device *dev = pci_get_drvdata(pdev);
struct orinoco_private *priv = netdev_priv(dev);
+   struct orinoco_pci_card *card = priv->card;
 
unregister_netdev(dev);
-
-   if (dev->irq)
-   free_irq(dev->irq, dev);
-
-   if (priv->hw.iobase)
-   iounmap(priv->hw.iobase);
-
+   free_irq(dev->irq, dev);
pci_set_drvdata(pdev, NULL);
free_orinocodev(dev);
-
+   iounmap(card->pci_ioaddr);
+   pci_release_regions(pdev);
pci_disable_device(pdev);
 }
 
@@ -312,6 +325,9 @@

orinoco_unlock(priv, );
 
+   pci_save_state(pdev);
+   pci_set_power_state(pdev, 3);
+
return 0;
 }
 
@@ -324,6 +340,9 @@
 
printk(KERN_DEBUG "%s: Orinoco-PCI waking up\n", dev->name);
 
+   pci_set_power_state(pdev, 0);
+   pci_restore_state(pdev);
+
err = orinoco_reinit_firmware(dev);
if (err) {

[14/14] Orinoco driver updates - update version and changelog

2005-02-23 Thread David Gibson
Previous patches have brought the in-kernel orinoco driver roughly to
parity with version 0.14alpha2 from out-of-tree.  Update the version
number and changelog accordingly.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-24 
14:50:59.879047056 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-02-24 14:51:02.388665536 
+1100
@@ -393,6 +393,29 @@
  *   in the rx_dropped statistics.
  * o Provided a module parameter to suppress linkstatus messages.
  *
+ * v0.13e -> v0.14alpha1 - 30 Sep 2003 - David Gibson
+ * o Replaced priv->connected logic with netif_carrier_on/off()
+ *   calls.
+ * o Remove has_ibss_any and never set the CREATEIBSS RID when
+ *   the ESSID is empty.  Too many firmwares break if we do.
+ * o 2.6 merges: Replace pdev->slot_name with pci_name(), remove
+ *   __devinitdata from PCI ID tables, use free_netdev().
+ * o Enabled shared-key authentication for Agere firmware (from
+ *   Robert J. Moore 
+ * o Move netif_wake_queue() (back) to the Tx completion from the
+ *   ALLOC event.  This seems to prevent/mitigate the rolling
+ *   error -110 problems at least on some Intersil firmwares.
+ *   Theoretically reduces performance, but I can't measure it.
+ *   Patch from Andrew Tridgell 
+ *
+ * v0.14alpha1 -> v0.14alpha2 - 20 Oct 2003 - David Gibson
+ * o Correctly turn off shared-key authentication when requested
+ *   (bugfix from Robert J. Moore).
+ * o Correct airport sleep interfaces for current 2.6 kernels.
+ * o Add code for key change without disabling/enabling the MAC
+ *   port.  This is supposed to allow 802.1x to work sanely, but
+ *   doesn't seem to yet.
+ *
  * TODO
  * o New wireless extensions API (patch from Moustafa
  *   Youssef, updated by Jim Carter and Pavel Roskin).
Index: working-2.6/drivers/net/wireless/orinoco.h
===
--- working-2.6.orig/drivers/net/wireless/orinoco.h 2005-02-24 
14:50:59.879047056 +1100
+++ working-2.6/drivers/net/wireless/orinoco.h  2005-02-24 14:51:02.389665384 
+1100
@@ -7,7 +7,7 @@
 #ifndef _ORINOCO_H
 #define _ORINOCO_H
 
-#define DRIVER_VERSION "0.13e"
+#define DRIVER_VERSION "0.14alpha2"
 
 #include 
 #include 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist.  NOT _the_ _other_ _way_
| _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[9/14] Orinoco driver updates - update is_ethersnap()

2005-02-23 Thread David Gibson
Make the is_ethersnap() function take a void * rather than a pointer
to the internal header structure.  This makes more logical sense and
reduces dependencies between different parts of the code.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-24 
14:50:48.426788064 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-02-24 14:50:50.125529816 
+1100
@@ -966,15 +966,17 @@
 
 /* Does the frame have a SNAP header indicating it should be
  * de-encapsulated to Ethernet-II? */
-static inline int is_ethersnap(struct header_struct *hdr)
+static inline int is_ethersnap(void *_hdr)
 {
+   u8 *hdr = _hdr;
+
/* We de-encapsulate all packets which, a) have SNAP headers
 * (i.e. SSAP=DSAP=0xaa and CTRL=0x3 in the 802.2 LLC header
 * and where b) the OUI of the SNAP header is 00:00:00 or
 * 00:00:f8 - we need both because different APs appear to use
 * different OUIs for some reason */
-   return (memcmp(>dsap, _hdr, 5) == 0)
-   && ( (hdr->oui[2] == 0x00) || (hdr->oui[2] == 0xf8) );
+   return (memcmp(hdr, _hdr, 5) == 0)
+   && ( (hdr[5] == 0x00) || (hdr[5] == 0xf8) );
 }
 
 static inline void orinoco_spy_gather(struct net_device *dev, u_char *mac,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist.  NOT _the_ _other_ _way_
| _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[10/14] Orinoco driver updates - prohibit IBSS with no ESSID

2005-02-23 Thread David Gibson
Remove has_ibss_any flag and never set the CREATEIBSS RID when the
ESSID is empty.  Too many firmware break if we do.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-24 
14:50:50.125529816 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-02-24 14:50:53.166067584 
+1100
@@ -1580,21 +1580,26 @@
}
 
if (priv->has_ibss) {
-   err = hermes_write_wordrec(hw, USER_BAP,
-  HERMES_RID_CNFCREATEIBSS,
-  priv->createibss);
-   if (err) {
-   printk(KERN_ERR "%s: Error %d setting CREATEIBSS\n", 
dev->name, err);
-   return err;
-   }
+   u16 createibss;
 
-   if ((strlen(priv->desired_essid) == 0) && (priv->createibss)
-  && (!priv->has_ibss_any)) {
+   if ((strlen(priv->desired_essid) == 0) && (priv->createibss)) {
printk(KERN_WARNING "%s: This firmware requires an "
   "ESSID in IBSS-Ad-Hoc mode.\n", dev->name);
/* With wvlan_cs, in this case, we would crash.
 * hopefully, this driver will behave better...
 * Jean II */
+   createibss = 0;
+   } else {
+   createibss = priv->createibss;
+   }
+   
+   err = hermes_write_wordrec(hw, USER_BAP,
+  HERMES_RID_CNFCREATEIBSS,
+  createibss);
+   if (err) {
+   printk(KERN_ERR "%s: Error %d setting CREATEIBSS\n",
+  dev->name, err);
+   return err;
}
}
 
@@ -2073,7 +2078,6 @@
priv->has_preamble = 0;
priv->has_port3 = 1;
priv->has_ibss = 1;
-   priv->has_ibss_any = 0;
priv->has_wep = 0;
priv->has_big_wep = 0;
 
@@ -2089,7 +2093,6 @@
firmver = ((unsigned long)sta_id.major << 16) | sta_id.minor;
 
priv->has_ibss = (firmver >= 0x60006);
-   priv->has_ibss_any = (firmver >= 0x60010);
priv->has_wep = (firmver >= 0x40020);
priv->has_big_wep = 1; /* FIXME: this is wrong - how do we tell
  Gold cards from the others? */
Index: working-2.6/drivers/net/wireless/orinoco.h
===
--- working-2.6.orig/drivers/net/wireless/orinoco.h 2005-02-24 
14:50:46.549073520 +1100
+++ working-2.6/drivers/net/wireless/orinoco.h  2005-02-24 14:50:53.167067432 
+1100
@@ -57,7 +57,7 @@
 #define FIRMWARE_TYPE_AGERE 1
 #define FIRMWARE_TYPE_INTERSIL 2
 #define FIRMWARE_TYPE_SYMBOL 3
-   int has_ibss, has_port3, has_ibss_any, ibss_port;
+   int has_ibss, has_port3, ibss_port;
int has_wep, has_big_wep;
int has_mwo;
int has_pm;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist.  NOT _the_ _other_ _way_
| _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[4/14] Orinoco driver updates - add free_orinocodev()

2005-02-23 Thread David Gibson
Introduce a free_orinocodev() function into the orinoco driver, used
by the hardware type/initialization modules to free the device
structure in preference to directly calling free_netdev().  At the
moment free_orinocodev() just calls free_netdev().  Future merges will
make it clean up internal scanning state, so merging this now will
reduce the diff noise.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.h
===
--- working-2.6.orig/drivers/net/wireless/orinoco.h 2005-02-18 
12:04:03.0 +1100
+++ working-2.6/drivers/net/wireless/orinoco.h  2005-02-18 12:04:03.0 
+1100
@@ -107,6 +107,7 @@
 
 extern struct net_device *alloc_orinocodev(int sizeof_card,
   int (*hard_reset)(struct 
orinoco_private *));
+extern void free_orinocodev(struct net_device *dev);
 extern int __orinoco_up(struct net_device *dev);
 extern int __orinoco_down(struct net_device *dev);
 extern int orinoco_stop(struct net_device *dev);
Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-18 
12:04:03.0 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-02-18 13:03:51.846593520 
+1100
@@ -2398,6 +2398,11 @@
 
 }
 
+void free_orinocodev(struct net_device *dev)
+{
+   free_netdev(dev);
+}
+
 //
 /* Wireless extensions  */
 //
@@ -4131,6 +4136,7 @@
 //
 
 EXPORT_SYMBOL(alloc_orinocodev);
+EXPORT_SYMBOL(free_orinocodev);
 
 EXPORT_SYMBOL(__orinoco_up);
 EXPORT_SYMBOL(__orinoco_down);
Index: working-2.6/drivers/net/wireless/orinoco_cs.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco_cs.c  2005-02-18 
12:04:03.0 +1100
+++ working-2.6/drivers/net/wireless/orinoco_cs.c   2005-02-18 
12:04:03.0 +1100
@@ -235,7 +235,7 @@
  dev);
unregister_netdev(dev);
}
-   free_netdev(dev);
+   free_orinocodev(dev);
 }  /* orinoco_cs_detach */
 
 /*
Index: working-2.6/drivers/net/wireless/orinoco_pci.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco_pci.c 2005-02-18 
12:04:03.0 +1100
+++ working-2.6/drivers/net/wireless/orinoco_pci.c  2005-02-18 
12:04:03.0 +1100
@@ -254,7 +254,7 @@
if (dev->irq)
free_irq(dev->irq, dev);
 
-   free_netdev(dev);
+   free_orinocodev(dev);
}
 
if (pci_ioaddr)
@@ -279,7 +279,7 @@
iounmap(priv->hw.iobase);
 
pci_set_drvdata(pdev, NULL);
-   free_netdev(dev);
+   free_orinocodev(dev);
 
pci_disable_device(pdev);
 }
Index: working-2.6/drivers/net/wireless/orinoco_plx.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco_plx.c 2005-02-18 
12:04:03.0 +1100
+++ working-2.6/drivers/net/wireless/orinoco_plx.c  2005-02-18 
12:04:03.0 +1100
@@ -279,7 +279,7 @@
  fail:
free_irq(dev->irq, dev);
  fail_irq:
-   free_netdev(dev);
+   free_orinocodev(dev);
  fail_alloc:
pci_iounmap(pdev, mem);
  fail_map:
@@ -304,7 +304,7 @@

pci_set_drvdata(pdev, NULL);
 
-   free_netdev(dev);
+   free_orinocodev(dev);
 
release_region(pci_resource_start(pdev, 3), pci_resource_len(pdev, 3));
 
Index: working-2.6/drivers/net/wireless/orinoco_tmd.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco_tmd.c 2005-02-18 
12:04:03.0 +1100
+++ working-2.6/drivers/net/wireless/orinoco_tmd.c  2005-02-18 
12:04:03.0 +1100
@@ -164,7 +164,7 @@
 out4:
pci_iounmap(pdev, mem);
 out3:
-   free_netdev(dev);
+   free_orinocodev(dev);
 out2:
release_region(pccard_ioaddr, pccard_iolen);
 out:
@@ -188,7 +188,7 @@
 
pci_set_drvdata(pdev, NULL);
 
-   free_netdev(dev);
+   free_orinocodev(dev);
 
release_region(pci_resource_start(pdev, 2), pci_resource_len(pdev, 2));
 
Index: working-2.6/drivers/net/wireless/airport.c
===
--- working-2.6.orig/drivers/net/wireless/airport.c 2005-02-18 
12:04:03.0 +1100
+++ working-2.6/drivers/net/wireless/airport.c  2005-02-18 12:04:03.0 
+1100
@@ -149,7 +149,7 @@
ssleep(1);
 
macio_set_drvdata(mdev, NULL);
-   free_netdev(dev);
+   

[3/14] Orinoco driver updates - use mdelay()/ssleep() more

2005-02-23 Thread David Gibson
Use mdelay() or ssleep() instead of various silly more complicated
ways of delaying in the orinoco driver.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco_pci.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco_pci.c 2005-01-12 
15:13:18.819073992 +1100
+++ working-2.6/drivers/net/wireless/orinoco_pci.c  2005-01-12 
15:15:33.137654464 +1100
@@ -151,19 +151,11 @@
 
/* Assert the reset until the card notice */
hermes_write_regn(hw, PCI_COR, HERMES_PCI_COR_MASK);
-   timeout = jiffies + (HERMES_PCI_COR_ONT * HZ / 1000);
-   while(time_before(jiffies, timeout)) {
-   mdelay(1);
-   }
-   //mdelay(HERMES_PCI_COR_ONT);
+   mdelay(HERMES_PCI_COR_ONT);
 
/* Give time for the card to recover from this hard effort */
hermes_write_regn(hw, PCI_COR, 0x);
-   timeout = jiffies + (HERMES_PCI_COR_OFFT * HZ / 1000);
-   while(time_before(jiffies, timeout)) {
-   mdelay(1);
-   }
-   //mdelay(HERMES_PCI_COR_OFFT);
+   mdelay(HERMES_PCI_COR_OFFT);
 
/* The card is ready when it's no longer busy */
timeout = jiffies + (HERMES_PCI_COR_BUSYT * HZ / 1000);
Index: working-2.6/drivers/net/wireless/orinoco_plx.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco_plx.c 2005-01-12 
15:13:18.821073688 +1100
+++ working-2.6/drivers/net/wireless/orinoco_plx.c  2005-01-12 
15:15:33.138654312 +1100
@@ -356,8 +356,7 @@
 static void __exit orinoco_plx_exit(void)
 {
pci_unregister_driver(_plx_driver);
-   current->state = TASK_UNINTERRUPTIBLE;
-   schedule_timeout(HZ);
+   ssleep(1);
 }
 
 module_init(orinoco_plx_init);
Index: working-2.6/drivers/net/wireless/orinoco_tmd.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco_tmd.c 2005-01-12 
15:13:18.820073840 +1100
+++ working-2.6/drivers/net/wireless/orinoco_tmd.c  2005-01-12 
15:16:05.897674184 +1100
@@ -225,8 +225,7 @@
 static void __exit orinoco_tmd_exit(void)
 {
pci_unregister_driver(_tmd_driver);
-   current->state = TASK_UNINTERRUPTIBLE;
-   schedule_timeout(HZ);
+   ssleep(1);
 }
 
 module_init(orinoco_tmd_init);


-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist.  NOT _the_ _other_ _way_
| _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[0/14] Orinoco driver updates

2005-02-23 Thread David Gibson
Jeff, please apply:

Here's a big stack of patches that make a significant step forward on
the long overdue orinoco driver merge.  Still quite a long way to go,
but it's something.  This patch stack is againt Linus' vanilla +
Viro's big iomap cleanup patch, as requested.

The first 9 patches make only trivial or cosmetic behavioural changes:
1/14orinoco-carrier 
Use netif_carrier_*() macros instead of homegrown
'connected' variable.

2/14orinoco-printks 
Update various printk()s and other cosmetic strings

3/14orinoco-delays
Use mdelay() and ssleep() instead of outdated ways of
delaying.

4/14orinoco-free-orinocodev
Introduce free_orinocodev() function, to reduce noise
in future diffs.

5/14orinoco-cleanup-hermes
Assorted cleanups to low-level hardware access code

6/14orinoco-pci-updates
Cleanup to initialization code for the PCI based
orinoco devices.

7/14orinoco-modparm
Use modern module_parm macros for orinoco module.

8/14orinoco-pccard-cleanups
Cleanup to PCMCIA initialization code

9/14orinoco-void-ethersnap
Trivial change to is_ethersnap() function to reduce
future diff noise.

The next 4 patches start to intoduce real new functionality and
bug fixes:
10/14   orinoco-no-ibss-any
Disallow IBSS mode if no ESSID is set (too many
firmwares break, otherwise)

11/14   orinoco-late-tx-wake
Delay waking the Tx queue, fixes problems on a number
of firwmares

12/14   orinoco-wep-updates
Various updates to WEP setup code

13/14   orinoco-update-firmware-detection
Updates and bugfixes to firmware detection logic

And the final one, is another trivial one:
14/14   orinoco-is-now-0.14alpha2
Update version and changelog to reflect the above
patches.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist.  NOT _the_ _other_ _way_
| _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[1/14] Orinoco driver updates - use netif_carrier_*()

2005-02-23 Thread David Gibson
Removes the orinoco driver's custom and dodgy "connected" variable
used to track whether or not we're associated with an AP.  Replaces it
instead with netif_carrier_ok() settings.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-01-13 
09:48:55.0 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-02-10 14:22:32.179826024 
+1100
@@ -784,7 +784,7 @@
return 1;
}
 
-   if (! priv->connected) {
+   if (! netif_carrier_ok(dev)) {
/* Oops, the firmware hasn't established a connection,
silently drop the packet (this seems to be the
safest approach). */
@@ -1269,6 +1269,7 @@
case HERMES_INQ_LINKSTATUS: {
struct hermes_linkstatus linkstatus;
u16 newstatus;
+   int connected;
 
if (len != sizeof(linkstatus)) {
printk(KERN_WARNING "%s: Unexpected size for linkstatus 
frame (%d bytes)\n",
@@ -1280,15 +1281,14 @@
  len / 2);
newstatus = le16_to_cpu(linkstatus.linkstatus);
 
-   if ( (newstatus == HERMES_LINKSTATUS_CONNECTED)
-|| (newstatus == HERMES_LINKSTATUS_AP_CHANGE)
-|| (newstatus == HERMES_LINKSTATUS_AP_IN_RANGE) )
-   priv->connected = 1;
-   else if ( (newstatus == HERMES_LINKSTATUS_NOT_CONNECTED)
- || (newstatus == HERMES_LINKSTATUS_DISCONNECTED)
- || (newstatus == HERMES_LINKSTATUS_AP_OUT_OF_RANGE)
- || (newstatus == HERMES_LINKSTATUS_ASSOC_FAILED) )
-   priv->connected = 0;
+   connected = (newstatus == HERMES_LINKSTATUS_CONNECTED)
+   || (newstatus == HERMES_LINKSTATUS_AP_CHANGE)
+   || (newstatus == HERMES_LINKSTATUS_AP_IN_RANGE);
+
+   if (connected)
+   netif_carrier_on(dev);
+   else
+   netif_carrier_off(dev);
 
if (newstatus != priv->last_linkstatus)
print_linkstatus(dev, newstatus);
@@ -1366,8 +1366,8 @@
}

/* firmware will have to reassociate */
+   netif_carrier_off(dev);
priv->last_linkstatus = 0x;
-   priv->connected = 0;
 
return 0;
 }
@@ -1878,7 +1878,7 @@
 
priv->hw_unavailable++;
priv->last_linkstatus = 0x; /* firmware will have to reassociate */
-   priv->connected = 0;
+   netif_carrier_off(dev);
 
orinoco_unlock(priv, );
 
@@ -2388,8 +2388,8 @@
   * hardware */
INIT_WORK(>reset_work, (void (*)(void *))orinoco_reset, dev);
 
+   netif_carrier_off(dev);
priv->last_linkstatus = 0x;
-   priv->connected = 0;
 
return dev;
 
Index: working-2.6/drivers/net/wireless/orinoco.h
===
--- working-2.6.orig/drivers/net/wireless/orinoco.h 2004-10-29 
13:16:58.0 +1000
+++ working-2.6/drivers/net/wireless/orinoco.h  2005-02-10 14:22:32.179826024 
+1100
@@ -42,7 +42,6 @@
/* driver state */
int open;
u16 last_linkstatus;
-   int connected;
 
/* Net device stuff */
struct net_device *ndev;


-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist.  NOT _the_ _other_ _way_
| _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext2/3 files per directory limits

2005-02-23 Thread Joel Jaeggli
On Wed, 23 Feb 2005, Ron Peterson wrote:
I would like to better understand ext2/3's performance characteristics.
I'm specifically interested in how ext2/3 will handle a /var/spool/mail
directory w/ ~6000 mbox format inboxes, handling approx 1GB delivered as
75,000 messages daily.  Virtually all access is via imap, w/ approx
~1000 imapd processes running during peak load.  Local delivery is via
procmail, which by default uses both kernel-supported locking calls and
.lock files.
At some point it makes sense to subdivide you mail load because 
serialization of i/o on that one filesystem becomes a bigger issue than 
the performance of your filesystem... We deliver into mbox formatted 
mailboxes inside users homedirs, some folks do a similar thing with 
maildir. In the end you can on make one filesystem so fast. beyond that 
you need more filesystems to acheive any kind of reasonable scaling...

I understand that various tuning parameters will have an impact,
e.g. putting the journal on a separate device, setting the noatime mount
option, etc.  I also understand that there are other mailbox formats and
other strategies for locating mail spools (e.g. in user's home
directories).
I'm interested in people's thoughts on these issues, but I'm mostly
interested in whether or not the scenario I described falls within
ext2/3's designed capabilities.
Best.

--
-- 
Joel Jaeggli  	   Unix Consulting 	   [EMAIL PROTECTED] 
GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mouse still losing sync and thus jumping around

2005-02-23 Thread Dmitry Torokhov
On Wednesday 23 February 2005 22:05, Anthony DiSante wrote:
> Dmitry Torokhov wrote:
> > Yes, It usually happens either under high load, when mouse interrupts are
> > significantly delayed. Or sometimes it happen when applications poll
> > battey status and on some boxes it takes pretty long time. And because
> > it is usually the same chip that serves keyboard/mouse it again delays
> > mouse interrupts.
> 
> I have this problem with recent 2.6.10 kernels too, but it has nothing to do 
> with load in my case; it happens whenever I switch my KVM to the linux box.
> 

Hi Anthony,

This is a bit different problem and we trying to find a reliable solution
for it.

> Long ago and far away, it used to be that switching out of X, then back in 
> (ctrl-alt-F1, then ctrl-alt-F7) would reset the mouse and stop the jumping. 
>   At some point in late 2.4/early 2.6 that stopped working, and the only fix 
> was to unplug the mouse from the KVM switch and re-plug it.
> 
> In Oct 2004 I posted to lkml with subject "KVM -> jumping mouse... still no 
> solution?"  Dmitry Torokhov (hi :) responded that this would work on 
> 2.6.9-rc3+:
> 
>   echo -n "reconnect" > /sys/bus/serio/devices/serioX/driver
> 
> That was GREAT and it worked for a while, but now my last few 2.6.10 kernels 
> don't seem to care when I do that, and again, unplugging the mouse is the 
> only thing that works.  I'm currently running 2.6.10-gentoo-r6.
> 

It still should work fine, but in a bit different form:

echo -n "reconnect" > /sys/bus/serio/devices/serioX/drvctl

I.e. substitute "driver" with "drvctl" as now "driver" is a symlink to
a currently bound driver that is set up by driver core.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] override RLIMIT_SIGPENDING for non-RT signals

2005-02-23 Thread Chris Wright
* Roland McGrath ([EMAIL PROTECTED]) wrote:
> > * Roland McGrath ([EMAIL PROTECTED]) wrote:
> > > Indeed, I think your patch does not go far enough.  I can read POSIX to 
> > > say
> > > that the siginfo_t data must be available when `kill' was used, as well.
> > 
> > How?  I only see reference to filling in SI_USER for rt signals?
> > Just curious...(I've only got SuSv3 and some crusty old POSIX rt docs).
> 
> There is stuff about a SA_SIGINFO signal handler's siginfo_t argument
> "shall contain" the various specified information like si_pid/si_uid values
> for a kill caller.

OK, guess it's odd corner case, since they aren't queued anyway.

> > Good point.  Although it's RLIMIT_SIGPENDING + (31 * user_nprocs).  So
> > that could be 31 * 8k, for example.
> 
> And a "good point" back to you, sir!  I think the right way to think about
> this in terms of resource consumption is that sizeof(struct sigqueue)*31 is
> part of the potential per-process overhead that make up the consumption
> units one should have in mind when choosing how to set the RLIMIT_NPROC limit.

As in dynamic, and work with the patch that you sent to redo default
sigpending as per nproc?

thanks,
-chris
-- 
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


ext2/3 files per directory limits

2005-02-23 Thread Ron Peterson
I would like to better understand ext2/3's performance characteristics.

I'm specifically interested in how ext2/3 will handle a /var/spool/mail
directory w/ ~6000 mbox format inboxes, handling approx 1GB delivered as
75,000 messages daily.  Virtually all access is via imap, w/ approx
~1000 imapd processes running during peak load.  Local delivery is via
procmail, which by default uses both kernel-supported locking calls and
.lock files.

I understand that various tuning parameters will have an impact,
e.g. putting the journal on a separate device, setting the noatime mount
option, etc.  I also understand that there are other mailbox formats and
other strategies for locating mail spools (e.g. in user's home
directories).

I'm interested in people's thoughts on these issues, but I'm mostly
interested in whether or not the scenario I described falls within
ext2/3's designed capabilities.

Best.

-- 
Ron Peterson
Network & Systems Manager
Mount Holyoke College
http://www.mtholyoke.edu/~rpeterso
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] set RLIMIT_SIGPENDING limit based on RLIMIT_NPROC

2005-02-23 Thread Chris Wright
* Roland McGrath ([EMAIL PROTECTED]) wrote:
> While looking into the issues Jeremy had with the RLIMIT_SIGPENDING limit,
> it occurred to me that the normal setting of this limit is bizarrely low.
> The initial hard limit setting (MAX_SIGPENDING) was taken from the old
> max_queued_signals parameter, which was for the entire system in aggregate.
> But even as a per-user limit, the 1024 value is incongruously low for this.

But the old default system-wide limit was 1024.  And you could have
spawned 8k processes then as well.  So I don't think this matters much.

> On my machine, RLIMIT_NPROC allows me 8192 processes, but only 1024 queued
> signals, i.e. fewer even than one pending signal in each process.  (To me,
> this really puts in doubt the sensibility of using a per-user limit for
> this rather than a per-process one, i.e. counted in sighand_struct or
> signal_struct, which could have a much smaller reasonable value.  I don't
> recall the rationale for making this new limit per-user in the first place.)

I don't either, the archives show using per-user as default choice
(never saw a discussion otherwise).  Users can easily queue signals to
themselves (using multiple processes or not), and there was some concern
that somebody actually wanted to be able queue up to 1024 (since it's
what was allowed in the past).

> This patch sets the default RLIMIT_SIGPENDING limit at boot time, using the
> calculation that decides the default RLIMIT_NPROC limit.  This uses the
> same value for those two limits, which I think is still pretty conservative
> on the RLIMIT_SIGPENDING value.

It's an rlimit, so easily setable in userspace at login session time.  I
think we could raise it if people start complaining it's too low (hasn't
seemed to be a problem yet).

thanks,
-chris
-- 
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mouse still losing sync and thus jumping around

2005-02-23 Thread Anthony DiSante
Dmitry Torokhov wrote:
Yes, It usually happens either under high load, when mouse interrupts are
significantly delayed. Or sometimes it happen when applications poll
battey status and on some boxes it takes pretty long time. And because
it is usually the same chip that serves keyboard/mouse it again delays
mouse interrupts.
I have this problem with recent 2.6.10 kernels too, but it has nothing to do 
with load in my case; it happens whenever I switch my KVM to the linux box.

Long ago and far away, it used to be that switching out of X, then back in 
(ctrl-alt-F1, then ctrl-alt-F7) would reset the mouse and stop the jumping. 
 At some point in late 2.4/early 2.6 that stopped working, and the only fix 
was to unplug the mouse from the KVM switch and re-plug it.

In Oct 2004 I posted to lkml with subject "KVM -> jumping mouse... still no 
solution?"  Dmitry Torokhov (hi :) responded that this would work on 2.6.9-rc3+:

echo -n "reconnect" > /sys/bus/serio/devices/serioX/driver
That was GREAT and it worked for a while, but now my last few 2.6.10 kernels 
don't seem to care when I do that, and again, unplugging the mouse is the 
only thing that works.  I'm currently running 2.6.10-gentoo-r6.

-Anthony DiSante
http://nodivisions.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] show RLIMIT_SIGPENDING usage in /proc/PID/status

2005-02-23 Thread Chris Wright
* Roland McGrath ([EMAIL PROTECTED]) wrote:
> > Two questions: 1) This changes the interface for consumers of
> > /proc/[pid]/status data, do we care?  Adding new line like this should be
> > safe enough.
> 
> As far as I can tell, noone fretted about the addition of Threads:,
> ShdPnd:, etc., which were not always there.

Sounds good ;-)

> > 2) Perhaps we should do /proc/[pid]/rlimit/ type dir for each value?
> >This has been asked for before.
> 
> Is the request to see the limit settings, or the current usage, or both?
> What kind of format are you suggesting?  I don't see a need for something
> with a million little files.  Also, for some of the limits the correct
> current usage count is not trivial to ascertain.  (And for others like
> RLIMIT_FSIZE and RLIMIT_CORE, it is of course not meaningful at all.)

Probably just one file per rlimit with usage, cur, max.

thanks,
-chris
-- 
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02

2005-02-23 Thread Lee Revell
On Thu, 2005-02-24 at 13:41 +1100, Nick Piggin wrote:
> Lee Revell wrote:
> > 
> > Agreed, it would be much better to optimize this away than just add a
> > scheduling point.  It seems like we could do this lazily.
> > 
> 
> Oh? What do you mean by lazy? IMO it is sort of implemented lazily now.
> That is, we are too lazy to refcount page table pages in fastpaths, so
> that pushes a lot of work to unmap time. Not necessarily a bad trade-off,
> mind you. Just something I'm looking into.
> 

I guess I was thinking we could be even more lazy, and somehow defer it
until after unmap time (in lieu of memory pressure that is).  Actually
that's kind of what a lock break would do.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: intel8x0: no sound in 2.6.11 rc3 & 4 (fine with 2.6.10)

2005-02-23 Thread Nish Aravamudan
On Wed, 23 Feb 2005 14:31:20 -0500, Bill Davidsen <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> > Hello
> >
> > I have read a post in lkml.org that states that the problem experienced in
> > rc3 has gone (1). That is not the case for me.
> >
> > My audio device is
> >
> > :00:1f.5 Multimedia audio controller: Intel Corp. 82801DB/DBL/DBM 
> > (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01)



> > I have found that I had to Mute __both__ "Headphone Jack Sense" and
> > "Line Jack Sense" in order to ear the audio in rc4.
> 
> I keep seeing this advice, but what tool do you use to mute them? I
> don't see anything like that in alsamixer, aumix, or any other program I
> tried.

I have a T41p with: Multimedia audio controller: Intel Corp.
82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01),
which looks to the be the same as Ulisses' device.

It works fine for me in 2.6.11-rc4 (has worked fine for a while, as
well). I have both a "Headphone Jack Sense" and "Line Jack Sense"
(both set to off) mixer entry in alsamixer. I'm not sure why you're
not seeing these entries...

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.11-rc4 libata-core (irq 30: nobody cared!)

2005-02-23 Thread Brian Kuschak

> Does this patch do anything useful?
> 
>   Jeff
> 

Not really.  It doesn't print the nobody cared
message, but still hangs at boot.  I'd give you a
backtrace but my MAGIC_SYSRQ doesn't seem to be
working right now.

-Brian


Linux version 2.6.11-rc4 ([EMAIL PROTECTED])
(gcc version 3.3.2) #28 Wed Feb 23 18:52:22 PST 2005
Built 1 zonelists
Kernel command line: root=/dev/ram rw ramdisk=36000
console=ttyS0
PID hash table entries: 1024 (order: 10, 16384 bytes)
Console: colour dummy device 80x25
Dentry cache hash table entries: 32768 (order: 5,
131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536
bytes)
Memory: 120832k available (2136k kernel code, 916k
data, 108k init, 0k highmem)
Mount-cache hash table entries: 512 (order: 0, 4096
bytes)
checking if image is initramfs...it isn't (no cpio
magic); looks like an initrd
Freeing initrd memory: 5709k freed
NET: Registered protocol family 16
PCI: Probing PCI hardware
SCSI subsystem initialized
Installing knfsd (copyright (C) 1996
[EMAIL PROTECTED]).
Initializing Cryptographic API
Serial: 8250/16550 driver $Revision: 1.90 $ 6 ports,
IRQ sharing disabled
ttyS0 at MMIO 0x0 (irq = 0) is a 16550A
ttyS1 at MMIO 0x0 (irq = 1) is a 16550A
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
RAMDISK driver initialized: 16 RAM disks of 36000K
size 1024 blocksize
loop: loaded (max 8 devices)
mal0: Initialized, 1 tx channels, 1 rx channels
emac: IBM EMAC Ethernet driver, version 2.0
Maintained by Benjamin Herrenschmidt
<[EMAIL PROTECTED]>
eth0: IBM emac, MAC 08:00:3e:26:15:59
eth0: Found Generic MII PHY (0x06)
Uniform Multi-Platform E-IDE driver Revision:
7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes;
override with idebus=xx
ata1: SATA max UDMA/100 cmd 0xC9002E80 ctl 0xC9002E8A
bmdma 0xC9002E00 irq 30
ata2: SATA max UDMA/100 cmd 0xC9002EC0 ctl 0xC9002ECA
bmdma 0xC9002E08 irq 30
ata1: dev 0 ATA, max UDMA7, 234493056 sectors: lba48
eth0: Link is Up
eth0: Speed: 100, Full duplex.




__ 
Do you Yahoo!? 
Yahoo! Mail - Helps protect you from nasty viruses. 
http://promotions.yahoo.com/new_mail
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] show RLIMIT_SIGPENDING usage in /proc/PID/status

2005-02-23 Thread Roland McGrath
> Two questions: 1) This changes the interface for consumers of
> /proc/[pid]/status data, do we care?  Adding new line like this should be
> safe enough.

As far as I can tell, noone fretted about the addition of Threads:,
ShdPnd:, etc., which were not always there.

> 2) Perhaps we should do /proc/[pid]/rlimit/ type dir for each value?
>This has been asked for before.

Is the request to see the limit settings, or the current usage, or both?
What kind of format are you suggesting?  I don't see a need for something
with a million little files.  Also, for some of the limits the correct
current usage count is not trivial to ascertain.  (And for others like
RLIMIT_FSIZE and RLIMIT_CORE, it is of course not meaningful at all.)


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] override RLIMIT_SIGPENDING for non-RT signals

2005-02-23 Thread Roland McGrath
> * Roland McGrath ([EMAIL PROTECTED]) wrote:
> > Indeed, I think your patch does not go far enough.  I can read POSIX to say
> > that the siginfo_t data must be available when `kill' was used, as well.
> 
> How?  I only see reference to filling in SI_USER for rt signals?
> Just curious...(I've only got SuSv3 and some crusty old POSIX rt docs).

There is stuff about a SA_SIGINFO signal handler's siginfo_t argument
"shall contain" the various specified information like si_pid/si_uid values
for a kill caller.

> Good point.  Although it's RLIMIT_SIGPENDING + (31 * user_nprocs).  So
> that could be 31 * 8k, for example.

And a "good point" back to you, sir!  I think the right way to think about
this in terms of resource consumption is that sizeof(struct sigqueue)*31 is
part of the potential per-process overhead that make up the consumption
units one should have in mind when choosing how to set the RLIMIT_NPROC limit.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02

2005-02-23 Thread Nick Piggin
Lee Revell wrote:
On Thu, 2005-02-24 at 12:29 +1100, Nick Piggin wrote:
Lee Revell wrote:
IIRC last time I really tested this a few months ago, the worst case
latency on that machine was about 150us.  Currently its 422us from the
same clear_page_range code path.
Well it should be pretty trivial to add a break in there.
I don't think it can get into 2.6.11 at this point though,
so we'll revisit this for 2.6.12 if the clear_page_range
optimisations don't get anywhere.

Agreed, it would be much better to optimize this away than just add a
scheduling point.  It seems like we could do this lazily.
Oh? What do you mean by lazy? IMO it is sort of implemented lazily now.
That is, we are too lazy to refcount page table pages in fastpaths, so
that pushes a lot of work to unmap time. Not necessarily a bad trade-off,
mind you. Just something I'm looking into.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Andrew Morton
"Chad N. Tindel" <[EMAIL PROTECTED]> wrote:
>
>  We have hit a defect where an exiting xterm process will hang.  This is 
> running
>  on a 2-cpu IA-64 box.  We have a multithreaded application, where one thread
>  is SCHED_FIFO and is running with priority 98, and the other thread is just
>  a normal SCHED_OTHER thread.  The SCHED_FIFO thread is in a CPU bound tight
>  loop, but I wouldn't expect that to cause since there are 2 CPUs.  
> 
>  However, it does seem to cause some problems.  For example, if you ssh into
>  the system and run an Xterm using X11 forwarding, when you type "exit" in
>  the xterm window, the window hangs and doesn't close.  Killing the CPU-bound
>  app causes the window to exit immediately.  The sysrq output shows the 
>  following:
> 
>  xterm D a001000bef60 0  2905   2876 
> (NOTLB)
> 
>  Call Trace:
>   [] schedule+0xca0/0x1300
>  sp=e00012257d20 bsp=e00012251080
>   [] flush_cpu_workqueue+0x1a0/0x4a0
>  sp=e00012257d30 bsp=e00012251020
>   [] flush_workqueue+0x100/0x160
>  sp=e00012257d90 bsp=e00012250fe8
>   [] flush_scheduled_work+0x20/0x40
>  sp=e00012257d90 bsp=e00012250fd0
>   [] release_dev+0x8e0/0x1100
>  sp=e00012257d90 bsp=e00012250f20
>   [] tty_release+0x30/0x60
>  sp=e00012257e30 bsp=e00012250ef8
>   [] __fput+0x330/0x340
>  sp=e00012257e30 bsp=e00012250ea8
>   [] fput+0x40/0x60
>  sp=e00012257e30 bsp=e00012250e88
>   [] filp_close+0xd0/0x160
>  sp=e00012257e30 bsp=e00012250e58
>   [] sys_close+0x140/0x1a0
>  sp=e00012257e30 bsp=e00012250dd8
>   [] ia64_ret_from_syscall+0x0/0x20
>  sp=e00012257e30 bsp=e00012250dd8
> 
>  So it would appear that xterm is hung in close() trying to shutdown a tty.
>  The comment says that is calling flush_scheduled_work() to 
>  "Wait for ->hangup_work and ->flip.work handlers to terminate".  Perhaps 
> there
>  is some locking issue that is causing these to not run and complete?

`xterm' is waiting for the other CPU to schedule a kernel thread (which is
bound to that CPU).  Once that kernel thread has done a little bit of work,
`xterm' can terminate.

But kernel threads don't run with realtime policy, so your userspace app
has permanently starved that kernel thread.

It's potentially quite a problem, really.  For example it could prevent
various tty operations from completing, it will prevent kjournald from ever
writing back anything (on uniprocessor, etc).  I've been waiting for
someone to complain ;)

But the other side of the coin is that a SCHED_FIFO userspace task
presumably has extreme latency requirements, so it doesn't *want* to be
preempted by some routine kernel operation.  People would get irritated if
we were to do that.

So what to do?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.11-rc4 libata-core (irq 30: nobody cared!)

2005-02-23 Thread Jeff Garzik
BTW, please CC your replies to linux-ide@vger.kernel.org as well.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: 2.6.11-rc4 libata-core (irq 30: nobody cared!)

2005-02-23 Thread Jeff Garzik
Does this patch do anything useful?
Jeff

= drivers/scsi/sata_sil.c 1.44 vs edited =
--- 1.44/drivers/scsi/sata_sil.c2005-02-17 19:43:51 -05:00
+++ edited/drivers/scsi/sata_sil.c  2005-02-23 21:27:18 -05:00
@@ -65,6 +65,7 @@
 static u32 sil_scr_read (struct ata_port *ap, unsigned int sc_reg);
 static void sil_scr_write (struct ata_port *ap, unsigned int sc_reg, u32 val);
 static void sil_post_set_mode (struct ata_port *ap);
+static void sil_tf_load(struct ata_port *ap, struct ata_taskfile *tf);
 
 static struct pci_device_id sil_pci_tbl[] = {
{ 0x1095, 0x3112, PCI_ANY_ID, PCI_ANY_ID, 0, 0, sil_3112 },
@@ -130,7 +131,7 @@
 static struct ata_port_operations sil_ops = {
.port_disable   = ata_port_disable,
.dev_config = sil_dev_config,
-   .tf_load= ata_tf_load,
+   .tf_load= sil_tf_load,
.tf_read= ata_tf_read,
.check_status   = ata_check_status,
.exec_command   = ata_exec_command,
@@ -197,6 +198,69 @@
 MODULE_LICENSE("GPL");
 MODULE_DEVICE_TABLE(pci, sil_pci_tbl);
 MODULE_VERSION(DRV_VERSION);
+
+static void sil_irq_enable(struct ata_port *ap, int disable)
+{
+   void __iomem *mmio = ap->host_set->mmio_base;
+   u32 tmp, new;
+   u32 bit = 1 << (22 + ap->port_no);
+
+   tmp = readl(mmio + SIL_SYSCFG);
+   if (disable)
+   new = tmp | bit;
+   else
+   new = tmp & ~bit;
+   if (new != tmp)
+   writel(new, mmio + SIL_SYSCFG);
+}
+
+static void sil_tf_load(struct ata_port *ap, struct ata_taskfile *tf)
+{
+   struct ata_ioports *ioaddr = >ioaddr;
+   unsigned int is_addr = tf->flags & ATA_TFLAG_ISADDR;
+
+   if (tf->ctl != ap->last_ctl) {
+   sil_irq_enable(ap, tf->ctl & ATA_NIEN);
+   writeb(tf->ctl, (void __iomem *) ap->ioaddr.ctl_addr);
+   ap->last_ctl = tf->ctl;
+   ata_wait_idle(ap);
+   }
+
+   if (is_addr && (tf->flags & ATA_TFLAG_LBA48)) {
+   writeb(tf->hob_feature, (void __iomem *) ioaddr->feature_addr);
+   writeb(tf->hob_nsect, (void __iomem *) ioaddr->nsect_addr);
+   writeb(tf->hob_lbal, (void __iomem *) ioaddr->lbal_addr);
+   writeb(tf->hob_lbam, (void __iomem *) ioaddr->lbam_addr);
+   writeb(tf->hob_lbah, (void __iomem *) ioaddr->lbah_addr);
+   VPRINTK("hob: feat 0x%X nsect 0x%X, lba 0x%X 0x%X 0x%X\n",
+   tf->hob_feature,
+   tf->hob_nsect,
+   tf->hob_lbal,
+   tf->hob_lbam,
+   tf->hob_lbah);
+   }
+
+   if (is_addr) {
+   writeb(tf->feature, (void __iomem *) ioaddr->feature_addr);
+   writeb(tf->nsect, (void __iomem *) ioaddr->nsect_addr);
+   writeb(tf->lbal, (void __iomem *) ioaddr->lbal_addr);
+   writeb(tf->lbam, (void __iomem *) ioaddr->lbam_addr);
+   writeb(tf->lbah, (void __iomem *) ioaddr->lbah_addr);
+   VPRINTK("feat 0x%X nsect 0x%X lba 0x%X 0x%X 0x%X\n",
+   tf->feature,
+   tf->nsect,
+   tf->lbal,
+   tf->lbam,
+   tf->lbah);
+   }
+
+   if (tf->flags & ATA_TFLAG_DEVICE) {
+   writeb(tf->device, (void __iomem *) ioaddr->device_addr);
+   VPRINTK("device 0x%X\n", tf->device);
+   }
+
+   ata_wait_idle(ap);
+}
 
 static void sil_post_set_mode (struct ata_port *ap)
 {


Re: [PATCH] show RLIMIT_SIGPENDING usage in /proc/PID/status

2005-02-23 Thread Chris Wright
* Roland McGrath ([EMAIL PROTECTED]) wrote:
> Jeremy mentioned the aggravation of not being able to tell when your
> processes are using up signal queue entries and hitting the
> RLIMIT_SIGPENDING limit.  This patch adds a line to /proc/PID/status
> showing how many queue items are in use, and allowed, for your uid.

Two questions:  1) This changes the interface for consumers of
/proc/[pid]/status data, do we care?  Adding new line like this should
be safe enough. 2) Perhaps we should do /proc/[pid]/rlimit/ type dir
for each value?  This has been asked for before.

thanks,
-chris
-- 
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] override RLIMIT_SIGPENDING for non-RT signals

2005-02-23 Thread Chris Wright
* Roland McGrath ([EMAIL PROTECTED]) wrote:
> Indeed, I think your patch does not go far enough.  I can read POSIX to say
> that the siginfo_t data must be available when `kill' was used, as well.

How?  I only see reference to filling in SI_USER for rt signals?
Just curious...(I've only got SuSv3 and some crusty old POSIX rt docs).

> This patch makes it allocate the siginfo_t, even when that exceeds
> {RLIMIT_SIGPENDING}, for any non-RT signal (< SIGRTMIN) not sent by
> sigqueue (actually, any signal that couldn't have been faked by a sigqueue
> call).  Of course, in an extreme memory shortage situation, you are SOL and
> violate POSIX a little before you die horribly from being out of memory 
> anyway.

> The LEGACY_QUEUE logic already ensures that, for non-RT signals, at most
> one is ever on the queue.  So there really is no risk at all of unbounded
> resource consumption; the usage can reach {RLIMIT_SIGPENDING} + 31, is all.

Good point.  Although it's RLIMIT_SIGPENDING + (31 * user_nprocs).  So
that could be 31 * 8k, for example.

thanks,
-chris
-- 
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02

2005-02-23 Thread Lee Revell
On Thu, 2005-02-24 at 12:29 +1100, Nick Piggin wrote:
> Lee Revell wrote:
> > 
> > IIRC last time I really tested this a few months ago, the worst case
> > latency on that machine was about 150us.  Currently its 422us from the
> > same clear_page_range code path.
> > 
> Well it should be pretty trivial to add a break in there.
> I don't think it can get into 2.6.11 at this point though,
> so we'll revisit this for 2.6.12 if the clear_page_range
> optimisations don't get anywhere.
> 

Agreed, it would be much better to optimize this away than just add a
scheduling point.  It seems like we could do this lazily.

IMHO it's not critical that these latency fixes be merged until the VP
feature gets merged, until then people will be using Ingo's patches
anyway.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] set RLIMIT_SIGPENDING limit based on RLIMIT_NPROC

2005-02-23 Thread Roland McGrath
While looking into the issues Jeremy had with the RLIMIT_SIGPENDING limit,
it occurred to me that the normal setting of this limit is bizarrely low.
The initial hard limit setting (MAX_SIGPENDING) was taken from the old
max_queued_signals parameter, which was for the entire system in aggregate.
But even as a per-user limit, the 1024 value is incongruously low for this.
On my machine, RLIMIT_NPROC allows me 8192 processes, but only 1024 queued
signals, i.e. fewer even than one pending signal in each process.  (To me,
this really puts in doubt the sensibility of using a per-user limit for
this rather than a per-process one, i.e. counted in sighand_struct or
signal_struct, which could have a much smaller reasonable value.  I don't
recall the rationale for making this new limit per-user in the first place.)

This patch sets the default RLIMIT_SIGPENDING limit at boot time, using the
calculation that decides the default RLIMIT_NPROC limit.  This uses the
same value for those two limits, which I think is still pretty conservative
on the RLIMIT_SIGPENDING value.


Thanks,
Roland


Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>

--- linux-2.6/include/asm-generic/resource.h
+++ linux-2.6/include/asm-generic/resource.h
@@ -51,7 +51,7 @@
[RLIMIT_MEMLOCK]= {   MLOCK_LIMIT,   MLOCK_LIMIT }, \
[RLIMIT_AS] = { RLIM_INFINITY, RLIM_INFINITY }, \
[RLIMIT_LOCKS]  = { RLIM_INFINITY, RLIM_INFINITY }, \
-   [RLIMIT_SIGPENDING] = { MAX_SIGPENDING, MAX_SIGPENDING },   \
+   [RLIMIT_SIGPENDING] = { 0, 0 }, \
[RLIMIT_MSGQUEUE]   = { MQ_BYTES_MAX, MQ_BYTES_MAX },   \
 }
 
--- linux-2.6/include/linux/signal.h
+++ linux-2.6/include/linux/signal.h
@@ -8,8 +8,6 @@
 
 #ifdef __KERNEL__
 
-#define MAX_SIGPENDING 1024
-
 /*
  * Real Time signals may be queued.
  */
--- linux-2.6/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -129,6 +129,8 @@ void __init fork_init(unsigned long memp
 
init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
+   init_task.signal->rlim[RLIMIT_SIGPENDING] =
+   init_task.signal->rlim[RLIMIT_NPROC];
 }
 
 static struct task_struct *dup_task_struct(struct task_struct *orig)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: memory management weirdness

2005-02-23 Thread Parag Warudkar
On Tuesday 22 February 2005 04:57 am, Martin MOKREJŠ wrote:
> The 3GB labeled file corresponds to fast case, 4GB is ugly slow.
> What can you gather from those files?
I did take a look and didn't analyze it further since Andi Mentioned it is a 
known BIOS bug.
Sorry about the trouble - didn't imagine it might be  BIOS related. Generally 
speaking it helps to have profile available when things are going slow.

Parag
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc4-mm1 (VFS: Cannot open root device "301")

2005-02-23 Thread Matt Mackall
On Thu, Feb 24, 2005 at 03:03:33AM +0100, Benoit Boissinot wrote:
> On Wed, 23 Feb 2005 16:41:59 -0800, Matt Mackall <[EMAIL PROTECTED]> wrote:
> > On Wed, Feb 23, 2005 at 04:16:53PM -0800, Andrew Morton wrote:
> > > Steven Cole <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Yes, that worked.  2.6.11-rc4-mm1 now boots OK, but hdb1 seems to be
> > > > > missing.
> > >
> > > Looking at the IDE update in rc4-mm1:
> > >
> > > +void ide_init_disk(struct gendisk *disk, ide_drive_t *drive)
> > > +{
> > > + ide_hwif_t *hwif = drive->hwif;
> > > + unsigned int unit = drive->select.all & (1 << 4);
> > > +
> 
> If i grep in the tree, for select.all, it looks like from the initialization
> that you can not recover the unit from select.all (ide.c line 235 and 1882)
> since the function used is not invertible.

They're fine, if a bit ugly. Unit is either 0 or 1. So:

  (unit<<4) | 0xa0

is equivalent to unit * 16 as the mask won't mask off any bits.
 
> > >
> > > Could someone try this?
> > >
> > > - unsigned int unit = drive->select.all & (1 << 4);
> > > + unsigned int unit = (drive->select.all >> 4) & 1;
> > 
> > Apparently there's already an 'hdb' sitting in drive->name, perhaps we
> > ought to do disk->disk_name = drive->name for the non-devfs case.
> >
> init_hwif_default initialized it right.
> 
> Could something like this work ?

No, because they're arrays and not pointers. I've booted with the
obvious strcpy, works fine.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-23 Thread Paul Jackson
Jay wrote:
> I think the microbenchmarking your link provides is irrelevant.

In the cases such as you describe where it's just some sort of empty
function call, then yes, I am willing to accept a wave of the hands and
a simple explanation of how it's not significant.  I've done the same
myself ;).

What about the case where accounting is enabled, and thus actually has
to do work?

How does that compare with just doing the traditional BSD accounting?

I presume in that case that the benchmarking is no longer irrelevant.
Though if you can make a decent case that it is, I'm willing to listen.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] show RLIMIT_SIGPENDING usage in /proc/PID/status

2005-02-23 Thread Roland McGrath
Jeremy mentioned the aggravation of not being able to tell when your
processes are using up signal queue entries and hitting the
RLIMIT_SIGPENDING limit.  This patch adds a line to /proc/PID/status
showing how many queue items are in use, and allowed, for your uid.

I can certainly see the appeal of having a display of the number of queued
items specific to each process, and even the items within the process
broken down per signal number.  However, those are not things that are
directly counted, and ascertaining them requires iterating through the
queue.  This patch instead gives what can be readily determined in constant
time using the accounting already done.  I'm not sure something more
complex is warranted just to facilitate one particular debugging need.
With this, you can see quickly that this particular problem has come up.
Then examination of each process's SigPnd/ShdPnd lines ought to give you an
indication of which processes have any queued RT signals sitting around for
a long time, and you can then attack those programs directly, though there
is no way after the fact to determine how many queued signals with the same
number a given process has (short of killing it and seeing the usage drop).

Note you may still have a mystery if the leaking programs are not leaving
pending RT signals queued, but rather preallocating queue items via
timer_create.  That usage is not readily apparent in any /proc information.


Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>

--- linux-2.6/fs/proc/array.c
+++ linux-2.6/fs/proc/array.c
@@ -239,6 +239,7 @@ static inline char * task_sig(struct tas
 {
sigset_t pending, shpending, blocked, ignored, caught;
int num_threads = 0;
+   unsigned long qsize = 0, qlim = 0;
 
sigemptyset();
sigemptyset();
@@ -255,11 +256,14 @@ static inline char * task_sig(struct tas
blocked = p->blocked;
collect_sigign_sigcatch(p, , );
num_threads = atomic_read(>signal->count);
+   qsize = atomic_read(>user->sigpending);
+   qlim = p->signal->rlim[RLIMIT_SIGPENDING].rlim_cur;
spin_unlock_irq(>sighand->siglock);
}
read_unlock(_lock);
 
buffer += sprintf(buffer, "Threads:\t%d\n", num_threads);
+   buffer += sprintf(buffer, "SigQ:\t%lu/%lu\n", qsize, qlim);
 
/* render them all */
buffer = render_sigset_t("SigPnd:\t", , buffer);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ppc32 weirdness with gcc-4.0 in 2.6.11-rc4

2005-02-23 Thread Benjamin Herrenschmidt
> -Memory: 255872k available (1788k kernel code, 976k data, 144k init, 0k 
> highmem)
> +Memory: 255872k available (1776k kernel code, 0k data, 144k init, 0k highmem)

That is weird... (0k data)

> AGP special page: 0xc000
>  Calibrating delay loop... 830.66 BogoMIPS (lpj=4153344)
>  Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
> @@ -132,13 +132,7 @@
>  VFS: Mounted root (ext3 filesystem) readonly.
>  Freeing unused kernel memory: 144k init 4k chrp 8k prep
>  usb 3-2: new full speed USB device using ohci_hcd and address 2
> -hub 3-2:1.0: USB hub found
> -hub 3-2:1.0: 3 ports detected
> -usb 3-2.1: new low speed USB device using ohci_hcd and address 3
> -input: USB HID v1.10 Mouse [Logitech Apple Optical USB Mouse] on 
> usb-0001:10:1b.0-2.1
> -usb 3-2.3: new full speed USB device using ohci_hcd and address 4
> -input: USB HID v1.10 Keyboard [Mitsumi Electric Apple Extended USB Keyboard] 
> on usb-0001:10:1b.0-2.3
> -input: USB HID v1.10 Device [Mitsumi Electric Apple Extended USB Keyboard] 
> on usb-0001:10:1b.0-2.3
> +usb 3-2: can't connect bus-powered hub to this port
>  EXT3 FS on hda5, internal journal
>  Adding 1048568k swap on /dev/hda3.  Priority:-1 extents:1
>  SCSI subsystem initialized
> 
> Note: "Memory: ... 0k data ..." !? Surely that can't be correct.

Not sure what's up, but it's probably something beeing miscompiled. Can
you check if the udelay/medlay loops are correct ?

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: uninterruptible sleep lockups

2005-02-23 Thread Bodo Eggert
On Wed, 23 Feb 2005, linux-os wrote:
> On Wed, 23 Feb 2005, Bodo Eggert wrote:
> > linux-os <[EMAIL PROTECTED]> wrote:

> >> You don't seem to understand. A process that's stuck in 'D' state
> >> shows a SEVERE error, usually with a hardware driver.
> >
> > Or a network filesystem mount to a no longer existing server or share.
> 
> But that's a whole different problem. That's a systemic problem
> of "fail-over". Network file-systems really need to interface
> with an intermediate virtual device that can isolate failed
> systems and make them look "perfect" to individual machines.
> 
> If you don't do this, then as soon as somebody trips over a
> wire, your database is trashed. I'm surprised that NFS, PCNFS,
> SMB, etc., actually work as well as everybody seems to
> think they do. Until the architectural problem is resolved,
> there are still going to be hung processes, trashed databases,
> etc.

You don't run databases over a network filesystem unless you're begging
for trouble. For the other common purposes you'll usurally get a more
stable behaviour, since the failure on the client won't prevent the server
from properly writing the metadata or flushing the cache.

> > How to clean up the stuck processes: (This requires a MMU)
> > Add an error path to each syscall (or create some generic error paths) and
> > keep the original stack frame. On errors, you can "longjump" (not exactly,
> > but similar) to the error path after copying the memory. The semaphore will
> > not be taken, and the code depending on the semaphore will not be executed.
> >
> 
> Again, you are attacking the symptom. The problem could be resolved
> by using a local disk (or a disk file) for the immediate I/O and
> the I/O to the file-servers could occur whenever they are available.

a) There are systems without local storage.

b) It won't help while stat()ing a non-cached object.

c) This would involve race conditions for e.g. two disconnected nodes on
   reconnect. AFAI can see, this race can be solved by:
 c1) The final transaction must be delayed until it's ACKed or 
 NACKed. This may delay the D-State for some seconds, but not enough.
 c2) The server will have to keep track of the clients and need to be told
 when a user left for a trip to the south pole without unmounting. 
 Very undesirable.
 c3) Ignoring. Very, very undesireable.
 c4) Requiring explicit transaction handling by the applications.  
 Interesting, but not in the near future.

d) This won't allow synchronous updates without falling back to classic 
   handling.

e) The users will update some files, get a positive reply and shut down
   their PCs before the changes can be commited to the server. If the
   server will not come back or the client is not rebooted within
   reasonable time, this will cause silent data loss.

f) This will require reliable identification of the network server.

g) I'm not only thinking of NFS/..., allthough I used it as _the_ example. 
   E.g. if you see your IDE drive failing, you'll want to declare it dead
   instead of waiting $num_of_sectors times five minutes until the kernel
   decides to give up.


I agree that most D-states are problems that need to be fixed instead of
being worked-around, but sometimes you can't fix the problem without
access to the crystal-ball-device. Therefore all devices that can block
will need a manual override (with different probability), and the
processes that were stuck will need a way to recover or be stuck forever.

Obvoiusly the system is healthy enough to do some important and
uninterruptible work after those errors occured, so having them stuck will
be OK for now. Instead, the next task might be freeing the file
descriptors preventing you from unmounting your removable media or network
share or allowing really-forced umount.

-- 
Top 100 things you don't want the sysadmin to say:
54. Uh huh.."nu -k $USER".. no problemsure thing...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc4-mm1 (VFS: Cannot open root device "301")

2005-02-23 Thread Benoit Boissinot
On Wed, 23 Feb 2005 16:41:59 -0800, Matt Mackall <[EMAIL PROTECTED]> wrote:
> On Wed, Feb 23, 2005 at 04:16:53PM -0800, Andrew Morton wrote:
> > Steven Cole <[EMAIL PROTECTED]> wrote:
> > >
> > > > Yes, that worked.  2.6.11-rc4-mm1 now boots OK, but hdb1 seems to be
> > > > missing.
> >
> > Looking at the IDE update in rc4-mm1:
> >
> > +void ide_init_disk(struct gendisk *disk, ide_drive_t *drive)
> > +{
> > + ide_hwif_t *hwif = drive->hwif;
> > + unsigned int unit = drive->select.all & (1 << 4);
> > +

If i grep in the tree, for select.all, it looks like from the initialization
that you can not recover the unit from select.all (ide.c line 235 and 1882)
since the function used is not invertible.

> >
> > Could someone try this?
> >
> > - unsigned int unit = drive->select.all & (1 << 4);
> > + unsigned int unit = (drive->select.all >> 4) & 1;
> 
> Apparently there's already an 'hdb' sitting in drive->name, perhaps we
> ought to do disk->disk_name = drive->name for the non-devfs case.
>
init_hwif_default initialized it right.

Could something like this work ?

regards,

Benoit
--- linux/drivers/ide/ide-probe.c	2005-02-23 12:16:32.0 +0100
+++ linux-test/drivers/ide/ide-probe.c	2005-02-24 03:02:06.0 +0100
@@ -1269,11 +1269,11 @@ EXPORT_SYMBOL_GPL(ide_unregister_region)
 void ide_init_disk(struct gendisk *disk, ide_drive_t *drive)
 {
 	ide_hwif_t *hwif = drive->hwif;
-	unsigned int unit = drive->select.all & (1 << 4);
+	unsigned int unit = drive->name[2] - 'a' - hwif->index * MAX_DRIVES;
 
 	disk->major = hwif->major;
 	disk->first_minor = unit << PARTN_BITS;
-	sprintf(disk->disk_name, "hd%c", 'a' + hwif->index * MAX_DRIVES + unit);
+	disk->disk_name = drive->name;
 	disk->queue = drive->queue;
 }
 


Re: 2.6.11-rc4 libata-core (irq 30: nobody cared!)

2005-02-23 Thread Brian Kuschak
Retry... that patch got screwed up in the last
email...
-Brian




__ 
Do you Yahoo!? 
Yahoo! Mail - Helps protect you from nasty viruses. 
http://promotions.yahoo.com/new_mail--- libata-core.c.orig  2005-02-23 17:41:03.831836464 -0800
+++ libata-core.c   2005-02-23 17:54:51.287044152 -0800
@@ -3158,6 +3158,11 @@
if (qc && (!(qc->tf.ctl & ATA_NIEN))) {
handled |= ata_host_intr(ap, qc);
}
+   else {
+   /* bk - just ack spurious interrupt here - temp 
workaround */
+   ata_irq_ack(ap, 0); 
+   printk(KERN_WARNING "ata%d: irq trap\n", 
ap->id);
+   }
}
}
 


2.6.11-rc4 libata-core (irq 30: nobody cared!)

2005-02-23 Thread Brian Kuschak
I see this problem with the sata_sil.c driver and
SII3112 card.  Others have reported seeing a similar
problem:  http://lkml.org/lkml/2005/2/6/41

There seems to be a pending interrupt from the drive,
but the code has already set the NIEN bit, so the
ATA_IRQ_TRAP macro doesn't help (the ata_interrupt
handler never calls ata_host_intr in this case).

I've implemented a quick workaround hack, but others
should investigate a better fix (maybe acking pending
interrupts before setting NIEN bit in ata_tf_load??)

Regards,
Brian

--- libata-core.c.orig  2005-02-23 17:41:03.831836464
-0800
+++ libata-core.c   2005-02-23 17:31:07.930427248
-0800
@@ -3158,6 +3158,11 @@
if (qc && (!(qc->tf.ctl &
ATA_NIEN))) {
handled |=
ata_host_intr(ap, qc);
}
+   else {
+   /* bk - just ack
spurious interrupt here - temp workaround */
+   ata_irq_ack(ap, 0);
+   printk(KERN_WARNING
"ata%d: irq trap\n", ap->id);
+   }
}
}


Linux version 2.6.11-rc4 ([EMAIL PROTECTED])
(gcc version 3.3.2) #27 Wed Feb 23 17:49:05 PST 2005
Built 1 zonelists
Kernel command line: root=/dev/ram rw ramdisk=36000
console=ttyS0
PID hash table entries: 1024 (order: 10, 16384 bytes)
Console: colour dummy device 80x25
Dentry cache hash table entries: 32768 (order: 5,
131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536
bytes)
Memory: 120832k available (2136k kernel code, 916k
data, 108k init, 0k highmem)
Mount-cache hash table entries: 512 (order: 0, 4096
bytes)
checking if image is initramfs...it isn't (no cpio
magic); looks like an initrd
Freeing initrd memory: 5709k freed
NET: Registered protocol family 16
PCI: Probing PCI hardware
SCSI subsystem initialized
Installing knfsd (copyright (C) 1996
[EMAIL PROTECTED]).
Initializing Cryptographic API
Serial: 8250/16550 driver $Revision: 1.90 $ 6 ports,
IRQ sharing disabled
ttyS0 at MMIO 0x0 (irq = 0) is a 16550A
ttyS1 at MMIO 0x0 (irq = 1) is a 16550A
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
RAMDISK driver initialized: 16 RAM disks of 36000K
size 1024 blocksize
loop: loaded (max 8 devices)
mal0: Initialized, 1 tx channels, 1 rx channels
emac: IBM EMAC Ethernet driver, version 2.0
Maintained by Benjamin Herrenschmidt
<[EMAIL PROTECTED]>
eth0: IBM emac, MAC 08:00:3e:26:15:59
eth0: Found Generic MII PHY (0x06)
Uniform Multi-Platform E-IDE driver Revision:
7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes;
override with idebus=xx
ata1: SATA max UDMA/100 cmd 0xC9002E80 ctl 0xC9002E8A
bmdma 0xC9002E00 irq 30
ata2: SATA max UDMA/100 cmd 0xC9002EC0 ctl 0xC9002ECA
bmdma 0xC9002E08 irq 30
irq 30: nobody cared!
Call trace:
 [c0005630] dump_stack+0x18/0x28
 [c003ae0c] __report_bad_irq+0x34/0xac
 [c003af38] note_interrupt+0x98/0xd4
 [c003a92c] __do_IRQ+0x15c/0x160
 [c0003e54] do_IRQ+0x50/0x98
 [c0002f64] ret_from_except+0x0/0x18
 [c0003ed4] default_idle+0x38/0x5c
 [c0003f20] cpu_idle+0x28/0x38
 [c00023a4] rest_init+0x24/0x34
 [c02dc614] start_kernel+0x170/0x1a8
 [c00022a4] start_here+0x44/0xb0
handlers:
[] (ata_interrupt+0x0/0x27c)
Disabling IRQ #30
ata1: dev 0 ATA, max UDMA7, 234493056 sectors: lba48
eth0: Link is Up
eth0: Speed: 100, Full duplex.






__ 
Do you Yahoo!? 
Yahoo! Mail - 250MB free storage. Do more. Manage less. 
http://info.mail.yahoo.com/mail_250
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-23 Thread Jay Lan
Hi Paul,
I think the microbenchmarking your link provides is irrelevant.
Your link provides benchmarking of doing a fork.
However, we are talking about inserting a callback routine
in a fork and/or an exit. The overhead is a function
call and time spent in the routine. The callback routine
can be configured to "do {} while (0)" if a certain CONFIG
flag is not set.
Thanks,
 - jay
Paul Jackson wrote:
So, I think such a fork/execve/exit hooks is harmless now.

I don't recall seeing any microbenchmarking of the impact on fork/exit
of such hooks.  You might find such a benchmark in lmbench, or at
http://bulk.fefe.de/scalability/.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] override RLIMIT_SIGPENDING for non-RT signals

2005-02-23 Thread Roland McGrath
Indeed, I think your patch does not go far enough.  I can read POSIX to say
that the siginfo_t data must be available when `kill' was used, as well.
This patch makes it allocate the siginfo_t, even when that exceeds
{RLIMIT_SIGPENDING}, for any non-RT signal (< SIGRTMIN) not sent by
sigqueue (actually, any signal that couldn't have been faked by a sigqueue
call).  Of course, in an extreme memory shortage situation, you are SOL and
violate POSIX a little before you die horribly from being out of memory anyway.

The LEGACY_QUEUE logic already ensures that, for non-RT signals, at most
one is ever on the queue.  So there really is no risk at all of unbounded
resource consumption; the usage can reach {RLIMIT_SIGPENDING} + 31, is all.

It's already the case that the limit can be exceeded by (in theory) up to
{RLIMIT_NPROC}-1 in race conditions because the bump and the limit check
are not atomic.  (Obviously you can only get anywhere near that many with
assloads of preemption, but exceeding it by a few is not too unlikely.)
This patch also fixes that accounting so that it should not be possible to
exceed {RLIMIT_SIGPENDING} + SIGRTMIN-1 queue items per user in races.


Thanks,
Roland


Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>

--- linux-2.6/kernel/signal.c
+++ linux-2.6/kernel/signal.c
@@ -260,19 +260,23 @@ next_signal(struct sigpending *pending, 
return sig;
 }
 
-static struct sigqueue *__sigqueue_alloc(struct task_struct *t, int flags)
+static struct sigqueue *__sigqueue_alloc(struct task_struct *t, int flags,
+int override_rlimit)
 {
struct sigqueue *q = NULL;
 
-   if (atomic_read(>user->sigpending) <
+   atomic_inc(>user->sigpending);
+   if (override_rlimit ||
+   atomic_read(>user->sigpending) <=
t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
q = kmem_cache_alloc(sigqueue_cachep, flags);
-   if (q) {
+   if (unlikely(q == NULL)) {
+   atomic_dec(>user->sigpending);
+   } else {
INIT_LIST_HEAD(>list);
q->flags = 0;
q->lock = NULL;
q->user = get_uid(t->user);
-   atomic_inc(>user->sigpending);
}
return(q);
 }
@@ -793,7 +797,9 @@ static int send_signal(int sig, struct s
   make sure at least one signal gets delivered and don't
   pass on the info struct.  */
 
-   q = __sigqueue_alloc(t, GFP_ATOMIC);
+   q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
+((unsigned long) info < 2 ||
+ info->si_code >= 0)));
if (q) {
list_add_tail(>list, >list);
switch ((unsigned long) info) {
@@ -1316,7 +1322,7 @@ struct sigqueue *sigqueue_alloc(void)
 {
struct sigqueue *q;
 
-   if ((q = __sigqueue_alloc(current, GFP_KERNEL)))
+   if ((q = __sigqueue_alloc(current, GFP_KERNEL, 0)))
q->flags |= SIGQUEUE_PREALLOC;
return(q);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Oops: Unable to handle kernel paging request

2005-02-23 Thread yves geunes
I run installed and updated sarge. I downloaded 2.6.10 from kernel.org.
When I use apt-get under 2.4.27 everything is OK
When I use apt-get under 2.6.10, I get a segfault
I run the same combination (allthough a different configuration on  a Pentium III (Coppermine) and 
on a Intel(R) Pentium(R) M processor 1.50GHz, but there it works fine

The faulty system is configured for 686.
I hope this is useful for you
0
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:3 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Built 1 zonelists
Kernel command line: root=/dev/hda1 ro
mapped APIC to d000 (fee0)
mapped IOAPIC to c000 (fec0)
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 2993.709 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1030900k/1047744k available (1703k kernel code, 16028k reserved, 768k data, 288k init, 130168k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 5914.62 BogoMIPS (lpj=2957312)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: After generic identify, caps: bfebfbff   
CPU: After vendor identify, caps:  bfebfbff   
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: After all inits, caps:bfebfbff   0080
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 04
per-CPU timeslice cutoff: 2925.86 usecs.
task migration cache decay timeout: 3 msecs.
Booting processor 1/1 eip 3000
Initializing CPU#1
Calibrating delay loop... 5980.16 BogoMIPS (lpj=2990080)
CPU: After generic identify, caps: bfebfbff   
CPU: After vendor identify, caps:  bfebfbff   
monitor/mwait feature present.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: After all inits, caps:bfebfbff   0080
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 04
Total of 2 processors activated (11894.78 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=-1
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
CPU0:
domain 0: span 03
 groups: 01 02
CPU1:
domain 0: span 03
 groups: 02 01
checking if image is initramfs...it isn't (bad gzip magic numbers); looks like an initrd
Freeing initrd memory: 3684k freed
NET: Registered protocol family 16
EISA bus registered
PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=3
PCI: Using configuration type 1
ACPI: Subsystem revision 20041105
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller :00:1f.1
PCI: Transparent bridge - :00:1e.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P2._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P3._PRT]
ACPI: Power Resource [URP1] (off)
ACPI: Power Resource [URP2] (off)
ACPI: Power Resource [FDDP] (off)
ACPI: Power Resource [LPTP] (off)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
SCSI subsystem initialized
PCI: Using ACPI for IRQ routing
** PCI interrupts are no longer routed 

Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02

2005-02-23 Thread Nick Piggin
Lee Revell wrote:
On Thu, 2005-02-24 at 10:27 +1100, Nick Piggin wrote: 

If you are using i386 with 2-level page tables (no highmem), then
the behaviour should be more or less identical. Odd.

IIRC last time I really tested this a few months ago, the worst case
latency on that machine was about 150us.  Currently its 422us from the
same clear_page_range code path.
On my Athlon XP the clear_page_range latency is not showing up at all,
and the worst delay so far is only 35us, most of which is the timer
interrupt IOW that machine is showing the best achievable latency (with
PREEMPT_DESKTOP).  The machine seeing 422 us latencies in
clear_page_range is a 600Mhz C3, which is known to be a FSB limited
architecture.
Well it should be pretty trivial to add a break in there.
I don't think it can get into 2.6.11 at this point though,
so we'll revisit this for 2.6.12 if the clear_page_range
optimisations don't get anywhere.
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   >