Fw: AIM7 fails with 2.6.18-rc5-mm1
We think this is a net bug. Begin forwarded message: Date: Mon, 4 Sep 2006 17:02:22 -0700 (PDT) From: Christoph Lameter [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: linux-kernel@vger.kernel.org Subject: AIM7 fails with 2.6.18-rc5-mm1 On an 8p Altix. 6 GB Ram AIM Multiuser Benchmark - Suite VII Run Beginning Tasksjobs/min jti jobs/min/task real cpu 1 2435.06 100 2435.0649 2.46 0.02 Mon Sep 4 10:17:44 2006 100 178784.27 94 1787.8427 3.36 7.08 Mon Sep 4 10:17:58 2006 200 280636.11 95 1403.1805 4.28 14.46 Mon Sep 4 10:18:15 2006 300 340973.67 91 1136.5789 5.28 22.35 Mon Sep 4 10:18:37 2006 400 382897.26 82 957.2431 6.27 30.44 Mon Sep 4 10:19:03 2006 500 413793.10 86 827.5862 7.25 38.14 Mon Sep 4 10:19:33 2006 600 434940.20 89 724.9003 8.28 46.43 Mon Sep 4 10:20:07 2006 700 Fatal error 98 at line 284 of file pipe_test.c: bind on write -- Address already in use Child #489: : Address already in use Failed to execute udp_test 100 Fatal error 98 at line 264 of file pipe_test.c: bind on write -- Address already in use Child #286: : Address already in use Failed to execute udp_test 100 etc etc Is this a known issue? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take15 1/4] kevent: Core files.
On Monday 04 September 2006 12:14, Evgeniy Polyakov wrote: +asmlinkage long sys_kevent_get_events(int ctl_fd, unsigned int min_nr, unsigned int max_nr, __u64 timeout, void __user *buf, unsigned flags) +asmlinkage long sys_kevent_ctl(int fd, unsigned int cmd, unsigned int num, void __user *arg) 'void __user *arg' in both of these always points to a struct ukevent, according to your documentation. Shouldn't it be a 'struct ukevent __user *arg' then? Arnd - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] network namespaces
Hi all, This complete separation of namespaces is very useful for at least two purposes: - allowing users to create and manage by their own various tunnels and VPNs, and - enabling easier and more straightforward live migration of groups of processes with their environment. I conceptually prefer this approach, but I seem to recall there were actual problems in using this for checkpoint/restart of lightweight (application) containers. Performance aside, are there any reasons why this approach would be problematic for c/r? I agree with this approach too, separated namespaces is the best way to identify the network ressources for a specific container. I'm afraid Daniel may be on vacation, and don't know who else other than Eric might have thoughts on this. Yes, I was in vacation, but I am back :) 2. People expressed concerns that complete separation of namespaces may introduce an undesired overhead in certain usage scenarios. The overhead comes from packets traversing input path, then output path, then input path again in the destination namespace if root namespace acts as a router. Yes, performance is probably one issue. My concerns was for layer 2 / layer 3 virtualization. I agree a layer 2 isolation/virtualization is the best for the system container. But there is another family of container called application container, it is not a system which is run inside a container but only the application. If you want to run a oracle database inside a container, you can run it inside an application container without launching init and all the services. This family of containers are used too for HPC (high performance computing) and for distributed checkpoint/restart. The cluster runs hundred of jobs, spawning them on different hosts inside an application container. Usually the jobs communicates with broadcast and multicast. Application containers does not care of having different MAC address and rely on a layer 3 approach. Are application containers comfortable with a layer 2 virtualization ? I don't think so, because several jobs running inside the same host communicate via broadcast/multicast between them and between other jobs running on different hosts. The IP consumption is a problem too: 1 container == 2 IP (one for the root namespace/ one for the container), multiplicated with the number of jobs. Furthermore, lot of jobs == lot of virtual devices. However, after a discussion with Kirill at the OLS, it appears we can merge the layer 2 and 3 approaches if the level of network virtualization is tunable and we can choose layer 2 or layer 3 when doing the unshare. The determination of the namespace for the incoming traffic can be done with an specific iptable module as a first step. While looking at the network namespace patches, it appears that the TCP/UDP part is **very** similar at what is needed for a layer 3 approach. Any thoughts ? Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] network namespaces
Daniel Lezcano [EMAIL PROTECTED] writes: 2. People expressed concerns that complete separation of namespaces may introduce an undesired overhead in certain usage scenarios. The overhead comes from packets traversing input path, then output path, then input path again in the destination namespace if root namespace acts as a router. Yes, performance is probably one issue. My concerns was for layer 2 / layer 3 virtualization. I agree a layer 2 isolation/virtualization is the best for the system container. But there is another family of container called application container, it is not a system which is run inside a container but only the application. If you want to run a oracle database inside a container, you can run it inside an application container without launching init and all the services. This family of containers are used too for HPC (high performance computing) and for distributed checkpoint/restart. The cluster runs hundred of jobs, spawning them on different hosts inside an application container. Usually the jobs communicates with broadcast and multicast. Application containers does not care of having different MAC address and rely on a layer 3 approach. Are application containers comfortable with a layer 2 virtualization ? I don't think so, because several jobs running inside the same host communicate via broadcast/multicast between them and between other jobs running on different hosts. The IP consumption is a problem too: 1 container == 2 IP (one for the root namespace/ one for the container), multiplicated with the number of jobs. Furthermore, lot of jobs == lot of virtual devices. However, after a discussion with Kirill at the OLS, it appears we can merge the layer 2 and 3 approaches if the level of network virtualization is tunable and we can choose layer 2 or layer 3 when doing the unshare. The determination of the namespace for the incoming traffic can be done with an specific iptable module as a first step. While looking at the network namespace patches, it appears that the TCP/UDP part is **very** similar at what is needed for a layer 3 approach. Any thoughts ? For HPC if you are interested in migration you need a separate IP per container. If you can take you IP address with you migration of networking state is simple. If you can't take your IP address with you a network container is nearly pointless from a migration perspective. Beyond that from everything I have seen layer 2 is just much cleaner than any layer 3 approach short of Serge's bind filtering. Beyond that I have yet to see a clean semantics for anything resembling your layer 2 layer 3 hybrid approach. If we can't have clear semantics it is by definition impossible to implement correctly because no one understands what it is supposed to do. Note. A true layer 3 approach has no impact on TCP/UDP filtering because it filters at bind time not at packet reception time. Once you start inspecting packets I don't see what the gain is from not going all of the way to layer 2. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2: hw checksum failures
Benjamin Herrenschmidt wrote: On Mon, 2006-09-04 at 20:56 -0700, Stephen Hemminger wrote: On Tue, 05 Sep 2006 13:42:38 +1000 Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: On Mon, 2006-09-04 at 20:34 -0700, Stephen Hemminger wrote: Unneeded byte swap was occurring. --- linux-2.6.orig/drivers/net/sky2.c +++ linux-2.6/drivers/net/sky2.c @@ -2001,7 +2001,7 @@ static int sky2_status_intr(struct sky2_ case OP_RXCHKS: skb = sky2-rx_ring[sky2-rx_next].skb; skb-ip_summed = CHECKSUM_HW; - skb-csum = le16_to_cpu(status); + skb-csum = status; break; case OP_TXINDEXLE: I've removed it in my paches (have you seen the other patches I sent for this driver ?), though I'm pre-swapping status and lenght now before the switch/case so there might still be an issue there. I'll have a look. The other tack would be to leave the reverse in hw flag on and take out all the existing swap calls but then you have to add an ifdef to re-order all the structures for tx_le, rx_le, status_le. That is what the vendor (GPL) version of sk98lin does. I prefer keeping the HW swap out of the way for now... that way, I know the card will react exactly like in an x86, and I avoid those ugly ifdef's. At least on powerpc, there is no cost in doing swap in software (well, pretty much no cost). Which means that if it worked on x86 with le16_to_cpu, it should work on powerpc... The main difference here however is that you called le16_to_cpu (which is basically a nop) on a 32 bits field, while I called le32_to_cpu() on it. But both should lead to the same ... (x86 will do a swapped 16 bits load of the 2 first bytes, while ppc will do a load of 4 bytes and swap that, thus ending up with the first 2 bytes swapped in the low order of the result). I'll dump the values and have a look to be sure. Another possibility would be a problem with the bits telling the chip where to calculate the checksum. Hardware only computes 16 bit checksum. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[REVISED] [PATCH] ethtool v4: add autoneg advertise feature
adds the ability to change the advertised speed and duplex for a network interface. Previously, a network interface was only able to advertise all supported speed's and duplex's, or one individual speed and duplex. The feature allows the user to choose which supported speed's and duplex's to advertise by using the hex value. Signed-off-by: Jeff Kirsher [EMAIL PROTECTED] Signed-off-by: Auke Kok [EMAIL PROTECTED] --- ethtool.8 | 24 ethtool.c | 12 +++- 2 files changed, 35 insertions(+), 1 deletions(-) diff --git a/ethtool.8 b/ethtool.8 index 888a7d8..679f6bc 100644 --- a/ethtool.8 +++ b/ethtool.8 @@ -176,6 +176,8 @@ ethtool \- Display or change ethernet ca .B2 duplex half full .B4 port tp aui bnc mii fibre .B2 autoneg on off +.RB [ advertise +.IR N ] .RB [ phyad .IR N ] .B2 xcvr internal external @@ -327,6 +329,28 @@ Select device port. Specify if autonegotiation is enabled. In the usual case it is, but might cause some problems with some network devices, so you can turn it off. .TP +.BI advertise \ N +Set the speed and duplex advertised by autonegotiation. The argument is +a hexidecimal value using one or a combination of the following values: +.RS +.PD 0 +.TP 3 +.BR 0x01 10 Half +.TP 3 +.BR 0x02 10 Full +.TP 3 +.BR 0x04 100 Half +.TP 3 +.BR 0x08 100 Full +.TP 3 +.BR 0x10 1000 Half (not supported by IEEE standards) +.TP 3 +.BR 0x20 1000 Full +.TP 3 +.BR 0x3F Auto +.PD +.RE +.TP .BI phyad \ N PHY address. .TP diff --git a/ethtool.c b/ethtool.c index 87e22ab..b7f189a 100644 --- a/ethtool.c +++ b/ethtool.c @@ -99,6 +99,7 @@ static struct option { [ duplex half|full ]\n [ port tp|aui|bnc|mii|fibre ]\n [ autoneg on|off ]\n + [ advertise %%x ]\n [ phyad %%d ]\n [ xcvr internal|external ]\n [ wol p|u|m|b|a|g|s|d... ]\n @@ -549,6 +550,15 @@ static void parse_cmdline(int argc, char show_usage(1); } break; + } else if (!strcmp(argp[i], advertise)) { + gset_changed = 1; + i += 1; + if (i = argc) + show_usage(1); + advertising_wanted = strtol(argp[i], NULL, 16); + if (advertising_wanted 0) + show_usage(1); + break; } else if (!strcmp(argp[i], phyad)) { gset_changed = 1; i += 1; @@ -601,7 +611,7 @@ static void parse_cmdline(int argc, char } } - if (autoneg_wanted == AUTONEG_ENABLE){ + if ((autoneg_wanted == AUTONEG_ENABLE) (advertising_wanted 0)) { if (speed_wanted == SPEED_10 duplex_wanted == DUPLEX_HALF) advertising_wanted = ADVERTISED_10baseT_Half; else if (speed_wanted == SPEED_10 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take15 4/4] kevent: Timer notifications.
On Monday 04 September 2006 12:14, Evgeniy Polyakov wrote: Timer notifications can be used for fine grained per-process time management, since interval timers are very inconvenient to use, and they are limited. I guess this must have been discussed before, but why is this not using high-resolution timers? Are you planning to change this? Maybe at least mention it in the description. Arnd - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.19 PATCH 1/7] ehea: interface to network stack
Hi Francois, thanks for your review and your comments. See below our answers. Regards Thomas Francois Romieu wrote: +cb2 = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); +if (!cb2) { +ehea_error(no mem for cb2); +goto kzalloc_failed; It's better when the label tell what it does than where it comes from. If it's numbered too, one can check them without going back and forth. +stats-tx_packets = cb2-txucp + cb2-txmcp + cb2-txbcp; +stats-multicast = cb2-rxmcp; +stats-rx_errors = cb2-rxuerr; +stats-rx_bytes = cb2-rxo; +stats-tx_bytes = cb2-txo; +stats-rx_packets = rx_packets; + +hcall_failed: +kfree(cb2); Tab was turned into spaces. Fixed. +static inline int ehea_refill_rq1(struct ehea_port_res *pr, int index, Avoid inline ? Inline declaration was removed from this one and several other functions. +for (i = 0; i nr_of_wqes; i++) { +if (!skb_arr_rq1[index]) { +skb_arr_rq1[index] = dev_alloc_skb(EHEA_LL_PKT_SIZE); netdev_alloc_skb ? Agreed done. + +if (!skb_arr_rq1[index]) { +ehea_error(no mem for skb/%d wqes filled, i); +ret = -ENOMEM; The caller does not check the returned value. Agreed. fn returns void now. +if (!skb_arr_rq1[i]) { +ehea_error(no mem for skb/%d skbs filled., i); +ret = -ENOMEM; +goto exit0; s/exit0/out/ Goto target naming was reworked throughout the whole driver and basically uses the style used by Dave M. and Jeff G. in the Tigon3 driver. +static inline int ehea_check_cqe(struct ehea_cqe *cqe, int *rq_num) +{ +*rq_num = (cqe-type EHEA_CQE_TYPE_RQ) 5; +if ((cqe-status EHEA_CQE_STAT_ERR_MASK) == 0) +return 0; +if (((cqe-status EHEA_CQE_STAT_ERR_TCP) != 0) + (cqe-header_length == 0)) on the previous line please. Changed at all occurences. +static inline struct sk_buff *get_skb_by_index(struct sk_buff **skb_array, + int arr_len, + struct ehea_cqe *cqe) +{ +int skb_index = EHEA_BMASK_GET(EHEA_WR_ID_INDEX, cqe-wr_id); +struct sk_buff *skb; +void *pref; +int x; + +x = skb_index + 1; +x = (arr_len - 1); + +pref = (void*)skb_array[x]; Useless cast. Agreed - removed. +if (unlikely(!skb)) { +if (netif_msg_rx_err(port)) +ehea_error(LL rq1: skb=NULL); +skb = dev_alloc_skb(EHEA_LL_PKT_SIZE); Tab/space Fixed. +irqreturn_t ehea_qp_aff_irq_handler(int irq, void *param, struct pt_regs * regs) static ? Agreed. +int ehea_sense_port_attr(struct ehea_port *port) static ? No - used in ehea_ethtool.c +} else { +if (hret == H_AUTHORITY) +{ Misplaced curly brace. Fixed. +ehea_info(Hypervisor denied setting port speed. Either + this partition is not authorized to set + port speed or another partition has modified + port speed first.); +ret = -EPERM; +} else +{ Misplaced curly brace. Fixed. +ret = -EIO; +ehea_error(Failed setting port speed); +} +} +netif_carrier_on(port-netdev); +exit0: +kfree(cb4); cb4 is NULL. Not wrong per se but I'd rather move the label one line down. Agreed. +void ehea_neq_tasklet(unsigned long data) static ? Agreed. +irqreturn_t ehea_interrupt_neq(int irq, void *param, struct pt_regs *regs) static ? Agreed. +{ +struct ehea_adapter *adapter = (struct ehea_adapter*)param; Useless cast. Fixed. +static int ehea_fill_port_res(struct ehea_port_res *pr) +{ +int ret; +struct ehea_qp_init_attr *init_attr = pr-qp-init_attr; + +/* RQ 1 */ +ret = ehea_init_fill_rq1(pr, init_attr-act_nr_rwqes_rq1 + - init_attr-act_nr_rwqes_rq2 + - init_attr-act_nr_rwqes_rq3 - 1); +/* RQ 2 */ Useless comment. Removed. +for (k = 0; k i; k++) { +u32 ist = port-port_res[k].recv_eq-attr.ist1; +ibmebus_free_irq(NULL, ist, port-port_res[k]); +} +goto failure; Poor label (and bloaty release practice too: remove k, reuse i below and more importantly release the things in allocation-reversed order). Somehow I don't get your point concerning the usage of 'k'. We need another iterator as the for loops using 'k' use 'i' as their terminating condition. +} +if
[PATCH] FRV: Fix {dis,en}able_irq_lockdep_irqrestore compile error
Fix the lack of certain non-LOCKDEP stub functions in linux/interrupt.h and also provide FRV with LOCKDEP variants. This is to be applied to -mm kernel since not all of the functions added exist in the main kernel. Signed-Off-By: David Howells [EMAIL PROTECTED] --- warthogdiffstat -p1 frv-irq-lockdep-2618rc5mm1.diff include/asm-frv/irq.h | 43 +++ include/linux/interrupt.h |2 ++ 2 files changed, 45 insertions(+) diff -urp ../kernels/linux-2.6.18-rc5-mm1/include/asm-frv/irq.h linux-2.6.18-rc5-mm1-frv/include/asm-frv/irq.h --- ../kernels/linux-2.6.18-rc5-mm1/include/asm-frv/irq.h 2006-09-04 18:02:48.0 +0100 +++ linux-2.6.18-rc5-mm1-frv/include/asm-frv/irq.h 2006-09-05 15:59:08.0 +0100 @@ -39,5 +39,48 @@ extern void disable_irq_nosync(unsigned extern void disable_irq(unsigned int irq); extern void enable_irq(unsigned int irq); +#ifdef CONFIG_LOCKDEP +/* + * Special lockdep variants of irq disabling/enabling. + * These should be used for locking constructs that + * know that a particular irq context which is disabled, + * and which is the only irq-context user of a lock, + * that it's safe to take the lock in the irq-disabled + * section without disabling hardirqs. + * + * On !CONFIG_LOCKDEP they are equivalent to the normal + * irq disable/enable methods. + */ +static inline void disable_irq_nosync_lockdep(unsigned int irq) +{ + disable_irq_nosync(irq); + local_irq_disable(); +} + +static inline void disable_irq_nosync_lockdep_irqsave(unsigned int irq, unsigned long *flags) +{ + disable_irq_nosync(irq); + local_irq_save(*flags); +} + +static inline void disable_irq_lockdep(unsigned int irq) +{ + disable_irq(irq); + local_irq_disable(); +} + +static inline void enable_irq_lockdep(unsigned int irq) +{ + local_irq_enable(); + enable_irq(irq); +} + +static inline void enable_irq_lockdep_irqrestore(unsigned int irq, unsigned long *flags) +{ + local_irq_restore(*flags); + enable_irq(irq); +} +#endif /* CONFIG_LOCKDEP */ + #endif /* _ASM_IRQ_H_ */ diff -urp ../kernels/linux-2.6.18-rc5-mm1/include/linux/interrupt.h linux-2.6.18-rc5-mm1-frv/include/linux/interrupt.h --- ../kernels/linux-2.6.18-rc5-mm1/include/linux/interrupt.h 2006-09-04 18:03:31.0 +0100 +++ linux-2.6.18-rc5-mm1-frv/include/linux/interrupt.h 2006-09-05 15:58:53.0 +0100 @@ -178,6 +178,8 @@ static inline int disable_irq_wake(unsig # define disable_irq_nosync_lockdep(irq) disable_irq_nosync(irq) # define disable_irq_lockdep(irq) disable_irq(irq) # define enable_irq_lockdep(irq) enable_irq(irq) +# define disable_irq_nosync_lockdep_irqsave(irq, flags) disable_irq_nosync(irq) +# define enable_irq_lockdep_irqrestore(irq, flags) enable_irq(irq) # endif #endif /* CONFIG_GENERIC_HARDIRQS */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] FRV: do_gettimeofday() should no longer use tickadj
Stop do_gettimeofday() on FRV from using tickadj, and model it after ARM instead. This patch also provides a placeholder macro for getting hardware timer data to be filled in when such is available. Signed-Off-By: David Howells [EMAIL PROTECTED] --- warthogdiffstat -p1 frv-tickadj-2618rc5mm1.diff arch/frv/kernel/time.c | 20 +--- 1 file changed, 5 insertions(+), 15 deletions(-) diff -urp ../kernels/linux-2.6.18-rc5-mm1/arch/frv/kernel/time.c linux-2.6.18-rc5-mm1-frv/arch/frv/kernel/time.c --- ../kernels/linux-2.6.18-rc5-mm1/arch/frv/kernel/time.c 2006-09-04 18:03:14.0 +0100 +++ linux-2.6.18-rc5-mm1-frv/arch/frv/kernel/time.c 2006-09-05 15:44:42.0 +0100 @@ -31,6 +31,9 @@ #define TICK_SIZE (tick_nsec / 1000) +/* H/W clock data if we can get it (in microseconds) */ +#define FRV_HW_CLOCK_DATA (0) + unsigned long __nongprelbss __clkin_clock_speed_HZ; unsigned long __nongprelbss __ext_bus_clock_speed_HZ; unsigned long __nongprelbss __res_bus_clock_speed_HZ; @@ -148,23 +151,10 @@ void do_gettimeofday(struct timeval *tv) { unsigned long seq; unsigned long usec, sec; - unsigned long max_ntp_tick; do { seq = read_seqbegin(xtime_lock); - - usec = 0; - - /* -* If time_adjust is negative then NTP is slowing the clock -* so make sure not to go into next possible interval. -* Better to lose some accuracy than have time go backwards.. -*/ - if (unlikely(time_adjust 0)) { - max_ntp_tick = (USEC_PER_SEC / HZ) - tickadj; - usec = min(usec, max_ntp_tick); - } - + usec = FRV_HW_CLOCK_DATA; sec = xtime.tv_sec; usec += (xtime.tv_nsec / 1000); } while (read_seqretry(xtime_lock, seq)); @@ -195,7 +185,7 @@ int do_settimeofday(struct timespec *tv) * wall time. Discover what correction gettimeofday() would have * made, and then undo it! */ - nsec -= 0 * NSEC_PER_USEC; + nsec -= FRV_HW_CLOCK_DATA * NSEC_PER_USEC; wtm_sec = wall_to_monotonic.tv_sec + (xtime.tv_sec - sec); wtm_nsec = wall_to_monotonic.tv_nsec + (xtime.tv_nsec - nsec); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NOMMU: Provide page_mkclean() for NOMMU
Provide a page_mkclean() implementation for NOMMU. This doesn't do anything except return successfully as there are no PTEs for it to play with. This is only relevant to the -mm kernels. Signed-Off-By: David Howells [EMAIL PROTECTED] --- warthogdiffstat -p1 nommu-page_mkclean-2618rc5mm1.diff include/linux/rmap.h |6 ++ 1 file changed, 6 insertions(+) diff -urp ../kernels/linux-2.6.18-rc5-mm1/include/linux/rmap.h linux-2.6.18-rc5-mm1-frv/include/linux/rmap.h --- ../kernels/linux-2.6.18-rc5-mm1/include/linux/rmap.h2006-09-04 18:03:32.0 +0100 +++ linux-2.6.18-rc5-mm1-frv/include/linux/rmap.h 2006-09-05 15:34:35.0 +0100 @@ -120,6 +120,12 @@ int page_mkclean(struct page *); #define page_referenced(page,l) TestClearPageReferenced(page) #define try_to_unmap(page, refs) SWAP_FAIL +static inline int page_mkclean(struct page *page) +{ + return 0; +} + + #endif /* CONFIG_MMU */ /* - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NOMMU: Make lib/ioremap.c conditional
Make lib/ioremap.c conditional on !CONFIG_MMU. It plays with PTEs which don't exist under NOMMU conditions. Signed-Off-By: David Howells [EMAIL PROTECTED] --- warthogdiffstat -p1 nommu-ioremap-2618rc5mm1.diff lib/Makefile |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff -urp ../kernels/linux-2.6.18-rc5-mm1/lib/Makefile linux-2.6.18-rc5-mm1-frv/lib/Makefile --- ../kernels/linux-2.6.18-rc5-mm1/lib/Makefile2006-09-04 18:03:32.0 +0100 +++ linux-2.6.18-rc5-mm1-frv/lib/Makefile 2006-09-05 16:01:38.0 +0100 @@ -5,8 +5,9 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \ bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o \ idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \ -sha1.o ioremap.o +sha1.o +lib-$(CONFIG_MMU) += ioremap.o lib-$(CONFIG_SMP) += cpumask.o lib-y += kobject.o kref.o kobject_uevent.o klist.o - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NOMMU: Move the fallback arch_vma_name() to a sensible place
Move the fallback arch_vma_name() to a sensible place (kernel/signal.c). Currently it's in fs/proc/task_mmu.c, a file that is dependent on both CONFIG_PROC_FS and CONFIG_MMU being enabled, but it's used from kernel/signal.c from where it is called unconditionally. Signed-Off-By: David Howells [EMAIL PROTECTED] --- warthogdiffstat -p1 nommu-arch_vma_name-2618rc5mm1.diff fs/proc/task_mmu.c |5 - kernel/signal.c|5 + 2 files changed, 5 insertions(+), 5 deletions(-) diff -urp ../kernels/linux-2.6.18-rc5-mm1/fs/proc/task_mmu.c linux-2.6.18-rc5-mm1-frv/fs/proc/task_mmu.c --- ../kernels/linux-2.6.18-rc5-mm1/fs/proc/task_mmu.c 2006-09-04 18:02:43.0 +0100 +++ linux-2.6.18-rc5-mm1-frv/fs/proc/task_mmu.c 2006-09-05 15:49:18.0 +0100 @@ -122,11 +122,6 @@ struct mem_size_stats unsigned long private_dirty; }; -__attribute__((weak)) const char *arch_vma_name(struct vm_area_struct *vma) -{ - return NULL; -} - static int show_map_internal(struct seq_file *m, void *v, struct mem_size_stats *mss) { struct proc_maps_private *priv = m-private; diff -urp ../kernels/linux-2.6.18-rc5-mm1/kernel/signal.c linux-2.6.18-rc5-mm1-frv/kernel/signal.c --- ../kernels/linux-2.6.18-rc5-mm1/kernel/signal.c 2006-09-04 18:03:32.0 +0100 +++ linux-2.6.18-rc5-mm1-frv/kernel/signal.c2006-09-05 15:49:19.0 +0100 @@ -773,6 +773,11 @@ static void pad_len_spaces(int len) printk(%*c, len, ' '); } +__attribute__((weak)) const char *arch_vma_name(struct vm_area_struct *vma) +{ + return NULL; +} + static int print_vma(struct vm_area_struct *vma) { struct mm_struct *mm = vma-vm_mm; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] network namespaces
For HPC if you are interested in migration you need a separate IP per container. If you can take you IP address with you migration of networking state is simple. If you can't take your IP address with you a network container is nearly pointless from a migration perspective. Eric, please, I know... I showed you a migration demo at OLS ;) Beyond that from everything I have seen layer 2 is just much cleaner than any layer 3 approach short of Serge's bind filtering. Beyond that I have yet to see a clean semantics for anything resembling your layer 2 layer 3 hybrid approach. If we can't have clear semantics it is by definition impossible to implement correctly because no one understands what it is supposed to do. Note. A true layer 3 approach has no impact on TCP/UDP filtering because it filters at bind time not at packet reception time. Once you start inspecting packets I don't see what the gain is from not going all of the way to layer 2. The bsdjail was just for information ... - Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] network namespaces
Yes, performance is probably one issue. My concerns was for layer 2 / layer 3 virtualization. I agree a layer 2 isolation/virtualization is the best for the system container. But there is another family of container called application container, it is not a system which is run inside a container but only the application. If you want to run a oracle database inside a container, you can run it inside an application container without launching init and all the services. This family of containers are used too for HPC (high performance computing) and for distributed checkpoint/restart. The cluster runs hundred of jobs, spawning them on different hosts inside an application container. Usually the jobs communicates with broadcast and multicast. Application containers does not care of having different MAC address and rely on a layer 3 approach. Are application containers comfortable with a layer 2 virtualization ? I don't think so, because several jobs running inside the same host communicate via broadcast/multicast between them and between other jobs running on different hosts. The IP consumption is a problem too: 1 container == 2 IP (one for the root namespace/ one for the container), multiplicated with the number of jobs. Furthermore, lot of jobs == lot of virtual devices. However, after a discussion with Kirill at the OLS, it appears we can merge the layer 2 and 3 approaches if the level of network virtualization is tunable and we can choose layer 2 or layer 3 when doing the unshare. The determination of the namespace for the incoming traffic can be done with an specific iptable module as a first step. While looking at the network namespace patches, it appears that the TCP/UDP part is **very** similar at what is needed for a layer 3 approach. Any thoughts ? My humble opinion is that your approach doesn't intersect with this one. So we can freely go with both *if needed*. And hear the comments from network guru guys and what and how to improve. So I suggest you at least to send the patches, so we could discuss it. Thanks, Kirill - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 Detected Tx Unit Hang
On 9/3/06, Paul Aviles [EMAIL PROTECTED] wrote: Hey Jesse, thanks for your reply. Here is the stuff on /procs. The weird no problem, part is that I have several other identical systems and only one is affected. Today I moved the hard drive to another similar system and I am not seeing the problem so I am wondering if is something maybe wrong with the card eeprom? Is there a way to check that? I doubt it is an eeprom problem. you can dump the eeproms with ethtool -e eth0 from both machines and compare them . Odd that only one system is having the problem. Could it be that the hardware on that box is having issues? Are you sure the machines are running the same bios version with the same settings? Any overclocking? cat /proc/interrupts CPU0 CPU1 16: 70540 0 IO-APIC-level uhci_hcd:usb4, eth0 this could contribute to your problem, were you able to test without NAPI? Jesse - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 problem on powerpc
On Tue, 05 Sep 2006 13:47:52 +1000 Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: It may not need any swapping, it is hard to tell what the hardware will do without experimentation. Yes... did you have a chance to test the vlan stuff on LE machines (x86) ? did it work with the BE swapping you were doing ? I've purposedly removed in my patches the hardware side swapping of the descriptors, as I explained, thus making the hardware react the same on ppc and x86. Thus we need the exact same swapping macros on both platforms). Last time I checked it worked. Private cable simulating VLAN from other Linux card. I know pretty much nothing about vlan so I'm not too much about trying to check that right now :) Also, there is still the hw checksum issue I need to verify what's up there, it might be a swapping problem as well... or not. Can you send me your latest patch set so I can work from there ? Cheers, Ben - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] network namespaces
On Tue, Sep 05, 2006 at 08:45:39AM -0600, Eric W. Biederman wrote: Daniel Lezcano [EMAIL PROTECTED] writes: 2. People expressed concerns that complete separation of namespaces may introduce an undesired overhead in certain usage scenarios. The overhead comes from packets traversing input path, then output path, then input path again in the destination namespace if root namespace acts as a router. Yes, performance is probably one issue. My concerns was for layer 2 / layer 3 virtualization. I agree a layer 2 isolation/virtualization is the best for the system container. But there is another family of container called application container, it is not a system which is run inside a container but only the application. If you want to run a oracle database inside a container, you can run it inside an application container without launching init and all the services. This family of containers are used too for HPC (high performance computing) and for distributed checkpoint/restart. The cluster runs hundred of jobs, spawning them on different hosts inside an application container. Usually the jobs communicates with broadcast and multicast. Application containers does not care of having different MAC address and rely on a layer 3 approach. Are application containers comfortable with a layer 2 virtualization ? I don't think so, because several jobs running inside the same host communicate via broadcast/multicast between them and between other jobs running on different hosts. The IP consumption is a problem too: 1 container == 2 IP (one for the root namespace/ one for the container), multiplicated with the number of jobs. Furthermore, lot of jobs == lot of virtual devices. However, after a discussion with Kirill at the OLS, it appears we can merge the layer 2 and 3 approaches if the level of network virtualization is tunable and we can choose layer 2 or layer 3 when doing the unshare. The determination of the namespace for the incoming traffic can be done with an specific iptable module as a first step. While looking at the network namespace patches, it appears that the TCP/UDP part is **very** similar at what is needed for a layer 3 approach. Any thoughts ? For HPC if you are interested in migration you need a separate IP per container. If you can take you IP address with you migration of networking state is simple. If you can't take your IP address with you a network container is nearly pointless from a migration perspective. Beyond that from everything I have seen layer 2 is just much cleaner than any layer 3 approach short of Serge's bind filtering. well, the 'ip subset' approach Linux-VServer and other Jail solutions use is very clean, it just does not match your expectations of a virtual interface (as there is none) and it does not cope well with all kinds of per context 'requirements', which IMHO do not really exist on the application layer (only on the whole system layer) Beyond that I have yet to see a clean semantics for anything resembling your layer 2 layer 3 hybrid approach. If we can't have clear semantics it is by definition impossible to implement correctly because no one understands what it is supposed to do. IMHO that would be quite simple, have a 'namespace' for limiting port binds to a subset of the available ips and another one which does complete network virtualization with all the whistles and bells, IMHO most of them are orthogonal and can easily be combined - full network virtualization - lightweight ip subset - both Note. A true layer 3 approach has no impact on TCP/UDP filtering because it filters at bind time not at packet reception time. Once you start inspecting packets I don't see what the gain is from not going all of the way to layer 2. IMHO this requirement only arises from the full system virtualization approach, just look at the other jail solutions (solaris, bsd, ...) some of them do not even allow for more than a single ip but they work quite well when used properly ... best, Herbert Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bridge: random extra bytes on STP TCN packet
We seem to send 3 extra bytes in a TCN, which will be whatever happens to be on the stack. Thanks to [EMAIL PROTECTED] for seeing. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] diff -Naur a/net/bridge/br_stp_bpdu.c b/net/bridge/br_stp_bpdu.c --- a/net/bridge/br_stp_bpdu.c 2006-09-03 23:40:08.0 +0530 +++ b/net/bridge/br_stp_bpdu.c 2006-09-03 23:40:33.0 +0530 @@ -121,7 +121,7 @@ buf[1] = 0; buf[2] = 0; buf[3] = BPDU_TYPE_TCN; - br_send_bpdu(p, buf, 7); + br_send_bpdu(p, buf, 4); } /* - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.18] WE-21 support (core API)
On Mon, Sep 04, 2006 at 10:35:09AM +0200, Johannes Berg wrote: Uh, please don't strip me from the CC list :) WE-netlink is optional. And WE-ioctl could be made optional (still on the todo list). You can also disable WE-event and WE-iwspy for further footprint reduction. The real question is: Why does removing WE-event reduce footprint? I guess the answer is that there's a lot of non-generic code needed to pack/unpack all the data. Which is not really something you want. Wrong answer. wireless.c has about 2.3k lines of code. But, for example airo.c contains another 15 lines of code just for the trivial *parameter checking* in airo_set_essid. This is duplicated all over. Did it never occur to you that things like /* Check the size of the string */ if(dwrq-length IW_ESSID_MAX_SIZE+1) { return -E2BIG ; } can be checked generically? Maybe you're actually checking this generically. But if I did it your way, I'd copy and paste this all over... It is actually checked generically, that's the whole point of the code in wireless.c. But, driver authors don't trust generic checks. It was designed this way on purpose, because you get low footprint and very good scalability. Wtf does scalability have to do with it? Footprint scalability. johannes Jean - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] network namespaces
This family of containers are used too for HPC (high performance computing) and for distributed checkpoint/restart. The cluster runs hundred of jobs, spawning them on different hosts inside an application container. Usually the jobs communicates with broadcast and multicast. Application containers does not care of having different MAC address and rely on a layer 3 approach. Ok I think to understand this we need some precise definitions. In the normal case it is an error for a job to communication with a different job. The basic advantage with a different MAC is that you can found out who the intended recipient is sooner in the networking stack and you have truly separate network devices. Allowing for a cleaner implementation. Changing the MAC after migration is likely to be fine. Are application containers comfortable with a layer 2 virtualization ? I don't think so, because several jobs running inside the same host communicate via broadcast/multicast between them and between other jobs running on different hosts. The IP consumption is a problem too: 1 container == 2 IP (one for the root namespace/ one for the container), multiplicated with the number of jobs. Furthermore, lot of jobs == lot of virtual devices. First if you hook you network namespaces with ethernet bridging you don't need any extra IPs. Second don't see the conflict you perceive between application containers and layer 2 containment. The bottom line is that you need at least one loopback interface per non-trivial network namespace. One you get that having a virtual is no big deal. In addition network devices don't consume less memory than a process. So lots of network devices should not be a problem. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Alexey Kuznetsov wrote: Hello! Some people reported that this program runs in 9.997 sec when run on FreeBSD. Try enclosed patch. I have no idea why 9.997 sec is so magic, but I get exactly this number on my notebook. :-) Alexey = This patch enables sending ACKs each 2d received segment. It does not affect either mss-sized connections (obviously) or connections controlled by Nagle (because there is only one small segment in flight). The idea is to record the fact that a small segment arrives on a connection, where one small segment has already been received and still not-ACKed. In this case ACK is forced after tcp_recvmsg() drains receive buffer. In other words, it is a soft each-2d-segment ACK, which is enough to preserve ACK clock even when ABC is enabled. Is this really necessary? I thought that the problems with ABC were in trying to apply byte-based heuristics from the RFC(s) to a packet-oritented cwnd in the stack? rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] network namespaces
Herbert Poetzl [EMAIL PROTECTED] writes: On Tue, Sep 05, 2006 at 08:45:39AM -0600, Eric W. Biederman wrote: Daniel Lezcano [EMAIL PROTECTED] writes: For HPC if you are interested in migration you need a separate IP per container. If you can take you IP address with you migration of networking state is simple. If you can't take your IP address with you a network container is nearly pointless from a migration perspective. Beyond that from everything I have seen layer 2 is just much cleaner than any layer 3 approach short of Serge's bind filtering. well, the 'ip subset' approach Linux-VServer and other Jail solutions use is very clean, it just does not match your expectations of a virtual interface (as there is none) and it does not cope well with all kinds of per context 'requirements', which IMHO do not really exist on the application layer (only on the whole system layer) I probably expressed that wrong. There are currently three basic approaches under discussion. Layer 3 (Basically bind filtering) nothing at the packet level. The approach taken by Serge's version of bsdjails and Vserver. Layer 2.5 What Daniel proposed. Layer 2. (Trivially mapping each packet to a different interface) And then treating everything as multiple instances of the network stack. Roughly what OpenVZ and I have implemented. You can get into some weird complications at layer 3 but because it doesn't touch each packet the proof it is fast is trivial. Beyond that I have yet to see a clean semantics for anything resembling your layer 2 layer 3 hybrid approach. If we can't have clear semantics it is by definition impossible to implement correctly because no one understands what it is supposed to do. IMHO that would be quite simple, have a 'namespace' for limiting port binds to a subset of the available ips and another one which does complete network virtualization with all the whistles and bells, IMHO most of them are orthogonal and can easily be combined - full network virtualization - lightweight ip subset - both Quite possibly. The LSM will stay for a while so we do have a clean way to restrict port binds. Note. A true layer 3 approach has no impact on TCP/UDP filtering because it filters at bind time not at packet reception time. Once you start inspecting packets I don't see what the gain is from not going all of the way to layer 2. IMHO this requirement only arises from the full system virtualization approach, just look at the other jail solutions (solaris, bsd, ...) some of them do not even allow for more than a single ip but they work quite well when used properly ... Yes they do. Currently I am strongly opposed to Daniel Layer 2.5 approach as I see no redeeming value in it. A good clean layer 3 approach I avoid only because I think we can do better. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] d80211: fix multiple device ap support
Another fix to the interpretation of dev_alloc_name() return value. dev_alloc_name() returns the number of the unit assigned or a negative errno code. Signed-off-by: David Kimdon [EMAIL PROTECTED] Index: linux-2.6.16/net/d80211/ieee80211_iface.c === --- linux-2.6.16.orig/net/d80211/ieee80211_iface.c +++ linux-2.6.16/net/d80211/ieee80211_iface.c @@ -122,7 +122,7 @@ int ieee80211_if_add_mgmt(struct net_dev if (!ndev) return -ENOMEM; ret = dev_alloc_name(ndev, wmgmt%d); - if (ret) + if (ret 0) goto fail; ndev-ieee80211_ptr = local; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.19 PATCH 1/7] ehea: interface to network stack
Thomas Klein [EMAIL PROTECTED] : [...] Somehow I don't get your point concerning the usage of 'k'. We need another iterator as the for loops using 'k' use 'i' as their terminating condition. Something like the code below perhaps (with more local variables maybe): static int ehea_reg_interrupts(struct net_device *dev) { struct ehea_port *port = netdev_priv(dev); struct ehea_port_res *pr; int i, ret; for (i = 0; i port-num_def_qps; i++) { pr = port-port_res[i]; snprintf(pr-int_recv_name, EHEA_IRQ_NAME_SIZE - 1 , %s-recv%d, dev-name, i); ret = ibmebus_request_irq(NULL, pr-recv_eq-attr.ist1, ehea_recv_irq_handler, SA_INTERRUPT, pr-int_recv_name, pr); if (ret) { ehea_error(failed registering irq for ehea_recv_int: port_res_nr:%d, ist=%X, i, pr-recv_eq-attr.ist1); goto err_free_irq_recv_eq_0; } if (netif_msg_ifup(port)) ehea_info(irq_handle 0x%X for funct ehea_recv_int %d registered, pr-recv_eq-attr.ist1, i); } snprintf(port-int_aff_name, EHEA_IRQ_NAME_SIZE - 1, %s-aff, dev-name); ret = ibmebus_request_irq(NULL, port-qp_eq-attr.ist1, ehea_qp_aff_irq_handler, SA_INTERRUPT, port-int_aff_name, port); if (ret) { ehea_error(failed registering irq for qp_aff_irq_handler: ist=%X, port-qp_eq-attr.ist1); goto err_free_irq_recv_eq_0; } if (netif_msg_ifup(port)) ehea_info(irq_handle 0x%X for function qp_aff_irq_handler registered, port-qp_eq-attr.ist1); for (i = 0; i port-num_def_qps + port-num_add_tx_qps; i++) { pr = port-port_res[i]; snprintf(pr-int_send_name, EHEA_IRQ_NAME_SIZE - 1, %s-send%d, dev-name, i); ret = ibmebus_request_irq(NULL, pr-send_eq-attr.ist1, ehea_send_irq_handler, SA_INTERRUPT, pr-int_send_name, pr); if (ret) { ehea_error(failed registering irq for ehea_send port_res_nr:%d, ist=%X, i, pr-send_eq-attr.ist1); goto err_free_irq_send_eq_1; } if (netif_msg_ifup(port)) ehea_info(irq_handle 0x%X for function ehea_send_int %d registered, pr-send_eq-attr.ist1, i); } out: return ret; err_free_irq_send_eq_1: // Post-dec works with unsigned int too. while (i-- 0) { u32 ist = port-port_res[i].send_eq-attr.ist1; ibmebus_free_irq(NULL, ist, port-port_res[i]); } ibmebus_free_irq(NULL, port-qp_eq-attr.ist1, port); i = port-num_def_qps; err_free_irq_recv_eq_0: while (i-- 0) { u32 ist = port-port_res[i].recv_eq-attr.ist1; ibmebus_free_irq(NULL, ist, port-port_res[k]); } goto out; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] tcp-lp: bug fix for oops in 2.6.18-rc6
Sorry that the patch submited yesterday still contain a small bug. This version have already been test for hours with BT connections. The oops is now difficult to reproduce. Signed-off-by: Wong Hoi Sing Edison [EMAIL PROTECTED] --- diff -urpN linux-2.6.18-rc6/net/ipv4/tcp_lp.c linux/net/ipv4/tcp_lp.c --- linux-2.6.18-rc6/net/ipv4/tcp_lp.c 2006-09-06 04:12:00.0 +0800 +++ linux/net/ipv4/tcp_lp.c 2006-09-06 04:24:07.0 +0800 @@ -3,13 +3,8 @@ * * TCP Low Priority is a distributed algorithm whose goal is to utilize only * the excess network bandwidth as compared to the ``fair share`` of - * bandwidth as targeted by TCP. Available from: - * http://www.ece.rice.edu/~akuzma/Doc/akuzma/TCP-LP.pdf + * bandwidth as targeted by TCP. * - * Original Author: - * Aleksandar Kuzmanovic [EMAIL PROTECTED] - * - * See http://www-ece.rice.edu/networks/TCP-LP/ for their implementation. * As of 2.6.13, Linux supports pluggable congestion control algorithms. * Due to the limitation of the API, we take the following changes from * the original TCP-LP implementation: @@ -24,11 +19,20 @@ * o OWD is handled in relative format, where local time stamp will in * tcp_time_stamp format. * - * Port from 2.4.19 to 2.6.16 as module by: - * Wong Hoi Sing Edison [EMAIL PROTECTED] - * Hung Hing Lun [EMAIL PROTECTED] + * Original Author: + * Aleksandar Kuzmanovic [EMAIL PROTECTED] + * Available from: + * http://www.ece.rice.edu/~akuzma/Doc/akuzma/TCP-LP.pdf + * Original implementation for 2.4.19: + * http://www-ece.rice.edu/networks/TCP-LP/ + * + * 2.6.x module Authors: + * Wong Hoi Sing, Edison [EMAIL PROTECTED] + * Hung Hing Lun, Mike [EMAIL PROTECTED] + * SourceForge project page: + * http://tcp-lp-mod.sourceforge.net/ * - * Version: $Id: tcp_lp.c,v 1.22 2006-05-02 18:18:19 hswong3i Exp $ + * Version: $Id: tcp_lp.c,v 1.24 2006/09/05 20:22:53 hswong3i Exp $ */ #include linux/config.h @@ -153,16 +157,19 @@ static u32 tcp_lp_remote_hz_estimator(st if (m 0) m = -m; - if (rhz != 0) { + if (rhz 0) { m -= rhz 6;/* m is now error in remote HZ est */ rhz += m; /* 63/64 old + 1/64 new */ } else rhz = m 6; + out: /* record time for successful remote HZ calc */ - lp-flag |= LP_VALID_RHZ; + if (rhz 0) + lp-flag |= LP_VALID_RHZ; + else + lp-flag = ~LP_VALID_RHZ; - out: /* record reference time stamp */ lp-remote_ref_time = tp-rx_opt.rcv_tsval; lp-local_ref_time = tp-rx_opt.rcv_tsecr; @@ -333,6 +340,6 @@ static void __exit tcp_lp_unregister(voi module_init(tcp_lp_register); module_exit(tcp_lp_unregister); -MODULE_AUTHOR(Wong Hoi Sing Edison, Hung Hing Lun); +MODULE_AUTHOR(Wong Hoi Sing Edison, Hung Hing Lun Mike); MODULE_LICENSE(GPL); MODULE_DESCRIPTION(TCP Low Priority); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] tcp-lp: update information to MAINTAINERS
Signed-off-by: Wong Hoi Sing Edison [EMAIL PROTECTED] --- diff -urpN linux-2.6.18-rc6/MAINTAINERS linux/MAINTAINERS --- linux-2.6.18-rc6/MAINTAINERS2006-09-06 04:12:11.0 +0800 +++ linux/MAINTAINERS 2006-09-06 04:19:08.0 +0800 @@ -2818,6 +2818,14 @@ M: [EMAIL PROTECTED] L: netdev@vger.kernel.org S: Maintained +TCP LOW PRIORITY MODULE +P: Wong Hoi Sing, Edison +M: [EMAIL PROTECTED] +P: Hung Hing Lun, Mike +M: [EMAIL PROTECTED] +W: http://tcp-lp-mod.sourceforge.net/ +S: Maintained + TI OMAP RANDOM NUMBER GENERATOR SUPPORT P: Deepak Saxena M: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] tcp-lp: update information to MAINTAINERS
Signed-off-by: Wong Hoi Sing Edison [EMAIL PROTECTED] --- diff -urpN linux-2.6.18-rc6/MAINTAINERS linux/MAINTAINERS --- linux-2.6.18-rc6/MAINTAINERS2006-09-06 04:12:11.0 +0800 +++ linux/MAINTAINERS 2006-09-06 04:19:08.0 +0800 @@ -2818,6 +2818,14 @@ M: [EMAIL PROTECTED] L: netdev@vger.kernel.org S: Maintained +TCP LOW PRIORITY MODULE +P: Wong Hoi Sing, Edison +M: [EMAIL PROTECTED] +P: Hung Hing Lun, Mike +M: [EMAIL PROTECTED] +W: http://tcp-lp-mod.sourceforge.net/ +S: Maintained + TI OMAP RANDOM NUMBER GENERATOR SUPPORT P: Deepak Saxena M: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] tcp-lp: bug fix for oops in 2.6.18-rc6
Sorry that the patch submited yesterday still contain a small bug. This version have already been test for hours with BT connections. The oops is now difficult to reproduce. Signed-off-by: Wong Hoi Sing Edison [EMAIL PROTECTED] --- diff -urpN linux-2.6.18-rc6/net/ipv4/tcp_lp.c linux/net/ipv4/tcp_lp.c --- linux-2.6.18-rc6/net/ipv4/tcp_lp.c 2006-09-06 04:12:00.0 +0800 +++ linux/net/ipv4/tcp_lp.c 2006-09-06 04:24:07.0 +0800 @@ -3,13 +3,8 @@ * * TCP Low Priority is a distributed algorithm whose goal is to utilize only * the excess network bandwidth as compared to the ``fair share`` of - * bandwidth as targeted by TCP. Available from: - * http://www.ece.rice.edu/~akuzma/Doc/akuzma/TCP-LP.pdf + * bandwidth as targeted by TCP. * - * Original Author: - * Aleksandar Kuzmanovic [EMAIL PROTECTED] - * - * See http://www-ece.rice.edu/networks/TCP-LP/ for their implementation. * As of 2.6.13, Linux supports pluggable congestion control algorithms. * Due to the limitation of the API, we take the following changes from * the original TCP-LP implementation: @@ -24,11 +19,20 @@ * o OWD is handled in relative format, where local time stamp will in * tcp_time_stamp format. * - * Port from 2.4.19 to 2.6.16 as module by: - * Wong Hoi Sing Edison [EMAIL PROTECTED] - * Hung Hing Lun [EMAIL PROTECTED] + * Original Author: + * Aleksandar Kuzmanovic [EMAIL PROTECTED] + * Available from: + * http://www.ece.rice.edu/~akuzma/Doc/akuzma/TCP-LP.pdf + * Original implementation for 2.4.19: + * http://www-ece.rice.edu/networks/TCP-LP/ + * + * 2.6.x module Authors: + * Wong Hoi Sing, Edison [EMAIL PROTECTED] + * Hung Hing Lun, Mike [EMAIL PROTECTED] + * SourceForge project page: + * http://tcp-lp-mod.sourceforge.net/ * - * Version: $Id: tcp_lp.c,v 1.22 2006-05-02 18:18:19 hswong3i Exp $ + * Version: $Id: tcp_lp.c,v 1.24 2006/09/05 20:22:53 hswong3i Exp $ */ #include linux/config.h @@ -153,16 +157,19 @@ static u32 tcp_lp_remote_hz_estimator(st if (m 0) m = -m; - if (rhz != 0) { + if (rhz 0) { m -= rhz 6; /* m is now error in remote HZ est */ rhz += m; /* 63/64 old + 1/64 new */ } else rhz = m 6; + out: /* record time for successful remote HZ calc */ - lp-flag |= LP_VALID_RHZ; + if (rhz 0) + lp-flag |= LP_VALID_RHZ; + else + lp-flag = ~LP_VALID_RHZ; - out: /* record reference time stamp */ lp-remote_ref_time = tp-rx_opt.rcv_tsval; lp-local_ref_time = tp-rx_opt.rcv_tsecr; @@ -333,6 +340,6 @@ static void __exit tcp_lp_unregister(voi module_init(tcp_lp_register); module_exit(tcp_lp_unregister); -MODULE_AUTHOR(Wong Hoi Sing Edison, Hung Hing Lun); +MODULE_AUTHOR(Wong Hoi Sing Edison, Hung Hing Lun Mike); MODULE_LICENSE(GPL); MODULE_DESCRIPTION(TCP Low Priority); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] tcp-lp: bug fix for oops in 2.6.18-rc6
Folks: we do watch over you and your postings, if you get rejects do send UNABRIDGED messages to [EMAIL PROTECTED], however we do look into the FREEZER (as the reject message does refer to) several times a day to find possible mis-rejects. Yours, [EMAIL PROTECTED] Signed-off-by: Wong Hoi Sing Edison [EMAIL PROTECTED] --- diff -urpN linux-2.6.18-rc6/MAINTAINERS linux/MAINTAINERS --- linux-2.6.18-rc6/MAINTAINERS2006-09-06 04:12:11.0 +0800 +++ linux/MAINTAINERS 2006-09-06 04:19:08.0 +0800 @@ -2818,6 +2818,14 @@ M: [EMAIL PROTECTED] L: netdev@vger.kernel.org S: Maintained +TCP LOW PRIORITY MODULE +P: Wong Hoi Sing, Edison +M: [EMAIL PROTECTED] +P: Hung Hing Lun, Mike +M: [EMAIL PROTECTED] +W: http://tcp-lp-mod.sourceforge.net/ +S: Maintained + TI OMAP RANDOM NUMBER GENERATOR SUPPORT P: Deepak Saxena M: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 problem on powerpc
On Mon, 2006-09-04 at 21:15 -0700, Stephen Hemminger wrote: On Tue, 05 Sep 2006 13:47:52 +1000 Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: It may not need any swapping, it is hard to tell what the hardware will do without experimentation. Yes... did you have a chance to test the vlan stuff on LE machines (x86) ? did it work with the BE swapping you were doing ? I've purposedly removed in my patches the hardware side swapping of the descriptors, as I explained, thus making the hardware react the same on ppc and x86. Thus we need the exact same swapping macros on both platforms). Last time I checked it worked. Private cable simulating VLAN from other Linux card. Ok, so we should probably switch back the vlan bits to BE swapping macros... However, we then have an inconsistency with that bit: #ifdef SKY2_VLAN_TAG_USED case OP_RXVLAN: sky2-rx_tag = length; break; case OP_RXCHKSVLAN: sky2-rx_tag = length; /* fall through */ #endif in sky2_status_intr() Where we read the lenght field directly without swapping (on the non patched driver, on the patched driver, lenght will have gone through an LE swap). That is, if you take the standpoint of a LE machine, you will read that value as a little endian value while elsewhere, we manipulate sky2-rx_tag as a BE value... (this is even without my patch) Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2: hw checksum failures
Which means that if it worked on x86 with le16_to_cpu, it should work on powerpc... The main difference here however is that you called le16_to_cpu (which is basically a nop) on a 32 bits field, while I called le32_to_cpu() on it. But both should lead to the same ... (x86 will do a swapped 16 bits load of the 2 first bytes, while ppc will do a load of 4 bytes and swap that, thus ending up with the first 2 bytes swapped in the low order of the result). I'll dump the values and have a look to be sure. Another possibility would be a problem with the bits telling the chip where to calculate the checksum. Hardware only computes 16 bit checksum. Oh I know that, but calling 16 bits swapping macros on a 32 bits field is a bit dodgy... might work in this case, I'll verify, but you may end up with the wrong half of the 32 bits word being used :) Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2: hw checksum failures
On Wed, 06 Sep 2006 07:12:43 +1000 Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: Which means that if it worked on x86 with le16_to_cpu, it should work on powerpc... The main difference here however is that you called le16_to_cpu (which is basically a nop) on a 32 bits field, while I called le32_to_cpu() on it. But both should lead to the same ... (x86 will do a swapped 16 bits load of the 2 first bytes, while ppc will do a load of 4 bytes and swap that, thus ending up with the first 2 bytes swapped in the low order of the result). I'll dump the values and have a look to be sure. Another possibility would be a problem with the bits telling the chip where to calculate the checksum. Hardware only computes 16 bit checksum. Oh I know that, but calling 16 bits swapping macros on a 32 bits field is a bit dodgy... might work in this case, I'll verify, but you may end up with the wrong half of the 32 bits word being used :) Ben. Agreed. Actually the checksum value is same hi/lo because there are two checksum units and we ask for the same offset on both. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 problem on powerpc
This is the reduced version of your patch, plus I got rid of the union in tx_le, it is a nuisance. --- sky2.orig/drivers/net/sky2.c2006-09-05 13:39:34.0 -0700 +++ sky2/drivers/net/sky2.c 2006-09-05 13:57:44.0 -0700 @@ -809,7 +809,7 @@ struct sky2_rx_le *le; le = sky2_next_rx(sky2); - le-addr = (ETH_HLEN 16) | ETH_HLEN; + le-addr = cpu_to_le32((ETH_HLEN 16) | ETH_HLEN); le-ctrl = 0; le-opcode = OP_TCPSTART | HW_OWNER; @@ -1227,7 +1227,7 @@ /* Send high bits if changed or crosses boundary */ if (addr64 != sky2-tx_addr64 || high32(mapping + len) != sky2-tx_addr64) { le = get_tx_le(sky2); - le-tx.addr = cpu_to_le32(addr64); + le-addr = cpu_to_le32(addr64); le-ctrl = 0; le-opcode = OP_ADDR64 | HW_OWNER; sky2-tx_addr64 = high32(mapping + len); @@ -1242,8 +1242,7 @@ if (mss != sky2-tx_last_mss) { le = get_tx_le(sky2); - le-tx.tso.size = cpu_to_le16(mss); - le-tx.tso.rsvd = 0; + le-addr = cpu_to_le32(mss); le-opcode = OP_LRGLEN | HW_OWNER; le-ctrl = 0; sky2-tx_last_mss = mss; @@ -1256,7 +1255,7 @@ if (sky2-vlgrp vlan_tx_tag_present(skb)) { if (!le) { le = get_tx_le(sky2); - le-tx.addr = 0; + le-addr = 0; le-opcode = OP_VLAN|HW_OWNER; le-ctrl = 0; } else @@ -1268,20 +1267,21 @@ /* Handle TCP checksum offload */ if (skb-ip_summed == CHECKSUM_HW) { - u16 hdr = skb-h.raw - skb-data; - u16 offset = hdr + skb-csum; + unsigned offset = skb-h.raw - skb-data; + u32 tcpsum; + + tcpsum = offset 16; /* sum start */ + tcpsum |= offset + skb-csum; /* sum write */ ctrl = CALSUM | WR_SUM | INIT_SUM | LOCK_SUM; if (skb-nh.iph-protocol == IPPROTO_UDP) ctrl |= UDPTCP; - if (hdr != sky2-tx_csum_start || offset != sky2-tx_csum_offset) { - sky2-tx_csum_start = hdr; - sky2-tx_csum_offset = offset; + if (tcpsum != sky2-tx_tcpsum) { + sky2-tx_tcpsum = tcpsum; le = get_tx_le(sky2); - le-tx.csum.start = cpu_to_le16(hdr); - le-tx.csum.offset = cpu_to_le16(offset); + le-addr = cpu_to_le32(tcpsum); le-length = 0; /* initial checksum value */ le-ctrl = 1; /* one packet */ le-opcode = OP_TCPLISW | HW_OWNER; @@ -1289,7 +1289,7 @@ } le = get_tx_le(sky2); - le-tx.addr = cpu_to_le32((u32) mapping); + le-addr = cpu_to_le32((u32) mapping); le-length = cpu_to_le16(len); le-ctrl = ctrl; le-opcode = mss ? (OP_LARGESEND | HW_OWNER) : (OP_PACKET | HW_OWNER); @@ -1307,14 +1307,14 @@ addr64 = high32(mapping); if (addr64 != sky2-tx_addr64) { le = get_tx_le(sky2); - le-tx.addr = cpu_to_le32(addr64); + le-addr = cpu_to_le32(addr64); le-ctrl = 0; le-opcode = OP_ADDR64 | HW_OWNER; sky2-tx_addr64 = addr64; } le = get_tx_le(sky2); - le-tx.addr = cpu_to_le32((u32) mapping); + le-addr = cpu_to_le32((u32) mapping); le-length = cpu_to_le16(frag-size); le-ctrl = ctrl; le-opcode = OP_BUFFER | HW_OWNER; @@ -1919,8 +1919,8 @@ dev = hw-dev[le-link]; sky2 = netdev_priv(dev); - length = le-length; - status = le-status; + length = le16_to_cpu(le-length); + status = le32_to_cpu(le-status); switch (le-opcode ~HW_OWNER) { case OP_RXSTAT: @@ -1964,7 +1964,7 @@ case OP_RXCHKS: skb = sky2-rx_ring[sky2-rx_next].skb; skb-ip_summed = CHECKSUM_HW; - skb-csum = le16_to_cpu(status); + skb-csum = status 0x; break; case OP_TXINDEXLE: @@ -3266,12 +3266,13 @@ hw-pm_cap = pm_cap; #ifdef __BIG_ENDIAN - /* byte swap descriptors in hardware */ + /* The sk98lin vendor driver uses hardware byte swapping but +* this driver uses software swapping. +*/ { u32 reg; - reg =
Re: sky2: hw checksum failures
Agreed. Actually the checksum value is same hi/lo because there are two checksum units and we ask for the same offset on both. Ok, that explains the (HLEN 16) | HLEN thing when configuring it... At this point, best is I dig into the actual values and see what's up. I'll let you know (I don't have the HW at hand right now) Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 problem on powerpc
On Tue, 2006-09-05 at 14:36 -0700, Stephen Hemminger wrote: This is the reduced version of your patch, plus I got rid of the union in tx_le, it is a nuisance. Thanks. I'll give it a go later today. The remaining nit is the inconsitent swapping of the vlan tag which is manipulated at BE at times and LE at others (later hapens in status_intr). Ben. --- sky2.orig/drivers/net/sky2.c 2006-09-05 13:39:34.0 -0700 +++ sky2/drivers/net/sky2.c 2006-09-05 13:57:44.0 -0700 @@ -809,7 +809,7 @@ struct sky2_rx_le *le; le = sky2_next_rx(sky2); - le-addr = (ETH_HLEN 16) | ETH_HLEN; + le-addr = cpu_to_le32((ETH_HLEN 16) | ETH_HLEN); le-ctrl = 0; le-opcode = OP_TCPSTART | HW_OWNER; @@ -1227,7 +1227,7 @@ /* Send high bits if changed or crosses boundary */ if (addr64 != sky2-tx_addr64 || high32(mapping + len) != sky2-tx_addr64) { le = get_tx_le(sky2); - le-tx.addr = cpu_to_le32(addr64); + le-addr = cpu_to_le32(addr64); le-ctrl = 0; le-opcode = OP_ADDR64 | HW_OWNER; sky2-tx_addr64 = high32(mapping + len); @@ -1242,8 +1242,7 @@ if (mss != sky2-tx_last_mss) { le = get_tx_le(sky2); - le-tx.tso.size = cpu_to_le16(mss); - le-tx.tso.rsvd = 0; + le-addr = cpu_to_le32(mss); le-opcode = OP_LRGLEN | HW_OWNER; le-ctrl = 0; sky2-tx_last_mss = mss; @@ -1256,7 +1255,7 @@ if (sky2-vlgrp vlan_tx_tag_present(skb)) { if (!le) { le = get_tx_le(sky2); - le-tx.addr = 0; + le-addr = 0; le-opcode = OP_VLAN|HW_OWNER; le-ctrl = 0; } else @@ -1268,20 +1267,21 @@ /* Handle TCP checksum offload */ if (skb-ip_summed == CHECKSUM_HW) { - u16 hdr = skb-h.raw - skb-data; - u16 offset = hdr + skb-csum; + unsigned offset = skb-h.raw - skb-data; + u32 tcpsum; + + tcpsum = offset 16; /* sum start */ + tcpsum |= offset + skb-csum; /* sum write */ ctrl = CALSUM | WR_SUM | INIT_SUM | LOCK_SUM; if (skb-nh.iph-protocol == IPPROTO_UDP) ctrl |= UDPTCP; - if (hdr != sky2-tx_csum_start || offset != sky2-tx_csum_offset) { - sky2-tx_csum_start = hdr; - sky2-tx_csum_offset = offset; + if (tcpsum != sky2-tx_tcpsum) { + sky2-tx_tcpsum = tcpsum; le = get_tx_le(sky2); - le-tx.csum.start = cpu_to_le16(hdr); - le-tx.csum.offset = cpu_to_le16(offset); + le-addr = cpu_to_le32(tcpsum); le-length = 0; /* initial checksum value */ le-ctrl = 1; /* one packet */ le-opcode = OP_TCPLISW | HW_OWNER; @@ -1289,7 +1289,7 @@ } le = get_tx_le(sky2); - le-tx.addr = cpu_to_le32((u32) mapping); + le-addr = cpu_to_le32((u32) mapping); le-length = cpu_to_le16(len); le-ctrl = ctrl; le-opcode = mss ? (OP_LARGESEND | HW_OWNER) : (OP_PACKET | HW_OWNER); @@ -1307,14 +1307,14 @@ addr64 = high32(mapping); if (addr64 != sky2-tx_addr64) { le = get_tx_le(sky2); - le-tx.addr = cpu_to_le32(addr64); + le-addr = cpu_to_le32(addr64); le-ctrl = 0; le-opcode = OP_ADDR64 | HW_OWNER; sky2-tx_addr64 = addr64; } le = get_tx_le(sky2); - le-tx.addr = cpu_to_le32((u32) mapping); + le-addr = cpu_to_le32((u32) mapping); le-length = cpu_to_le16(frag-size); le-ctrl = ctrl; le-opcode = OP_BUFFER | HW_OWNER; @@ -1919,8 +1919,8 @@ dev = hw-dev[le-link]; sky2 = netdev_priv(dev); - length = le-length; - status = le-status; + length = le16_to_cpu(le-length); + status = le32_to_cpu(le-status); switch (le-opcode ~HW_OWNER) { case OP_RXSTAT: @@ -1964,7 +1964,7 @@ case OP_RXCHKS: skb = sky2-rx_ring[sky2-rx_next].skb; skb-ip_summed = CHECKSUM_HW; - skb-csum = le16_to_cpu(status); + skb-csum = status 0x; break; case OP_TXINDEXLE: @@ -3266,12 +3266,13 @@ hw-pm_cap = pm_cap; #ifdef __BIG_ENDIAN - /* byte swap descriptors in hardware */ +
Re: [PATCH][RFC] Re: high latency with TCP connections
Hello! Is this really necessary? No, of course. We lived for ages without this, would live for another age. I thought that the problems with ABC were in trying to apply byte-based heuristics from the RFC(s) to a packet-oritented cwnd in the stack? It was just the last drop. Even with disabled ABC, that test shows some gaps in latency summed up to ~300 msec. Almost invisible, but not good. Too aggressive delack has many other issues. Even without ABC we have quadratically suppressed cwnd on TCP_NODELAY connections comparing to BSD: at sender side we suppress it by counting cwnd in packets, at receiver side by ACKing by byte counter. Each time when another victim sees artificial latencies introduced by agressive delayed acks, even though he requested TCP_NODELAY, our best argument is Stupid, you do all wrong, how could you get a decent performance? :-). Probably, we stand for a feature which really does not worth to stand for and causes nothing but permanent pain in ass. Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFT 5/5] sky2: fix fiber support
Fix support for fiber based devices. Needed to keep track of PMD type to add workaround in setup. Add support for gigabit half duplex fiber. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- drivers/net/sky2.c | 81 - drivers/net/sky2.h | 15 + 2 files changed, 63 insertions(+), 33 deletions(-) --- sky2.orig/drivers/net/sky2.c2006-09-05 13:57:44.0 -0700 +++ sky2/drivers/net/sky2.c 2006-09-05 14:00:04.0 -0700 @@ -308,7 +308,7 @@ } ctrl = gm_phy_read(hw, port, PHY_MARV_PHY_CTRL); - if (hw-copper) { + if (sky2_is_copper(hw)) { if (hw-chip_id == CHIP_ID_YUKON_FE) { /* enable automatic crossover */ ctrl |= PHY_M_PC_MDI_XMODE(PHY_M_PC_ENA_AUTO) 1; @@ -325,25 +325,37 @@ ctrl |= PHY_M_PC_DSC(2) | PHY_M_PC_DOWN_S_ENA; } } - gm_phy_write(hw, port, PHY_MARV_PHY_CTRL, ctrl); } else { /* workaround for deviation #4.88 (CRC errors) */ /* disable Automatic Crossover */ ctrl = ~PHY_M_PC_MDIX_MSK; - gm_phy_write(hw, port, PHY_MARV_PHY_CTRL, ctrl); + } - if (hw-chip_id == CHIP_ID_YUKON_XL) { - /* Fiber: select 1000BASE-X only mode MAC Specific Ctrl Reg. */ - gm_phy_write(hw, port, PHY_MARV_EXT_ADR, 2); - ctrl = gm_phy_read(hw, port, PHY_MARV_PHY_CTRL); - ctrl = ~PHY_M_MAC_MD_MSK; - ctrl |= PHY_M_MAC_MODE_SEL(PHY_M_MAC_MD_1000BX); - gm_phy_write(hw, port, PHY_MARV_PHY_CTRL, ctrl); + gm_phy_write(hw, port, PHY_MARV_PHY_CTRL, ctrl); + + /* special setup for PHY 88E1112 Fiber */ + if (hw-chip_id == CHIP_ID_YUKON_XL !sky2_is_copper(hw)) { + pg = gm_phy_read(hw, port, PHY_MARV_EXT_ADR); + /* Fiber: select 1000BASE-X only mode MAC Specific Ctrl Reg. */ + gm_phy_write(hw, port, PHY_MARV_EXT_ADR, 2); + ctrl = gm_phy_read(hw, port, PHY_MARV_PHY_CTRL); + ctrl = ~PHY_M_MAC_MD_MSK; + ctrl |= PHY_M_MAC_MODE_SEL(PHY_M_MAC_MD_1000BX); + gm_phy_write(hw, port, PHY_MARV_PHY_CTRL, ctrl); + + if (hw-pmd_type == 'P') { /* select page 1 to access Fiber registers */ gm_phy_write(hw, port, PHY_MARV_EXT_ADR, 1); + + /* for SFP-module set SIGDET polarity to low */ + ctrl = gm_phy_read(hw, port, PHY_MARV_PHY_CTRL); + ctrl |= PHY_M_FIB_SIGD_POL; + gm_phy_write(hw, port, PHY_MARV_CTRL, ctrl); } + + gm_phy_write(hw, port, PHY_MARV_EXT_ADR, pg); } ctrl = gm_phy_read(hw, port, PHY_MARV_CTRL); @@ -361,7 +373,7 @@ reg = 0; if (sky2-autoneg == AUTONEG_ENABLE) { - if (hw-copper) { + if (sky2_is_copper(hw)) { if (sky2-advertising ADVERTISED_1000baseT_Full) ct1000 |= PHY_M_1000C_AFD; if (sky2-advertising ADVERTISED_1000baseT_Half) @@ -374,8 +386,12 @@ adv |= PHY_M_AN_10_FD; if (sky2-advertising ADVERTISED_10baseT_Half) adv |= PHY_M_AN_10_HD; - } else /* special defines for FIBER (88E1011S only) */ - adv |= PHY_M_AN_1000X_AHD | PHY_M_AN_1000X_AFD; + } else {/* special defines for FIBER (88E1040S only) */ + if (sky2-advertising ADVERTISED_1000baseT_Full) + adv |= PHY_M_AN_1000X_AFD; + if (sky2-advertising ADVERTISED_1000baseT_Half) + adv |= PHY_M_AN_1000X_AHD; + } /* Set Flow-control capabilities */ if (sky2-tx_pause sky2-rx_pause) @@ -1494,7 +1510,7 @@ static u16 sky2_phy_speed(const struct sky2_hw *hw, u16 aux) { - if (!hw-copper) + if (!sky2_is_copper(hw)) return SPEED_1000; if (hw-chip_id == CHIP_ID_YUKON_FE) @@ -2266,7 +2282,7 @@ static int sky2_reset(struct sky2_hw *hw) { u16 status; - u8 t8, pmd_type; + u8 t8; int i; sky2_write8(hw, B0_CTST, CS_RST_CLR); @@ -2312,9 +2328,7 @@ sky2_pci_write32(hw, PEX_UNC_ERR_STAT, 0xUL); - pmd_type = sky2_read8(hw, B2_PMD_TYP); - hw-copper = !(pmd_type == 'L' || pmd_type == 'S'); - + hw-pmd_type = sky2_read8(hw, B2_PMD_TYP); hw-ports = 1; t8 = sky2_read8(hw, B2_Y2_HW_RES); if ((t8 CFG_DUAL_MAC_MSK) == CFG_DUAL_MAC_MSK) { @@
[RFT 3/5] sky2: handle forced settings
Handle cases where pause parameters are forced. Need to program the GMAC before starting the PHY, not after. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-09-05 12:10:18.0 -0700 +++ sky2/drivers/net/sky2.c 2006-09-05 13:32:59.0 -0700 @@ -289,7 +289,7 @@ static void sky2_phy_init(struct sky2_hw *hw, unsigned port) { struct sky2_port *sky2 = netdev_priv(hw-dev[port]); - u16 ctrl, ct1000, adv, pg, ledctrl, ledover; + u16 ctrl, ct1000, adv, pg, ledctrl, ledover, reg; if (sky2-autoneg == AUTONEG_ENABLE !(hw-chip_id == CHIP_ID_YUKON_XL || hw-chip_id == CHIP_ID_YUKON_EC_U)) { @@ -358,6 +358,7 @@ ctrl = 0; ct1000 = 0; adv = PHY_AN_CSMA; + reg = 0; if (sky2-autoneg == AUTONEG_ENABLE) { if (hw-copper) { @@ -390,21 +391,44 @@ /* forced speed/duplex settings */ ct1000 = PHY_M_1000C_MSE; - if (sky2-duplex == DUPLEX_FULL) - ctrl |= PHY_CT_DUP_MD; + /* Disable auto update for duplex flow control and speed */ + reg |= GM_GPCR_AU_ALL_DIS; switch (sky2-speed) { case SPEED_1000: ctrl |= PHY_CT_SP1000; + reg |= GM_GPCR_SPEED_1000; break; case SPEED_100: ctrl |= PHY_CT_SP100; + reg |= GM_GPCR_SPEED_100; break; } + if (sky2-duplex == DUPLEX_FULL) { + reg |= GM_GPCR_DUP_FULL; + ctrl |= PHY_CT_DUP_MD; + } + + if (!sky2-rx_pause) + reg |= GM_GPCR_FC_RX_DIS; + + if (!sky2-tx_pause) + reg |= GM_GPCR_FC_TX_DIS; + + /* Forward pause packets to GMAC? */ + if (!sky2-tx_pause || + (hw-chip_id != CHIP_ID_YUKON_EC_U +sky2-duplex == DUPLEX_HALF sky2-speed != SPEED_1000)) + sky2_write8(hw, SK_REG(port, GMAC_CTRL), GMC_PAUSE_OFF); + else + sky2_write8(hw, SK_REG(port, GMAC_CTRL), GMC_PAUSE_ON); + ctrl |= PHY_CT_RESET; } + gma_write16(hw, port, GM_GP_CTRL, reg); + if (hw-chip_id != CHIP_ID_YUKON_FE) gm_phy_write(hw, port, PHY_MARV_1000T_CTRL, ct1000); @@ -508,6 +532,7 @@ gm_phy_write(hw, port, PHY_MARV_LED_OVER, ledover); } + /* Enable phy interrupt on auto-negotiation complete (or link up) */ if (sky2-autoneg == AUTONEG_ENABLE) gm_phy_write(hw, port, PHY_MARV_INT_MASK, PHY_M_IS_AN_COMPL); @@ -570,49 +595,11 @@ gm_phy_read(hw, 1, PHY_MARV_INT_MASK) != 0); } - if (sky2-autoneg == AUTONEG_DISABLE) { - reg = gma_read16(hw, port, GM_GP_CTRL); - reg |= GM_GPCR_AU_ALL_DIS; - gma_write16(hw, port, GM_GP_CTRL, reg); - gma_read16(hw, port, GM_GP_CTRL); - - switch (sky2-speed) { - case SPEED_1000: - reg = ~GM_GPCR_SPEED_100; - reg |= GM_GPCR_SPEED_1000; - break; - case SPEED_100: - reg = ~GM_GPCR_SPEED_1000; - reg |= GM_GPCR_SPEED_100; - break; - case SPEED_10: - reg = ~(GM_GPCR_SPEED_1000 | GM_GPCR_SPEED_100); - break; - } - - if (sky2-duplex == DUPLEX_FULL) - reg |= GM_GPCR_DUP_FULL; - - /* turn off pause in 10/100mbps half duplex */ - else if (sky2-speed != SPEED_1000 -hw-chip_id != CHIP_ID_YUKON_EC_U) - sky2-tx_pause = sky2-rx_pause = 0; - } else - reg = GM_GPCR_SPEED_1000 | GM_GPCR_SPEED_100 | GM_GPCR_DUP_FULL; - - if (!sky2-tx_pause !sky2-rx_pause) { - sky2_write32(hw, SK_REG(port, GMAC_CTRL), GMC_PAUSE_OFF); - reg |= - GM_GPCR_FC_TX_DIS | GM_GPCR_FC_RX_DIS | GM_GPCR_AU_FCT_DIS; - } else if (sky2-tx_pause !sky2-rx_pause) { - /* disable Rx flow-control */ - reg |= GM_GPCR_FC_RX_DIS | GM_GPCR_AU_FCT_DIS; - } - - gma_write16(hw, port, GM_GP_CTRL, reg); - sky2_read16(hw, SK_REG(port, GMAC_IRQ_SRC)); + /* Enable Transmit FIFO Underrun */ + sky2_write8(hw, SK_REG(port, GMAC_IRQ_MSK), GMAC_DEF_MSK); + spin_lock_bh(sky2-phy_lock); sky2_phy_init(hw, port); spin_unlock_bh(sky2-phy_lock); @@ -1529,40 +1516,10 @@ unsigned port = sky2-port; u16 reg; - /* Enable Transmit
[RFT 2/5] sky2: accept flow control
Don't program the GMAC to reject flow control packets. This maybe the cause of some of the receive hangs. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.h2006-09-05 15:17:38.0 -0700 +++ sky2/drivers/net/sky2.h 2006-09-05 15:18:00.0 -0700 @@ -1566,7 +1566,7 @@ GMR_FS_ANY_ERR = GMR_FS_RX_FF_OV | GMR_FS_CRC_ERR | GMR_FS_FRAGMENT | GMR_FS_LONG_ERR | - GMR_FS_MII_ERR | GMR_FS_BAD_FC | GMR_FS_GOOD_FC | + GMR_FS_MII_ERR | GMR_FS_BAD_FC | GMR_FS_UN_SIZE | GMR_FS_JABBER, }; -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFT 0/5] sky2 experimental patches
These patches (against 2.6.18-rc6) may solve some of the mystery hangs and other open problems. Still seeking confirmation. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFT 1/5] sky2: more device ids (resend)
Some more Marvell device id's, these are from the latest SysKonnect driver version. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- drivers/net/sky2.c |3 +++ 1 file changed, 3 insertions(+) --- sky2.orig/drivers/net/sky2.c2006-09-01 14:49:49.0 -0700 +++ sky2/drivers/net/sky2.c 2006-09-01 14:49:56.0 -0700 @@ -106,6 +106,7 @@ { PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9000) }, { PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9E00) }, { PCI_DEVICE(PCI_VENDOR_ID_DLINK, 0x4b00) },/* DGE-560T */ + { PCI_DEVICE(PCI_VENDOR_ID_DLINK, 0x4001) },/* DGE-550SX */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4340) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4341) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4342) }, @@ -117,6 +118,7 @@ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4350) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4351) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4352) }, + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4353) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4360) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4361) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4362) }, @@ -126,6 +128,7 @@ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4366) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4367) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4368) }, + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4369) }, { 0 } }; -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFT 4/5] sky2: big endian fix
Revised version of Ben's patch to fix big endian support. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-09-05 13:39:34.0 -0700 +++ sky2/drivers/net/sky2.c 2006-09-05 13:57:44.0 -0700 @@ -809,7 +809,7 @@ struct sky2_rx_le *le; le = sky2_next_rx(sky2); - le-addr = (ETH_HLEN 16) | ETH_HLEN; + le-addr = cpu_to_le32((ETH_HLEN 16) | ETH_HLEN); le-ctrl = 0; le-opcode = OP_TCPSTART | HW_OWNER; @@ -1227,7 +1227,7 @@ /* Send high bits if changed or crosses boundary */ if (addr64 != sky2-tx_addr64 || high32(mapping + len) != sky2-tx_addr64) { le = get_tx_le(sky2); - le-tx.addr = cpu_to_le32(addr64); + le-addr = cpu_to_le32(addr64); le-ctrl = 0; le-opcode = OP_ADDR64 | HW_OWNER; sky2-tx_addr64 = high32(mapping + len); @@ -1242,8 +1242,7 @@ if (mss != sky2-tx_last_mss) { le = get_tx_le(sky2); - le-tx.tso.size = cpu_to_le16(mss); - le-tx.tso.rsvd = 0; + le-addr = cpu_to_le32(mss); le-opcode = OP_LRGLEN | HW_OWNER; le-ctrl = 0; sky2-tx_last_mss = mss; @@ -1256,7 +1255,7 @@ if (sky2-vlgrp vlan_tx_tag_present(skb)) { if (!le) { le = get_tx_le(sky2); - le-tx.addr = 0; + le-addr = 0; le-opcode = OP_VLAN|HW_OWNER; le-ctrl = 0; } else @@ -1268,20 +1267,21 @@ /* Handle TCP checksum offload */ if (skb-ip_summed == CHECKSUM_HW) { - u16 hdr = skb-h.raw - skb-data; - u16 offset = hdr + skb-csum; + unsigned offset = skb-h.raw - skb-data; + u32 tcpsum; + + tcpsum = offset 16; /* sum start */ + tcpsum |= offset + skb-csum; /* sum write */ ctrl = CALSUM | WR_SUM | INIT_SUM | LOCK_SUM; if (skb-nh.iph-protocol == IPPROTO_UDP) ctrl |= UDPTCP; - if (hdr != sky2-tx_csum_start || offset != sky2-tx_csum_offset) { - sky2-tx_csum_start = hdr; - sky2-tx_csum_offset = offset; + if (tcpsum != sky2-tx_tcpsum) { + sky2-tx_tcpsum = tcpsum; le = get_tx_le(sky2); - le-tx.csum.start = cpu_to_le16(hdr); - le-tx.csum.offset = cpu_to_le16(offset); + le-addr = cpu_to_le32(tcpsum); le-length = 0; /* initial checksum value */ le-ctrl = 1; /* one packet */ le-opcode = OP_TCPLISW | HW_OWNER; @@ -1289,7 +1289,7 @@ } le = get_tx_le(sky2); - le-tx.addr = cpu_to_le32((u32) mapping); + le-addr = cpu_to_le32((u32) mapping); le-length = cpu_to_le16(len); le-ctrl = ctrl; le-opcode = mss ? (OP_LARGESEND | HW_OWNER) : (OP_PACKET | HW_OWNER); @@ -1307,14 +1307,14 @@ addr64 = high32(mapping); if (addr64 != sky2-tx_addr64) { le = get_tx_le(sky2); - le-tx.addr = cpu_to_le32(addr64); + le-addr = cpu_to_le32(addr64); le-ctrl = 0; le-opcode = OP_ADDR64 | HW_OWNER; sky2-tx_addr64 = addr64; } le = get_tx_le(sky2); - le-tx.addr = cpu_to_le32((u32) mapping); + le-addr = cpu_to_le32((u32) mapping); le-length = cpu_to_le16(frag-size); le-ctrl = ctrl; le-opcode = OP_BUFFER | HW_OWNER; @@ -1919,8 +1919,8 @@ dev = hw-dev[le-link]; sky2 = netdev_priv(dev); - length = le-length; - status = le-status; + length = le16_to_cpu(le-length); + status = le32_to_cpu(le-status); switch (le-opcode ~HW_OWNER) { case OP_RXSTAT: @@ -1964,7 +1964,7 @@ case OP_RXCHKS: skb = sky2-rx_ring[sky2-rx_next].skb; skb-ip_summed = CHECKSUM_HW; - skb-csum = le16_to_cpu(status); + skb-csum = status 0x; break; case OP_TXINDEXLE: @@ -3266,12 +3266,13 @@ hw-pm_cap = pm_cap; #ifdef __BIG_ENDIAN - /* byte swap descriptors in hardware */ + /* The sk98lin vendor driver uses hardware byte swapping but +* this driver uses software swapping. +*/ { u32 reg; -
[2.6 patch] net/sctp/: cleanups
This patch contains the following cleanups: - make the following needlessly global function static: - socket.c: sctp_apply_peer_addr_params() - add proper prototypes for the several global functions in include/net/sctp/sctp.h Note that this fixes wrong prototypes for the following functions: - sctp_snmp_proc_exit() - sctp_eps_proc_exit() - sctp_assocs_proc_exit() The latter was spotted by the GNU C compiler and reported by David Woodhouse. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- include/net/sctp/sctp.h | 13 + net/sctp/ipv6.c |1 - net/sctp/protocol.c |7 --- net/sctp/socket.c | 14 +++--- 4 files changed, 20 insertions(+), 15 deletions(-) --- linux-2.6.18-rc5-mm1/include/net/sctp/sctp.h.old2006-09-05 16:50:33.0 +0200 +++ linux-2.6.18-rc5-mm1/include/net/sctp/sctp.h2006-09-05 16:54:18.0 +0200 @@ -128,6 +128,8 @@ int flags); extern struct sctp_pf *sctp_get_pf_specific(sa_family_t family); extern int sctp_register_pf(struct sctp_pf *, sa_family_t); +int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev, +void *ptr); /* * sctp/socket.c @@ -178,6 +180,17 @@ struct sock *oldsk, struct sock *newsk); /* + * sctp/proc.c + */ +int sctp_snmp_proc_init(void); +void sctp_snmp_proc_exit(void); +int sctp_eps_proc_init(void); +void sctp_eps_proc_exit(void); +int sctp_assocs_proc_init(void); +void sctp_assocs_proc_exit(void); + + +/* * Section: Macros, externs, and inlines */ --- linux-2.6.18-rc5-mm1/net/sctp/socket.c.old 2006-09-05 16:49:15.0 +0200 +++ linux-2.6.18-rc5-mm1/net/sctp/socket.c 2006-09-05 16:49:27.0 +0200 @@ -2081,13 +2081,13 @@ * SPP_SACKDELAY_ENABLE, setting both will have undefined * results. */ -int sctp_apply_peer_addr_params(struct sctp_paddrparams *params, - struct sctp_transport *trans, - struct sctp_association *asoc, - struct sctp_sock*sp, - int hb_change, - int pmtud_change, - int sackdelay_change) +static int sctp_apply_peer_addr_params(struct sctp_paddrparams *params, + struct sctp_transport *trans, + struct sctp_association *asoc, + struct sctp_sock*sp, + int hb_change, + int pmtud_change, + int sackdelay_change) { int error; --- linux-2.6.18-rc5-mm1/net/sctp/ipv6.c.old2006-09-05 16:50:51.0 +0200 +++ linux-2.6.18-rc5-mm1/net/sctp/ipv6.c2006-09-05 16:50:58.0 +0200 @@ -78,7 +78,6 @@ #include asm/uaccess.h -extern int sctp_inetaddr_event(struct notifier_block *, unsigned long, void *); static struct notifier_block sctp_inet6addr_notifier = { .notifier_call = sctp_inetaddr_event, }; --- linux-2.6.18-rc5-mm1/net/sctp/protocol.c.old2006-09-05 16:53:10.0 +0200 +++ linux-2.6.18-rc5-mm1/net/sctp/protocol.c2006-09-05 16:53:20.0 +0200 @@ -82,13 +82,6 @@ kmem_cache_t *sctp_chunk_cachep __read_mostly; kmem_cache_t *sctp_bucket_cachep __read_mostly; -extern int sctp_snmp_proc_init(void); -extern int sctp_snmp_proc_exit(void); -extern int sctp_eps_proc_init(void); -extern int sctp_eps_proc_exit(void); -extern int sctp_assocs_proc_init(void); -extern int sctp_assocs_proc_exit(void); - /* Return the address of the control sock. */ struct sock *sctp_get_ctl_sock(void) { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] A change in periodic work scheduling in bcm43xx
Michael, Based on user reports and my own experiences, the current problems with NETDEV WATCHDOG tx timeouts, and the device just falling over do not happen when periodic work is not preemptible. These problems seem to affect BCM4306 rev 2 3 chips. Since I changed BADNESS_LIMIT to 20 to disable preemption during periodic work, my device has stayed up continuously for more than 18 hours. Previously, the longest time between failures was less than 6 hours, and sometimes as short as 10 minutes. As you know, the present scheme for periodic work scheduling for bcm43xx in both wireless-2.6 and wireless-dev runs all 4 periodic tasks on certain ticks of the 15-second clock. Using your values of badness of 1, 1, 5, and 10 for the 15, 30, 60, and 120 second periodic tasks, respectively, the badness repeat cycle is ..., 1, 2, 1, 7, 1, 2, 1, 17, ... I propose that we reduce the size of the spike in badness by shifting the 120 second task from a clock value of 8n to 8n+7, and the 60 second task from 4n to 4n+1. This way no more than 2 of the periodic tasks will be run in any clock period, and the badness repeat cycle becomes ..., 6, 2, 1, 2, 6, 2, 11, 2, The tasks are run with the same periodicity as before, just a little more asynchronously. I recall that they were completely asynchronous in early versions of this driver. Until we can locate and fix the problem that occurs during preemption, should we consider setting BADNESS_LIMIT to 20 in the wireless-2.6 kernels? For those of us whose cards have the problem, it certainly makes the device a lot more usable. Larry The patches to implement the scheduling change are as follows: Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c === --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c @@ -3195,9 +3195,9 @@ static void do_periodic_work(struct bcm4 unsigned int state; state = bcm-periodic_state; - if (state % 8 == 0) + if (state % 8 == 7) bcm43xx_periodic_every120sec(bcm); - if (state % 4 == 0) + if (state % 4 == 1) bcm43xx_periodic_every60sec(bcm); if (state % 2 == 0) bcm43xx_periodic_every30sec(bcm); @@ -3216,8 +3216,8 @@ static int estimate_periodic_work_badnes { int badness = 0; - if (state % 8 == 0) /* every 120 sec */ + if (state % 8 == 7) /* every 120 sec */ badness += 10; - if (state % 4 == 0) /* every 60 sec */ + if (state % 4 == 1) /* every 60 sec */ badness += 5; if (state % 2 == 0) /* every 30 sec */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] A change in periodic work scheduling in bcm43xx
On Tuesday 05 September 2006 19:58, Larry Finger wrote: Michael, Based on user reports and my own experiences, the current problems with NETDEV WATCHDOG tx timeouts, and the device just falling over do not happen when periodic work is not preemptible. These problems seem to affect BCM4306 rev 2 3 chips. Since I changed BADNESS_LIMIT to 20 to disable preemption during periodic work, my device has stayed up continuously for more than 18 hours. Previously, the longest time between failures was less than 6 hours, and sometimes as short as 10 minutes. As you know, the present scheme for periodic work scheduling for bcm43xx in both wireless-2.6 and wireless-dev runs all 4 periodic tasks on certain ticks of the 15-second clock. Using your values of badness of 1, 1, 5, and 10 for the 15, 30, 60, and 120 second periodic tasks, respectively, the badness repeat cycle is ..., 1, 2, 1, 7, 1, 2, 1, 17, ... I propose that we reduce the size of the spike in badness by shifting the 120 second task from a clock value of 8n to 8n+7, and the 60 second task from 4n to 4n+1. This way no more than 2 of the periodic tasks will be run in any clock period, and the badness repeat cycle becomes ..., 6, 2, 1, 2, 6, 2, 11, 2, The tasks are run with the same periodicity as before, just a little more asynchronously. I recall that they were completely asynchronous in early versions of this driver. Until we can locate and fix the problem that occurs during preemption, should we consider setting BADNESS_LIMIT to 20 in the wireless-2.6 kernels? For those of us whose cards have the problem, it certainly makes the device a lot more usable. Oh well... And if we do this, it will take two weeks for the latency-people to show up and request a revert of this again. Well, I _really_ don't want to have a patch like this, because it just papers over a real bug. There are only two choices: Either we want preemption or we don't. It's worthless to tune the badness limit to a point where it is least likely for the bug to trigger. Sooner or later it _will_ trigger. What we really want is: 1st: A relieable way to reproduce the bug in short time. Waiting 20hours isn't really a good way of debugging. 2nd: If we can reproduce it in reasonable time, we can track down what is actually causing the bug. My thoughts on the bug: When a preemptible work happens, we completely shutdown IRQ handling and we suspend the MAC. We do this, because we must not take the IRQ spinlock if we want to be preemptible. By not taking the IRQ spinlock, we race against the DMA engine (and other parts). So we must shutdown any data flow during the periodic work to ensure the IRQ handler does not trigger. The sad thing is: We don't know much about how the card and the firmware works (yet). So the big question is: How to suspend the card in an easy and _inexpensive_ way? We currently mask all IRQs and suspend the MAC. I guess MAC suspending is part of the problem. I _guess_ the card is confused by suspending the MAC in the middle of possible transmissions. It's all just a guess. That's why I want to have a good way to reproduce the bug to do experiments. We could suspend the DMA TX channel before we suspend the MAC, for example. We could try other things as well. For example don't suspend the MAC at all. Just mask IRQs. We must be _careful_ here. The preemptible periodic work is a damn fragile part of the whole driver and it is easily possible to break it even more with a patch that looks correct. Short: We don't need a patch to paper over the bug, but we need _ideas_ of what is actually going on. -- Greetings Michael. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPSEC]: searching SAD without assumming L3 details
On Sat, Sep 02, 2006 at 09:43:02AM -0400, jamal wrote: Allow for searching the SAD from external data path points without assumming L3 details. The only customer of this exposure currently is pktgen. Any reason why xfrm_state_find can't be used? It doesn't look right to add generic code that's only used by pktgen. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
roaming support for d80211 stack
Hi Are there any one working on roaming support for d80211 stack? Thanks Mohamed - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 Detected Tx Unit Hang
I haven't done the NAPI yet. These are identical systems altogether, maybe the CPU is a different stepping at the most, but that is all. The 16: 70540 0 IO-APIC-level uhci_hcd:usb4, eth0 is the same in every GS12 I have. No overclocking and same BIOS. Tyan released ver 1.8 about a month ago and I did the upgrade and same effect. Then I thought about upgrading to 2.6.17.11 just to see if the driver will have any issues and nothing, same deal. The only way I was able to control it was usign a dummy 10/100 non-management switch. Then we had no issues. I will try without NAPI tomorrow 9-6-06 and will report back. My understanding on NAPI was that it will drop packets by design on overload. Why will that cause a system lock? Are there any other kernel options you would like to enable to track this better and if you need remote access to the system I can accomodate too, just let me know what time zone you are to schedule it. Let me know. Regards, Paul Aviles - Original Message - From: Jesse Brandeburg [EMAIL PROTECTED] To: Paul Aviles [EMAIL PROTECTED] Cc: netdev@vger.kernel.org Sent: Tuesday, September 05, 2006 12:09 PM Subject: Re: e1000 Detected Tx Unit Hang On 9/3/06, Paul Aviles [EMAIL PROTECTED] wrote: Hey Jesse, thanks for your reply. Here is the stuff on /procs. The weird no problem, part is that I have several other identical systems and only one is affected. Today I moved the hard drive to another similar system and I am not seeing the problem so I am wondering if is something maybe wrong with the card eeprom? Is there a way to check that? I doubt it is an eeprom problem. you can dump the eeproms with ethtool -e eth0 from both machines and compare them . Odd that only one system is having the problem. Could it be that the hardware on that box is having issues? Are you sure the machines are running the same bios version with the same settings? Any overclocking? cat /proc/interrupts CPU0 CPU1 16: 70540 0 IO-APIC-level uhci_hcd:usb4, eth0 this could contribute to your problem, were you able to test without NAPI? Jesse - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] net/sctp/: cleanups
On Tue, 2006-09-05 at 23:57 +0200, Adrian Bunk wrote: This patch contains the following cleanups: - make the following needlessly global function static: - socket.c: sctp_apply_peer_addr_params() - add proper prototypes for the several global functions in include/net/sctp/sctp.h Note that this fixes wrong prototypes for the following functions: - sctp_snmp_proc_exit() - sctp_eps_proc_exit() - sctp_assocs_proc_exit() The latter was spotted by the GNU C compiler and reported by David Woodhouse. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Acked-by: Sridhar Samudrala [EMAIL PROTECTED] --- include/net/sctp/sctp.h | 13 + net/sctp/ipv6.c |1 - net/sctp/protocol.c |7 --- net/sctp/socket.c | 14 +++--- 4 files changed, 20 insertions(+), 15 deletions(-) --- linux-2.6.18-rc5-mm1/include/net/sctp/sctp.h.old 2006-09-05 16:50:33.0 +0200 +++ linux-2.6.18-rc5-mm1/include/net/sctp/sctp.h 2006-09-05 16:54:18.0 +0200 @@ -128,6 +128,8 @@ int flags); extern struct sctp_pf *sctp_get_pf_specific(sa_family_t family); extern int sctp_register_pf(struct sctp_pf *, sa_family_t); +int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev, +void *ptr); /* * sctp/socket.c @@ -178,6 +180,17 @@ struct sock *oldsk, struct sock *newsk); /* + * sctp/proc.c + */ +int sctp_snmp_proc_init(void); +void sctp_snmp_proc_exit(void); +int sctp_eps_proc_init(void); +void sctp_eps_proc_exit(void); +int sctp_assocs_proc_init(void); +void sctp_assocs_proc_exit(void); + + +/* * Section: Macros, externs, and inlines */ --- linux-2.6.18-rc5-mm1/net/sctp/socket.c.old2006-09-05 16:49:15.0 +0200 +++ linux-2.6.18-rc5-mm1/net/sctp/socket.c2006-09-05 16:49:27.0 +0200 @@ -2081,13 +2081,13 @@ * SPP_SACKDELAY_ENABLE, setting both will have undefined * results. */ -int sctp_apply_peer_addr_params(struct sctp_paddrparams *params, - struct sctp_transport *trans, - struct sctp_association *asoc, - struct sctp_sock*sp, - int hb_change, - int pmtud_change, - int sackdelay_change) +static int sctp_apply_peer_addr_params(struct sctp_paddrparams *params, +struct sctp_transport *trans, +struct sctp_association *asoc, +struct sctp_sock*sp, +int hb_change, +int pmtud_change, +int sackdelay_change) { int error; --- linux-2.6.18-rc5-mm1/net/sctp/ipv6.c.old 2006-09-05 16:50:51.0 +0200 +++ linux-2.6.18-rc5-mm1/net/sctp/ipv6.c 2006-09-05 16:50:58.0 +0200 @@ -78,7 +78,6 @@ #include asm/uaccess.h -extern int sctp_inetaddr_event(struct notifier_block *, unsigned long, void *); static struct notifier_block sctp_inet6addr_notifier = { .notifier_call = sctp_inetaddr_event, }; --- linux-2.6.18-rc5-mm1/net/sctp/protocol.c.old 2006-09-05 16:53:10.0 +0200 +++ linux-2.6.18-rc5-mm1/net/sctp/protocol.c 2006-09-05 16:53:20.0 +0200 @@ -82,13 +82,6 @@ kmem_cache_t *sctp_chunk_cachep __read_mostly; kmem_cache_t *sctp_bucket_cachep __read_mostly; -extern int sctp_snmp_proc_init(void); -extern int sctp_snmp_proc_exit(void); -extern int sctp_eps_proc_init(void); -extern int sctp_eps_proc_exit(void); -extern int sctp_assocs_proc_init(void); -extern int sctp_assocs_proc_exit(void); - /* Return the address of the control sock. */ struct sock *sctp_get_ctl_sock(void) { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFT] sky2 vs iptables
Hi, There's a strange sky2 bug on the Gentoo bugzilla: http://bugs.gentoo.org/show_bug.cgi?id=136508 sky2 seems to work OK, but breaks as soon as the iptables ruleset is loaded. Nothing can be pinged, etc. Can someone try and reproduce this? The iptables rule script has been uploaded here: http://bugs.gentoo.org/attachment.cgi?id=95694action=view The very last command in that file is the one which produces an error and stops everything working: iptables: Unknown error 18446744073709551615 Apparently a sky2 null deref has also been seen at this point, although I don't have further details on that. Thanks! Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] myri10ge: update the firmware download URL in Kconfig
Jeff, Could you please push the following patch to Linus before 2.6.18? It updates the firmware download URL in Kconfig to match the header in drivers/net/myri10ge/myri10ge.c. Thanks! Brice Goglin From: Brice Goglin [EMAIL PROTECTED] [PATCH] myri10ge: update the firmware download URL in Kconfig Update the firmware download URL in Kconfig to match the header in drivers/net/myri10ge/myri10ge.c. Signed-off-by: Brice Goglin [EMAIL PROTECTED] --- drivers/net/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-rc/drivers/net/Kconfig === --- linux-rc.orig/drivers/net/Kconfig 2006-09-05 11:19:37.0 -0400 +++ linux-rc/drivers/net/Kconfig2006-09-05 11:19:49.0 -0400 @@ -2393,7 +2393,7 @@ you will need a newer firmware image. You may get this image or more information, at: - http://www.myri.com/Myri-10G/ + http://www.myri.com/scs/download-Myri10GE.html To compile this driver as a module, choose M here and read file:Documentation/networking/net-modules.txt. The module - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ProxyARP and IPSec
Alexey Kuznetsov writes: Probably, you are not aware that standard IPsec tunnel device, if it is created: 1. Probably, will not accept fragmented frames, because IPsec cannot handle them IPsec can handle them, though not particularly smoothly if the IPsec tunnel is only supposed to carry a particular portprotocol combination. 2. Probably, will have undefined MTU (65536), because of 1 An MTU that is more likely to make most things work (at least over Ethernet) is ETH_DATA_LEN - MAX_SA_LEN where MAX_SA_LEN is however much is required for IPsec (something like IP + UDP if NAT-T + ESP header + IV + padding + ESP trailer). The simplest thing is to just statically configure it. However, some implementations dynamically calculate the IPsec device MTU based on the maximum size required by any of the IPsec SAs that will go over the interface, using either a pessimistic (255) or optimistic (2) padding estimate. This can cause problems for OPSF adjacency if each side arrives at a different MTU but that can be handled by either manually configuring the device MTU or explicitly configuring the MTU that OSPF will advertise (I think Quagga supports that). Actually, this is the reason why it is not implemented. It is dirty business. :-) And the person, who implements this, has to be really... unscrupulous. :-) Exactly the same issue occurs if one implements IPsec (or any other encryption method) in user-level using a tun/tap device. Consequently while I agree that fragmentation causes an additional level of problems if one wants to have port/protocol based selectors in IPsec, I believe that most (but not all) VPN users are quite satisfied with policies containing all traffic, all ports and so will not encounter any IPsec specific problems. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] FRV: do_gettimeofday() should no longer use tickadj
On Tue, 2006-09-05 at 16:35 +0100, David Howells wrote: Stop do_gettimeofday() on FRV from using tickadj, and model it after ARM instead. This patch also provides a placeholder macro for getting hardware timer data to be filled in when such is available. From this patch it looks like the FRV arch could be trivially converted to GENERIC_TIME. Would you consider the following, totally untested patch? Signed-off-by: John Stultz [EMAIL PROTECTED] Kconfig |4 ++ kernel/time.c | 81 -- 2 files changed, 4 insertions(+), 81 deletions(-) diff --git a/arch/frv/Kconfig b/arch/frv/Kconfig index 95a3892..a601a17 100644 --- a/arch/frv/Kconfig +++ b/arch/frv/Kconfig @@ -29,6 +29,10 @@ config GENERIC_HARDIRQS bool default n +config GENERIC_TIME + bool + default y + config TIME_LOW_RES bool default y diff --git a/arch/frv/kernel/time.c b/arch/frv/kernel/time.c index d5b64e1..68a77fe 100644 --- a/arch/frv/kernel/time.c +++ b/arch/frv/kernel/time.c @@ -32,8 +32,6 @@ #define TICK_SIZE (tick_nsec / 1000) -extern unsigned long wall_jiffies; - unsigned long __nongprelbss __clkin_clock_speed_HZ; unsigned long __nongprelbss __ext_bus_clock_speed_HZ; unsigned long __nongprelbss __res_bus_clock_speed_HZ; @@ -145,85 +143,6 @@ void time_init(void) } /* - * This version of gettimeofday has near microsecond resolution. - */ -void do_gettimeofday(struct timeval *tv) -{ - unsigned long seq; - unsigned long usec, sec; - unsigned long max_ntp_tick; - - do { - unsigned long lost; - - seq = read_seqbegin(xtime_lock); - - usec = 0; - lost = jiffies - wall_jiffies; - - /* -* If time_adjust is negative then NTP is slowing the clock -* so make sure not to go into next possible interval. -* Better to lose some accuracy than have time go backwards.. -*/ - if (unlikely(time_adjust 0)) { - max_ntp_tick = (USEC_PER_SEC / HZ) - tickadj; - usec = min(usec, max_ntp_tick); - - if (lost) - usec += lost * max_ntp_tick; - } - else if (unlikely(lost)) - usec += lost * (USEC_PER_SEC / HZ); - - sec = xtime.tv_sec; - usec += (xtime.tv_nsec / 1000); - } while (read_seqretry(xtime_lock, seq)); - - while (usec = 100) { - usec -= 100; - sec++; - } - - tv-tv_sec = sec; - tv-tv_usec = usec; -} - -EXPORT_SYMBOL(do_gettimeofday); - -int do_settimeofday(struct timespec *tv) -{ - time_t wtm_sec, sec = tv-tv_sec; - long wtm_nsec, nsec = tv-tv_nsec; - - if ((unsigned long)tv-tv_nsec = NSEC_PER_SEC) - return -EINVAL; - - write_seqlock_irq(xtime_lock); - /* -* This is revolting. We need to set xtime correctly. However, the -* value in this location is the value at the most recent update of -* wall time. Discover what correction gettimeofday() would have -* made, and then undo it! -*/ - nsec -= 0 * NSEC_PER_USEC; - nsec -= (jiffies - wall_jiffies) * TICK_NSEC; - - wtm_sec = wall_to_monotonic.tv_sec + (xtime.tv_sec - sec); - wtm_nsec = wall_to_monotonic.tv_nsec + (xtime.tv_nsec - nsec); - - set_normalized_timespec(xtime, sec, nsec); - set_normalized_timespec(wall_to_monotonic, wtm_sec, wtm_nsec); - - ntp_clear(); - write_sequnlock_irq(xtime_lock); - clock_was_set(); - return 0; -} - -EXPORT_SYMBOL(do_settimeofday); - -/* * Scheduler clock - returns current time in nanosec units. */ unsigned long long sched_clock(void) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html