Re : Re : Re : Re : Bonding : Monitoring of 4965 wireless card
Yes it's what i'm looking for. I don't understand how to change the arp_ip_target with the gateway, arp_ip_target is a module option. - Message d'origine De : Jay Vosburgh [EMAIL PROTECTED] À : [EMAIL PROTECTED] Cc : netdev@vger.kernel.org Envoyé le : Jeudi, 10 Janvier 2008, 0h26mn 38s Objet : Re: Re : Re : Re : Bonding : Monitoring of 4965 wireless card [EMAIL PROTECTED] wrote: I mean that instead of arp test an ip in lan or else, i want it to test 127.0.0.1 but in order to do this it must go out and re-enter and then use wlan0 to go out. In other words, what I think you're saying (and I'm not entirely sure here) is that you want probes to go to a remote node on the network, and back, without having to actually know the identity of the remote node (because, presumably, on a roaming type of wireless configuration, your gateway and whatnot can change from time to time). Is that what you're looking for? That isn't available now, but might be straightforward to plug into the address update system to keep the arp_ip_target up to date as the current gateway as the gateway changes. I haven't looked into the details of doing that, but in theory it sounds straightforward. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - Message d'origine De : Jay Vosburgh [EMAIL PROTECTED] À : [EMAIL PROTECTED] Cc : John W. Linville [EMAIL PROTECTED]; netdev@vger.kernel.org Envoyé le : Mercredi, 9 Janvier 2008, 22h36mn 00s Objet : Re: Re : Re : Bonding : Monitoring of 4965 wireless card [EMAIL PROTECTED] wrote: I ignore it, but it seems like it prevent bonding detect link of wlan0. I enslave wlan0 and i already use use_carrier=1; The default for bonding is use_carrier=1, which makes bonding use the device driver's netif_carrier_on/off state for link detection. Bonding only checks via ethtool/mii if use_carrier=0. I'll try arp monitoring but this is annoying i c'ant test localhost. Is there a way to test localhost with arp, without pass through lo ? What do you mean by test localhost with arp, without pass through lo? ARP monitoring issues probes (ARPs) to a remote destination to confirm that there is connectivity; I'm not sure what localhost has to do with it. In general, though, I have not tested bonding with wireless adapters, so I'm unfamiliar with how well it does or does not work. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] _ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html _ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
No idea about shaping trough many pc
Hello all. I try more then 2 month resolve problem witch my shaping. Maybe you can help for me? Sheme: +---+ + - | Shaping PC 1 | -+ / +---+ \ ++ / ++ \ + + | Cisco | + | Shaping PC N | ---+ -| CISCO | ++ \ ++ / +-+ \ +-+ / + - | Shaping PC 20 | + +-+ Network - Over 10k users. Common bandwidth to INTERNET more then 1 GBs All computers have BGP and turn on multipath. Cisco can't do load sharing by Packet (its can resolve all my problems =((( ). Only by DST IP, SRC IP, or +Level4. Ok. User must have speed 1mbs. Lets look variants: 1. Create rules to user = (1mbs/N computers). If user use N connection all great, but if it use 1 connection his speed = 1mbs/N - its not look good. All be great if cisco can PER PACKET load sharing =( 2. Create rules to user = 1mbs. If user use 1 connection all great, but if it use N connection his speed much more then needed limit =( Why i use 20 PC? Becouse 1 pc normal forward 100-150mbs... when it have 100% cpu usage on Sofware Interrupts... Any idea how to resolve this problem? In my dreams (feature request to netdev ;) ): Get PC - title: MASTER TC. All 20 PC syncronize statistic with MASTER and have common rules and statistic. Then i use variant 2 and will be happy... but its not real? =( Maybe have other variants? Thanks for help! Slavon. P.S. Sorry for my english =( -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] net driver fixes
Well, it's 2.6.24-rc7 already - any news? I put this into my net-2.6 tree last night since Jeff asked me to look over critical networking driver stuff for a little while. Thanks, they are upstream now and it did fix tulip in my PPC - network is stable again. -- Meelis Roos ([EMAIL PROTECTED]) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] drivers/net/niu: Support for Marvell PHY
This patch makes necessary changes in the Neptune driver to support the new Marvell PHY. It also adds support for the LED blinking on Neptune cards with Marvell PHY. All registers are using defines in the niu.h header file as is already done for the BCM8704 registers. diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/niu.c linux-2.6-changed/drivers/net/niu.c --- linux-2.6/drivers/net/niu.c 2007-12-10 14:14:04.0 +0100 +++ linux-2.6-changed/drivers/net/niu.c 2008-01-09 14:32:59.0 +0100 @@ -801,22 +801,86 @@ static int bcm8704_init_user_dev3(struct return 0; } -static int xcvr_init_10g(struct niu *np) +void mrvl88x2011_act_led(struct niu *np, int val) +{ + int err; + err = mdio_read(np, np-phy_addr, MRVL88X2011_USER_DEV2_ADDR, + MRVL88X2011_LED_8_TO_11_CTL); + err = ~MRVL88X2011_LED(MRVL88X2011_LED_ACT,MRVL88X2011_LED_CTL_MASK); + err |= MRVL88X2011_LED(MRVL88X2011_LED_ACT,val); + + (void) mdio_write(np, np-phy_addr, MRVL88X2011_USER_DEV2_ADDR, + MRVL88X2011_LED_8_TO_11_CTL, err); +} + +void mrvl88x2011_led_blink_rate(struct niu *np, int rate) +{ + int err; + + err = mdio_read(np, np-phy_addr, MRVL88X2011_USER_DEV2_ADDR, + MRVL88X2011_LED_BLINK_CTL); + if (err = 0) { + err = ~MRVL88X2011_LED_BLKRATE_MASK; + err |= (rate 4); + + (void) mdio_write(np, np-phy_addr, MRVL88X2011_USER_DEV2_ADDR, + MRVL88X2011_LED_BLINK_CTL, err); + } +} + +static int xcvr_init_10g_mrvl88x2011(struct niu *np) +{ + int err; + + /* Set LED functions */ + mrvl88x2011_led_blink_rate(np, MRVL88X2011_LED_BLKRATE_134MS); + mrvl88x2011_act_led(np, MRVL88X2011_LED_CTL_OFF); /* led activity */ + + err = mdio_read(np, np-phy_addr, MRVL88X2011_USER_DEV3_ADDR, + MRVL88X2011_GENERAL_CTL); + if (err 0) { + return(err); + } + + err |= MRVL88X2011_ENA_XFPREFCLK; + + err = mdio_write(np, np-phy_addr, MRVL88X2011_USER_DEV3_ADDR, + MRVL88X2011_GENERAL_CTL, err); + if (err 0) { + return(err); + } + + err = mdio_read(np, np-phy_addr, MRVL88X2011_USER_DEV1_ADDR, + MRVL88X2011_PMA_PMD_CTL_1); + if (err 0) { + return(err); + } + + if (np-link_config.loopback_mode == LOOPBACK_MAC) { + err |= MRVL88X2011_LOOPBACK; + } + else { + err = ~MRVL88X2011_LOOPBACK; + } + + err = mdio_write(np, np-phy_addr, MRVL88X2011_USER_DEV1_ADDR, + MRVL88X2011_PMA_PMD_CTL_1, err); + if (err 0) { + return(err); + } + + /* Enable PMD */ + err = mdio_write(np, np-phy_addr, MRVL88X2011_USER_DEV1_ADDR, + MRVL88X2011_10G_PMD_TX_DIS, MRVL88X2011_ENA_PMDTX); + + return (err); +} + +static int xcvr_init_10g_bcm8704(struct niu *np) { struct niu_link_config *lp = np-link_config; u16 analog_stat0, tx_alarm_status; int err; - u64 val; - - val = nr64_mac(XMAC_CONFIG); - val = ~XMAC_CONFIG_LED_POLARITY; - val |= XMAC_CONFIG_FORCE_LED_ON; - nw64_mac(XMAC_CONFIG, val); - - /* XXX shared resource, lock parent XXX */ - val = nr64(MIF_CONFIG); - val |= MIF_CONFIG_INDIRECT_MODE; - nw64(MIF_CONFIG, val); err = bcm8704_reset(np); if (err) @@ -896,6 +960,39 @@ static int xcvr_init_10g(struct niu *np) return 0; } +static int xcvr_init_10g(struct niu *np) +{ + int err; + u64 val; + int phy_id; + + val = nr64_mac(XMAC_CONFIG); + val = ~XMAC_CONFIG_LED_POLARITY; + val |= XMAC_CONFIG_FORCE_LED_ON; + nw64_mac(XMAC_CONFIG, val); + + /* XXX shared resource, lock parent XXX */ + val = nr64(MIF_CONFIG); + val |= MIF_CONFIG_INDIRECT_MODE; + nw64(MIF_CONFIG, val); + + phy_id = phy_decode(np-parent-port_phy, np-port); + phy_id = np-parent-phy_probe_info.phy_id[phy_id][np-port]; + + /* handle different phy types */ + switch((phy_id NIU_PHY_ID_MASK)) { + case NIU_PHY_ID_MRVL88X2011: + err = xcvr_init_10g_mrvl88x2011(np); + break; + + default: /* bcom 8704 */ + err = xcvr_init_10g_bcm8704(np); + break; + } + + return 0; +} + static int mii_reset(struct niu *np) { int limit, err; @@ -1082,18 +1179,71 @@ static int niu_link_status_common(struct return 0; } -static int link_status_10g(struct niu *np, int *link_up_p) +static int link_status_10g_mrvl(struct niu *np, int *link_up_p) { - unsigned long flags; - int err, link_up; + int err; + int link_up = 0; + int pma_status; + int pcs_status; - link_up = 0; -
Re: [PATCH] drivers/net/niu: Support for Marvell PHY
From: Mirko Lindner [EMAIL PROTECTED] Date: Thu, 10 Jan 2008 10:33:01 +0100 This patch makes necessary changes in the Neptune driver to support the new Marvell PHY. It also adds support for the LED blinking on Neptune cards with Marvell PHY. All registers are using defines in the niu.h header file as is already done for the BCM8704 registers. Applied. Please provide a proper Signed-off-by: line next time as documented in linux/Documentation/SubmittingPatches Also there were a lot of coding style and other errors. I fixed them up because I already put you through the ringer to fix up the original patch. I'll note most of them, but reading linux/CodingStyle would be a great idea: -static int xcvr_init_10g(struct niu *np) +void mrvl88x2011_act_led(struct niu *np, int val) ... +void mrvl88x2011_led_blink_rate(struct niu *np, int rate) +{ Both of these functions should be marked static, return an int and check and return all errors indicated by mdio_read() and mdio_write(). +static int xcvr_init_10g_mrvl88x2011(struct niu *np) +{ ... + /* Set LED functions */ + mrvl88x2011_led_blink_rate(np, MRVL88X2011_LED_BLKRATE_134MS); + mrvl88x2011_act_led(np, MRVL88X2011_LED_CTL_OFF); /* led activity */ Longer than 80-column lines, no error checking. + err = mdio_read(np, np-phy_addr, MRVL88X2011_USER_DEV3_ADDR, + MRVL88X2011_GENERAL_CTL); + if (err 0) { + return(err); + } Extraneous openning and closing braces wasting precious screen real-estate. return is not a function taking an argument, don't surround simple values with parenthesis. These happened a lot, I won't mention the other instances. + err = mdio_write(np, np-phy_addr, MRVL88X2011_USER_DEV3_ADDR, + MRVL88X2011_GENERAL_CTL, err); Bad indentation, the argument on the second line of the mdio_write() call should line up with the initial np initial arg on the previous line. If necessary, use tools or editors that help do this automatically for you. Several folks (including me) use Emacs's C mode with coding style set to linux for that purpose. +static int xcvr_init_10g(struct niu *np) +{ + int err; + u64 val; + int phy_id; Multiple variables of the same type should be on one single line unless it would make the line too long. + /* handle different phy types */ + switch((phy_id NIU_PHY_ID_MASK)) { Space is needed between switch keyword and openning parenthesis. Only one set of parenthesis is sufficient here. -static int link_status_10g(struct niu *np, int *link_up_p) +static int link_status_10g_mrvl(struct niu *np, int *link_up_p) { - unsigned long flags; - int err, link_up; + int err; + int link_up = 0; + int pma_status; + int pcs_status; No tabs please between variable type and names, and again list multiple variables of the same type on one single line in order to save previous screen real-estate. + pma_status = ((err MRVL88X2011_LNK_STATUS_OK) ? 1:0); Spaces are needed to make this easier to read ? 1 : 0. + if (err == ( + PHYXS_XGXS_LANE_STAT_ALINGED | PHYXS_XGXS_LANE_STAT_LANE3 | + PHYXS_XGXS_LANE_STAT_LANE2 | PHYXS_XGXS_LANE_STAT_LANE1 | + PHYXS_XGXS_LANE_STAT_LANE0 | PHYXS_XGXS_LANE_STAT_MAGIC | 0x800)) { This err == ( line should hold the first line of bit values, again to save lines. + mrvl88x2011_act_led(np, link_up ? MRVL88X2011_LED_CTL_PCS_ACT:MRVL88X2011_LED_CTL_OFF); Line excessively exceeds 80 columns. if (type == PHY_TYPE_PMA_PMD || type == PHY_TYPE_PCS) { - if ((id NIU_PHY_ID_MASK) != NIU_PHY_ID_BCM8704) + if (((id NIU_PHY_ID_MASK) != NIU_PHY_ID_BCM8704) + ((id NIU_PHY_ID_MASK) != NIU_PHY_ID_MRVL88X2011)) Not indented properly, the second line in the inner if statement should have it's initial parenthesis match up with the second openning parenthesis on the previous line. +/* MRVL88X2011 register control */ +#define MRVL88X2011_ENA_XFPREFCLK0x0001 +#define MRVL88X2011_ENA_PMDTX0x +#define MRVL88X2011_LOOPBACK0x1 +#define MRVL88X2011_LED_ACT 0x1 +#define MRVL88X2011_LNK_STATUS_OK 0x4 +#define MRVL88X2011_LED_BLKRATE_MASK 0x70 +#define MRVL88X2011_LED_BLKRATE_034MS0x0 +#define MRVL88X2011_LED_BLKRATE_067MS0x1 +#define MRVL88X2011_LED_BLKRATE_134MS0x2 +#define MRVL88X2011_LED_BLKRATE_269MS0x3 +#define MRVL88X2011_LED_BLKRATE_538MS0x4 +#define MRVL88X2011_LED_CTL_OFF 0x0 +#define MRVL88X2011_LED_CTL_PCS_ACT 0x5 +#define MRVL88X2011_LED_CTL_MASK 0x7 +#define MRVL88X2011_LED(n,v)((v)((n)*4)) +#define MRVL88X2011_LED_STAT(n,v) ((v)((n)*4)) These lines inconsistently use tabs vs. spaces to create the indentation between the macro name and it's definition. Anyways, after cleaning up all of
[PATCH][NEIGH] Fix race between neigh_parms_release and neightbl_fill_parms
The neightbl_fill_parms() is called under the write-locked tbl-lock and accesses the parms-dev. The negh_parm_release() calls the dev_put(parms-dev) without this lock. This creates a tiny race window on which the parms contains potentially stale dev pointer. To fix this race it's enough to move the dev_put() upper under the tbl-lock, but note, that the parms are held by neighbors and thus can live after the neigh_parms_release() is called, so we still can have a parm with bad dev pointer. I didn't find where the neigh-parms-dev is accessed, but still think that putting the dev is to be done in a place, where the parms are really freed. Am I right with that? Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 29b8ee4..cc8a2f1 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1316,8 +1316,6 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms) *p = parms-next; parms-dead = 1; write_unlock_bh(tbl-lock); - if (parms-dev) - dev_put(parms-dev); call_rcu(parms-rcu_head, neigh_rcu_free_parms); return; } @@ -1328,6 +1326,8 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms) void neigh_parms_destroy(struct neigh_parms *parms) { + if (parms-dev) + dev_put(parms-dev); kfree(parms); } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: No idea about shaping trough many pc
For proper link bandwidth sharing i guess something like network counters have to be shared between PC's (with proper locking). I didn't heard anything like this IMHO a ways to do this: Split destination network to multiple parts and do routes on Cisco. Let's say you have: 192.168.0.0/16 and u have 4 balancing PC's total bandwidth 1Gbit/s (speed conforming to IEC 1000Mbit/s in 1Gbit/s) Then u do on cisco : 192.168.0.0/18 via PC1(shared speed 250Mbit/s) 192.168.64.0/18 via PC2(shared speed 250Mbit/s) 192.168.128.0/18 via PC3(shared speed 250Mbit/s) 192.168.192.0/18 via PC4(shared speed 250Mbit/s) Probably you can do some scripts to check, if there is in some PC too much available bandwidth (average 5 minutes), then you can give some other PC which is need more bandwidth - more bandwidth. For example: Average counters for 5minute shows: PC1 - occupy 100Mbit/s PC2 - -//- 50Mbit/s PC3 - -//- 150Mbit/s PC4 - -//- 230Mbit/s Then u change link speed: PC1 max 200 PC2 max 150 PC3 max 250 PC4 max 400 (100 from PC2 and 50 from PC1) Sure PC must be capable to pass this traffic. And my IMHO it is not normal that your PC's not able to handle more than 200Mbps of traffic. I have complicated setup, with 4 LAN 8139 cards, which is passing totally 200Mbps traffic. I am sure it can handle up to 300mbps, but already i am changing it to PC with PCI-E e1000/broadcom netxtreme with offloading capabilities, large buffers and proper drivers with NAPI. I have such hardware handling now for example 160Mbps and counters is: 12:50:41 CPU %user %nice%sys %iowait%irq %soft %steal %idleintr/s 12:50:42 all0.000.000.000.000.251.240.00 98.51 4009.90 12:50:43 all0.000.000.000.000.001.250.00 98.75 4024.75 12:50:44 all0.000.000.000.000.001.500.00 98.50 4181.82 12:50:45 all0.250.000.000.000.001.500.00 98.25 4626.73 12:50:46 all0.000.000.000.000.001.500.00 98.50 4351.52 12:50:47 all0.250.000.000.000.001.750.00 98.00 4805.88 It is 2.6.23.8 with some mistakes during configuration, i am doing to try 2.6.24-rc7 and some optimizations. Right now profile looks like: 1095717.0675 mwait_idle_with_hints 7454 11.6110 read_hpet 3883 6.0485 _raw_spin_lock 1605 2.5001 timer_interrupt 1363 2.1231 irq_entries_start So maybe i will have to try change timers to TSC, disable nmi_watchdog and try to tune up network driver (bnx2). Probably you have to check such things too. On Thu, 10 Jan 2008 12:06:35 +0300, Badalian Vyacheslav wrote Hello all. I try more then 2 month resolve problem witch my shaping. Maybe you can help for me? Sheme: +---+ + - | Shaping PC 1 | -+ / +---+ \ ++ / ++ \ + + | Cisco | + | Shaping PC N | ---+ -- ---| CISCO | ++ \ ++ / +-+ \ +- + / + - | Shaping PC 20 | ++-+ Network - Over 10k users. Common bandwidth to INTERNET more then 1 GBs All computers have BGP and turn on multipath. Cisco can't do load sharing by Packet (its can resolve all my problems =((( ). Only by DST IP, SRC IP, or +Level4. Ok. User must have speed 1mbs. Lets look variants: 1. Create rules to user = (1mbs/N computers). If user use N connection all great, but if it use 1 connection his speed = 1mbs/N - its not look good. All be great if cisco can PER PACKET load sharing =( 2. Create rules to user = 1mbs. If user use 1 connection all great, but if it use N connection his speed much more then needed limit =( Why i use 20 PC? Becouse 1 pc normal forward 100-150mbs... when it have 100% cpu usage on Sofware Interrupts... Any idea how to resolve this problem? In my dreams (feature request to netdev ;) ): Get PC - title: MASTER TC. All 20 PC syncronize statistic with MASTER and have common rules and statistic. Then i use variant 2 and will be happy... but its not real? =( Maybe have other variants? Thanks for help! Slavon. P.S. Sorry for my english =( -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-2.6.25 00/10][NETNS][IPV6] make sysctl per namespace - V3
From: Daniel Lezcano [EMAIL PROTECTED] Date: Wed, 09 Jan 2008 17:45:33 +0100 The following patchset makes the ipv6 sysctl to handle multiple network namespaces. Each instance of a network namespace as its own set of sysctl values, that means the behavior of the ipv6 stack can be different depending on the sysctl values setup in the different network namespaces. I applied all of this to net-2.6.25 but what a rough half hour it was :-/ Starting at patch #5 there were tons of space before tab errors. And as I fixed them up, this made subsequent patches need rediffing since the contextual lines in patches after #5 needed the whitespace fixed up as well. I didn't push this back to you because this was already the 3rd round, but please show me some love and check this stuff out before submission. GIT gives you effective ways to verify the whitespace without even applying the patch. ~davem/bin/pcheck: #!/bin/sh set -x git apply --check --whitespace=error-all $1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux IPv6 DAD not full conform to RFC 4862 ?
Hi, On Wed, Jan 09, 2008 at 09:26:53PM +0100, Karsten Keil wrote: Reading the section you reference, we do follow all the MUST requirements, and we log an error. Given that the disable section is a SHOULD, I think we can at least be somewhat more restrictive in our implementation. Perhaps we should just disable the interface iff the failed address is link-local AND there are no other functional address assigned to the interface. I agree here, but it seems that currently the IPv6 Logo Committee thinks that it has to be disable the interface to get the IPv6 ready Logo in future. I already claim that on a discussion at the TAHI users list. JFYI, here the answer from the TAHI list. Hi, Karsten. Thanks for your comments. I know that it is SHOULD, but our test tool supports the test specification published by IPv6 Ready Logo Program http://www.ipv6ready.org/, and basically the test specification supports all of MUST and SHOULD. You may know, now IPv6 Ready Logo Committee is also discussing about the next major revision up of test specification. RFC 4862 Section 5.4.5 is one of discussing point. The public review has been over, but if you have strong concern about it, I recommend to comment to [EMAIL PROTECTED]. Personally, I think that mandating this function is the best way. But vendor's input will really important for them. Regards, Yukiyo Akisada So it would be good if some of the networking experts complain there. -- Karsten Keil SuSE Labs ISDN and VOIP development SUSE LINUX Products GmbH, Maxfeldstr.5 90409 Nuernberg, GF: Markus Rex, HRB 16746 (AG Nuernberg) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.25] [ATM] Oops reading net/atm/arp
cat /proc/net/atm/arp causes the NULL pointer dereference in the get_proc_net+0xc/0x3a. This happens as proc_get_net believes that the parent proc dir entry contains struct net. Fix this assumption for net/atm case. The problem is introduced by the commit c0097b07abf5f92ab135d024dd41bd2aada1512f from Eric W. Biederman/Daniel Lezcano. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c index cfc4f6c..4823c96 100644 --- a/fs/proc/proc_net.c +++ b/fs/proc/proc_net.c @@ -96,6 +96,17 @@ static struct proc_dir_entry *proc_net_shadow(struct task_struct *task, return task-nsproxy-net_ns-proc_net; } +struct proc_dir_entry *proc_net_mkdir(struct net *net, const char *name, + struct proc_dir_entry *parent) +{ + struct proc_dir_entry *pde; + pde = proc_mkdir_mode(name, S_IRUGO | S_IXUGO, parent); + if (pde != NULL) + pde-data = net; + return pde; +} +EXPORT_SYMBOL_GPL(proc_net_mkdir); + static __net_init int proc_net_ns_init(struct net *net) { struct proc_dir_entry *root, *netd, *net_statd; @@ -107,18 +118,16 @@ static __net_init int proc_net_ns_init(struct net *net) goto out; err = -EEXIST; - netd = proc_mkdir(net, root); + netd = proc_net_mkdir(net, net, root); if (!netd) goto free_root; err = -EEXIST; - net_statd = proc_mkdir(stat, netd); + net_statd = proc_net_mkdir(net, stat, netd); if (!net_statd) goto free_net; root-data = net; - netd-data = net; - net_statd-data = net; net-proc_net_root = root; net-proc_net = netd; diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index a531682..8f92546 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -201,6 +201,8 @@ static inline struct proc_dir_entry *create_proc_info_entry(const char *name, extern struct proc_dir_entry *proc_net_fops_create(struct net *net, const char *name, mode_t mode, const struct file_operations *fops); extern void proc_net_remove(struct net *net, const char *name); +extern struct proc_dir_entry *proc_net_mkdir(struct net *net, const char *name, + struct proc_dir_entry *parent); #else diff --git a/net/atm/proc.c b/net/atm/proc.c index 5d9d5ff..565e75e 100644 --- a/net/atm/proc.c +++ b/net/atm/proc.c @@ -476,7 +476,7 @@ static void atm_proc_dirs_remove(void) if (e-dirent) remove_proc_entry(e-name, atm_proc_root); } - remove_proc_entry(atm, init_net.proc_net); + proc_net_remove(init_net, atm); } int __init atm_proc_init(void) @@ -484,7 +484,7 @@ int __init atm_proc_init(void) static struct atm_proc_entry *e; int ret; - atm_proc_root = proc_mkdir(atm, init_net.proc_net); + atm_proc_root = proc_net_mkdir(init_net, atm, init_net.proc_net); if (!atm_proc_root) goto err_out; for (e = atm_proc_ents; e-name; e++) { -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.25] [NEIGH] Make /proc/net/arp opening consistent with seq_net_open semantics
seq_open_net requires that first field of the seq-private data to be struct seq_net_private. In reality this is a single pointer to a struct net for now. The patch makes code consistent. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- diff --git a/include/net/neighbour.h b/include/net/neighbour.h index a9dda29..09f9fc6 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -223,7 +223,7 @@ extern void __neigh_for_each_release(struct neigh_table *tbl, int (*cb)(struct n extern void pneigh_for_each(struct neigh_table *tbl, void (*cb)(struct pneigh_entry *)); struct neigh_seq_state { - struct net *net; + struct seq_net_private p; struct neigh_table *tbl; void *(*neigh_sub_iter)(struct neigh_seq_state *state, struct neighbour *n, loff_t *pos); diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 8024933..19c0dd1 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2142,7 +2142,7 @@ EXPORT_SYMBOL(__neigh_for_each_release); static struct neighbour *neigh_get_first(struct seq_file *seq) { struct neigh_seq_state *state = seq-private; - struct net *net = state-net; + struct net *net = state-p.net; struct neigh_table *tbl = state-tbl; struct neighbour *n = NULL; int bucket = state-bucket; @@ -2183,7 +2183,7 @@ static struct neighbour *neigh_get_next(struct seq_file *seq, loff_t *pos) { struct neigh_seq_state *state = seq-private; - struct net *net = state-net; + struct net *net = state-p.net; struct neigh_table *tbl = state-tbl; if (state-neigh_sub_iter) { @@ -2243,7 +2243,7 @@ static struct neighbour *neigh_get_idx(struct seq_file *seq, loff_t *pos) static struct pneigh_entry *pneigh_get_first(struct seq_file *seq) { struct neigh_seq_state *state = seq-private; - struct net * net = state-net; + struct net * net = state-p.net; struct neigh_table *tbl = state-tbl; struct pneigh_entry *pn = NULL; int bucket = state-bucket; @@ -2266,7 +2266,7 @@ static struct pneigh_entry *pneigh_get_next(struct seq_file *seq, loff_t *pos) { struct neigh_seq_state *state = seq-private; - struct net * net = state-net; + struct net * net = state-p.net; struct neigh_table *tbl = state-tbl; pn = pn-next; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux IPv6 DAD not full conform to RFC 4862 ?
On Wed, Jan 09, 2008 at 03:32:12PM -0800, David Miller wrote: From: Karsten Keil [EMAIL PROTECTED] Date: Wed, 9 Jan 2008 16:36:56 +0100 If the address is a link-local address formed from an interface identifier based on the hardware address, which is supposed to be uniquely assigned (e.g., EUI-64 for an Ethernet interface), IP operation on the interface SHOULD be disabled. By disabling IP operation, the node will then: - not send any IP packets from the interface, - silently drop any IP packets received on the interface, and - not forward any IP packets to the interface (when acting as a router or processing a packet with a Routing header). I question any RFC mandate that shuts down IP communication on a node because of packets received from remote systems. If the TAHI test can trigger this, so can a compromised system on your network and won't that be fun? :-) I agree, but on the other side, a interface with a real duplicate HW address sending packets on the network can also cause very serious problems, and maybe is not so easy to detect as a system where the interface never come up because of this. So maybe it makes sense to implement it as option, not as default. And the DOS scenario is already here, also without disabling IP completely, since you can deny any IPv6 address assignments with faked DAD pakets. -- Karsten Keil SuSE Labs SUSE LINUX Products GmbH, Maxfeldstr.5 90409 Nuernberg, GF: Markus Rex, HRB 16746 (AG Nuernberg) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FW: ccid2/ccid3 oopses
| So maybe the cause triggering this oops is somewhere else. | | yes, probably. sorry - i didn`t tell or maybe i didn`t know when writing | my first mail to module authors and forget to add that before forwarding here. | | for me , the problem does not happen with suse kernel of the day | (2.6.24-rc6-git7-20080102160500-default, .config attached) but it happens | with vanilla 2.6.24-rc6 (mostly allmodconfig, also attached) | There are 256 differences between the two .config files. I think there are other people on the list who will be able to give more information regarding the .config files. The differences that struck me in the one which doesn't work is -- CONFIG_DEBUG_KERNEL and -- CONFIG_DEBUG_BUGVERBOSE were not set. Both are very useful for bug-hunting, the latter is much better for decoding oopses. Can't say anything about the Suse kernel. We use the plain kernel from www.kernel.org, specifically the netdev-tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 If you can't get further here, try with a kernel.org kernel or check Suse forums. 1. the tests yesterday were done on the DCCP test tree based on the above netdev-2.6 2.6.24-rc7 tree from git://eden-feed.erg.abdn.ac.uk/dccp_exp (dccp subtree) Tested your for-loop 60 seconds each for CCID3/4 -- no oops. 2. also repeated the tests on an unmodified 2.6.24-rc7 tree from netdev-2.6 (today) 120 seconds for-loop each -- no oops. As said, if the above does not help, try a www.kernel.org kernel (or one of the above trees) first. | | | the easiest way to reproduce is: | | | | while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done | | after short time, the kernel oopses (messages below) | | -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH netns-2.6.25 0/19] routing virtualization v2
From: Denis V. Lunev [EMAIL PROTECTED] Date: Wed, 09 Jan 2008 21:03:03 +0300 This set adds namespace support for routing tables rules manipulation in the different namespaces. So, one could create a namespace and setup IPv4 routing there how he wants. All 19 patches applied, thanks Denis. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NEIGH] Fix race between neigh_parms_release and neightbl_fill_parms
From: Pavel Emelyanov [EMAIL PROTECTED] Date: Thu, 10 Jan 2008 13:56:53 +0300 The neightbl_fill_parms() is called under the write-locked tbl-lock and accesses the parms-dev. The negh_parm_release() calls the dev_put(parms-dev) without this lock. This creates a tiny race window on which the parms contains potentially stale dev pointer. To fix this race it's enough to move the dev_put() upper under the tbl-lock, but note, that the parms are held by neighbors and thus can live after the neigh_parms_release() is called, so we still can have a parm with bad dev pointer. I didn't find where the neigh-parms-dev is accessed, but still think that putting the dev is to be done in a place, where the parms are really freed. Am I right with that? Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] It is accessed in lookup_neigh_parms(), neightbl_fill_parms(), and neightbl_fill_info() (hmmm, that BUG_ON(tbl-parms.dev) is cute). You fix looks correct, patch applied, thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] [ATM] Oops reading net/atm/arp
From: Denis V. Lunev [EMAIL PROTECTED] Date: Thu, 10 Jan 2008 14:28:53 +0300 cat /proc/net/atm/arp causes the NULL pointer dereference in the get_proc_net+0xc/0x3a. This happens as proc_get_net believes that the parent proc dir entry contains struct net. Fix this assumption for net/atm case. The problem is introduced by the commit c0097b07abf5f92ab135d024dd41bd2aada1512f from Eric W. Biederman/Daniel Lezcano. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] [ATM] Simplify /proc/net/atm/arp opening
From: Denis V. Lunev [EMAIL PROTECTED] Date: Thu, 10 Jan 2008 14:30:44 +0300 The iterator state-ns.neigh_sub_iter initialization is moved from arp_seq_open to clip_seq_start for convinience. This should not be a problem as the iterator will be used only after the seq_start callback. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] [NEIGH] Make /proc/net/arp opening consistent with seq_net_open semantics
From: Denis V. Lunev [EMAIL PROTECTED] Date: Thu, 10 Jan 2008 14:32:19 +0300 seq_open_net requires that first field of the seq-private data to be struct seq_net_private. In reality this is a single pointer to a struct net for now. The patch makes code consistent. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] Applied, thanks for correcting this. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-2.6.25 00/10][NETNS][IPV6] make sysctl per namespace - V3
David Miller wrote: From: Daniel Lezcano [EMAIL PROTECTED] Date: Wed, 09 Jan 2008 17:45:33 +0100 The following patchset makes the ipv6 sysctl to handle multiple network namespaces. Each instance of a network namespace as its own set of sysctl values, that means the behavior of the ipv6 stack can be different depending on the sysctl values setup in the different network namespaces. I applied all of this to net-2.6.25 but what a rough half hour it was :-/ Starting at patch #5 there were tons of space before tab errors. And as I fixed them up, this made subsequent patches need rediffing since the contextual lines in patches after #5 needed the whitespace fixed up as well. I didn't push this back to you because this was already the 3rd round, but please show me some love and check this stuff out before submission. GIT gives you effective ways to verify the whitespace without even applying the patch. ~davem/bin/pcheck: #!/bin/sh set -x git apply --check --whitespace=error-all $1 Sorry, I will check that in the future :| Many thanks for taking the time to fix that. -- Daniel. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux IPv6 DAD not full conform to RFC 4862 ?
On Wed, Jan 09, 2008 at 04:09:57PM -0500, Vlad Yasevich wrote: Neil Horman wrote: On Thu, Jan 10, 2008 at 01:38:57AM +0900, YOSHIFUJI Hideaki / 吉藤英明 wrote: In article [EMAIL PROTECTED] (at Wed, 9 Jan 2008 16:36:56 +0100), Karsten Keil [EMAIL PROTECTED] says: So I think we should disable the interface now, if DAD fails on a hardware based LLA. I don't want to do this, at least, unconditionally. Options (not exclusive): - we could have dad_reaction interface variable and 1: disable interface = 1: disable IPv6 0: ignore (as we do now) I like the flexibility of this solution, but given that the only part of the RFC that we're missing on at the moment is that we SHOULD disable the interface on DAD failure for a link-local address, I would think this scheme would be good: 0 : ignore, and del address from interface (current behavior) = 0 : disable interface for dad failure for a link-local address0 : disable interface for dad failure for any address Regards Neil Just a friendly reminder that such a scheme should only be applied to autoconfigured addresses. A manually configured duplicated address should not bring down the whole interface. I agree, but I think that case would be covered by the default option above (sysctl 0). Neil -vlad -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 9 Jan 2008 11:37:27 +0100 [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache In rt_cache_get_next(), no need to guard seq-private by a rcu_dereference() since seq is private to the thread running this function. Reading seq.private once (as guaranted bu rcu_dereference()) or several time if compiler really is dumb enough wont change the result. But we miss real spots where rcu_dereference() are needed, both in rt_cache_get_first() and rt_cache_get_next() Signed-off-by: Eric Dumazet [EMAIL PROTECTED] Signed-off-by: Herbert Xu [EMAIL PROTECTED] I've applied this to net-2.6, thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] AX25: kill user triggable printks
From: maximilian attems [EMAIL PROTECTED] Date: Wed, 9 Jan 2008 11:21:10 +0100 sfuzz can easily trigger any of those. move the printk message to the corresponding comment: makes the intention of the code clear and easy to pick up on an scheduled removal. as bonus simplify the braces placement. Signed-off-by: maximilian attems [EMAIL PROTECTED] Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
EQL / doubts
Hi All, I have few questions about EQL driver *) Why the tx_queue_len is set as 5?. For example if we bond 3 lines and each has 1000 as tx_queue_len, will the bonding line(eql) tx_queue_len be sum of these three tx_queue_len?. In this case, will the bonding line(eql)tx_queue_len be 3000? *)Question: Why list_add is used instead of list_add_tail?. For queue implementation, list_add_tail would be required. Why do we implement of slave queue in the way of stack implementation?. File: linux/drivers/net/eql.c Function: __eql_insert_slave(slave_queue_t *queue, slave_t *slave) Code: /* queue-lock must be held */ static int __eql_insert_slave(slave_queue_t *queue, slave_t *slave) { if (!eql_is_full(queue)) { slave_t *duplicate_slave = NULL; duplicate_slave = __eql_find_slave_dev(queue, slave-dev); if (duplicate_slave != 0) eql_kill_one_slave(queue, duplicate_slave); list_add(slave-list, queue-all_slaves); // Why list_add has been used instead of list_add_tail?. I hope queue-all_slaves is queue implementation. *) Is it possible to improve the load balancing performance using multiprocessor?. For example,if a server has two processors and N n/w interfaces, is it possible to assign one processor for N/2 n/w interface's tx and rx handling and other for N/2 n/w interface's tx/rx handling Thanks Jeba -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] [XFRM]: Kill some bloat
On Tue, 8 Jan 2008, Ilpo Järvinen wrote: On Mon, 7 Jan 2008, David Miller wrote: From: Andi Kleen [EMAIL PROTECTED] Date: Tue, 8 Jan 2008 06:00:07 +0100 On Mon, Jan 07, 2008 at 07:37:00PM -0800, David Miller wrote: The vast majority of them are one, two, and three liners. % awk ' { line++ } ; /^{/ { total++; start = line } ; /^}/ { len=line-start-3; if (len 4) l++; if (len = 10) k++; } ; END { print total, l, l/total, k, k/total }' include/net/tcp.h 68 28 0.411765 20 0.294118 41% are over 4 lines, 29% are = 10 lines. Take out the comments and whitespace lines, your script is too simplistic. In addition it triggered spuriously per struct/enum end brace :-) and was using the last known function starting brace in there so no wonder the numbers were that high... Counting with the corrected lines (len=line-start-1) spurious matches removed: 74 19 0.256757 7 0.0945946 Here are (finally) the measured bytes (couple of the functions are missing because I had couple of bugs in the regexps and the #if trickery at the inline resulted failed compiles): 12 funcs, 242+, 1697-, diff: -1455 tcp_set_state 13 funcs, 92+, 632-, diff: -540 tcp_is_cwnd_limited 12 funcs, 2836+, 3225-, diff: -389 tcp_current_ssthresh 5 funcs, 261+, 556-, diff: -295 tcp_prequeue 7 funcs, 2777+, 3049-, diff: -272 tcp_clear_retrans_hints_partial 11 funcs, 64+, 275-, diff: -211 tcp_win_from_space 6 funcs, 128+, 320-, diff: -192 tcp_prequeue_init 12 funcs, 45+, 209-, diff: -164 tcp_set_ca_state 7 funcs, 106+, 237-, diff: -131 tcp_fast_path_check 5 funcs, 167+, 291-, diff: -124 tcp_write_queue_purge 6 funcs, 43+, 160-, diff: -117 tcp_push_pending_frames 9 funcs, 55+, 159-, diff: -104 tcp_v4_check 6 funcs, 4+, 97-, diff: -93 tcp_packets_in_flight 7 funcs, 58+, 150-, diff: -92 tcp_fast_path_on 4 funcs, 4+, 91-, diff: -87 tcp_clear_options 6 funcs, 141+, 217-, diff: -76 tcp_openreq_init 8 funcs, 38+, 111-, diff: -73 tcp_unlink_write_queue 7 funcs, 32+, 103-, diff: -71 tcp_checksum_complete 7 funcs, 35+, 101-, diff: -66 __tcp_fast_path_on 5 funcs, 4+, 66-, diff: -62 tcp_receive_window 6 funcs, 67+, 128-, diff: -61 tcp_add_write_queue_tail 7 funcs, 30+, 86-, diff: -56tcp_ca_event 6 funcs, 73+, 106-, diff: -33 tcp_paws_check 4 funcs, 4+, 36-, diff: -32 tcp_highest_sack_seq 6 funcs, 46+, 78-, diff: -32tcp_fin_time 3 funcs, 4+, 35-, diff: -31 tcp_clear_all_retrans_hints 7 funcs, 30+, 51-, diff: -21__tcp_add_write_queue_tail 3 funcs, 4+, 14-, diff: -10 tcp_enable_fack 4 funcs, 4+, 14-, diff: -10 keepalive_time_when 8 funcs, 66+, 73-, diff: -7 tcp_full_space 3 funcs, 4+, 5-, diff: -1 tcp_wnd_end 4 funcs, 97+, 97-, diff: +0 tcp_mib_init 3 funcs, 4+, 3-, diff: +1 tcp_skb_is_last 2 funcs, 4+, 2-, diff: +2 keepalive_intvl_when 2 funcs, 4+, 2-, diff: +2 tcp_is_fack 2 funcs, 4+, 2-, diff: +2 tcp_skb_mss 2 funcs, 4+, 2-, diff: +2 tcp_write_queue_empty 2 funcs, 4+, 2-, diff: +2 tcp_advance_highest_sack 2 funcs, 4+, 2-, diff: +2 tcp_advance_send_head 2 funcs, 4+, 2-, diff: +2 tcp_check_send_head 2 funcs, 4+, 2-, diff: +2 tcp_highest_sack_reset 2 funcs, 4+, 2-, diff: +2 tcp_init_send_head 2 funcs, 4+, 2-, diff: +2 tcp_sack_reset 6 funcs, 47+, 44-, diff: +3 tcp_space 5 funcs, 55+, 50-, diff: +5 tcp_too_many_orphans 3 funcs, 8+, 2-, diff: +6 tcp_minshall_update 3 funcs, 8+, 2-, diff: +6 tcp_update_wl 8 funcs, 25+, 14-, diff: +11between 3 funcs, 14+, 2-, diff: +12 tcp_put_md5sig_pool 3 funcs, 14+, 2-, diff: +12 tcp_clear_xmit_timers 5 funcs, 30+, 17-, diff: +13tcp_dec_pcount_approx_int 6 funcs, 33+, 20-, diff: +13tcp_insert_write_queue_after 3 funcs, 17+, 2-, diff: +15 __tcp_checksum_complete 5 funcs, 17+, 2-, diff: +15 tcp_init_wl 4 funcs, 57+, 41-, diff: +16tcp_dec_quickack_mode 4 funcs, 40+, 22-, diff: +18__tcp_add_write_queue_head 5 funcs, 36+, 16-, diff: +20tcp_highest_sack_combine 4 funcs, 40+, 18-, diff: +22tcp_dec_pcount_approx 6 funcs, 29+, 5-, diff: +24 tcp_is_sack 4 funcs, 28+, 2-, diff: +26 tcp_is_reno 5 funcs, 50+, 24-, diff: +26tcp_insert_write_queue_before 4 funcs, 83+, 56-, diff: +27tcp_check_probe_timer 8 funcs, 69+, 14-, diff: +55tcp_left_out 11 funcs, 2995+, 2893-, diff: +102 tcp_skb_pcount 30 funcs, 930+, 2-, diff: +928 before -- i.
[PATCH net-2.6.25 0/6][NETNS]: Make ipv6_devconf (all and default) live in net namespaces
The ipv6_devconf_(all) and ipv6_devconf_dflt are currently global, but should be per-namespace. This set moves them on the struct net. Or, more precisely, on the struct netns_ipv6, which is already added. Unfortunately, many code in the ipv6 cannot yet provide a correct struct net to get the ipv6_devconf from (e.g. routing code), so this part of job is to be done after the appropriate parts are virtualized. However, after this set user can play with the ipv6_devconf inside a namespace not affecting the others. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25 1/6][NETNS]: Clean out the ipv6-related sysctls creation/destruction
The addrconf sysctls and neigh sysctls are registered and unregistered always in pairs, so they can be joined into one (well, two) functions, that accept the struct inet6_dev and do all the job. This also get rids of unneeded ifdefs inside the code. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- net/ipv6/addrconf.c | 63 +++--- 1 files changed, 34 insertions(+), 29 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 6a48bb8..27b35dd 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -102,7 +102,15 @@ #ifdef CONFIG_SYSCTL static void addrconf_sysctl_register(struct inet6_dev *idev); -static void addrconf_sysctl_unregister(struct ipv6_devconf *p); +static void addrconf_sysctl_unregister(struct inet6_dev *idev); +#else +static inline void addrconf_sysctl_register(struct inet6_dev *idev) +{ +} + +static inline void addrconf_sysctl_unregister(struct inet6_dev *idev) +{ +} #endif #ifdef CONFIG_IPV6_PRIVACY @@ -392,13 +400,7 @@ static struct inet6_dev * ipv6_add_dev(struct net_device *dev) ipv6_mc_init_dev(ndev); ndev-tstamp = jiffies; -#ifdef CONFIG_SYSCTL - neigh_sysctl_register(dev, ndev-nd_parms, NET_IPV6, - NET_IPV6_NEIGH, ipv6, - ndisc_ifinfo_sysctl_change, - NULL); addrconf_sysctl_register(ndev); -#endif /* protected by rtnl_lock */ rcu_assign_pointer(dev-ip6_ptr, ndev); @@ -2391,15 +2393,8 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event, case NETDEV_CHANGENAME: if (idev) { snmp6_unregister_dev(idev); -#ifdef CONFIG_SYSCTL - addrconf_sysctl_unregister(idev-cnf); - neigh_sysctl_unregister(idev-nd_parms); - neigh_sysctl_register(dev, idev-nd_parms, - NET_IPV6, NET_IPV6_NEIGH, ipv6, - ndisc_ifinfo_sysctl_change, - NULL); + addrconf_sysctl_unregister(idev); addrconf_sysctl_register(idev); -#endif err = snmp6_register_dev(idev); if (err) return notifier_from_errno(err); @@ -2523,10 +2518,7 @@ static int addrconf_ifdown(struct net_device *dev, int how) /* Shot the device (if unregistered) */ if (how == 1) { -#ifdef CONFIG_SYSCTL - addrconf_sysctl_unregister(idev-cnf); - neigh_sysctl_unregister(idev-nd_parms); -#endif + addrconf_sysctl_unregister(idev); neigh_parms_release(nd_tbl, idev-nd_parms); neigh_ifdown(nd_tbl, dev); in6_dev_put(idev); @@ -4106,21 +4098,34 @@ out: return; } +static void __addrconf_sysctl_unregister(struct ipv6_devconf *p) +{ + struct addrconf_sysctl_table *t; + + if (p-sysctl == NULL) + return; + + t = p-sysctl; + p-sysctl = NULL; + unregister_sysctl_table(t-sysctl_header); + kfree(t-dev_name); + kfree(t); +} + static void addrconf_sysctl_register(struct inet6_dev *idev) { + neigh_sysctl_register(idev-dev, idev-nd_parms, NET_IPV6, + NET_IPV6_NEIGH, ipv6, + ndisc_ifinfo_sysctl_change, + NULL); __addrconf_sysctl_register(idev-dev-name, idev-dev-ifindex, idev, idev-cnf); } -static void addrconf_sysctl_unregister(struct ipv6_devconf *p) +static void addrconf_sysctl_unregister(struct inet6_dev *idev) { - if (p-sysctl) { - struct addrconf_sysctl_table *t = p-sysctl; - p-sysctl = NULL; - unregister_sysctl_table(t-sysctl_header); - kfree(t-dev_name); - kfree(t); - } + __addrconf_sysctl_unregister(idev-cnf); + neigh_sysctl_unregister(idev-nd_parms); } @@ -4232,8 +4237,8 @@ void addrconf_cleanup(void) unregister_netdevice_notifier(ipv6_dev_notf); #ifdef CONFIG_SYSCTL - addrconf_sysctl_unregister(ipv6_devconf_dflt); - addrconf_sysctl_unregister(ipv6_devconf); + __addrconf_sysctl_unregister(ipv6_devconf_dflt); + __addrconf_sysctl_unregister(ipv6_devconf); #endif rtnl_lock(); -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache
Hi David Here is DECNET part, shadowing commit 0bcceadceb0907094ba4e40bf9a7cd9b080f13fb ([IPV4] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache ) Thank you [DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache In dn_rt_cache_get_next(), no need to guard seq-private by a rcu_dereference() since seq is private to the thread running this function. Reading seq.private once (as guaranted bu rcu_dereference()) or several time if compiler really is dumb enough wont change the result. But we miss real spots where rcu_dereference() are needed, both in dn_rt_cache_get_first() and dn_rt_cache_get_next() Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c index 3e5..0e10ff2 100644 --- a/net/decnet/dn_route.c +++ b/net/decnet/dn_route.c @@ -1665,12 +1665,12 @@ static struct dn_route *dn_rt_cache_get_first(struct seq_file *seq) break; rcu_read_unlock_bh(); } - return rt; + return rcu_dereference(rt); } static struct dn_route *dn_rt_cache_get_next(struct seq_file *seq, struct dn_route *rt) { - struct dn_rt_cache_iter_state *s = rcu_dereference(seq-private); + struct dn_rt_cache_iter_state *s = seq-private; rt = rt-u.dst.dn_next; while(!rt) { @@ -1680,7 +1680,7 @@ static struct dn_route *dn_rt_cache_get_next(struct seq_file *seq, struct dn_rou rcu_read_lock_bh(); rt = dn_rt_hash_table[s-bucket].chain; } - return rt; + return rcu_dereference(rt); } static void *dn_rt_cache_seq_start(struct seq_file *seq, loff_t *pos) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
SMP code / network stack
Hi All, If a server has multiple processors and N number of ethernet cards, is it possible to handle transmission by each processor separately? .In other words, each processor will be responsible for tx of few ethernet cards?. Example: Server has 4 processors and 8 ethernet cards. is it possible for each processor for transmission using 2 ethernet cards only?. So that, at a instant , data will be send out from 8 ethernet cards. Thanks Jeba -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25 4/6][NETNS]: Create ipv6 devconf-s for namespaces
This is the core. Declare and register the pernet subsys for addrconf. The init callback the will create the devconf-s. The init_net will reuse the existing statically declared confs, so that accessing them from inside the ipv6 code will still work. The register_pernet_subsys() is moved above the ipv6_add_dev() call for loopback, because this function will need the net-devconf_dflt pointer to be already set. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- include/net/netns/ipv6.h |2 + net/ipv6/addrconf.c | 82 +++--- 2 files changed, 72 insertions(+), 12 deletions(-) diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h index 10733a6..06b4dc0 100644 --- a/include/net/netns/ipv6.h +++ b/include/net/netns/ipv6.h @@ -28,5 +28,7 @@ struct netns_sysctl_ipv6 { struct netns_ipv6 { struct netns_sysctl_ipv6 sysctl; + struct ipv6_devconf *devconf_all; + struct ipv6_devconf *devconf_dflt; }; #endif diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index bde50c6..3ad081e 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -4135,6 +4135,70 @@ static void addrconf_sysctl_unregister(struct inet6_dev *idev) #endif +static int addrconf_init_net(struct net *net) +{ + int err; + struct ipv6_devconf *all, *dflt; + + err = -ENOMEM; + all = ipv6_devconf; + dflt = ipv6_devconf_dflt; + + if (net != init_net) { + all = kmemdup(all, sizeof(ipv6_devconf), GFP_KERNEL); + if (all == NULL) + goto err_alloc_all; + + dflt = kmemdup(dflt, sizeof(ipv6_devconf_dflt), GFP_KERNEL); + if (dflt == NULL) + goto err_alloc_dflt; + } + + net-ipv6.devconf_all = all; + net-ipv6.devconf_dflt = dflt; + +#ifdef CONFIG_SYSCTL + err = __addrconf_sysctl_register(net, all, NET_PROTO_CONF_ALL, + NULL, all); + if (err 0) + goto err_reg_all; + + err = __addrconf_sysctl_register(net, default, NET_PROTO_CONF_DEFAULT, + NULL, dflt); + if (err 0) + goto err_reg_dflt; +#endif + return 0; + +#ifdef CONFIG_SYSCTL +err_reg_dflt: + __addrconf_sysctl_unregister(all); +err_reg_all: + kfree(dflt); +#endif +err_alloc_dflt: + kfree(all); +err_alloc_all: + return err; +} + +static void addrconf_exit_net(struct net *net) +{ +#ifdef CONFIG_SYSCTL + __addrconf_sysctl_unregister(net-ipv6.devconf_dflt); + __addrconf_sysctl_unregister(net-ipv6.devconf_all); +#endif + if (net != init_net) { + kfree(net-ipv6.devconf_dflt); + kfree(net-ipv6.devconf_all); + } +} + +static struct pernet_operations addrconf_ops = { + .init = addrconf_init_net, + .exit = addrconf_exit_net, +}; + /* * Device notifier */ @@ -4167,6 +4231,8 @@ int __init addrconf_init(void) return err; } + register_pernet_subsys(addrconf_ops); + /* The addrconf netdev notifier requires that loopback_dev * has it's ipv6 private information allocated and setup * before it can bring up and give link-local addresses @@ -4190,7 +4256,7 @@ int __init addrconf_init(void) err = -ENOMEM; rtnl_unlock(); if (err) - return err; + goto errlo; ip6_null_entry.u.dst.dev = init_net.loopback_dev; ip6_null_entry.rt6i_idev = in6_dev_get(init_net.loopback_dev); @@ -4218,16 +4284,11 @@ int __init addrconf_init(void) ipv6_addr_label_rtnl_register(); -#ifdef CONFIG_SYSCTL - __addrconf_sysctl_register(init_net, all, NET_PROTO_CONF_ALL, - NULL, ipv6_devconf); - __addrconf_sysctl_register(init_net, default, NET_PROTO_CONF_DEFAULT, - NULL, ipv6_devconf_dflt); -#endif - return 0; errout: unregister_netdevice_notifier(ipv6_dev_notf); +errlo: + unregister_pernet_subsys(addrconf_ops); return err; } @@ -4240,10 +4301,7 @@ void addrconf_cleanup(void) unregister_netdevice_notifier(ipv6_dev_notf); -#ifdef CONFIG_SYSCTL - __addrconf_sysctl_unregister(ipv6_devconf_dflt); - __addrconf_sysctl_unregister(ipv6_devconf); -#endif + unregister_pernet_subsys(addrconf_ops); rtnl_lock(); -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25 5/6][NETNS]: Use the per-net ipv6_devconf_dflt
All its users are in net/ipv6/addrconf.c's sysctl handlers. Since they already have the struct net to get from, the per-net ipv6_devconf_dflt can already be used. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- net/ipv6/addrconf.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 3ad081e..9b96de3 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -334,7 +334,7 @@ static struct inet6_dev * ipv6_add_dev(struct net_device *dev) rwlock_init(ndev-lock); ndev-dev = dev; - memcpy(ndev-cnf, ipv6_devconf_dflt, sizeof(ndev-cnf)); + memcpy(ndev-cnf, dev-nd_net-ipv6.devconf_dflt, sizeof(ndev-cnf)); ndev-cnf.mtu6 = dev-mtu; ndev-cnf.sysctl = NULL; ndev-nd_parms = neigh_parms_alloc(dev, nd_tbl); @@ -481,11 +481,11 @@ static void addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old) struct net *net; net = (struct net *)table-extra2; - if (p == ipv6_devconf_dflt.forwarding) + if (p == net-ipv6.devconf_dflt-forwarding) return; if (p == ipv6_devconf.forwarding) { - ipv6_devconf_dflt.forwarding = ipv6_devconf.forwarding; + net-ipv6.devconf_dflt-forwarding = ipv6_devconf.forwarding; addrconf_forward_change(net); } else if ((!*p) ^ (!old)) dev_forward_change((struct inet6_dev *)table-extra1); -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25 2/6][NETNS]: Make the __addrconf_sysctl_register return an error
This error code will be needed to abort the namespace creation if needed. Probably, this is to be checked when a new device is created (currently it is ignored). Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- net/ipv6/addrconf.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 27b35dd..18d4334 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -4044,7 +4044,7 @@ static struct addrconf_sysctl_table }, }; -static void __addrconf_sysctl_register(char *dev_name, int ctl_name, +static int __addrconf_sysctl_register(char *dev_name, int ctl_name, struct inet6_dev *idev, struct ipv6_devconf *p) { int i; @@ -4088,14 +4088,14 @@ static void __addrconf_sysctl_register(char *dev_name, int ctl_name, goto free_procname; p-sysctl = t; - return; + return 0; free_procname: kfree(t-dev_name); free: kfree(t); out: - return; + return -ENOBUFS; } static void __addrconf_sysctl_unregister(struct ipv6_devconf *p) -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25 3/6][NETNS]: Make the ctl-tables per-namespace
This includes passing the net to __addrconf_sysctl_register and saving this on the ctl_table-extra2 to be used in handlers (those, needing it). Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- net/ipv6/addrconf.c | 24 ++-- 1 files changed, 14 insertions(+), 10 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 18d4334..bde50c6 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -456,13 +456,13 @@ static void dev_forward_change(struct inet6_dev *idev) } -static void addrconf_forward_change(void) +static void addrconf_forward_change(struct net *net) { struct net_device *dev; struct inet6_dev *idev; read_lock(dev_base_lock); - for_each_netdev(init_net, dev) { + for_each_netdev(net, dev) { rcu_read_lock(); idev = __in6_dev_get(dev); if (idev) { @@ -478,12 +478,15 @@ static void addrconf_forward_change(void) static void addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old) { + struct net *net; + + net = (struct net *)table-extra2; if (p == ipv6_devconf_dflt.forwarding) return; if (p == ipv6_devconf.forwarding) { ipv6_devconf_dflt.forwarding = ipv6_devconf.forwarding; - addrconf_forward_change(); + addrconf_forward_change(net); } else if ((!*p) ^ (!old)) dev_forward_change((struct inet6_dev *)table-extra1); @@ -4044,8 +4047,8 @@ static struct addrconf_sysctl_table }, }; -static int __addrconf_sysctl_register(char *dev_name, int ctl_name, - struct inet6_dev *idev, struct ipv6_devconf *p) +static int __addrconf_sysctl_register(struct net *net, char *dev_name, + int ctl_name, struct inet6_dev *idev, struct ipv6_devconf *p) { int i; struct addrconf_sysctl_table *t; @@ -4068,6 +4071,7 @@ static int __addrconf_sysctl_register(char *dev_name, int ctl_name, for (i=0; t-addrconf_vars[i].data; i++) { t-addrconf_vars[i].data += (char*)p - (char*)ipv6_devconf; t-addrconf_vars[i].extra1 = idev; /* embedded; no ref */ + t-addrconf_vars[i].extra2 = net; } /* @@ -4082,7 +4086,7 @@ static int __addrconf_sysctl_register(char *dev_name, int ctl_name, addrconf_ctl_path[ADDRCONF_CTL_PATH_DEV].procname = t-dev_name; addrconf_ctl_path[ADDRCONF_CTL_PATH_DEV].ctl_name = ctl_name; - t-sysctl_header = register_sysctl_paths(addrconf_ctl_path, + t-sysctl_header = register_net_sysctl_table(net, addrconf_ctl_path, t-addrconf_vars); if (t-sysctl_header == NULL) goto free_procname; @@ -4118,8 +4122,8 @@ static void addrconf_sysctl_register(struct inet6_dev *idev) NET_IPV6_NEIGH, ipv6, ndisc_ifinfo_sysctl_change, NULL); - __addrconf_sysctl_register(idev-dev-name, idev-dev-ifindex, - idev, idev-cnf); + __addrconf_sysctl_register(idev-dev-nd_net, idev-dev-name, + idev-dev-ifindex, idev, idev-cnf); } static void addrconf_sysctl_unregister(struct inet6_dev *idev) @@ -4215,9 +4219,9 @@ int __init addrconf_init(void) ipv6_addr_label_rtnl_register(); #ifdef CONFIG_SYSCTL - __addrconf_sysctl_register(all, NET_PROTO_CONF_ALL, + __addrconf_sysctl_register(init_net, all, NET_PROTO_CONF_ALL, NULL, ipv6_devconf); - __addrconf_sysctl_register(default, NET_PROTO_CONF_DEFAULT, + __addrconf_sysctl_register(init_net, default, NET_PROTO_CONF_DEFAULT, NULL, ipv6_devconf_dflt); #endif -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25 6/6][NETNS]: Use the per-net ipv6_devconf(_all) in sysctl handlers
Actually the net-ipv6.devconf_all can be used in a few places, but to keep the /proc/sys/net/ipv6/conf/ sysctls work consistently in the namespace we should use the per-net devconf_all in the sysctl forwarding handler. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- net/ipv6/addrconf.c | 13 +++-- 1 files changed, 7 insertions(+), 6 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 9b96de3..cd90f9a 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -456,7 +456,7 @@ static void dev_forward_change(struct inet6_dev *idev) } -static void addrconf_forward_change(struct net *net) +static void addrconf_forward_change(struct net *net, __s32 newf) { struct net_device *dev; struct inet6_dev *idev; @@ -466,8 +466,8 @@ static void addrconf_forward_change(struct net *net) rcu_read_lock(); idev = __in6_dev_get(dev); if (idev) { - int changed = (!idev-cnf.forwarding) ^ (!ipv6_devconf.forwarding); - idev-cnf.forwarding = ipv6_devconf.forwarding; + int changed = (!idev-cnf.forwarding) ^ (!newf); + idev-cnf.forwarding = newf; if (changed) dev_forward_change(idev); } @@ -484,9 +484,10 @@ static void addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old) if (p == net-ipv6.devconf_dflt-forwarding) return; - if (p == ipv6_devconf.forwarding) { - net-ipv6.devconf_dflt-forwarding = ipv6_devconf.forwarding; - addrconf_forward_change(net); + if (p == net-ipv6.devconf_all-forwarding) { + __s32 newf = net-ipv6.devconf_all-forwarding; + net-ipv6.devconf_dflt-forwarding = newf; + addrconf_forward_change(net, newf); } else if ((!*p) ^ (!old)) dev_forward_change((struct inet6_dev *)table-extra1); -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
TCP/IP stack / SMP kernel
Hi All, I am just wondering how TCP/IP stack runs in SMP kernel with multi processor environment?. will TCP/IP stack be on one processor or it is shared among the different processors? thanks Jeba -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SMP code / network stack
On Thu, 10 Jan 2008 14:05:46 + Jeba Anandhan [EMAIL PROTECTED] wrote: Hi All, If a server has multiple processors and N number of ethernet cards, is it possible to handle transmission by each processor separately? .In other words, each processor will be responsible for tx of few ethernet cards?. Example: Server has 4 processors and 8 ethernet cards. is it possible for each processor for transmission using 2 ethernet cards only?. So that, at a instant , data will be send out from 8 ethernet cards. Hi Jeba Modern ethernet cards have a big TX queue, so that even one CPU is enough to keep several cards busy in // You can check /proc/interrupts and change /proc/irq/*/smp_affinities to direct IRQ to particular cpus, but transmit is usually trigered by processes that might run on different cpus. If all ethernet cards are on the same IRQ, then you might have a problem... Example on a dual processor : # cat /proc/interrupts CPU0 CPU1 0: 11472559 74291833IO-APIC-edge timer 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 81: 0 0 IO-APIC-level ohci_hcd 97: 1830022231847 IO-APIC-level ehci_hcd, eth0 121: 163095662 166443627 IO-APIC-level libata NMI: 0 0 LOC: 85887285 85887193 ERR: 0 MIS: 0 You can see eth0 is on IRQ 97 Then : # cat /proc/irq/97/smp_affinity 0001 # echo 2 /proc/irq/97/smp_affinity # grep 97 /proc/interrupts 97: 1830035216 2259 IO-APIC-level ehci_hcd, eth0 # sleep 10 # grep 97 /proc/interrupts 97: 1830035216 5482 IO-APIC-level ehci_hcd, eth0 You can see only CPU1 is now handling IRQ 97 (but CPU0 is allowed to give to eth0 some transmit work) You might want to check /proc/net/softnet_stat too. If your server is doing something very special (network trafic, no disk accesses or number crunching), you might need to bind application processes to cpus, not only network irqs. process A, using nic eth0 eth1, bound to CPU 0 (process and IRQs) process B, using nic eth2 eth3, bound to CPU 1 process C, using nic eth4 eth5, bound to CPU 2 process D, using nic eth6 eth7, bound to CPU 3 Also, take a look at ethtool -c ethX command -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ipip tunnel code (IPV4)
Hello, I am trying to learn the IPV4 ipip tunnel code (net/ipv4/ipip.c) and I have two little questions about semantics of variables: ipip_fb_tunnel_init - what does fb stand for ? In tunnels_wc : what does wc stand for ? Regards, Andy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24
On Thu, Jan 10, 2008 at 11:58:09AM +1100, Herbert Xu wrote: On Wed, Jan 09, 2008 at 03:19:10PM -0800, Jay Vosburgh wrote: No that's not the point. The point is to move the majority of the code into process context so that you can take the RTNL. Once you have taken the RTNL you can disable BH all you want and I don't care one bit. I'm not sure how we could move more code into a process context; much of the bonding driver is at the mercy of its callers, as in this case. The monitoring stuff and enslave / deslave is all in a process context now (workqueue). The transmit processing functions, for example, can't be assumed to be in any particular context as they're called by dev_queue_xmit. No I'm not calling for you to move any more code into process context. I was replying to the comment that changing the read_lock calls in process context to read_lock_bh somehow undoes the benefit of moving softirq code into process context. It does not since the point of the move is to be able to take the RTNL, which you can still do as long as you do it before you disable BH. That wasn't the only purpose, Herbert. Making sure that calls to dev_set_mac_address were called from process context was important at the time of the coding as well since at least the tg3 driver took locks that could not be taken reliably in soft-irq context. Michael Chan fixed this here: commit 986e0aeb9ae09127b401c3baa66f15b7a31f354c Author: Michael Chan [EMAIL PROTECTED] Date: Sat May 5 12:10:20 2007 -0700 [TG3]: Remove reset during MAC address changes. so if wasn't as much of an issue after that, but moving as much of the code to process context was important for that as well (hence the move to not continue to try to not use bh-locks everywhere). -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25][NEIGH]: Add a comment describing what a NUD stands for.
When I studied the neighbor code I puzzled over what the NUD can mean for quite a long time. Finally I asked Alexey and he said that this was smth like neighbor unreachability detection. Does it worth adding a comment helping future developers understand what's going on? Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/include/net/neighbour.h b/include/net/neighbour.h index 09f9fc6..bc34144 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -26,6 +26,10 @@ #include linux/sysctl.h #include net/rtnetlink.h +/* + * NUD stands for neighbor unreachability detection + */ + #define NUD_IN_TIMER (NUD_INCOMPLETE|NUD_REACHABLE|NUD_DELAY|NUD_PROBE) #define NUD_VALID (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE|NUD_STALE|NUD_DELAY) #define NUD_CONNECTED (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: No idea about shaping trough many pc
On Thu, Jan 10, 2008 at 12:06:35PM +0300, Badalian Vyacheslav wrote: Hello all. I try more then 2 month resolve problem witch my shaping. Maybe you can help for me? Sheme: +---+ + - | Shaping PC 1 | -+ / +---+ \ ++ / ++ \ + + | Cisco | + | Shaping PC N | ---+ -| CISCO | ++ \ ++ / +-+ \ +-+ / + - | Shaping PC 20 | + +-+ Network - Over 10k users. Common bandwidth to INTERNET more then 1 GBs All computers have BGP and turn on multipath. Cisco can't do load sharing by Packet (its can resolve all my problems =((( ). Only by DST IP, SRC IP, or +Level4. Ok. User must have speed 1mbs. Lets look variants: 1. Create rules to user = (1mbs/N computers). If user use N connection all great, but if it use 1 connection his speed = 1mbs/N - its not look good. All be great if cisco can PER PACKET load sharing =( 2. Create rules to user = 1mbs. If user use 1 connection all great, but if it use N connection his speed much more then needed limit =( Why i use 20 PC? Becouse 1 pc normal forward 100-150mbs... when it have 100% cpu usage on Sofware Interrupts... I have managed forwarding of 600Mbps using about 15% CPU load on a 500MHz Geode LX, using 4 100Mbit pcnet32 interfaces and a small tweak to how the NAPI is implemented on it. Adding traffic shapping and such to the processing would certainly increase the CPU load, but hopefully not by much. The reason I didn't get more than 600Mbps was that the PCI bus is now full. Any idea how to resolve this problem? In my dreams (feature request to netdev ;) ): Get PC - title: MASTER TC. All 20 PC syncronize statistic with MASTER and have common rules and statistic. Then i use variant 2 and will be happy... but its not real? =( Maybe have other variants? Well now sure about synchornizing and all that. I still think if I can manage 600Mbps forwarding rate using a slow poke Geode then a modern CPU like a Q6600 with a number of PCIe gig ports should be able to do quite a lot. The tweak I did was to add a timer to the driver that I can activate whenever I finish emptying the receive queue. When the timer expires it adds the port back to the NAPI queue, and when it is called again the poll will either process whatever packets arrived during the delay, or it will actually unmask the IRQ and go back to IRQ mode. The delay I use is 1 jiffy, and I run with 1000HZ and set the queues to 256 packets, since 1ms at 100MBps can provide at most about 200 packets (64byte worst case). I simply check whenever I empty the queue how many packets I just processed. If greater than 0, I enable the timer to expire on the next jiffy and leave the port masked after removing port from napi polling, and if it was 0 then I must have been called again after the timer expired and still had no packets to process in which case I unmask the IRQ and don't enable the timer. I had to change the HZ to 1000 since at 250 or 100 I wouldn't be able to handle the worst case number of packets (the pcnet32 has a maximum of 512 packets in a queue). With NAPI the normal behaviour is that whenever you empty the receive queue, you reenable IRQs, but it doesn't take that fast a CPU to actually empty the queue all the time and then you end up with the overhead for masking IRQs everytime you receive packets, process them, and then the overhead of unmasking the IRQ just to within a fraction of a milisecond getting an IRQ for the next packet. With the delay until the next jiffy for unmasking the IRQ you end up causing a potential lag on processing packets of up to 1ms, although on average less than that, but the IRQ load drops dramatically and the overhead of managing the IRQ masking and the IRQ handler goes away. In the case of this system the CPU load dropped from 90% at 500Mbps to 15% at 600Mbps, and the interrupt rate dropped from one IRQ every couple of packets, to one IRQ at the start of each burst of packets. I believe some GB ethernet ports and most 10Gig ports have the ability to do delayed IRQ where they wait for a certain number of packets before generating an IRQ, which is pretty much what I tried to emulate with my tweak and it sure works amazingly well. -- Len Sorensen -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9719] New: when a system is configured as a bridge, and at the same time configured to have multipath weighted route, with one leg goes thru NAT and another without NAT, the nat
Andrew Morton wrote: Distribution: iptables 1.4.0 was used with kernel 2.6.23 and iptables 1.3.8 with 2.6.22.15 Hardware Environment: 3 interfaces, 2 interfaces bridged to form br0, and another connects to internet using pppoe. Software Environment: bridge, multipath routing Problem Description: when a system is configured as a bridge with IP assigned to br0 interface, and at the same time it is configured to have multipath weighted default route, and one of the default route is NAT-ed and another of the default route is not NAT-ed, then it is NAT-ed interface will occasionally get packets leaking out to it with packets with private IPs. That is most likely because the route changes over time (when the cache is flushed) and the NAT mappings for the connection have been set up on a different interface. The way to properly do this is to add routing rules based on fwmark and use CONNMARK to bind a connection to one of the interfaces after the initial multipath routing decision. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio_net and SMP guests
Am Donnerstag, 10. Januar 2008 schrieb Christian Borntraeger: Am Donnerstag, 10. Januar 2008 schrieb Christian Borntraeger: Am Dienstag, 18. Dezember 2007 schrieb Rusty Russell: To me this points to doing interrupt suppression a different way. If we have a -disable_cb() virtio function, and call it before we call netif_rx_schedule, does that fix it? The fix looks good and I agree with it. There is one problem that I try to find for some days, but the following BUG_ON triggers: static void vring_disable_cb(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); START_USE(vq); BUG_ON(vq-vring.avail-flags VRING_AVAIL_F_NO_INTERRUPT); vq-vring.avail-flags |= VRING_AVAIL_F_NO_INTERRUPT; END_USE(vq); } Ok, I found it: static int virtnet_open(struct net_device *dev) { struct virtnet_info *vi = netdev_priv(dev); try_fill_recv(vi); /* If we didn't even get one input buffer, we're useless. */ if (vi-num == 0) return -ENOMEM; --- int for new packet static void skb_recv_done(struct virtqueue *rvq) { struct virtnet_info *vi = rvq-vdev-priv; /* Suppress further interrupts. */ rvq-vq_ops-disable_cb(rvq); netif_rx_schedule(vi-dev, vi-napi); } - poll is not yet possible, no softirq - return from interrupt napi_enable(vi-napi); vi-rvq-vq_ops-disable_cb(vi-rvq); --- BUG: its already disabled Btw. this problem also happens on single processor guests. What about the following patch: --- drivers/net/virtio_net.c |9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) Index: kvm/drivers/net/virtio_net.c === --- kvm.orig/drivers/net/virtio_net.c +++ kvm/drivers/net/virtio_net.c @@ -179,9 +179,12 @@ static void try_fill_recv(struct virtnet static void skb_recv_done(struct virtqueue *rvq) { struct virtnet_info *vi = rvq-vdev-priv; - /* Suppress further interrupts. */ - rvq-vq_ops-disable_cb(rvq); - netif_rx_schedule(vi-dev, vi-napi); + /* Schedule NAPI, Suppress further interrupts if successful. */ + + if (netif_rx_schedule_prep(vi-dev, vi-napi)) { + rvq-vq_ops-disable_cb(rvq); + __netif_rx_schedule(vi-dev, vi-napi); + } } static int virtnet_poll(struct napi_struct *napi, int budget) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[VLAN]: nested VLAN: fix lockdep's recursive locking warning
[VLAN]: nested VLAN: fix lockdep's recursive locking warning Allow vlans nesting other vlans without lockdep's warnings (max. 2 levels i.e. parent + child). Thanks to Patrick McHardy for pointing a bug in the first version of this patch. Reported-by: Benny Amorsen Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 4d14fded63dcaf9d5dcf78e2a8ea3f5de2c29eb9 tree 2f0792e8240151b1e5437b05130d1f569175f572 parent e2474f60798c97f5c05d29a906045dd1f416ba7f author Jarek Poplawski [EMAIL PROTECTED] Thu, 10 Jan 2008 16:25:00 +0100 committer Patrick McHardy [EMAIL PROTECTED] Thu, 10 Jan 2008 16:25:00 +0100 net/8021q/vlan.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 4add9bd..032bf44 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -323,6 +323,7 @@ static const struct header_ops vlan_header_ops = { static int vlan_dev_init(struct net_device *dev) { struct net_device *real_dev = VLAN_DEV_INFO(dev)-real_dev; + int subclass = 0; /* IFF_BROADCAST|IFF_MULTICAST; ??? */ dev-flags = real_dev-flags ~IFF_UP; @@ -349,7 +350,11 @@ static int vlan_dev_init(struct net_device *dev) dev-hard_start_xmit = vlan_dev_hard_start_xmit; } - lockdep_set_class(dev-_xmit_lock, vlan_netdev_xmit_lock_key); + if (real_dev-priv_flags IFF_802_1Q_VLAN) + subclass = 1; + + lockdep_set_class_and_subclass(dev-_xmit_lock, +vlan_netdev_xmit_lock_key, subclass); return 0; }
Re: [PATCH take2] Re: Nested VLAN causes recursive locking error
Jarek Poplawski wrote: As a matter of fact I started to doubt it's a real problem: 2 vlan headers in the row - is it working? Yes, apparently some people are using this. Anyway, as Patrick pointed, the previous patch was a bit buggy, and deeper nesting needs a little more (if it's can work too...). So, here is something minimal. Patrick, if you think about something else, then of course don't care about this patch. No, this seems fine, thanks. Even better would be a way to get the last lockdep subclass through lockdep somehow, but I couldn't find a clean way for this. So I've applied your patch and also fixed macvlan. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SMP code / network stack
Hi Eric, Thanks for the reply. I have one more doubt. For example, if we have 2 processor and 4 ethernet cards. Only CPU0 does all work through 8 cards. If we set the affinity to each ethernet card as CPU number, will it be efficient?. Will this be default behavior? # cat /proc/interrupts CPU0 CPU1 0: 11472559 74291833IO-APIC-edge timer 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 81: 0 0 IO-APIC-level ohci_hcd 97: 1830022231847 IO-APIC-level ehci_hcd, eth0 97: 3830012232847 IO-APIC-level ehci_hcd, eth1 97: 5830052231847 IO-APIC-level ehci_hcd, eth2 97: 6830032213847 IO-APIC-level ehci_hcd, eth3 #sleep 10 # cat /proc/interrupts CPU0 CPU1 0: 11472559 74291833IO-APIC-edge timer 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 81: 0 0 IO-APIC-level ohci_hcd 97: 2031409801847 IO-APIC-level ehci_hcd, eth0 97: 4813981390847 IO-APIC-level ehci_hcd, eth1 97: 7123982139847 IO-APIC-level ehci_hcd, eth2 97: 8030193010847 IO-APIC-level ehci_hcd, eth3 Instead of the above mentioned ,if we set the affinity for eth2 and eth3. the output will be # cat /proc/interrupts CPU0 CPU1 0: 11472559 74291833IO-APIC-edge timer 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 81: 0 0 IO-APIC-level ohci_hcd 97: 1830022231847 IO-APIC-level ehci_hcd, eth0 97: 3830012232847 IO-APIC-level ehci_hcd, eth1 97: 5830052231923 IO-APIC-level ehci_hcd, eth2 97: 68300322131230 IO-APIC-level ehci_hcd, eth3 #sleep 10 # cat /proc/interrupts CPU0 CPU1 0: 11472559 74291833IO-APIC-edge timer 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 81: 0 0 IO-APIC-level ohci_hcd 97: 2300022231847 IO-APIC-level ehci_hcd, eth0 97: 4010212232847 IO-APIC-level ehci_hcd, eth1 97: 58300522311847 IO-APIC-level ehci_hcd, eth2 97: 68300322132337 IO-APIC-level ehci_hcd, eth3 In this case, will the performance improves?. Thanks Jeba On Thu, 2008-01-10 at 15:45 +0100, Eric Dumazet wrote: -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[MACVLAN]: Prevent nesting macvlan devices
[MACVLAN]: Prevent nesting macvlan devices Don't allow to nest macvlan devices since it will cause lockdep warnings and isn't really useful for anything. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 80a76fbde679793a17482a3dd842386801fca66b tree 07f67e78ac0ae505a5de81e7e770a1b7d597f120 parent 4d14fded63dcaf9d5dcf78e2a8ea3f5de2c29eb9 author Patrick McHardy [EMAIL PROTECTED] Thu, 10 Jan 2008 16:25:01 +0100 committer Patrick McHardy [EMAIL PROTECTED] Thu, 10 Jan 2008 16:25:01 +0100 drivers/net/macvlan.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 2e4bcd5..e8dc2f4 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -384,6 +384,13 @@ static int macvlan_newlink(struct net_device *dev, if (lowerdev == NULL) return -ENODEV; + /* Don't allow macvlans on top of other macvlans - its not really + * wrong, but lockdep can't handle it and its not useful for anything + * you couldn't do directly on top of the real device. + */ + if (lowerdev-rtnl_link_ops == dev-rtnl_link_ops) + return -ENODEV; + if (!tb[IFLA_MTU]) dev-mtu = lowerdev-mtu; else if (dev-mtu lowerdev-mtu)
e1000 performance issue in 4 simultaneous links
Hello, I've perceived that there is a performance issue when running netperf against 4 e1000 links connected end-to-end to another machine with 4 e1000 interfaces. I have 2 4-port interfaces on my machine, but the test is just considering 2 port for each interfaces card. When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec of transfer rate. If I run 4 netperf against 4 different interfaces, I get around 720 * 10^6 bits/sec. If I run the same test against 2 interfaces I get a 940 * 10^6 bits/sec transfer rate also, and if I run it against 3 interfaces I get around 850 * 10^6 bits/sec performance. I got this results using the upstream netdev-2.6 branch kernel plus David Miller's 7 NAPI patches set[1]. In the kernel 2.6.23.12 the result is a bit worse, and the the transfer rate was around 600 * 10^6 bits/sec. [1] http://marc.info/?l=linux-netdevm=119977075917488w=2 PS: I am not using a switch in the middle of interfaces (they are end-to-end) and the connections are independents. -- Breno Leitao [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 performance issue in 4 simultaneous links
Breno Leitao wrote: Hello, I've perceived that there is a performance issue when running netperf against 4 e1000 links connected end-to-end to another machine with 4 e1000 interfaces. I have 2 4-port interfaces on my machine, but the test is just considering 2 port for each interfaces card. When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec of transfer rate. If I run 4 netperf against 4 different interfaces, I get around 720 * 10^6 bits/sec. snip I take it that's the average for individual interfaces, not the aggregate? RX processing for multi-gigabits per second can be quite expensive. This can be mitigated by interrupt moderation and NAPI polling, jumbo frames (MTU 1500) and/or Large Receive Offload (LRO). I don't think e1000 hardware does LRO, but the driver could presumably be changed use Linux's software LRO. Even with these optimisations, if all RX processing is done on a single CPU this can become a bottleneck. Does the test system have multiple CPUs? Are IRQs for the multiple NICs balanced across multiple CPUs? Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 performance issue in 4 simultaneous links
Ben, I am facing the performance issue when we try to bond the multiple interfaces with virtual interface. It could be related to this thread. My questions are, *) When we use mulitple NICs, will the performance of overall system be summation of all individual lines XX bits/sec. ? *) What are the factors improves the performance if we have multiple interfaces?. [ kind of tuning the parameters in proc ] Breno, I hope this thread will be helpful for performance issue which i have with bonding driver. Jeba On Thu, 2008-01-10 at 16:36 +, Ben Hutchings wrote: Breno Leitao wrote: Hello, I've perceived that there is a performance issue when running netperf against 4 e1000 links connected end-to-end to another machine with 4 e1000 interfaces. I have 2 4-port interfaces on my machine, but the test is just considering 2 port for each interfaces card. When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec of transfer rate. If I run 4 netperf against 4 different interfaces, I get around 720 * 10^6 bits/sec. snip I take it that's the average for individual interfaces, not the aggregate? RX processing for multi-gigabits per second can be quite expensive. This can be mitigated by interrupt moderation and NAPI polling, jumbo frames (MTU 1500) and/or Large Receive Offload (LRO). I don't think e1000 hardware does LRO, but the driver could presumably be changed use Linux's software LRO. Even with these optimisations, if all RX processing is done on a single CPU this can become a bottleneck. Does the test system have multiple CPUs? Are IRQs for the multiple NICs balanced across multiple CPUs? Ben. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
questions on NAPI processing latency and dropped network packets
Hi all, I've got an issue that's popped up with a deployed system running 2.6.10. I'm looking for some help figuring out why incoming network packets aren't being processed fast enough. After a recent userspace app change, we've started seeing packets being dropped by the ethernet hardware (e1000, NAPI is enabled). The error/dropped/fifo counts are going up in ethtool: rx_packets: 32180834 rx_bytes: 5480756958 rx_errors: 862506 rx_dropped: 771345 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_fifo_errors: 91161 rx_missed_errors: 91161 This link is receiving roughly 13K packets/sec, and we're dropping roughly 51 packets/sec due to fifo errors. Increasing the rx descriptor ring size from 256 up to around 3000 or so seems to make the problem stop, but it seems to me that this is just a workaround for the latency in processing the incoming packets. So, I'm looking for some suggestions on how to fix this or to figure out where the latency is coming from. Some additional information: 1) Interrupts are being processed on both cpus: [EMAIL PROTECTED]:/root cat /proc/interrupts CPU0 CPU1 30:17037564530785 U3-MPIC Level eth0 2) top shows a fair amount of time processing softirqs, but very little time in ksoftirqd (or is that a sampling artifact?). Tasks: 79 total, 1 running, 78 sleeping, 0 stopped, 0 zombie Cpu0: 23.6% us, 30.9% sy, 0.0% ni, 36.9% id, 0.0% wa, 0.3% hi, 8.3% si Cpu1: 30.4% us, 24.1% sy, 0.0% ni, 5.9% id, 0.0% wa, 0.7% hi, 38.9% si Mem: 4007812k total, 2199148k used, 1808664k free, 0k buffers Swap: 0k total, 0k used, 0k free, 219844k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5375 root 15 0 2682m 1.8g 6640 S 99.9 46.7 31:17.68 SigtranServices 7696 root 17 0 6952 3212 1192 S 7.3 0.1 0:15.75 schedmon.ppc210 7859 root 16 0 2688 1228 964 R 0.7 0.0 0:00.04 top 2956 root 8 -8 18940 7436 5776 S 0.3 0.2 0:01.35 blademtc 1 root 16 0 1660 620 532 S 0.0 0.0 0:30.62 init 2 root RT 0 000 S 0.0 0.0 0:00.01 migration/0 3 root 15 0 000 S 0.0 0.0 0:00.55 ksoftirqd/0 4 root RT 0 000 S 0.0 0.0 0:00.01 migration/1 5 root 15 0 000 S 0.0 0.0 0:00.43 ksoftirqd/1 3) /proc/sys/net/core/netdev_max_backlog is set to the default of 300 So...anyone have any ideas/suggestions? Thanks, Chris -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PROCFS] [NETNS] issue with /proc/net entries
Hi Eric, While testing the current network namespace stuff merged in net-2.6.25, I bumped into the following problem with the /proc/net/ entries. It doesn't always display the actual data of the current namespace, but sometime displays data from other namespaces. I bisected the problem to the commit: proc: remove/Fix proc generic d_revalidate 3790ee4bd86396558eedd86faac1052cb782e4e1 The problem: If a process in a particular network namespace changes current directory to /proc/net, then processes in other network namespaces trying to look at /proc/net entries will see data from the first namespace (the one with CWD /proc/net). (See test case below). As you comments in the commit suggest, you seem to be aware of some issues when CONFIG_NET_NS=y. Is it one of these corner cases you identified? Any idea on how we can fix it? Thanks. Benjamin Test case: -- (1) Shell 1, in init namespace: $ cat /proc/net/dev lo ... eth0 ... (2) Shell 2, in another network namespace $ cat /proc/net/dev lo ... (3) Shell 1 $ cd /proc/net $ cat dev lo ... eth0 ... (4) Shell 2 $ cat /proc/net/dev lo ... eth0 ... Argh, lo + eth0 in child namespace the device list of init netns is displayed in /proc/net/dev of child namespace :-( (5) Shell 1 $ cd / (6) Shell 2 $ cat /proc/net/dev lo ... Back to normality. -- B e n j a m i n T h e r y - BULL/DT/Open Software RD http://www.bull.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.23+] ingress classify to [nf]mark
To classid x:y = mark=markx|y (classid :y = -j MARK --set-mark y, etc). --- linux-2.6.23-gentoo-r2/net/sched/Kconfig +++ linux-2.6.23-gentoo-r2.fixed/net/sched/Kconfig @@ -222,6 +222,16 @@ To compile this code as a module, choose M here: the module will be called sch_ingress. +config NET_SCH_INGRESS_TC2MARK + bool ingress classify - mark + depends on NET_SCH_INGRESS NET_CLS_ACT + ---help--- + This enables access to mark value via classid + Example: set tc filter ... flowid|classid 1:2 + eq netfilter mark mark=mark1|2 + + But classid may be undefined (?) - use flowid :0. + comment Classification config NET_CLS --- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c +++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c @@ -161,2 +161,5 @@ skb-tc_index = TC_H_MIN(res.classid); +#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK + skb-mark = (skb-mark(res.classid16))|TC_H_MIN(res.classid); +#endif default: -- WBR, Denis Kaganovich, [EMAIL PROTECTED] http://mahatma.bspu.unibel.by -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 performance issue in 4 simultaneous links
On Thu, 2008-01-10 at 16:36 +, Ben Hutchings wrote: When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec of transfer rate. If I run 4 netperf against 4 different interfaces, I get around 720 * 10^6 bits/sec. snip I take it that's the average for individual interfaces, not the aggregate? Right, each of these results are for individual interfaces. Otherwise, we'd have a huge problem. :-) This can be mitigated by interrupt moderation and NAPI polling, jumbo frames (MTU 1500) and/or Large Receive Offload (LRO). I don't think e1000 hardware does LRO, but the driver could presumably be changed use Linux's software LRO. Without using these features and keeping the MTU as 1500, do you think we could get a better performance than this one? I also tried to increase my interface MTU to 9000, but I am afraid that netperf only transmits packets with less than 1500. Still investigating. single CPU this can become a bottleneck. Does the test system have multiple CPUs? Are IRQs for the multiple NICs balanced across multiple CPUs? Yes, this machine has 8 ppc 1.9Ghz CPUs. And the IRQs are balanced across the CPUs, as I see in /proc/interrupts: # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 16:940760 1047904993777 975813 XICS Level IPI 18: 4 3 4 1 3 6 8 3 XICS Level hvc_console 19: 0 0 0 0 0 0 0 0 XICS Level RAS_EPOW 273: 10728 10850 10937 10833 10884 10788 10868 10776 XICS Level eth4 275: 0 0 0 0 0 0 0 0 XICS Level ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3 277: 234933 230275 229770 234048 235906 229858 229975 233859 XICS Level eth6 278: 266225 267606 262844 265985 268789 266869 263110 267422 XICS Level eth7 279:893919857909867917 894881 XICS Level eth0 305: 439246 439117 438495 436072 438053 440111 438973 438951 XICS Level eth0 Neterion Xframe II 10GbE network adapter 321: 3268 3088 3143 3113 3305 2982 3326 3084 XICS Level ipr 323: 268030 273207 269710 271338 270306 273258 270872 273281 XICS Level eth16 324: 215012 221102 219494 216732 216531 220460 219718 218654 XICS Level eth17 325: 7103 3580 7246 3475 7132 3394 7258 3435 XICS Level pata_pdc2027x BAD: 4216 Thanks, -- Breno Leitao [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: questions on NAPI processing latency and dropped network packets
Chris Friesen wrote: Hi all, I've got an issue that's popped up with a deployed system running 2.6.10. I'm looking for some help figuring out why incoming network packets aren't being processed fast enough. After a recent userspace app change, we've started seeing packets being dropped by the ethernet hardware (e1000, NAPI is enabled). The error/dropped/fifo counts are going up in ethtool: rx_packets: 32180834 rx_bytes: 5480756958 rx_errors: 862506 rx_dropped: 771345 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_fifo_errors: 91161 rx_missed_errors: 91161 This link is receiving roughly 13K packets/sec, and we're dropping roughly 51 packets/sec due to fifo errors. Increasing the rx descriptor ring size from 256 up to around 3000 or so seems to make the problem stop, but it seems to me that this is just a workaround for the latency in processing the incoming packets. So, I'm looking for some suggestions on how to fix this or to figure out where the latency is coming from. Some additional information: 1) Interrupts are being processed on both cpus: [EMAIL PROTECTED]:/root cat /proc/interrupts CPU0 CPU1 30:17037564530785 U3-MPIC Level eth0 2) top shows a fair amount of time processing softirqs, but very little time in ksoftirqd (or is that a sampling artifact?). Tasks: 79 total, 1 running, 78 sleeping, 0 stopped, 0 zombie Cpu0: 23.6% us, 30.9% sy, 0.0% ni, 36.9% id, 0.0% wa, 0.3% hi, 8.3% si Cpu1: 30.4% us, 24.1% sy, 0.0% ni, 5.9% id, 0.0% wa, 0.7% hi, 38.9% si Mem: 4007812k total, 2199148k used, 1808664k free, 0k buffers Swap: 0k total, 0k used, 0k free, 219844k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5375 root 15 0 2682m 1.8g 6640 S 99.9 46.7 31:17.68 SigtranServices 7696 root 17 0 6952 3212 1192 S 7.3 0.1 0:15.75 schedmon.ppc210 7859 root 16 0 2688 1228 964 R 0.7 0.0 0:00.04 top 2956 root 8 -8 18940 7436 5776 S 0.3 0.2 0:01.35 blademtc 1 root 16 0 1660 620 532 S 0.0 0.0 0:30.62 init 2 root RT 0 000 S 0.0 0.0 0:00.01 migration/0 3 root 15 0 000 S 0.0 0.0 0:00.55 ksoftirqd/0 4 root RT 0 000 S 0.0 0.0 0:00.01 migration/1 5 root 15 0 000 S 0.0 0.0 0:00.43 ksoftirqd/1 3) /proc/sys/net/core/netdev_max_backlog is set to the default of 300 So...anyone have any ideas/suggestions? You're using 2.6.10... you can always replace the e1000 module with the out-of-tree version from e1000.sf.net, this might help a bit - the version in the 2.6.10 kernel is very very old. it also appears that your app is eating up CPU time. perhaps setting the app to a nicer nice level might mitigate things a bit. Also turn off the in-kernel irq mitigation, it just causes cache misses and you really need the network irq to sit on a single cpu at most (if not all) the time to get the best performance. Use the userspace irqbalance daemon instead to achieve this. Auke -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.23+] ingress classify to [nf]mark
Dzianis Kahanovich wrote: --- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c +++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c @@ -161,2 +161,5 @@ skb-tc_index = TC_H_MIN(res.classid); +#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK +skb-mark = (skb-mark(res.classid16))|TC_H_MIN(res.classid); +#endif default: Behaviour like this shouldn't depend on compile-time options. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: questions on NAPI processing latency and dropped network packets
Kok, Auke wrote: You're using 2.6.10... you can always replace the e1000 module with the out-of-tree version from e1000.sf.net, this might help a bit - the version in the 2.6.10 kernel is very very old. Do you have any reason to believe this would improve things? It seems like the problem lies in the NAPI/softirq code rather than in the e1000 driver itself, no? it also appears that your app is eating up CPU time. perhaps setting the app to a nicer nice level might mitigate things a bit. If we're not handling the softirq work from ksoftirqd how would changing scheduler settings affect anything? Also turn off the in-kernel irq mitigation, it just causes cache misses and you really need the network irq to sit on a single cpu at most (if not all) the time to get the best performance. Use the userspace irqbalance daemon instead to achieve this. Using userspace irqbalance would be some effort to test and deploy properly. However, as a quick test I tried setting the irq affinity for this device and it didn't help. One thing that might be of interest is that it seems to be bursty rather than gradual. Here are some timestamps (in seconds) along with the number of overruns on eth0: 6552.15 overruns:260097 6552.69 overruns:260097 6553.32 overruns:260097 6553.83 overruns:260097 6554.35 overruns:260097 6554.87 overruns:260097 6555.41 overruns:260097 6555.94 overruns:260097 6556.51 overruns:260097 6557.07 overruns:260282 6557.58 overruns:260282 6558.23 overruns:260282 Chris -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: questions on NAPI processing latency and dropped network packets
Chris Friesen wrote: Hi all, I've got an issue that's popped up with a deployed system running 2.6.10. I'm looking for some help figuring out why incoming network packets aren't being processed fast enough. After a recent userspace app change, we've started seeing packets being dropped by the ethernet hardware (e1000, NAPI is enabled). What's changed in your application? Any real-time threads in there? From the top output below, looks like SigtranServices is consuming all your CPU... The error/dropped/fifo counts are going up in ethtool: rx_packets: 32180834 rx_bytes: 5480756958 rx_errors: 862506 rx_dropped: 771345 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_fifo_errors: 91161 rx_missed_errors: 91161 This link is receiving roughly 13K packets/sec, and we're dropping roughly 51 packets/sec due to fifo errors. Increasing the rx descriptor ring size from 256 up to around 3000 or so seems to make the problem stop, but it seems to me that this is just a workaround for the latency in processing the incoming packets. So, I'm looking for some suggestions on how to fix this or to figure out where the latency is coming from. Some additional information: 1) Interrupts are being processed on both cpus: [EMAIL PROTECTED]:/root cat /proc/interrupts CPU0 CPU1 30:17037564530785 U3-MPIC Level eth0 2) top shows a fair amount of time processing softirqs, but very little time in ksoftirqd (or is that a sampling artifact?). Tasks: 79 total, 1 running, 78 sleeping, 0 stopped, 0 zombie Cpu0: 23.6% us, 30.9% sy, 0.0% ni, 36.9% id, 0.0% wa, 0.3% hi, 8.3% si Cpu1: 30.4% us, 24.1% sy, 0.0% ni, 5.9% id, 0.0% wa, 0.7% hi, 38.9% si Mem: 4007812k total, 2199148k used, 1808664k free, 0k buffers Swap: 0k total, 0k used, 0k free, 219844k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5375 root 15 0 2682m 1.8g 6640 S 99.9 46.7 31:17.68 SigtranServices 7696 root 17 0 6952 3212 1192 S 7.3 0.1 0:15.75 schedmon.ppc210 7859 root 16 0 2688 1228 964 R 0.7 0.0 0:00.04 top 2956 root 8 -8 18940 7436 5776 S 0.3 0.2 0:01.35 blademtc 1 root 16 0 1660 620 532 S 0.0 0.0 0:30.62 init 2 root RT 0 000 S 0.0 0.0 0:00.01 migration/0 3 root 15 0 000 S 0.0 0.0 0:00.55 ksoftirqd/0 4 root RT 0 000 S 0.0 0.0 0:00.01 migration/1 5 root 15 0 000 S 0.0 0.0 0:00.43 ksoftirqd/1 3) /proc/sys/net/core/netdev_max_backlog is set to the default of 300 So...anyone have any ideas/suggestions? Thanks, Chris -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 performance issue in 4 simultaneous links
Many many things to check when running netperf :) *) Are the cards on the same or separate PCImumble bus, and what sort of bus *) is the two interface performance two interfaces on the same four-port card, or an interface from each of the two four-port cards? *) is there a dreaded (IMO) irqbalance daemon running? one of the very first things I do when running netperf is terminate the irqbalance daemon with as extreme a predjudice as I can. *) what is the distribution of interrupts from the interfaces to the CPUs? if you've tried to set that manually, the dreaded irqbalance daemon will come along shortly thereafter and ruin everything. *) what does netperf say about the overall CPU utilization of the system(s) when the tests are running? *) what does top say about the utilization of any single CPU in the system(s) when the tests are running? *) are you using the global -T option to spread the netperf/netserver processes across the CPUs, or leaving that all up to the stack/scheduler/etc? I suspect there could be more but that is what comes to mind thusfar as far as things I often check when running netperf. rick jones -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SMP code / network stack
Arnaldo Carvalho de Melo wrote: Em Thu, Jan 10, 2008 at 03:26:59PM +, Jeba Anandhan escreveu: Hi Eric, Thanks for the reply. I have one more doubt. For example, if we have 2 processor and 4 ethernet cards. Only CPU0 does all work through 8 cards. If we set the affinity to each ethernet card as CPU number, will it be efficient?. Will this be default behavior? # cat /proc/interrupts CPU0 CPU1 0: 11472559 74291833IO-APIC-edge timer 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 81: 0 0 IO-APIC-level ohci_hcd 97: 1830022231847 IO-APIC-level ehci_hcd, eth0 97: 3830012232847 IO-APIC-level ehci_hcd, eth1 97: 5830052231847 IO-APIC-level ehci_hcd, eth2 97: 6830032213847 IO-APIC-level ehci_hcd, eth3 another thing to try: if you don't need usb2 support, remove the ehci_hcd module - this will give a slight less overhead servicing irq's in your system. I take it that you have no MSI support in these ethernet cards? Auke -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 performance issue in 4 simultaneous links
Breno Leitao wrote: On Thu, 2008-01-10 at 16:36 +, Ben Hutchings wrote: When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec of transfer rate. If I run 4 netperf against 4 different interfaces, I get around 720 * 10^6 bits/sec. snip I take it that's the average for individual interfaces, not the aggregate? Right, each of these results are for individual interfaces. Otherwise, we'd have a huge problem. :-) This can be mitigated by interrupt moderation and NAPI polling, jumbo frames (MTU 1500) and/or Large Receive Offload (LRO). I don't think e1000 hardware does LRO, but the driver could presumably be changed use Linux's software LRO. Without using these features and keeping the MTU as 1500, do you think we could get a better performance than this one? I also tried to increase my interface MTU to 9000, but I am afraid that netperf only transmits packets with less than 1500. Still investigating. single CPU this can become a bottleneck. Does the test system have multiple CPUs? Are IRQs for the multiple NICs balanced across multiple CPUs? Yes, this machine has 8 ppc 1.9Ghz CPUs. And the IRQs are balanced across the CPUs, as I see in /proc/interrupts: which is wrong and hurts performance. you want your ethernet irq's to stick to a CPU for long times to prevent cache thrash. please disable the in-kernel irq balancing code and use the userspace `irqbalance` daemon. Gee I should put that in my signature, I already wrote that twice today :) Auke # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 16:940760 1047904993777 975813 XICS Level IPI 18: 4 3 4 1 3 6 8 3 XICS Level hvc_console 19: 0 0 0 0 0 0 0 0 XICS Level RAS_EPOW 273: 10728 10850 10937 10833 10884 10788 10868 10776 XICS Level eth4 275: 0 0 0 0 0 0 0 0 XICS Level ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3 277: 234933 230275 229770 234048 235906 229858 229975 233859 XICS Level eth6 278: 266225 267606 262844 265985 268789 266869 263110 267422 XICS Level eth7 279:893919857909867917 894881 XICS Level eth0 305: 439246 439117 438495 436072 438053 440111 438973 438951 XICS Level eth0 Neterion Xframe II 10GbE network adapter 321: 3268 3088 3143 3113 3305 2982 3326 3084 XICS Level ipr 323: 268030 273207 269710 271338 270306 273258 270872 273281 XICS Level eth16 324: 215012 221102 219494 216732 216531 220460 219718 218654 XICS Level eth17 325: 7103 3580 7246 3475 7132 3394 7258 3435 XICS Level pata_pdc2027x BAD: 4216 Thanks, -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] New driver sfc for Solarstorm SFC4000 controller - 4th attempt
This is a resubmission of a new driver for Solarflare network controllers. The driver supports several types of PHY (10Gbase-T, XFP, CX4) on six different 10G and 1G boards. Hardware based on this network controller is now available from SMC as part numbers SMC10GPCIe-XFP and SMC10GPCIe-10BT. The previous thread was: http://marc.info/?l=linux-netdevm=119825632209357w=2 Thanks to the people who looked at the previous patches. We have addressed the following from comments received after the 3rd submission: - Kerneldoc style comment - Kconfig changes - Reduced size slightly I am also sending a request to [EMAIL PROTECTED] for review of the MTD part of the driver. Previous reviewers have noted that the driver is quite large (but it would not be the largest network driver by source or compiled module size). I think it is a reasonable size for a driver that supports a fully featured NIC, across a range of MACs, PHYs and silicon revisions. One aspect that is worth mentioning is that the NIC has no firmware. A benefit is no dreaded binary blob! A downside is that more support code is needed but this tends to be around initialisation and is readable commented C. To give a small break down of the sizes of the different driver parts (wc output) Core control/datapath | 5001 16405 139467 = efx.c rx.c tx.c Controller HW support | 3653 11823 107554 = falcon.c HW defs | 1588 4838 47050 = falcon_hwdefs.h board support | 1848 7105 52455 MAC support | 1623 4977 51007 PHY support | 2196 7904 67711 Headers | 4565 20645 162402 Self test code| 863 3088 24981 Ethtool support | 751 2144 22845 MTD code (separate module)| 1021 3200 26944 Debugfs Code (KConfig option) | 863 2543 24896 Are there further review comments that we need to address before it can be merged? The patch (against net-2.6.25) is at: https://support.solarflare.com/netdev/4/net-2.6.25-sfc-2.2.0038.patch The new files may also be downloaded as a tarball: https://support.solarflare.com/netdev/4/net-2.6.25-sfc-2.2.0038.tgz And for verification there is: https://support.solarflare.com/netdev/4/MD5SUMS Regards -- Rob Stonehouse -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 performance issue in 4 simultaneous links
I also tried to increase my interface MTU to 9000, but I am afraid that netperf only transmits packets with less than 1500. Still investigating. It may seem like picking a tiny nit, but netperf never transmits packets. It only provides buffers of specified size to the stack. It is then the stack which transmits and determines the size of the packets on the network. Drifting a bit more... While there are settings, conditions and known stack behaviours where one can be confident of the packet size on the network based on the options passed to netperf, generally speaking one should not ass-u-me a direct relationship between the options one passes to netperf and the size of the packets on the network. And for JumboFrames to be effective it must be set on both ends, otherwise the TCP MSS exchange will result in the smaller of the two MTU's winning as it were. single CPU this can become a bottleneck. Does the test system have multiple CPUs? Are IRQs for the multiple NICs balanced across multiple CPUs? Yes, this machine has 8 ppc 1.9Ghz CPUs. And the IRQs are balanced across the CPUs, as I see in /proc/interrupts: That suggests to me anyway that the dreaded irqbalanced is running, shuffling the interrupts as you go. Not often a happy place for running netperf when one want's consistent results. # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 16:940760 1047904993777975813 XICS Level IPI 18: 4 3 4 1 3 6 8 3 XICS Level hvc_console 19: 0 0 0 0 0 0 0 0 XICS Level RAS_EPOW 273: 10728 10850 10937 10833 10884 10788 10868 10776 XICS Level eth4 275: 0 0 0 0 0 0 0 0 XICS Level ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3 277: 234933 230275 229770 234048 235906 229858 229975 233859 XICS Level eth6 278: 266225 267606 262844 265985 268789 266869 263110 267422 XICS Level eth7 279:893919857909867917 894881 XICS Level eth0 305: 439246 439117 438495 436072 438053 440111 438973 438951 XICS Level eth0 Neterion Xframe II 10GbE network adapter 321: 3268 3088 3143 3113 3305 2982 3326 3084 XICS Level ipr 323: 268030 273207 269710 271338 270306 273258 270872 273281 XICS Level eth16 324: 215012 221102 219494 216732 216531 220460 219718 218654 XICS Level eth17 325: 7103 3580 7246 3475 7132 3394 7258 3435 XICS Level pata_pdc2027x BAD: 4216 IMO, what you want (in the absence of multi-queue NICs) is one CPU taking the interrupts of one port/interface, and each port/interface's interrupts going to a separate CPU. So, something that looks roughly like concocted example: CPU0 CPU1 CPU2 CPU3 1: 12340 00 eth0 2: 0 1234 00 eth1 3: 00 12340 eth2 4: 00 0 1234 eth3 which you should be able to acheive via the method I think someone else has already mentioned about echoing values into /proc/irq/irq/smp_affinity - after you have slain the dreaded irqbalance daemon. rick jones -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: questions on NAPI processing latency and dropped network packets
1) Interrupts are being processed on both cpus: [EMAIL PROTECTED]:/root cat /proc/interrupts CPU0 CPU1 30:17037564530785 U3-MPIC Level eth0 IIRC none of the e1000 driven cards are multi-queue, so while the above shows that interrupts from eth0 have been processed on both CPUs at various points in the past, it doesn't necessarily mean that they are being processed on both CPUs at the same time right? rick jones -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ipip tunnel code (IPV4)
Andy, -Original Message- From: Andy Johnson [mailto:[EMAIL PROTECTED] Sent: Thursday, January 10, 2008 6:35 AM To: netdev@vger.kernel.org Subject: ipip tunnel code (IPV4) Hello, I am trying to learn the IPV4 ipip tunnel code (net/ipv4/ipip.c) and I have two little questions about semantics of variables: ipip_fb_tunnel_init - what does fb stand for ? In tunnels_wc : what does wc stand for ? Similar names occur in net/ipv6/sit.c, which is the IPv6-in-IPv4 analog of ipip.c. I am 90% certain that wc stands for wildcard - it is used for selecting the default tunnel interface when no other tunnel interfaces match a specific (src, dst) pair. In that light, I assume fb stands for something like fallback although I am not certain. It would seem to fit though, because the fallback tunnel interface is the one that is selected by a wildcard match. Would be interested if anyone could confirm or correct my assumptions. Thanks - Fred [EMAIL PROTECTED] Regards, Andy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: questions on NAPI processing latency and dropped network packets
Chris Friesen wrote: Kok, Auke wrote: You're using 2.6.10... you can always replace the e1000 module with the out-of-tree version from e1000.sf.net, this might help a bit - the version in the 2.6.10 kernel is very very old. Do you have any reason to believe this would improve things? It seems like the problem lies in the NAPI/softirq code rather than in the e1000 driver itself, no? your real issue is that your userspace app is hogging the CPU. While network is not really cpu intensive, it does require that ample time at many intervals is given to the CPU to run cleanups and prevent FIFO issues. alternatively, you can increase your rx/tx ring descriptor count (with ethtool), which basically makes it easier for the hardware not to be serviced for a longer period, since there are more buffers available and the card can go longer on when userspace is hogging the CPU. it also appears that your app is eating up CPU time. perhaps setting the app to a nicer nice level might mitigate things a bit. If we're not handling the softirq work from ksoftirqd how would changing scheduler settings affect anything? correct, it might not. Also turn off the in-kernel irq mitigation, it just causes cache misses and you really need the network irq to sit on a single cpu at most (if not all) the time to get the best performance. Use the userspace irqbalance daemon instead to achieve this. Using userspace irqbalance would be some effort to test and deploy properly. However, as a quick test I tried setting the irq affinity for this device and it didn't help. irqbalance is a simple userspace app that drops into any system seemlessly and does the best job all around - often it beats manual tuning of smp_affinity even ;) Auke -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: questions on NAPI processing latency and dropped network packets
Rick Jones wrote: 1) Interrupts are being processed on both cpus: [EMAIL PROTECTED]:/root cat /proc/interrupts CPU0 CPU1 30:17037564530785 U3-MPIC Level eth0 IIRC none of the e1000 driven cards are multi-queue the pci-express variants are, but the functionality is almost always disabled (and relatively new anyway). even with multiqueue, you can still have only a single irq line (which defeats the purpose of course mostly). , so while the above shows that interrupts from eth0 have been processed on both CPUs at various points in the past, it doesn't necessarily mean that they are being processed on both CPUs at the same time right? never will, an irq can only be processed on one cpu at a time anyway, obviously the irq here has been migrated ONCE from one of the cpu's to the other. unfortunately you can't see from /proc/interrupts whether this happens frequently or not, or how many times it happened before. Auke -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [ipw3945-devel] [PATCH 2/5] iwlwifi: iwl3945 synchronize interruptand tasklet for down iwlwifi
On , Joonwoo Park wrote: --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c @@ -6262,6 +6262,10 @@ static void __iwl_down(struct iwl_priv *priv) /* tell the device to stop sending interrupts */ iwl_disable_interrupts(priv); + /* synchronize irq and tasklet */ + synchronize_irq(priv-pci_dev-irq); + tasklet_kill(priv-irq_tasklet); + Could synchronize_irq() be moved into iwl_disable_interrupts() ? I am also wondering if we cannot call tasklet_kill() before iwl_disable_interrupts() ... thus preventing it from being scheduled when we are going down. Reinette -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9721] New: wake on lan fails with sky2 module
Stephen Hemminger schrieb: On Wed, 9 Jan 2008 16:03:00 -0800 Andrew Morton [EMAIL PROTECTED] wrote: (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 9 Jan 2008 13:05:34 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9721 Summary: wake on lan fails with sky2 module Product: ACPI Version: 2.5 KernelVersion: 2.6.24-rc7 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Power-Sleep-Wake AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] This post-2.6.23 regression was assigned to ACPI but is quite possibly a net driver problem? Latest working kernel version: 2.6.23.12 Earliest failing kernel version: 2.6.24-rc6 (not tested earlier kernel, 2.6.24-rc7 still failing) Distribution: Ubuntu 8.04 (but Kernel build from Kernel.org and system modifiet to make wake on lan work, i.e. network cards are not shutted down on poweroff) Hardware Environment: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20) onboard Asus P5W DH motherboard, uses module SKY2 Software Environment: Problem Description: When enabling wake on lan with: 'ethtool -s eth0 wol' i get the following status: 21:56:29 ~ # sudo ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pg Wake-on: g wol enabled Current message level: 0x00ff (255) Link detected: yes but after shutting down the pc doesn't wake up when magic packet is sent. the status lights of the network card are still on (so the card seems to be online). same system with only changed kernel to 2.6.23.12 and same procedure like above: wake on lan works. Steps to reproduce: enable wol on your network card using SKY2 module and it doesn't work too? if you need more information, just tell me, it's my first bug report. regards Wake from power off works on 2.6.24-rc7 for me. Wake from suspend doesn't because Network Manager, HAL, or some other user space tool gets confused. I just rechecked it with Fujitsu Lifebook, which has sky2 (88E8055). There many variations of this chip, and it maybe chip specific problem or ACPI/BIOS issues. If you don't enable Wake on Lan in BIOS, the driver can't do it for you. Also, check how you are shutting down. Also since the device has to restart the PHY, it could be a switch issue if you have some fancy pants switch doing intrusion detection or something, but I doubt that. Is it a clean or fast shutdown, most distributions mark network devices as down on shutdown, but if the distribution does something stupid like remove the driver module, then the driver is unable to setup Wake On Lan. The wake on lan setup is done in one place in the driver, add a printk to see if it is ever called. I only tried wake from shutdown (poweroff), and like i wrote, on the same system with kernel 2.6.23.12 (nothing changed but vmlinuz and initrd, with the same kernel config on 2.6.24-rc6/7 (make oldconfig, default answer to all questions)), it works. so it seems to me like a problem in the kernel. every wake-up setting (wake up by pci-device, rtc-alarm, modem ...) in bios is also enabled, otherwise it couldn't work in 2.6.23.12 (and windows). if you say your sky2-card works, it might be a acpi-problem not related to sky2 like i thought - when i am at home i'll try to start my pc with a timer (-- /proc/acpi/alarm) from kernel 2.6.24-rc7 to check if acpi-wakeup works and report back (if it is any help in finding the source of my problem). and regarding printk i'll try to find out what you mean (my first steps into kernel debugging :) - i think you mean adding a line in the source to print out something when the function is called) regards -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PROCFS] [NETNS] issue with /proc/net entries
Benjamin Thery [EMAIL PROTECTED] writes: Hi Eric, While testing the current network namespace stuff merged in net-2.6.25, I bumped into the following problem with the /proc/net/ entries. It doesn't always display the actual data of the current namespace, but sometime displays data from other namespaces. I bisected the problem to the commit: proc: remove/Fix proc generic d_revalidate 3790ee4bd86396558eedd86faac1052cb782e4e1 The problem: If a process in a particular network namespace changes current directory to /proc/net, then processes in other network namespaces trying to look at /proc/net entries will see data from the first namespace (the one with CWD /proc/net). (See test case below). As you comments in the commit suggest, you seem to be aware of some issues when CONFIG_NET_NS=y. Is it one of these corner cases you identified? Any idea on how we can fix it? Yes. It isn't especially hard. I have most of it in my queue I just need to get the silly patches out of there. Essentially we need to fix the caching of proc_generic entries, So that we can have a proper d_revalidate implementation. To get d_revalidate and the caching correct for /proc/net will take just a bit more work. We need to make /proc/net a symlink to something like /proc/self/net so that we don't get excess revalidates when switching between different processes. Or else we can't properly implement the case you have described. Where being in the directory causes the wrong version of /proc/net to show up. Changing the contents of the dentry for /proc/net should only happen during unshare. Not when we switch between processes or else we get into the d_revalidate leaks mount points problem again. We also need the check to see if something is mounted on top of us before we call drop the dentry. But if we don't even try until we know the dentry is invalid it should not be too bad. Eric -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: igb: Intel 82575 gigabit ethernet driver (take #2)
Jeff Garzik wrote: Looks pretty decent. Main comments (style mostly, driver operation path seems sound): thanks again for the comments. I am about to send an updated patch just before my vacation and before I do let me just quickly touch on your comments below: * kill the bitfields and unions [in descriptor structs]. they are not endian-safe as presented, generate poor code, and are otherwise undesirable. that bitfield was unused and so I removed the code. I don't see any more bitfields at all now in this driver. * the basic operations are too verbose: E1000_READ_REG(hw, REGISTER) is far more readable as ER32(REGISTER), following the style of other drivers. Furthermore, the E1000_ prefix, in addition to being overly redundant (used in each register read/write), it is also incorrect, because this is not E1000... partially I agree, and I refined the register writes to remove the need for the hw part. However the hardware *is* e1000, we ended up making a new driver since it just does not make sense to add all of this infrastructure for older chipsets anymore. renaming everything (from e1000_ to igb_) would just make life for us really hard looking up historical diffs, history etc. and most importantly compare with e1000/e1000e when we encounter an issue that might affect the other drivers. For now it is easier to just leave these alone. I however do not rule out that we change this at a later stage ... * in general, rename everything with e1000_ prefix. this will eliminate plenty of human confusion in the long run. I'm doing this for all functions, which solves the namespace collisions. The e1000 specific static structs (which are the same in igb as they are in e1000, e1000e) as well as the registers (ditto) I'll keep unchanged for now. * API: unless you have chips in the lab that will require an API hook, don't create one. For example, a direct call to e1000_acquire_nvm_82575() should replace all -acquire_nvm() hooks if there are no chips in pipeline GUARANTEED to have a different -acquire_nvm() feature. Noted Note also that there are already many less hooks as there are in e1000e. We did already make an effort to scrub as many as we can. Cheers, Auke -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
debian iproute2 patches branch rebased.
Hello Stephen! I've rebased the patches branch we carry in debian on top of the new 080108 release of iproute2. See patches branch of git://git.debian.org/git/collab-maint/pkg-iproute I've dropped one of the patches you picked up[1], so there's now one of the old ones left and a new manpage for routel/routef. (Any reason you didn't pull the actual commit we served you with git?) The old remaining patch fixes the infinite loop in ip route flush exactly the same way you fixed the same problem in ip neigh flush[2]. An additional patch will be provided in a followup mail (not available in Debian) that was created by request from Patrick McHardy. This one makes max rounds configurable (and 0 means try to infinity, so you can restore old behaviour). Patrick and me disagrees on what the default should be[3]. He thinks the 'ip route flush' aka 'loop forever' behaviour should stay, while I vote for the 'ip neigh flush' behaviour of bailing out after N attempts. IMNSHO looping infinitely is an *insane* default. Specially since this is a tool used in bootup scripts [1]: See commit ea5dd59c03b36fe2acec8f03a8d7a2f7b7036b04 [2]: See commit 660818498d0f5a3f52c05355a3e82c23f670fcc1 Where the comment seems to be wrong about Limit ip route flush..., since it's actually ip neigh flush that's being modified. [3]: Read thread from here on: http://www.spinics.net/lists/netdev/msg44920.html commit 1eef590948f81b5c84e8450d5c95dd73744b4278 Author: Andreas Henriksson [EMAIL PROTECTED] Date: Thu Jan 3 16:48:56 2008 +0100 Add routel and routef man page. diff --git a/Makefile b/Makefile index de04176..723eb5d 100644 --- a/Makefile +++ b/Makefile @@ -56,6 +56,7 @@ install: all ln -sf lnstat.8 $(DESTDIR)$(MANDIR)/man8/rtstat.8 ln -sf lnstat.8 $(DESTDIR)$(MANDIR)/man8/ctstat.8 ln -sf rtacct.8 $(DESTDIR)$(MANDIR)/man8/nstat.8 + ln -sf routel.8 $(DESTDIR)$(MANDIR)/man8/routef.8 install -m 0755 -d $(DESTDIR)$(MANDIR)/man3 install -m 0644 $(shell find man/man3 -maxdepth 1 -type f) $(DESTDIR)$(MANDIR)/man3 diff --git a/man/man8/routel.8 b/man/man8/routel.8 new file mode 100644 index 000..cdf8f55 --- /dev/null +++ b/man/man8/routel.8 @@ -0,0 +1,32 @@ +.TH ROUTEL 8 3 Jan, 2008 iproute2 Linux +.SH NAME +.LP +routel \- list routes with pretty output format +.br +routef \- flush routes +.SH SYNTAX +.LP +routel [\fItablenr\fP [\fIraw ip args...\fP]] +.br +routef +.SH DESCRIPTION +.LP +These programs are a set of helper scripts you can use instead of raw iproute2 commands. +.br +The routel script will list routes in a format that some might consider easier to interpret then the ip route list equivalent. +.br +The routef script does not take any arguments and will simply flush the routing table down the drain. Beware! This means deleting all routes which will make your network unusable! + +.SH FILES +.LP +\fI/usr/bin/routef\fP +.br +\fI/usr/bin/routel\fP +.SH AUTHORS +.LP +The routel script was written by Stephen R. van den Berg [EMAIL PROTECTED], 1999/04/18 and donated to the public domain. +.br +This manual page was written by Andreas Henriksson [EMAIL PROTECTED], for the Debian GNU/Linux system. +.SH SEE ALSO +.LP +ip(8) commit 1d1dab5826d1a9091e0bb2cf832f0785dc2add63 Author: Daniel Silverstone [EMAIL PROTECTED] Date: Fri Oct 19 13:32:24 2007 +0200 Avoid infinite loop in ip addr flush. Fix ip addr flush the same way ip neigh flush was previously fixed, by bailing out if the flush hasn't completed after MAX_ROUNDS (10) tries. diff --git a/ip/ipaddress.c b/ip/ipaddress.c index d1c6620..34379d0 100644 --- a/ip/ipaddress.c +++ b/ip/ipaddress.c @@ -34,6 +34,8 @@ #include ll_map.h #include ip_common.h +#define MAX_ROUNDS 10 + static struct { int ifindex; @@ -667,7 +669,7 @@ int ipaddr_list_or_flush(int argc, char **argv, int flush) filter.flushp = 0; filter.flushe = sizeof(flushb); - for (;;) { + while (round MAX_ROUNDS) { if (rtnl_wilddump_request(rth, filter.family, RTM_GETADDR) 0) { perror(Cannot send dump request); exit(1); @@ -694,6 +696,8 @@ int ipaddr_list_or_flush(int argc, char **argv, int flush) fflush(stdout); } } + fprintf(stderr, *** Flush remains incomplete after %d rounds. ***\n, MAX_ROUNDS); fflush(stderr); + return 1; } if (filter.family != AF_PACKET) { -- Regards, Andreas Henriksson -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Please pull 'fixes-jgarzik' branch of wireless-2.6
Jeff, A couple more fixes for 2.6.24. The one from Mattias Nissler is already in your upstream tree...FYI. Let me know if there are problems! Thanks, John --- Individual patches available here: http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/fixes-jgarzik/ --- The following changes since commit 3ce54450461bad18bbe1f9f5aa3ecd2f8e8d1235: Linus Torvalds (1): Linux 2.6.24-rc7 are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git fixes-jgarzik Ivo van Doorn (1): rt2x00: Corectly initialize rt2500usb MAC Mattias Nissler (1): rt2x00: Allow rt61 to catch up after a missing tx report drivers/net/wireless/rt2x00/rt2500usb.c |2 +- drivers/net/wireless/rt2x00/rt61pci.c | 13 + 2 files changed, 14 insertions(+), 1 deletions(-) diff --git a/drivers/net/wireless/rt2x00/rt2500usb.c b/drivers/net/wireless/rt2x00/rt2500usb.c index 50775f9..18b1f91 100644 --- a/drivers/net/wireless/rt2x00/rt2500usb.c +++ b/drivers/net/wireless/rt2x00/rt2500usb.c @@ -257,7 +257,7 @@ static const struct rt2x00debug rt2500usb_rt2x00debug = { static void rt2500usb_config_mac_addr(struct rt2x00_dev *rt2x00dev, __le32 *mac) { - rt2500usb_register_multiwrite(rt2x00dev, MAC_CSR2, mac, + rt2500usb_register_multiwrite(rt2x00dev, MAC_CSR2, mac, (3 * sizeof(__le16))); } diff --git a/drivers/net/wireless/rt2x00/rt61pci.c b/drivers/net/wireless/rt2x00/rt61pci.c index 01dbef1..0d9436d 100644 --- a/drivers/net/wireless/rt2x00/rt61pci.c +++ b/drivers/net/wireless/rt2x00/rt61pci.c @@ -1738,6 +1738,7 @@ static void rt61pci_txdone(struct rt2x00_dev *rt2x00dev) { struct data_ring *ring; struct data_entry *entry; + struct data_entry *entry_done; struct data_desc *txd; u32 word; u32 reg; @@ -1791,6 +1792,18 @@ static void rt61pci_txdone(struct rt2x00_dev *rt2x00dev) !rt2x00_get_field32(word, TXD_W0_VALID)) return; + entry_done = rt2x00_get_data_entry_done(ring); + while (entry != entry_done) { + /* Catch up. Just report any entries we missed as +* failed. */ + WARNING(rt2x00dev, + TX status report missed for entry %p\n, + entry_done); + rt2x00pci_txdone(rt2x00dev, entry_done, TX_FAIL_OTHER, +0); + entry_done = rt2x00_get_data_entry_done(ring); + } + /* * Obtain the status about this packet. */ -- John W. Linville [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Please pull 'upstream-jgarzik-2' branch of wireless-2.6
Jeff, This is additive on top of the pull request posted on Tuesday evening: http://marc.info/?l=linux-wirelessm=119985065704687w=2 If you pull this one, you will get that one as well. Please let me know if there are any problems! Thanks, John --- Individual patches are available here: http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/upstream-jgarzik-2/ --- The following changes since commit deb27641a93290475f6c66b99d2fceabbc28d6fb: Michael Buesch (1): zd1211rw: fix alignment for QOS and WDS frames are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git upstream-jgarzik-2 John W. Linville (2): b43: finish removal of pio support iwlwifi: fix-up damage from rebase of namespace separation patches Michael Buesch (3): b43: Add N-PHY register definitions b43: Fix PHY register routing b43: Remove the PHY spinlock Pavel Roskin (1): hostap_cs: don't match revisions in presense of the MAC chip name drivers/net/wireless/b43/Makefile |1 + drivers/net/wireless/b43/b43.h | 17 +- drivers/net/wireless/b43/debugfs.c |6 +- drivers/net/wireless/b43/lo.c | 64 ++-- drivers/net/wireless/b43/main.c |4 - drivers/net/wireless/b43/nphy.c | 34 ++ drivers/net/wireless/b43/nphy.h | 706 +++ drivers/net/wireless/b43/phy.c | 235 + drivers/net/wireless/b43/phy.h | 57 +-- drivers/net/wireless/b43/pio.c | 652 - drivers/net/wireless/b43/pio.h | 153 -- drivers/net/wireless/hostap/hostap_cs.c | 15 +- drivers/net/wireless/iwlwifi/iwl3945-base.c |9 +- drivers/net/wireless/iwlwifi/iwl4965-base.c |9 +- 14 files changed, 949 insertions(+), 1013 deletions(-) create mode 100644 drivers/net/wireless/b43/nphy.c create mode 100644 drivers/net/wireless/b43/nphy.h delete mode 100644 drivers/net/wireless/b43/pio.c delete mode 100644 drivers/net/wireless/b43/pio.h Omnibus patch attached as upstream-jgarzik-2.patch.bz2 -- John W. Linville [EMAIL PROTECTED] upstream-jgarzik-2.patch.bz2 Description: BZip2 compressed data
iproute2: make max rounds in ip {neigh,addr} flush configurable.
On tor, 2008-01-10 at 20:54 +0100, Andreas Henriksson wrote: An additional patch will be provided in a followup mail (not available in Debian) that was created by request from Patrick McHardy. This one makes max rounds configurable (and 0 means try to infinity, so you can restore old behaviour). In my opinion 10 tries should be enough for anyone, but here's the patch anyway. This one is on top of the patches in the previous mail. diff --git a/ip/ipaddress.c b/ip/ipaddress.c index ff9e318..232fd64 100644 --- a/ip/ipaddress.c +++ b/ip/ipaddress.c @@ -67,6 +67,7 @@ static void usage(void) fprintf(stderr,ip addr del IFADDR dev STRING\n); fprintf(stderr,ip addr {show|flush} [ dev STRING ] [ scope SCOPE-ID ]\n); fprintf(stderr, [ to PREFIX ] [ FLAG-LIST ] [ label PATTERN ]\n); + fprintf(stderr, [ maxrounds N ]\n); fprintf(stderr, IFADDR := PREFIX | ADDR peer PREFIX\n); fprintf(stderr, [ broadcast ADDR ] [ anycast ADDR ]\n); fprintf(stderr, [ label STRING ] [ scope SCOPE-ID ]\n); @@ -566,6 +567,7 @@ int ipaddr_list_or_flush(int argc, char **argv, int flush) struct nlmsg_list *l, *n; char *filter_dev = NULL; int no_link = 0; + unsigned maxrounds = MAX_ROUNDS; ipaddr_reset_filter(oneline); filter.showqueue = 1; @@ -630,6 +632,10 @@ int ipaddr_list_or_flush(int argc, char **argv, int flush) } else if (strcmp(*argv, label) == 0) { NEXT_ARG(); filter.label = *argv; + } else if (strcmp(*argv, maxrounds) == 0) { + NEXT_ARG(); + if (get_unsigned(maxrounds, *argv, 0)) + invarg(maxrounds must be 0 (infinite) or higher, maxrounds); } else { if (strcmp(*argv, dev) == 0) { NEXT_ARG(); @@ -669,7 +675,7 @@ int ipaddr_list_or_flush(int argc, char **argv, int flush) filter.flushp = 0; filter.flushe = sizeof(flushb); - while (round MAX_ROUNDS) { + while (maxrounds == 0 || round maxrounds) { if (rtnl_wilddump_request(rth, filter.family, RTM_GETADDR) 0) { perror(Cannot send dump request); exit(1); @@ -696,7 +702,10 @@ int ipaddr_list_or_flush(int argc, char **argv, int flush) fflush(stdout); } } - fprintf(stderr, *** Flush remains incomplete after %d rounds. ***\n, MAX_ROUNDS); fflush(stderr); + fprintf(stderr, + *** Flush remains incomplete after %u rounds. ***\n, + maxrounds); + fflush(stderr); return 1; } diff --git a/ip/ipneigh.c b/ip/ipneigh.c index db684f5..61fac66 100644 --- a/ip/ipneigh.c +++ b/ip/ipneigh.c @@ -53,6 +53,7 @@ static void usage(void) [ nud { permanent | noarp | stale | reachable } ]\n | proxy ADDR } [ dev DEV ]\n); fprintf(stderr,ip neigh {show|flush} [ to PREFIX ] [ dev DEV ] [ nud STATE ]\n); + fprintf(stderr, [ maxrounds N ]\n); exit(-1); } @@ -321,6 +322,7 @@ int do_show_or_flush(int argc, char **argv, int flush) { char *filter_dev = NULL; int state_given = 0; + unsigned maxrounds = MAX_ROUNDS; ipneigh_reset_filter(); @@ -361,6 +363,10 @@ int do_show_or_flush(int argc, char **argv, int flush) if (state == 0) state = 0x100; filter.state |= state; + } else if (strcmp(*argv, maxrounds) == 0) { + NEXT_ARG(); + if (get_unsigned(maxrounds, *argv, 0)) + invarg(maxrounds must be 0 (infinite) or higher, maxrounds); } else { if (strcmp(*argv, to) == 0) { NEXT_ARG(); @@ -392,7 +398,7 @@ int do_show_or_flush(int argc, char **argv, int flush) filter.flushe = sizeof(flushb); filter.state = ~NUD_FAILED; - while (round MAX_ROUNDS) { + while (maxrounds == 0 || round maxrounds) { if (rtnl_wilddump_request(rth, filter.family, RTM_GETNEIGH) 0) { perror(Cannot send dump request); exit(1); @@ -418,8 +424,10 @@ int do_show_or_flush(int argc, char **argv, int flush) fflush(stdout); } } - printf(*** Flush not complete bailing out after %d
Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24
On Thu, Jan 10, 2008 at 09:51:44AM -0500, Andy Gospodarek wrote: That wasn't the only purpose, Herbert. Making sure that calls to dev_set_mac_address were called from process context was important at the time of the coding as well since at least the tg3 driver took locks that could not be taken reliably in soft-irq context. Michael Chan fixed this here: Sure, but where do you call that function while holding the bond lock? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24
Herbert Xu [EMAIL PROTECTED] wrote: On Thu, Jan 10, 2008 at 09:51:44AM -0500, Andy Gospodarek wrote: That wasn't the only purpose, Herbert. Making sure that calls to dev_set_mac_address were called from process context was important at the time of the coding as well since at least the tg3 driver took locks that could not be taken reliably in soft-irq context. Michael Chan fixed this here: Sure, but where do you call that function while holding the bond lock? If I recall correctly, the problem was that tg3, et al, did things that might sleep, and bonding was calling from a timer context, which couldn't sleep. It wasn't about the lock. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re : Re : Re : Re : Bonding : Monitoring of 4965 wireless card
[EMAIL PROTECTED] wrote: Yes it's what i'm looking for. I don't understand how to change the arp_ip_target with the gateway, arp_ip_target is a module option. If you're running a relatively recent bonding driver (version 3.0.0 or later), the arp_ip_targets can be changed on the fly via sysfs, e.g., echo +10.0.0.1 /sys/class/net/bond0/bonding/arp_ip_target echo -20.0.0.1 /sys/class/net/bond0/bonding/arp_ip_target You can check out Documentation/networking/bonding.txt (in the kernel source code) for more details. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - Message d'origine De : Jay Vosburgh [EMAIL PROTECTED] À : [EMAIL PROTECTED] Cc : netdev@vger.kernel.org Envoyé le : Jeudi, 10 Janvier 2008, 0h26mn 38s Objet : Re: Re : Re : Re : Bonding : Monitoring of 4965 wireless card [EMAIL PROTECTED] wrote: I mean that instead of arp test an ip in lan or else, i want it to test 127.0.0.1 but in order to do this it must go out and re-enter and then use wlan0 to go out. In other words, what I think you're saying (and I'm not entirely sure here) is that you want probes to go to a remote node on the network, and back, without having to actually know the identity of the remote node (because, presumably, on a roaming type of wireless configuration, your gateway and whatnot can change from time to time). Is that what you're looking for? That isn't available now, but might be straightforward to plug into the address update system to keep the arp_ip_target up to date as the current gateway as the gateway changes. I haven't looked into the details of doing that, but in theory it sounds straightforward. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24
Andy Gospodarek [EMAIL PROTECTED] wrote: [...] That wasn't the only purpose, Herbert. Making sure that calls to dev_set_mac_address were called from process context was important at the time of the coding as well since at least the tg3 driver took locks that could not be taken reliably in soft-irq context. Michael Chan fixed this here: commit 986e0aeb9ae09127b401c3baa66f15b7a31f354c Author: Michael Chan [EMAIL PROTECTED] Date: Sat May 5 12:10:20 2007 -0700 [TG3]: Remove reset during MAC address changes. so if wasn't as much of an issue after that, but moving as much of the code to process context was important for that as well (hence the move to not continue to try to not use bh-locks everywhere). Well, not for tg3 perhaps, but other network device drivers do the same thing (if memory serves, any USB ethernet adapter will have issues there). Also, I believe the netlink notifier callback, rtnetlink_event, which every dev_set_whatever calls, does a possibly-sleeping memory allocation (rtmsg_ifinfo - nlmsg_new - alloc_skb(GFP_KERNEL)); so we don't really want to hold extra locks for that, either. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24
On Thu, Jan 10, 2008 at 04:03:53PM -0500, Andy Gospodarek wrote: Sure, but where do you call that function while holding the bond lock? If I recall correctly, the problem was that tg3, et al, did things that might sleep, and bonding was calling from a timer context, which couldn't sleep. It wasn't about the lock. Exactly, I was just about to post the same. In other words, changing read_lock on bond-lock to read_lock_bh doesn't affect this one bit. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH take2] Re: Nested VLAN causes recursive locking error
On Thu, Jan 10, 2008 at 04:31:22PM +0100, Patrick McHardy wrote: ... No, this seems fine, thanks. Even better would be a way to get the last lockdep subclass through lockdep somehow, but I couldn't find a clean way for this. So I've applied your patch and also fixed macvlan. As a matter of fact this simplified version was done mainly to remove this bad looking effect of a never decreased global. Of course, your proposal with using parent's subclass + 1 would be better, if deeper nestings are required: so, I could try to enhance this (probably with such additional lockdep macro) after some hint. But still some 'quirks' are possible there: removing and adding devices 'properly' would often require resetting of many subclasses, so quite a lot of activities if more devices. And probably not very common if not requested until now... Thanks, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24
On Thu, Jan 10, 2008 at 12:50:46PM -0800, Jay Vosburgh wrote: Herbert Xu [EMAIL PROTECTED] wrote: On Thu, Jan 10, 2008 at 09:51:44AM -0500, Andy Gospodarek wrote: That wasn't the only purpose, Herbert. Making sure that calls to dev_set_mac_address were called from process context was important at the time of the coding as well since at least the tg3 driver took locks that could not be taken reliably in soft-irq context. Michael Chan fixed this here: Sure, but where do you call that function while holding the bond lock? If I recall correctly, the problem was that tg3, et al, did things that might sleep, and bonding was calling from a timer context, which couldn't sleep. It wasn't about the lock. Exactly, I was just about to post the same. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: e1000 performance issue in 4 simultaneous links
Breno Leitao wrote: When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec of transfer rate. If I run 4 netperf against 4 different interfaces, I get around 720 * 10^6 bits/sec. This is actually a known issue that we have worked with your company before on. It comes down to your system's default behavior of round robining interrupts (see cat /proc/interrupts while running the test) combined with e1000's way of exiting / rescheduling NAPI. The default round robin behavior of the interrupts on your system is the root cause of this issue, and here is what happens: 4 interfaces start generating interrupts, if you're lucky the round robin balancer has them all on different cpus. As the e1000 driver goes into and out of polling mode, the round robin balancer keeps moving the interrupt to the next cpu. Eventually 2 or more driver instances end up on the same CPU, which causes both driver instances to stay in NAPI polling mode, due to the amount of work being done, and that there are always more than netdev-weight packets to do for each instance. This keeps *hardware* interrupts for each interface *disabled*. Staying in NAPI polling mode causes higher cpu utilization on that one processor, which guarantees that when the hardware round robin balancer moves any other network interrupt onto that CPU, it too will join the NAPI polling mode chain. So no matter how many processors you have, with this round robin style of hardware interrupts, it guarantees you that if there is a lot of work to do (more than weight) at each softirq, then, all network interfaces will end up on the same cpu eventually (the busiest one) Your performance becomes the same as if you had booted with maxcpus=1 I hope this explanation makes sense, but what it comes down to is that combining hardware round robin balancing with NAPI is a BAD IDEA. In general the behavior of hardware round robin balancing is bad and I'm sure it is causing all sorts of other performance issues that you may not even be aware of. I'm sure your problem will go away if you run e1000 in interrupt mode. (use make CFLAGS_EXTRA=-DE1000_NO_NAPI) If I run the same test against 2 interfaces I get a 940 * 10^6 bits/sec transfer rate also, and if I run it against 3 interfaces I get around 850 * 10^6 bits/sec performance. I got this results using the upstream netdev-2.6 branch kernel plus David Miller's 7 NAPI patches set[1]. In the kernel 2.6.23.12 the result is a bit worse, and the the transfer rate was around 600 * 10^6 bits/sec. Thank you for testing the latest kernel.org kernel. Hope this helps. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'fixes-jgarzik' branch of wireless-2.6
On Thu, Jan 10, 2008 at 02:49:22PM -0500, John W. Linville wrote: Jeff, A couple more fixes for 2.6.24. The one from Mattias Nissler is already in your upstream tree...FYI. Let me know if there are problems! Please disregard this request. The 'upstream-jgarzik-2' request is still valid. Thanks, John -- John W. Linville [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH take2] Re: Nested VLAN causes recursive locking error
On Thu, Jan 10, 2008 at 10:08:16PM +0100, Jarek Poplawski wrote: ... But still some 'quirks' are possible there: removing and adding devices 'properly' would often require resetting of many subclasses, ...Hmm, probably they are always removed from/with the children, then no problem! (I know, I could've checked...) Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: questions on NAPI processing latency and dropped network packets
James Chapman wrote: What's changed in your application? Any real-time threads in there? From the top output below, looks like SigtranServices is consuming all your CPU... There are two cpus, and SigtranServices is multithreaded with many threads. Most of these threads are affined to cpu0, a couple to cpu1. None of the threads are realtime. Top is showing 37% idle on cpu0, and 6% idle on cpu1, so not all the cpu is being consumed. However, I'm wondering if we're hitting bursty bits and we're just running out of time. I'm going to try a system with MAX_SOFTIRQ_RESTART bumped up a bit, and also enable profiling. Chris -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.23+] ingress classify to [nf]mark
On Thu, 2008-10-01 at 17:05 -0200, Dzianis Kahanovich wrote: To classid x:y = mark=markx|y (classid :y = -j MARK --set-mark y, etc). --- linux-2.6.23-gentoo-r2/net/sched/Kconfig +++ linux-2.6.23-gentoo-r2.fixed/net/sched/Kconfig @@ -222,6 +222,16 @@ [..] skb-tc_index = TC_H_MIN(res.classid); +#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK + skb-mark = (skb-mark(res.classid16))|TC_H_MIN(res.classid); +#endif default: Please either use ipt action and netfilter fwmarker for this activity or create a new action. If you choose the later (example because you want to dynamically compute the mark), look at net/sched/act_simple.c to start from and i can help you if you have any questions. If you want to use ipt action, the syntax would be something like: --- tc qdisc add dev XXX ingress tc filter add dev XXX parent : protocol ip prio 5 \ u32 blah bleh \ flowid 1:12 action ipt -j mark --set-mark 13 - cheers, jamal -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Please pull 'fixes-jgarzik' branch of wireless-2.6 (use this one)
[2nd try -- turns-out the Mattis Nissler patch needed an extra tweak. It will probably also cause build breakage when you rebase since rt2x00lib_txdone(...) becomes rt2x00pci_txdone(rt2x00dev,...) in 2.6.25, so FYI... :-) This also includes another patch (the 4 byte boundary one) which is already in your upstream branch. So, beware of that one too. :-)] Jeff, Three more fixes for 2.6.24. The one from Mattias Nissler is already in your upstream tree...FYI. Let me know if there are problems! Thanks, John --- Individual patches available here: http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/fixes-jgarzik/ --- The following changes since commit 3ce54450461bad18bbe1f9f5aa3ecd2f8e8d1235: Linus Torvalds (1): Linux 2.6.24-rc7 are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git fixes-jgarzik Ivo van Doorn (2): rt2x00: Corectly initialize rt2500usb MAC rt2x00: Put 802.11 data on 4 byte boundary Mattias Nissler (1): rt2x00: Allow rt61 to catch up after a missing tx report drivers/net/wireless/rt2x00/rt2500usb.c |2 +- drivers/net/wireless/rt2x00/rt2x00pci.c | 20 drivers/net/wireless/rt2x00/rt2x00usb.c | 17 +++-- drivers/net/wireless/rt2x00/rt61pci.c | 12 4 files changed, 44 insertions(+), 7 deletions(-) diff --git a/drivers/net/wireless/rt2x00/rt2500usb.c b/drivers/net/wireless/rt2x00/rt2500usb.c index 50775f9..18b1f91 100644 --- a/drivers/net/wireless/rt2x00/rt2500usb.c +++ b/drivers/net/wireless/rt2x00/rt2500usb.c @@ -257,7 +257,7 @@ static const struct rt2x00debug rt2500usb_rt2x00debug = { static void rt2500usb_config_mac_addr(struct rt2x00_dev *rt2x00dev, __le32 *mac) { - rt2500usb_register_multiwrite(rt2x00dev, MAC_CSR2, mac, + rt2500usb_register_multiwrite(rt2x00dev, MAC_CSR2, mac, (3 * sizeof(__le16))); } diff --git a/drivers/net/wireless/rt2x00/rt2x00pci.c b/drivers/net/wireless/rt2x00/rt2x00pci.c index 2780df0..6d5d9ab 100644 --- a/drivers/net/wireless/rt2x00/rt2x00pci.c +++ b/drivers/net/wireless/rt2x00/rt2x00pci.c @@ -124,7 +124,10 @@ void rt2x00pci_rxdone(struct rt2x00_dev *rt2x00dev) struct data_entry *entry; struct data_desc *rxd; struct sk_buff *skb; + struct ieee80211_hdr *hdr; struct rxdata_entry_desc desc; + int header_size; + int align; u32 word; while (1) { @@ -138,17 +141,26 @@ void rt2x00pci_rxdone(struct rt2x00_dev *rt2x00dev) memset(desc, 0x00, sizeof(desc)); rt2x00dev-ops-lib-fill_rxdone(entry, desc); + hdr = (struct ieee80211_hdr *)entry-data_addr; + header_size = + ieee80211_get_hdrlen(le16_to_cpu(hdr-frame_control)); + + /* +* The data behind the ieee80211 header must be +* aligned on a 4 byte boundary. +*/ + align = NET_IP_ALIGN + (2 * (header_size % 4 == 0)); + /* * Allocate the sk_buffer, initialize it and copy * all data into it. */ - skb = dev_alloc_skb(desc.size + NET_IP_ALIGN); + skb = dev_alloc_skb(desc.size + align); if (!skb) return; - skb_reserve(skb, NET_IP_ALIGN); - skb_put(skb, desc.size); - memcpy(skb-data, entry-data_addr, desc.size); + skb_reserve(skb, align); + memcpy(skb_put(skb, desc.size), entry-data_addr, desc.size); /* * Send the frame to rt2x00lib for further processing. diff --git a/drivers/net/wireless/rt2x00/rt2x00usb.c b/drivers/net/wireless/rt2x00/rt2x00usb.c index 1f5675d..ab4797e 100644 --- a/drivers/net/wireless/rt2x00/rt2x00usb.c +++ b/drivers/net/wireless/rt2x00/rt2x00usb.c @@ -221,7 +221,9 @@ static void rt2x00usb_interrupt_rxdone(struct urb *urb) struct data_ring *ring = entry-ring; struct rt2x00_dev *rt2x00dev = ring-rt2x00dev; struct sk_buff *skb; + struct ieee80211_hdr *hdr; struct rxdata_entry_desc desc; + int header_size; int frame_size; if (!test_bit(DEVICE_ENABLED_RADIO, rt2x00dev-flags) || @@ -253,9 +255,20 @@ static void rt2x00usb_interrupt_rxdone(struct urb *urb) skb_put(skb, frame_size); /* -* Trim the skb_buffer to only contain the valid -* frame data (so ignore the device's descriptor). +* The data behind the ieee80211 header must be +* aligned on a 4 byte boundary. +* After that trim the entire buffer down to only +* contain the valid frame data excluding the device +* descriptor. */ + hdr = (struct ieee80211_hdr *)entry-skb-data; +
Re: AF_UNIX MSG_PEEK bug?
Here's what I think is a better patch. Or maybe just simpler. However, I'm still unsure what the effect of this patch on file descriptor passing might be. Reading the prior code, and the parallel portions/comments in unix_dgram_recvmsg(), it looks like there's been a lot of uncertainty as to how file descriptor passing should be handled durning MSG_PEEK operations. To quote: /* It is questionable: on PEEK we could: - do not return fds - good, but too simple 8) - return fds, and do not return them on read (old strategy, apparently wrong) - clone fds (I chose it for now, it is the most universal solution) POSIX 1003.1g does not actually define this clearly at all. POSIX 1003.1g doesn't define a lot of things clearly however! */ With this patch, passed file descriptors are ignored during MSG_PEEK. This is essentially the first case in the comment above. What I can't seem to figure out is why this is incorrect. I suspect there's some history here that I can't find via Google, mailing list archives, or revision logs. So, that said, here's a cleaner patch. It's still not ready for application until the file descriptor passing is better understood. Thanks, Brent diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 060bba4..6d6cdb4 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1750,6 +1750,8 @@ static int unix_stream_recvmsg(struct ki int target; int err = 0; long timeo; + struct sk_buff *skb; + struct sk_buff_head peek_stack; err = -EINVAL; if (sk-sk_state != TCP_ESTABLISHED) @@ -1759,6 +1761,9 @@ static int unix_stream_recvmsg(struct ki if (flagsMSG_OOB) goto out; + if (flags MSG_PEEK) + skb_queue_head_init(peek_stack); + target = sock_rcvlowat(sk, flagsMSG_WAITALL, size); timeo = sock_rcvtimeo(sk, flagsMSG_DONTWAIT); @@ -1778,7 +1783,6 @@ static int unix_stream_recvmsg(struct ki do { int chunk; - struct sk_buff *skb; unix_state_lock(sk); skb = skb_dequeue(sk-sk_receive_queue); @@ -1864,19 +1868,14 @@ static int unix_stream_recvmsg(struct ki if (siocb-scm-fp) break; - } - else - { - /* It is questionable, see note in unix_dgram_recvmsg. -*/ - if (UNIXCB(skb).fp) - siocb-scm-fp = scm_fp_dup(UNIXCB(skb).fp); + } else + __skb_queue_head(peek_stack, skb); + } while (size); - /* put message back and return */ + /* Push all peeked skbs back onto receive queue */ + if (flags MSG_PEEK) + while ((skb = __skb_dequeue(peek_stack))) skb_queue_head(sk-sk_receive_queue, skb); - break; - } - } while (size); mutex_unlock(u-readlock); scm_recv(sock, msg, siocb-scm, flags); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AF_UNIX MSG_PEEK bug?
and the parallel portions/comments in unix_dgram_recvmsg(), it looks like there's been a lot of uncertainty as to how file descriptor passing should be handled durning MSG_PEEK operations. To quote: The specs basically don't answer the question. What is critical is that the behaviour does not change compared with older Linux releases. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re : Re : Re : Re : Re : Bonding : Monitoring of 4965 wireless card
I try arp monitoring but it doesn't work! Test an ip, the interface must have an address, and the dhcpcd is launch by ifplugd if bond0 is linked ... so it goes around in circles. So i return to miimon, and i figured out that bond detect when wlan0 is associated and set it active interface. But when i switch rf_kill it don't react. So i try to deassociate and magic it detect interface off!! I presume it is a bug of the wlan driver which not re-initialise the info on the wlan. So i made a small script in acpi to provide that behavior. - Message d'origine De : Jay Vosburgh [EMAIL PROTECTED] À : [EMAIL PROTECTED] Cc : netdev@vger.kernel.org Envoyé le : Jeudi, 10 Janvier 2008, 21h59mn 20s Objet : Re: Re : Re : Re : Re : Bonding : Monitoring of 4965 wireless card [EMAIL PROTECTED] wrote: Yes it's what i'm looking for. I don't understand how to change the arp_ip_target with the gateway, arp_ip_target is a module option. If you're running a relatively recent bonding driver (version 3.0.0 or later), the arp_ip_targets can be changed on the fly via sysfs, e.g., echo +10.0.0.1 /sys/class/net/bond0/bonding/arp_ip_target echo -20.0.0.1 /sys/class/net/bond0/bonding/arp_ip_target You can check out Documentation/networking/bonding.txt (in the kernel source code) for more details. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - Message d'origine De : Jay Vosburgh [EMAIL PROTECTED] À : [EMAIL PROTECTED] Cc : netdev@vger.kernel.org Envoyé le : Jeudi, 10 Janvier 2008, 0h26mn 38s Objet : Re: Re : Re : Re : Bonding : Monitoring of 4965 wireless card [EMAIL PROTECTED] wrote: I mean that instead of arp test an ip in lan or else, i want it to test 127.0.0.1 but in order to do this it must go out and re-enter and then use wlan0 to go out. In other words, what I think you're saying (and I'm not entirely sure here) is that you want probes to go to a remote node on the network, and back, without having to actually know the identity of the remote node (because, presumably, on a roaming type of wireless configuration, your gateway and whatnot can change from time to time). Is that what you're looking for? That isn't available now, but might be straightforward to plug into the address update system to keep the arp_ip_target up to date as the current gateway as the gateway changes. I haven't looked into the details of doing that, but in theory it sounds straightforward. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html _ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
Eric Dumazet wrote, On 01/09/2008 11:37 AM: ... [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache ... diff --git a/net/ipv4/route.c b/net/ipv4/route.c index d337706..28484f3 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq) break; rcu_read_unlock_bh(); } - return r; + return rcu_dereference(r); } static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r) { - struct rt_cache_iter_state *st = rcu_dereference(seq-private); + struct rt_cache_iter_state *st = seq-private; r = r-u.dst.rt_next; while (!r) { @@ -298,7 +298,7 @@ static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r) rcu_read_lock_bh(); r = rt_hash_table[st-bucket].chain; } - return r; + return rcu_dereference(r); } It seems this optimization could've a side effect: if during such a loop updates are done, and r is seen !NULL during while() check, but NULL after rcu_dereference(), the listing/counting could stop too soon. So, IMHO, probably the first version of this patch is more reliable. (Or alternatively additional check is needed before return.) Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] ipg: balance locking in irq handler
Spotted-by: [EMAIL PROTECTED] Signed-off-by: Francois Romieu [EMAIL PROTECTED] --- drivers/net/ipg.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c index dbd23bb..cd1650e 100644 --- a/drivers/net/ipg.c +++ b/drivers/net/ipg.c @@ -1630,6 +1630,8 @@ static irqreturn_t ipg_interrupt_handler(int irq, void *dev_inst) #ifdef JUMBO_FRAME ipg_nic_rxrestore(dev); #endif + spin_lock(sp-lock); + /* Get interrupt source information, and acknowledge * some (i.e. TxDMAComplete, RxDMAComplete, RxEarly, * IntRequested, MacControlFrame, LinkEvent) interrupts @@ -1647,9 +1649,7 @@ static irqreturn_t ipg_interrupt_handler(int irq, void *dev_inst) handled = 1; if (unlikely(!netif_running(dev))) - goto out; - - spin_lock(sp-lock); + goto out_unlock; /* If RFDListEnd interrupt, restore all used RFDs. */ if (status IPG_IS_RFD_LIST_END) { @@ -1733,9 +1733,9 @@ out_enable: ipg_w16(IPG_IE_TX_DMA_COMPLETE | IPG_IE_RX_DMA_COMPLETE | IPG_IE_HOST_ERROR | IPG_IE_INT_REQUESTED | IPG_IE_TX_COMPLETE | IPG_IE_LINK_EVENT | IPG_IE_UPDATE_STATS, INT_ENABLE); - +out_unlock: spin_unlock(sp-lock); -out: + return IRQ_RETVAL(handled); } -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Pull request for 'ipg-fixes' branch
Please pull from branch 'ipg-fixes' in repository git://git.kernel.org/pub/scm/linux/kernel/git/romieu/netdev-2.6.git ipg-fixes to get the changes below. I have tested the driver with a PIV HT based motherboard. The network controller is connected to a fast ethernet switch (yeah, I'm cheap). A second host is performing two loop of scp in both direction for a 400Mb file. The files are sha1sumed after each scp. I have added a 'ping -q -f -l16' from the computer under test after some time. After 35 copies from the computer under test and 28 copies to it (the ping eats a bit): total used free sharedbuffers cached Mem: 10190401003860 15180 0 20556 936792 -/+ buffers/cache: 46512 972528 Swap: 2031608 02031608 Before: total used free sharedbuffers cached Mem: 1019040 572036 447004 0 14988 525924 -/+ buffers/cache: 31124 987916 Swap: 2031608 02031608 /proc/slabinfo before and after the test are attached. The driver is still a POMS but it seems better now. I will not be available to work further on this issue before sunday. I'd appreciate being Cced though. Distance from 'net-2.6/master' (27d1cba21fcc50c37eef5042c6be9fa7135e88fc) - 286c83ce6e8263a5c4c55a57b4c1040800de0171 d42f3afc953f9c99ffe84667a3ecf0d3b69f3d64 358bf4b8e8cbde5d6411b219e93a61728c892685 a58cceed4464ba8ae94294184c15f43e92a5de89 Diffstat drivers/net/ipg.c | 36 1 files changed, 12 insertions(+), 24 deletions(-) Shortlog Francois Romieu (4): ipg: balance locking in irq handler ipg: plug Tx completion leak ipg: fix queue stop condition in the xmit handler ipg: fix Tx completion irq request Patch - diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c index dbd23bb..50f0c17 100644 --- a/drivers/net/ipg.c +++ b/drivers/net/ipg.c @@ -857,21 +857,14 @@ static void init_tfdlist(struct net_device *dev) static void ipg_nic_txfree(struct net_device *dev) { struct ipg_nic_private *sp = netdev_priv(dev); - void __iomem *ioaddr = sp-ioaddr; - unsigned int curr; - u64 txd_map; - unsigned int released, pending; - - txd_map = (u64)sp-txd_map; - curr = ipg_r32(TFD_LIST_PTR_0) - - do_div(txd_map, sizeof(struct ipg_tx)) - 1; + unsigned int released, pending, dirty; IPG_DEBUG_MSG(_nic_txfree\n); pending = sp-tx_current - sp-tx_dirty; + dirty = sp-tx_dirty % IPG_TFDLIST_LENGTH; for (released = 0; released pending; released++) { - unsigned int dirty = sp-tx_dirty % IPG_TFDLIST_LENGTH; struct sk_buff *skb = sp-TxBuff[dirty]; struct ipg_tx *txfd = sp-txd + dirty; @@ -882,11 +875,8 @@ static void ipg_nic_txfree(struct net_device *dev) * If the TFDDone bit is set, free the associated * buffer. */ - if (dirty == curr) - break; - - /* Setup TFDDONE for compatible issue. */ - txfd-tfc |= cpu_to_le64(IPG_TFC_TFDDONE); + if (!(txfd-tfc cpu_to_le64(IPG_TFC_TFDDONE))) +break; /* Free the transmit buffer. */ if (skb) { @@ -898,6 +888,7 @@ static void ipg_nic_txfree(struct net_device *dev) sp-TxBuff[dirty] = NULL; } + dirty = (dirty + 1) % IPG_TFDLIST_LENGTH; } sp-tx_dirty += released; @@ -1630,6 +1621,8 @@ static irqreturn_t ipg_interrupt_handler(int irq, void *dev_inst) #ifdef JUMBO_FRAME ipg_nic_rxrestore(dev); #endif + spin_lock(sp-lock); + /* Get interrupt source information, and acknowledge * some (i.e. TxDMAComplete, RxDMAComplete, RxEarly, * IntRequested, MacControlFrame, LinkEvent) interrupts @@ -1647,9 +1640,7 @@ static irqreturn_t ipg_interrupt_handler(int irq, void *dev_inst) handled = 1; if (unlikely(!netif_running(dev))) - goto out; - - spin_lock(sp-lock); + goto out_unlock; /* If RFDListEnd interrupt, restore all used RFDs. */ if (status IPG_IS_RFD_LIST_END) { @@ -1733,9 +1724,9 @@ out_enable: ipg_w16(IPG_IE_TX_DMA_COMPLETE | IPG_IE_RX_DMA_COMPLETE | IPG_IE_HOST_ERROR | IPG_IE_INT_REQUESTED | IPG_IE_TX_COMPLETE | IPG_IE_LINK_EVENT | IPG_IE_UPDATE_STATS, INT_ENABLE); - +out_unlock: spin_unlock(sp-lock); -out: + return IRQ_RETVAL(handled); } @@ -1943,10 +1934,7 @@ static int ipg_nic_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) */ if (sp-tenmbpsmode) txfd-tfc |=
[PATCH 2/4] ipg: plug Tx completion leak
The Tx skb release could not free more than one skb per call. Add it to the fact that the xmit handler does not check for a queue full condition and you have a recipe to leak quickly. Let's release every pending Tx descriptor which has been given back to the host CPU by the network controller. The xmit handler suggests that it is done through the IPG_TFC_TFDDONE bit. Remove the former curr computing: it does not produce anything usable in its current form. Signed-off-by: Francois Romieu [EMAIL PROTECTED] --- drivers/net/ipg.c | 19 +-- 1 files changed, 5 insertions(+), 14 deletions(-) diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c index cd1650e..9752902 100644 --- a/drivers/net/ipg.c +++ b/drivers/net/ipg.c @@ -857,21 +857,14 @@ static void init_tfdlist(struct net_device *dev) static void ipg_nic_txfree(struct net_device *dev) { struct ipg_nic_private *sp = netdev_priv(dev); - void __iomem *ioaddr = sp-ioaddr; - unsigned int curr; - u64 txd_map; - unsigned int released, pending; - - txd_map = (u64)sp-txd_map; - curr = ipg_r32(TFD_LIST_PTR_0) - - do_div(txd_map, sizeof(struct ipg_tx)) - 1; + unsigned int released, pending, dirty; IPG_DEBUG_MSG(_nic_txfree\n); pending = sp-tx_current - sp-tx_dirty; + dirty = sp-tx_dirty % IPG_TFDLIST_LENGTH; for (released = 0; released pending; released++) { - unsigned int dirty = sp-tx_dirty % IPG_TFDLIST_LENGTH; struct sk_buff *skb = sp-TxBuff[dirty]; struct ipg_tx *txfd = sp-txd + dirty; @@ -882,11 +875,8 @@ static void ipg_nic_txfree(struct net_device *dev) * If the TFDDone bit is set, free the associated * buffer. */ - if (dirty == curr) - break; - - /* Setup TFDDONE for compatible issue. */ - txfd-tfc |= cpu_to_le64(IPG_TFC_TFDDONE); + if (!(txfd-tfc cpu_to_le64(IPG_TFC_TFDDONE))) +break; /* Free the transmit buffer. */ if (skb) { @@ -898,6 +888,7 @@ static void ipg_nic_txfree(struct net_device *dev) sp-TxBuff[dirty] = NULL; } + dirty = (dirty + 1) % IPG_TFDLIST_LENGTH; } sp-tx_dirty += released; -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote: Eric Dumazet wrote, On 01/09/2008 11:37 AM: ... [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache ... diff --git a/net/ipv4/route.c b/net/ipv4/route.c index d337706..28484f3 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq) break; rcu_read_unlock_bh(); } - return r; + return rcu_dereference(r); } static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r) { - struct rt_cache_iter_state *st = rcu_dereference(seq-private); + struct rt_cache_iter_state *st = seq-private; r = r-u.dst.rt_next; while (!r) { @@ -298,7 +298,7 @@ static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r) rcu_read_lock_bh(); r = rt_hash_table[st-bucket].chain; } - return r; + return rcu_dereference(r); } It seems this optimization could've a side effect: if during such a loop updates are done, and r is seen !NULL during while() check, but NULL after rcu_dereference(), the listing/counting could stop too soon. So, IMHO, probably the first version of this patch is more reliable. (Or alternatively additional check is needed before return.) Looks to me like r is a local variable (argument list), so there should not be any possibility of it being changed by some other task, right? Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] ipg: fix Tx completion irq request
The current logic will only request an ack for the first pending packet. No irq is triggered as soon as the CPU submits a few packets a bit quickly. Let's request an irq for every packet instead. Signed-off-by: Francois Romieu [EMAIL PROTECTED] --- drivers/net/ipg.c |5 + 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c index b234b29..50f0c17 100644 --- a/drivers/net/ipg.c +++ b/drivers/net/ipg.c @@ -1934,10 +1934,7 @@ static int ipg_nic_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) */ if (sp-tenmbpsmode) txfd-tfc |= cpu_to_le64(IPG_TFC_TXINDICATE); - else if (!((sp-tx_current - sp-tx_dirty + 1) - IPG_FRAMESBETWEENTXDMACOMPLETES)) { - txfd-tfc |= cpu_to_le64(IPG_TFC_TXDMAINDICATE); - } + txfd-tfc |= cpu_to_le64(IPG_TFC_TXDMAINDICATE); /* Based on compilation option, determine if FCS is to be * appended to transmit frame by IPG. */ -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] ipg: fix queue stop condition in the xmit handler
Signed-off-by: Francois Romieu [EMAIL PROTECTED] --- drivers/net/ipg.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c index 9752902..b234b29 100644 --- a/drivers/net/ipg.c +++ b/drivers/net/ipg.c @@ -1994,7 +1994,7 @@ static int ipg_nic_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) ipg_w32(IPG_DC_TX_DMA_POLL_NOW, DMA_CTRL); if (sp-tx_current == (sp-tx_dirty + IPG_TFDLIST_LENGTH)) - netif_wake_queue(dev); + netif_stop_queue(dev); spin_unlock_irqrestore(sp-lock, flags); -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote: It seems this optimization could've a side effect: if during such a loop updates are done, and r is seen !NULL during while() check, but NULL after rcu_dereference(), the listing/counting could stop too soon. So, IMHO, probably the first version of this patch is more reliable. (Or alternatively additional check is needed before return.) No, while the value of r-u.dst.rt_next can change between two readings, the value of r cannot. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: igb: Intel 82575 gigabit ethernet driver (take #3)
Kok, Auke wrote: All, here is the third version of the igb (82575) ethernet controller driver. This driver was previously posted 2007-07-13 and 2007-12-11. Many comments received were addressed: - removed indirection wrappers in the same way as e1000e and ixgbe. - cleaned up largely against sparse, checkpatch - removed module parameters and moved functionality to ethtool ioctls - new NAPI API rewrites - by default the driver runs in multiqueue mode with 2 to 40 RX queues enabled. and specifically in this version: - register macro's were condensed for readability - fixed namespace collisions by renaming functions to igb_* Since the driver is still too large (allthough the patch shrunk from 558k to 416k to 407k, almost 38% of its size) to post to this list I am attaching the bzipped patch here. You can get the same driver alternatively from here: http://foo-projects.org/~sofar/0001-igb-PCI-Express-82575-Gigabit-Ethernet-driver.patch [407k] http://foo-projects.org/~sofar/0001-igb-PCI-Express-82575-Gigabit-Ethernet-driver.patch.bz2 [74k] or through git: git://lost.foo-projects.org/~ahkok/git/linux-2.6 #igb There are several concerns still open for this driver: - hardware code is still a large API. we're expecting more hardware to be supported by this driver in the future. The API has already been scrubbed but we anticipate that the remaining hooks will be used in the future. - The register defines are still named E1000_ as they are mostly identical to the e1000 chipsets (igb register space is a superset of most recent e1000 register sets). I think we can throw it into netdev#upstream if you're ready... Jeff -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: igb: Intel 82575 gigabit ethernet driver (take #3)
Jeff Garzik wrote: Kok, Auke wrote: All, here is the third version of the igb (82575) ethernet controller driver. This driver was previously posted 2007-07-13 and 2007-12-11. Many comments received were addressed: - removed indirection wrappers in the same way as e1000e and ixgbe. - cleaned up largely against sparse, checkpatch - removed module parameters and moved functionality to ethtool ioctls - new NAPI API rewrites - by default the driver runs in multiqueue mode with 2 to 40 RX queues enabled. and specifically in this version: - register macro's were condensed for readability - fixed namespace collisions by renaming functions to igb_* Since the driver is still too large (allthough the patch shrunk from 558k to 416k to 407k, almost 38% of its size) to post to this list I am attaching the bzipped patch here. You can get the same driver alternatively from here: http://foo-projects.org/~sofar/0001-igb-PCI-Express-82575-Gigabit-Ethernet-driver.patch [407k] http://foo-projects.org/~sofar/0001-igb-PCI-Express-82575-Gigabit-Ethernet-driver.patch.bz2 [74k] or through git: git://lost.foo-projects.org/~ahkok/git/linux-2.6 #igb There are several concerns still open for this driver: - hardware code is still a large API. we're expecting more hardware to be supported by this driver in the future. The API has already been scrubbed but we anticipate that the remaining hooks will be used in the future. - The register defines are still named E1000_ as they are mostly identical to the e1000 chipsets (igb register space is a superset of most recent e1000 register sets). I think we can throw it into netdev#upstream if you're ready... yes, of course :) both the patch file and the git tree should work for you. Cheers, Auke -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ipw3945-devel] [PATCH 4/5] iwlwifi: iwl3945 eliminate sleepable task queue from context
Hi Joonwoo, We already did something similiar in our code base. Could you please take a look at this patch? http://intellinuxwireless.org/repos/?p=iwlwifi.git;a=commitdiff;h=57aa02255e9d7be5e2494683fc2793bd1d0707e2 Thanks, -yi On Wed, 2008-01-09 at 20:02 +0900, Joonwoo Park wrote: Eleminiate task queuing of iwl_pci_probe, register hw to ieee80211 immediately Signed-off-by: Joonwoo Park [EMAIL PROTECTED] --- drivers/net/wireless/iwlwifi/iwl3945-base.c | 66 +- 1 files changed, 43 insertions(+), 23 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c index f95f226..7e8d8b3 100644 --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c @@ -6171,27 +6171,9 @@ static void iwl_alive_start(struct iwl_priv *priv) if (iwl_is_rfkill(priv)) return; - if (!priv-mac80211_registered) { - /* Unlock so any user space entry points can call back into - * the driver without a deadlock... */ - mutex_unlock(priv-mutex); - iwl_rate_control_register(priv-hw); - rc = ieee80211_register_hw(priv-hw); - priv-hw-conf.beacon_int = 100; - mutex_lock(priv-mutex); - - if (rc) { - iwl_rate_control_unregister(priv-hw); - IWL_ERROR(Failed to register network - device (error %d)\n, rc); - return; - } - - priv-mac80211_registered = 1; + iwl_reset_channel_flag(priv); - iwl_reset_channel_flag(priv); - } else - ieee80211_start_queues(priv-hw); + ieee80211_start_queues(priv-hw); priv-active_rate = priv-rates_mask; priv-active_rate_basic = priv-rates_mask IWL_BASIC_RATES_MASK; @@ -6369,7 +6351,8 @@ static int __iwl_up(struct iwl_priv *priv) /* clear (again), then enable host interrupts */ iwl_write32(priv, CSR_INT, 0x); - iwl_enable_interrupts(priv); + if (priv-mac80211_registered) + iwl_enable_interrupts(priv); /* really make sure rfkill handshake bits are cleared */ iwl_write32(priv, CSR_UCODE_DRV_GP1_CLR, CSR_UCODE_SW_BIT_RFKILL); @@ -6887,10 +6870,21 @@ static void iwl_bg_scan_completed(struct work_struct *work) static int iwl_mac_start(struct ieee80211_hw *hw) { + int ret; struct iwl_priv *priv = hw-priv; IWL_DEBUG_MAC80211(enter\n); + ret = wait_event_interruptible_timeout(priv-wait_command_queue, + iwl_is_ready(priv), HOST_COMPLETE_TIMEOUT); + + if (ret == -ERESTARTSYS) + return ret; + else if (ret == 0 !iwl_is_ready(priv)) { + IWL_ERROR(IWL ready timeout\n); + return -ETIMEDOUT; + } + /* we should be verifying the device is ready to be opened */ mutex_lock(priv-mutex); @@ -8299,6 +8293,19 @@ static void iwl_cancel_deferred_work(struct iwl_priv *priv) cancel_work_sync(priv-beacon_update); } +static int iwl_register_hw(struct iwl_priv *priv) +{ + int err; + IWL_DEBUG_INFO(register_hw\n); + iwl_rate_control_register(priv-hw); + err = ieee80211_register_hw(priv-hw); + if (!err) { + priv-hw-conf.beacon_int = 100; + priv-mac80211_registered = 1; + } + return err; +} + static struct attribute *iwl_sysfs_entries[] = { dev_attr_antenna.attr, dev_attr_channels.attr, @@ -8546,11 +8553,24 @@ static int iwl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) goto out_pci_alloc; } + err = __iwl_up(priv); + if (err) { + IWL_ERROR(Could not make up interface : %d\n, err); + mutex_unlock(priv-mutex); + goto out_pci_alloc; + } + mutex_unlock(priv-mutex); - IWL_DEBUG_INFO(Queing UP work.\n); + err = iwl_register_hw(priv); + if (err) { + iwl_rate_control_unregister(priv-hw); + IWL_ERROR(Failed to register network + device (error %d)\n, err); + goto out_pci_alloc; + } - queue_work(priv-workqueue, priv-up); + iwl_enable_interrupts(priv); return 0; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html