[PATCH] [PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec = 1000000
I found two problems in PSCHED_TADD() and PSCHED_TADD2(). 1) These function increment tv_sec if tv_usec 100. But I think it should if tv_usec = 100. 2) tv_usec became 120 or more when I used CBQ and experimented it. It is not correct to exceed 100 because tv_usec is micro seconds. To fix 2), I think that it should do delta / 100, add the quotient to tv_sec, and add the remainder to tv_usec. In both cases, because time when the transmission is restarted reaches an illegal value, it is not possible to communicate at the set rate. To fix these problem I create following patch. Are there any comments? [Experiment] * kernel: linux-2.6.15.5 * CBQ settings -- tc qdisc add dev $IF root handle 1:0 cbq bandwidth 100Mbit \ avpkt 1000 mpu 64 ewma 5 cell 8 tc class add dev $IF parent 1:0 classid 1:10 cbq rate 32Kbit \ prio 1 ewma 5 cell 8 avpkt 138 mpu 64 bandwidth 100Mbit \ minburst 25 maxburst 50 bounded isolated tc filter add dev $IF parent 1:0 protocol ip prio 16 u32 match \ ip dport 4952 0x flowid 1:10 --- * Traffic dst port 4952: 138byte per 20msec. [Result] * In cbq_ovl_classic(): cl-undertime = { tv_sec = 1150368540, tv_usec = 1208301 } ~~ q-now= { tv_sec = 1150368539, tv_usec = 878917 } delay = 1329384 cl-avgidle = -14781 cl-offtime = 1295394 [Patch] diff -Nur linux-2.6.17-rc6.orig/include/net/pkt_sched.h linux-2.6.17-rc6.mypatch/include/net/pkt_sched.h --- linux-2.6.17-rc6.orig/include/net/pkt_sched.h 2006-06-06 09:57:02.0 +0900 +++ linux-2.6.17-rc6.mypatch/include/net/pkt_sched.h2006-06-16 11:29:08.0 +0900 @@ -169,17 +169,31 @@ #define PSCHED_TADD2(tv, delta, tv_res) \ ({ \ - int __delta = (tv).tv_usec + (delta); \ - (tv_res).tv_sec = (tv).tv_sec; \ - if (__delta USEC_PER_SEC) { (tv_res).tv_sec++; __delta -= USEC_PER_SEC; } \ - (tv_res).tv_usec = __delta; \ + int __delta = (delta); \ + (tv_res) = (tv); \ + if((delta) USEC_PER_SEC) { \ +(tv_res).tv_sec += (delta) / USEC_PER_SEC; \ +__delta -= (delta) % USEC_PER_SEC; \ + } \ + (tv_res).tv_usec += __delta; \ + if((tv_res).tv_usec = USEC_PER_SEC) { \ +(tv_res).tv_sec++; \ +(tv_res).tv_usec -= USEC_PER_SEC; \ + } \ }) #define PSCHED_TADD(tv, delta) \ ({ \ - (tv).tv_usec += (delta); \ - if ((tv).tv_usec USEC_PER_SEC) { (tv).tv_sec++; \ -(tv).tv_usec -= USEC_PER_SEC; } \ + int __delta = (delta); \ + if((delta) USEC_PER_SEC) { \ +(tv).tv_sec += (delta) / USEC_PER_SEC; \ +__delta -= (delta) % USEC_PER_SEC; \ + } \ + (tv).tv_usec += __delta; \ + if((tv).tv_usec = USEC_PER_SEC) { \ +(tv).tv_sec++; \ +(tv).tv_usec -= USEC_PER_SEC; \ + } \ }) /* Set/check that time is in the past perfect; -- Shuya Maeda - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.17: networking bug??
Mark Lord wrote: Unilaterally following the standard is all well and good for those who know how to get around it when a site becomes inaccessible, but not for Joe User. So lets enable it in the kernel, and let the distros turn it off. The Joe User who isn't a kernel hacker won't be running 2.6.17 in a long time. He'll be running whatever his distro packages for him, and they will know how to disable (or patch out) window scaling. Someone who compiles his own kernel runs into all sorts of issues, this is just one more of them. If it always fails, or always works, that's not such a big problem. I would never have complained if I had never been able to access the web sites in question. But since it IS working in 2.6.16, and got broken in 2.6.17, I'm bloody well going to complain. Yes. And make sure you complain to those running the bad box as well. Helge Hafting - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH, RFT] bcm43xx: AccessPoint mode
Hi, This patch enables the usage of a bcm43xx card as AP with the Devicescape 802.11 stack. Well, it does not work 100%, but at least it's very promising. We are able to create a bssid and correctly send beacon frames out. This patch is tested on BE and LE machines. There seem to be issues with Devicescape and/or hostap. Trying to authenticate from a STA to the AP does not work. The packet is simply not processed. I was able to catch the auth request on the AP (using the wonderful dscape virtual interfaces). So the AP receives the packet, but loses it somewhere in the stack or hostapd. Well, thanks to Alexander Tsvyashchenko and the OpenWRT team for the hard work to figure out how this all works. My part on this patch is mainly endianess fixes. Please give it a testrun. Final note about hostapd: hostapd snapshot 0.5-2006-06-10 seems to work in the sense that it is able to bring up the device. hostapd snapshot 0.5-2006-06-11 seems to fail. I did not look into this more close, yet. Important notes from Alexander Tsvyashchenko's initial mail follow: -- 1) This version deals with TIM in cleaner way (though, PS mode is still not supported) - instead of patching dscape stack to skip TIM generation, it strips TIM when writing probe response template and leaves it when writing beacon template. 2) As in current dscape stack management interface seems to be no longer passed to the driver, all interface handling is left as it is, no changes there should be made anymore. ... Known limitations: 1) PS mode is not supported. Testing instructions: Although my previous patch to hostapd to make it interoperable with bcm43xx dscape has been merged already in their CVS version, due to the subsequent changes in dscape stack current hostapd is again incompartible :-( So, to test this patch, the patch to hostapd should be applied. I used hostapd snapshot 0.5-2006-06-10, patch for it is attached. The patch is very hacky and requires tricky way to bring everything up, but as dscape stack is changed quite constantly, I just do not want to waste time fixing it in proper way only to find a week later that dscape handling of master interface was changed completely once more and everything is broken again ;-) The patch for dscape stack that is attached is not 100% necessary, but it seems to allow operating clients that request PS mode to be enabled at AP (verified with PDA client), the only thing it contains is disabling actual PS handling in dscape. So, the following sequence should be used to test AP mode: 1) take hostapd snapshot 0.5-2006-06-10 (other recent versions should work OK also, though), apply the hostapd patch attached. 2) Insert modules (80211, rate_control and bcm43xx-d80211) 3) iwconfig wlan0 mode master 4) ifconfig wlan0 up (this should be done by hostapd actually, but its operation with current dscape stack seems to be broken) 5) Start hostapd (f.e. hostapd -B /etc/hostapd.conf), config file can look like: = interface=wlan0 driver=devicescape ssid=OpenWrt channel=1 send_probe_response=0 logger_syslog=-1 logger_syslog_level=2 logger_stdout=-1 logger_stdout_level=2 debug=4 = 6) iwconfig wlan0 essid your-SSID-name (this also should not be required, but current combination of hostapd + dscape doesn't seem to generate config_interface callback when setting beacon, so this is required just to force call of config_interface). Index: wireless-dev-dscapeports/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c === --- wireless-dev-dscapeports.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c 2006-06-17 21:26:10.0 +0200 +++ wireless-dev-dscapeports/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c 2006-06-18 23:36:31.0 +0200 @@ -152,7 +152,7 @@ u32 status; status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD); - if (!(status BCM43xx_SBF_XFER_REG_BYTESWAP)) + if (status BCM43xx_SBF_XFER_REG_BYTESWAP) val = swab32(val); bcm43xx_write32(bcm, BCM43xx_MMIO_RAM_CONTROL, offset); @@ -312,7 +312,7 @@ } } -void bcm43xx_tsf_write(struct bcm43xx_private *bcm, u64 tsf) +static void bcm43xx_time_lock(struct bcm43xx_private *bcm) { u32 status; @@ -320,7 +320,19 @@ status |= BCM43xx_SBF_TIME_UPDATE; bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status); mmiowb(); +} + +static void bcm43xx_time_unlock(struct bcm43xx_private *bcm) +{ + u32 status; + + status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD); + status = ~BCM43xx_SBF_TIME_UPDATE; + bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status); +} +static void bcm43xx_tsf_write_locked(struct bcm43xx_private *bcm, u64 tsf) +{ /* Be careful with the in-progress timer. * First zero out the low register, so we have a full * register-overflow duration to complete the operation. @@ -350,10
Re: [PATCH] AP (master) mode fixed (resubmit)
On Monday 19 June 2006 11:37, Francois Barre wrote: 2006/6/18, Michael Buesch [EMAIL PROTECTED]: Ok, I got my Airport to generate Beacons on this BE machine. Hurray, I'm not alone running BE stuff here... There was a bug hiding in bcm43xx_ram_write(). [..] Could you provide a small patch just for this issue please ? It's not that I'm too lasy to re-apply your whole patches again, but... Well, if you have it... It is not an issue without the AP mode patch, because all callers of bcm43xx_ram_write() are buggy, too. So, caller buggy, callee buggy, result OK. ;) But I can not associate to the bcm43xx-AP. But it seems like a dscape problem. The authentication packet arrives at the machine (I can capture it with the new cool virtual monitor interface), but it is not processed. So the STA does not receive a response. Funny, I did have no problem associating with the AP. What happens exactly on the STA ? Did you manage to trace anything on ? What exactly is your hardware, Michael ? I think I did something wrong while bringing the device up. It works now. But attached is a fixed patch, already. We had an off-by-two bug in common template write. Also, Alexander, did you have the opportunity to heavily test your AP code ? I mean, finding the maximum bandwidth a bcm43xx could provide while being an AP, the way it behaves with multiple STA associated, I was able to associate now, but could not transmit ping packets, yet. Dunno what the problem is. Maybe the STA is broken, too. Let's see. Index: wireless-dev-dscapeports/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c === --- wireless-dev-dscapeports.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c 2006-06-17 21:26:10.0 +0200 +++ wireless-dev-dscapeports/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c 2006-06-19 11:25:02.0 +0200 @@ -151,8 +151,10 @@ { u32 status; + assert(offset % 4 == 0); + status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD); - if (!(status BCM43xx_SBF_XFER_REG_BYTESWAP)) + if (status BCM43xx_SBF_XFER_REG_BYTESWAP) val = swab32(val); bcm43xx_write32(bcm, BCM43xx_MMIO_RAM_CONTROL, offset); @@ -312,7 +314,7 @@ } } -void bcm43xx_tsf_write(struct bcm43xx_private *bcm, u64 tsf) +static void bcm43xx_time_lock(struct bcm43xx_private *bcm) { u32 status; @@ -320,7 +322,19 @@ status |= BCM43xx_SBF_TIME_UPDATE; bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status); mmiowb(); +} + +static void bcm43xx_time_unlock(struct bcm43xx_private *bcm) +{ + u32 status; + + status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD); + status = ~BCM43xx_SBF_TIME_UPDATE; + bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status); +} +static void bcm43xx_tsf_write_locked(struct bcm43xx_private *bcm, u64 tsf) +{ /* Be careful with the in-progress timer. * First zero out the low register, so we have a full * register-overflow duration to complete the operation. @@ -350,10 +364,13 @@ mmiowb(); bcm43xx_write16(bcm, BCM43xx_MMIO_TSF_0, v0); } +} - status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD); - status = ~BCM43xx_SBF_TIME_UPDATE; - bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status); +void bcm43xx_tsf_write(struct bcm43xx_private *bcm, u64 tsf) +{ + bcm43xx_time_lock(bcm); + bcm43xx_tsf_write_locked(bcm, tsf); + bcm43xx_time_unlock(bcm); } static void bcm43xx_measure_channel_change_time(struct bcm43xx_private *bcm) @@ -415,10 +432,11 @@ static void bcm43xx_write_mac_bssid_templates(struct bcm43xx_private *bcm) { static const u8 zero_addr[ETH_ALEN] = { 0 }; - const u8 *mac = NULL; - const u8 *bssid = NULL; + const u8 *mac; + const u8 *bssid; u8 mac_bssid[ETH_ALEN * 2]; int i; + u32 tmp; bssid = bcm-interface.bssid; if (!bssid) @@ -431,12 +449,13 @@ memcpy(mac_bssid + ETH_ALEN, bssid, ETH_ALEN); /* Write our MAC address and BSSID to template ram */ - for (i = 0; i ARRAY_SIZE(mac_bssid); i += sizeof(u32)) - bcm43xx_ram_write(bcm, 0x20 + i, *((u32 *)(mac_bssid + i))); - for (i = 0; i ARRAY_SIZE(mac_bssid); i += sizeof(u32)) - bcm43xx_ram_write(bcm, 0x78 + i, *((u32 *)(mac_bssid + i))); - for (i = 0; i ARRAY_SIZE(mac_bssid); i += sizeof(u32)) - bcm43xx_ram_write(bcm, 0x478 + i, *((u32 *)(mac_bssid + i))); + for (i = 0; i ARRAY_SIZE(mac_bssid); i += sizeof(u32)) { + tmp = (u32)(mac_bssid[i + 0]); + tmp |= (u32)(mac_bssid[i + 1]) 8; + tmp |= (u32)(mac_bssid[i + 2]) 16; + tmp |= (u32)(mac_bssid[i + 3]) 24; + bcm43xx_ram_write(bcm, 0x20 + i, tmp); +
[NET]: Prevent multiple qdisc runs
Hi Dave: I'm nearly done with the generic segmentation offload stuff (although only TCPv4 is implemented for now), and I encountered this problem. [NET]: Prevent multiple qdisc runs Having two or more qdisc_run's contend against each other is bad because it can induce packet reordering if the packets have to be requeued. It appears that this is an unintended consequence of relinquinshing the queue lock while transmitting. That in turn is needed for devices that spend a lot of time in their transmit routine. There are no advantages to be had as devices with queues are inherently single-threaded (the loopback device is not but then it doesn't have a queue). Even if you were to add a queue to a parallel virtual device (e.g., bolt a tbf filter in front of an ipip tunnel device), you would still want to process the queue in sequence to ensure that the packets are ordered correctly. The solution here is to steal a bit from net_device to prevent this. BTW, as qdisc_restart is no longer used by anyone as a module inside the kernel (IIRC it used to with netif_wake_queue), I have not exported the new __qdisc_run function. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e432b74..39919c8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -233,6 +233,7 @@ enum netdev_state_t __LINK_STATE_RX_SCHED, __LINK_STATE_LINKWATCH_PENDING, __LINK_STATE_DORMANT, + __LINK_STATE_QDISC_RUNNING, }; diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index b94d1ad..75b5b93 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -218,12 +218,13 @@ extern struct qdisc_rate_table *qdisc_ge struct rtattr *tab); extern void qdisc_put_rtab(struct qdisc_rate_table *tab); -extern int qdisc_restart(struct net_device *dev); +extern void __qdisc_run(struct net_device *dev); static inline void qdisc_run(struct net_device *dev) { - while (!netif_queue_stopped(dev) qdisc_restart(dev) 0) - /* NOTHING */; + if (!netif_queue_stopped(dev) + !test_and_set_bit(__LINK_STATE_QDISC_RUNNING, dev-state)) + __qdisc_run(dev); } extern int tc_classify(struct sk_buff *skb, struct tcf_proto *tp, diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index b1e4c5e..d7aca8e 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -90,7 +90,7 @@ void qdisc_unlock_tree(struct net_device NOTE: Called under dev-queue_lock with locally disabled BH. */ -int qdisc_restart(struct net_device *dev) +static inline int qdisc_restart(struct net_device *dev) { struct Qdisc *q = dev-qdisc; struct sk_buff *skb; @@ -179,6 +179,14 @@ requeue: return q-q.qlen; } +void __qdisc_run(struct net_device *dev) +{ + while (qdisc_restart(dev) 0 !netif_queue_stopped(dev)) + /* NOTHING */; + + clear_bit(__LINK_STATE_QDISC_RUNNING, dev-state); +} + static void dev_watchdog(unsigned long arg) { struct net_device *dev = (struct net_device *)arg; @@ -620,6 +628,5 @@ EXPORT_SYMBOL(qdisc_create_dflt); EXPORT_SYMBOL(qdisc_alloc); EXPORT_SYMBOL(qdisc_destroy); EXPORT_SYMBOL(qdisc_reset); -EXPORT_SYMBOL(qdisc_restart); EXPORT_SYMBOL(qdisc_lock_tree); EXPORT_SYMBOL(qdisc_unlock_tree);
Re: [NET]: Prevent multiple qdisc runs
Herbert, I take it you saw a lot of requeues happening that prompted this? What were the circumstances? The _only_ times i have seen it happen is when the (PCI) bus couldnt handle the incoming rate or there was a bug in the driver. Also: what happens to the packet that comes in from either local or is being forwarded and finds the qdisc_is_running flag is set? I couldnt tell if the intent was to drop it or not. The answer for TCP is probably simpler than for packets being forwarded. cheers, jamal On Mon, 2006-19-06 at 22:15 +1000, Herbert Xu wrote: Hi Dave: I'm nearly done with the generic segmentation offload stuff (although only TCPv4 is implemented for now), and I encountered this problem. [NET]: Prevent multiple qdisc runs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[DOC]: generic netlink
Folks, Attached is a document that should help people wishing to use generic netlink interface. It is a WIP so a lot more to go if i see interest. The doc has been around for a while, i spent part of yesterday and this morning cleaning it up. If you have sent me comments before, please forgive me for having misplaced them - just send again. cheers, jamal PS:- I dont have a good place to put this doc and point to, hence the 17K attachment 1.0 Problem Statement --- Netlink is a robust wire-format IPC typically used for kernel-user communication although could also be used to be a communication carrier between user-user and kernel-kernel. A typical netlink connection setup is of the form: netlink_socket = socket(PF_NETLINK, socket_type, netlink_family); where netlink_family selects the netlink bus to communicate on. Example of a family would be NETLINK_ROUTE which is 0x0 or NETLINK_XFRM which is 0x6. [Refer to RFC 3549 for a high level view and look at include/linux/netlink.h for some of the allocated families]. Over the years, due to its robust design, netlink has become very popular. This has resulted in the danger of running out of family numbers to issue. In netconf 2005 in Montreal it was decided to find ways to work around the allocation challenge and as a result NETLINK_GENERIC bus was born. This document gives a mid-level view if NETLINK_GENERIC and how to use it. The reader does not necessarily have to know what netlink is, but needs to know at least the encapsulation used - which is described in the next section. There are some implicit assumptions about what netlink is or what structures like TLVs are etc. I apologize i dont have much time to give a tutorial - invite me to some odd conference and i will be forced to do better than this doc. Better send patches to this doc. 2.0 High Level view In order to illustrate the way different components talk to each other, the diagram below is used to provide an abstraction on how the operations happen. There are two (three depending on your perspective) components: 1) The generic netlink connection which for illustration is refered to as a bus. The generic netlink bus is shown as split between user and kernel domains: This means programs can connect to the bus from either kernel or user space. 2) components that talk to each other after attaching to the bus. a) Two users are shown in user spaces b)3 in the kernel. All boxes have kernel-wide unique identifiers that can be used to address them. Typicaly, user space boxes exist to control one or more kernel level boxen i.e they update some attributes that exist in a kernel level box. Any of these boxes can communicate to each other by first connecting to the bus and then sending messages addressed to any box. +--+ +--+ | user1 | .. | user-n | +--+---+ +---+--+ | | / | | |User +-++-+ Space/domain user || + Generic Netlink Bus +--- kernel || Kernel +--+--+--+ Space/domain || | || | || | || | +--+---++---+-+ +--+-+ |controller|| foobar | | googah | +--++-+ ++ The controller is a speacial built-in user of the bus. It is the repository of info on kernel components that have attached to the bus. It has a reserved address identifier of 0x10. By querying the controller, one could find out that both foobar and googah are registered and what their IDs are etc. Essentially its a namespace translator not unlike DNS is for IP addresses. More later on this. To get to the point of the most common usage of netlink (user space control of a kernel component), the diagram below breaks things down for a single user program that controls a kernel module called foobar. The example is simple for illustration purposes; as an example, user space could control a lot more kernel modules. +--+ | | |user program | gnl events ; ---| | (2),-/ +--^-+--^--+ ,' gnl| ^ foobar ^ foobar ,'discovery ^ | events | config/query ,' (1) | ^ (4) ^ (3) +--/--
Re: [NET]: Prevent multiple qdisc runs
Hi Jamal: On Mon, Jun 19, 2006 at 09:33:51AM -0400, jamal wrote: I take it you saw a lot of requeues happening that prompted this? What were the circumstances? The _only_ times i have seen it happen is when the (PCI) bus couldnt handle the incoming rate or there was a bug in the driver. Actually I discovered the problem only because the generic segmentation offload stuff that I'm working on needs to deal with the situation where a super-packet is partially transmitted. Requeueing causes all sorts of nasty problems so I chose to keep it within the net_device structure. To do so requires qdisc_run to be serialised against each other. I then found out that we want this anyway because otherwise the requeued packets could be reordered. Also: what happens to the packet that comes in from either local or is being forwarded and finds the qdisc_is_running flag is set? I couldnt tell if the intent was to drop it or not. The answer for TCP is probably simpler than for packets being forwarded. The qdisc_is_running only prevents qdisc_run from occuring (because it's already running), it does not impact on the queueing of the packet. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Prevent multiple qdisc runs
Herbert, On Mon, 2006-19-06 at 23:42 +1000, Herbert Xu wrote: Hi Jamal: On Mon, Jun 19, 2006 at 09:33:51AM -0400, jamal wrote: [..] Actually I discovered the problem only because the generic segmentation offload stuff that I'm working on needs to deal with the situation where a super-packet is partially transmitted. Requeueing causes all sorts of nasty problems so I chose to keep it within the net_device structure. To do so requires qdisc_run to be serialised against each other. I then found out that we want this anyway because otherwise the requeued packets could be reordered. Ok, I am trying to visualize but having a hard time: Re-queueing is done at the front of the queue to maintain ordering whereas queueing is done at the front (i.e it is a FIFO). i,e even if p2 comes in and gets queued while p1 is being processed, requeueing of p1 will put it infront of p2. Your super-packet issue may be different though .. Also: what happens to the packet that comes in from either local or is being forwarded and finds the qdisc_is_running flag is set? I couldnt tell if the intent was to drop it or not. The answer for TCP is probably simpler than for packets being forwarded. The qdisc_is_running only prevents qdisc_run from occuring (because it's already running), it does not impact on the queueing of the packet. I will wait for your answer on the other part before responding to this. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Stephen Hemminger wrote: Does this fix it? # sysctl -w net.ipv4.tcp_abc=0 That did not help. I have 1 minute outputs from tcpdump under both 2.6.11.12 and 2.6.16.20. You will see a large size difference between the files. Since the 2.6.11.12 one is 2 MBytes, I thought I would post them via the web instead of via attachments. Look at: http://www.atmos.washington.edu/~harry/linux/2.6.11.12.out.1min http://www.atmos.washington.edu/~harry/linux/2.6.16.20.out.1min And again, thank to all of you for looking into this. -- Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED] 206-543-0547 [EMAIL PROTECTED] Dept of Atmospheric Sciences FAX:206-543-0308 University of Washington, Box 351640, Seattle, WA 98195-1640 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Prevent multiple qdisc runs
On Mon, Jun 19, 2006 at 10:23:29AM -0400, jamal wrote: Ok, I am trying to visualize but having a hard time: Re-queueing is done at the front of the queue to maintain ordering whereas queueing is done at the front (i.e it is a FIFO). i,e even if p2 comes in and gets queued while p1 is being processed, requeueing of p1 will put it infront of p2. Correct. When qdisc_run happens we take an skb off the head of the queue. If it can't be transmitted right away, we try to put it back in the same spot. If you have two qdisc_run's happening at the same time then that spot could be different. Your super-packet issue may be different though .. The reordering issue is not related to super-packets. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Prevent multiple qdisc runs
On Tue, 2006-20-06 at 00:29 +1000, Herbert Xu wrote: Correct. When qdisc_run happens we take an skb off the head of the queue. If it can't be transmitted right away, we try to put it back in the same spot. If you have two qdisc_run's happening at the same time then that spot could be different. Ok, but: The queue lock will ensure only one of the qdisc runs (assuming different CPUs) will be able to dequeue at any one iota in time, no? And if you assume that the cpu that manages to get the tx lock as well is going to be contending for the qlock in ordewr to requeue, then the only scenario i can see the race happening is when you have one CPU faster than the other. Did i miss something? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] pcnet32 NAPI changes
On Fri, Jun 16, 2006 at 12:11:54PM -0700, Don Fry wrote: This patch is a collection of changes to pcnet32 which does the following: - Fix section mismatch warning. - fix set_ringparam to correctly handle memory allocation failures - fix off-by-one in get_ringparam. - cleanup at end of loopback_test when not up. - Add NAPI to driver, fixing set_ringparam and loopback_test to work correctly with poll. - for multicast, do not reset the chip unless cannot enter suspend mode to avoid race with poll. The set_ringparam code is larger than I would prefer, but it will not leave null pointers around for the code to stumble over when memory allocation fails. If anyone has a better idea, please let me know. Some complexity could be avoided by allocating memory for the maximum number of tx and rx buffers at probe time. Requiring 14k for the tx ring and arrays, and another 14k for rx; instead of about 10k total for the default sizes. So 28k vs 10k? Why are these adjustable if it makes that little difference? Is there any advantage to making them smaller? It is NAPI only, unlike Len Sorensen's version which allows for compile time selection. Some drivers are NAPI only, others have compile options. Which is preferred? I just figured making it an option was less intrusive, although I can't imagine a good reason for not wanting to use the NAPI version at all times. I certainly know I intend to use it that way. I have tested these changes with a 79C971, 973, 976, and 978 on a ppc64 machine, and 970A, 972, 973, 975, and 976 on an x86 machine. I have not tested these changes with VMware or Xen. I will give it a try with our system and see how it runs. Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [DOC]: generic netlink
On Mon, 19 Jun 2006, jamal wrote: Attached is a document that should help people wishing to use generic netlink interface. It is a WIP so a lot more to go if i see interest. Thanks for writing this up. It seems that TIPC is multiplexing all of it's commands through TIPC_GENL_CMD. I wonder, if this is how other protocols are likely to utilize genl, then we could possibly drop the command registration code completely and one command op can be registered by the protocol during genl_register_family(). This would both simplify the genl code and API, and help ensure consistency of users. - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Harry Edmon [EMAIL PROTECTED] wrote: I have a system with a strange network performance degradation from 2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6. The system is has Dual single core Xeons with hyperthreading on. cut Hi Harry Can you check which high-res timesource you are using? In the kernel log look for: kernel: Using tsc for high-res timesource kernel: Using pmtmr for high-res timesource I have experinced some network performance degradation when using the pmtmr timesource, on a Opteron AMD system. It seems that the default timesource change between 2.6.15 to 2.6.16. If you use pmtmr try to reboot with kernel option clock=tsc. On my Opteron AMD system i normally can route 400 kpps, but with timesource pmtmr i could only route around 83 kpps. (I found the timer to be the issue by using oprofile). Cheers, Jesper Brouer -- --- MSc. Master of Computer Science Dept. of Computer Science, University of Copenhagen Author of http://www.adsl-optimizer.dk --- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFT] bcm43xx: Busting the 1G limit
On Sat, 2006-06-17 at 19:28 +0200, Michael Buesch wrote: Hi, This patch adds full 32-bit and 64-bit DMA support to the bcm43xx driver. Well, it _should_ do this. I can not test it, as I don't have a machine to trigger the 1G limit. The 1G limit should be exploitable on an AMD64 machine with more than 1G RAM. Please test and report, if it works or not. In the case of works not, please provide full dmesg log. Note that I am not sure which cards actually support full 32-bit or even 64-bit mode. Older cards might still only support 30-bit DMA. Hi. I tried this on both 2.6.17-rc6 and on wireless-dev, and got pretty much the same panic on both (modulo locking). My box is a turion with 2 GB of ram and a BCM4318. Here's the panic from wireless-dev: Unable to handle kernel NULL pointer dereference at 0020 RIP: 88104f24{:bcm43xx:bcm43xx_dma_handle_xmitstatus+436} PGD 0 Oops: [1] PREEMPT CPU 0 Modules linked in: uhci_hdc ieee80211_crypt_wep cryptoloop loop snd_atiixp_modem snd_atiixp snd_ac97_codec snd_ac97_bus bcm43xx snd_pcm snd_timer ieee80211softmac ehci_hcd snd ohci1394 ieee80211 ohci_hdc sdhci ieee1394 yenta_socket usbcore mmc_core soundcore rsrc_nonstatic ieee80211_crypt 8139too snd_page_alloc pcmcia_core Pid: 6139, comm: iwconfig Not tainted 2.6.17-rc6-dfg1-g57aab842-dirty #1 RIP: 0010:[88104f24] 88104f24{:bcm43xx:bcm43xx_dma_handle_xmitstatus+436} RSP: 0018:81445df8 EFLAGS: 00010002 RAX: 0063 RBX: 0001 RCX: RDX: RSI: 0082 RDI: 0001 RBP: 81445e28 R08: 0002e8c7 R09: R10: R11: fffa R12: R13: 30d1 R14: 81445eb8 R15: 00d0 FS: 2b8b5dc68d20() GS:81445eb8() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0020 CR3: 76314000 CR4: 06e0 Process iwconfig (pid: 6139, threadinfo 8100748ae000, task 810075840890) Stack: 81007510f050 81007510f050 8800 814453b8 2800 814453f8 880f13cb 81445e78 81007541e740 Call Trace: IRQ 880f13cb{:bcm43xx:bcm43xx_interrupt_tasklet +2379} 81092b78{tasklet_action+72} 810126d0{__do_softirq+80} 8106872a{call_softirq+30} 81075e04{do_softirq +52} 81092cf4{irq_exit+63} 81075e51{do_IRQ+65} 81067dae{ret_from_intr+0} EOI 810078c8{_raw_spin_lock+296} 8106dc9e{_spin_lock+30} 810202dd{unlink_file_vma+61} 810206a8{free_pagetables+152} 8103ee97{exit_mmap+135} 810416b6{mmput+54} 81047b63{exit_mmap+243} 81016a9a{do_exit+602} 81012e1c{__fput+428} 810506e0{debug_mutex_init+0} 81054862{sys_exit_group+18} 81067892{system_call+126} Code: 45 3b 7c 24 20 7c 28 49 c7 c0 be 9e 10 88 b9 59 03 00 00 48 RIP 88104f24{:bcm43xx:bcm43xx_dma_handle_xmitstatus+436} RSP 81445df8 CR2: 0020 0Kernel panic - not syncing: Aiee, killing interrupt handler! Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [DOC]: generic netlink
On Mon, 2006-19-06 at 11:13 -0400, James Morris wrote: It seems that TIPC is multiplexing all of it's commands through TIPC_GENL_CMD. TIPC is a deviation; they had the 100 ioctls and therefore did a direct one-to-one mapping. I wonder, if this is how other protocols are likely to utilize genl, then we could possibly drop the command registration code completely and one command op can be registered by the protocol during genl_register_family(). The intent is to have a handful of commands as in classical netlink (eg route or qdisc etc) where you are controlling data that sits in the kernel; i.e when you have an attribute or a vector of attributes, then the commands will be of the semantics: ADD/DEL/GET/DUMP only. Other that TIPC the two other users i have seen use it in this manner. But, you are right if usage tends to lean in some other way we could get rid of it (I think TIPC is a bad example). This would both simplify the genl code and API, and help ensure consistency of users. You are talking from an SELinux perspective i take it? My view: If you want to have ACLs against such commands then it becomes easier to say can only do ADD but not DEL for example (We need to resolve genl_rcv_msg() check on commands to be in sync with SELinux as was pointed by Thomas) cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [DOC]: generic netlink
On Mon, 19 Jun 2006, jamal wrote: Other that TIPC the two other users i have seen use it in this manner. But, you are right if usage tends to lean in some other way we could get rid of it (I think TIPC is a bad example). Ok, perhaps make a note in the docs about this and keep an eye out when new code is submitted, and encourage people not to do this. This would both simplify the genl code and API, and help ensure consistency of users. You are talking from an SELinux perspective i take it? Actually, what would help SELinux is the opposite, forcing everyone to use separate commands and assigning security attributes to each one. But because TIPC is already multiplexing, it's not feasible. Instead, I think the way to go for SELinux is to have each nl family provide a permission callback, so SELinux can pass the skb back to the nl module which then returns a type of permission ('read', 'write', 'readpriv'). This way, the nl module can create and manage its own internal table of command permissions and also know exactly where in the message to dig for the command specifier. My view: If you want to have ACLs against such commands then it becomes easier to say can only do ADD but not DEL for example (We need to resolve genl_rcv_msg() check on commands to be in sync with SELinux as was pointed by Thomas) This already exists, to some extent, but only for some protocols. You can see examples of existing permission tables managed by SELinux in: security/selinux/nlmsgtab.c The hope move this out of SELinux and into each nl module, which is much more manageable and scalable. - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [DOC]: generic netlink
jamal wrote: On Mon, 2006-19-06 at 11:13 -0400, James Morris wrote: It seems that TIPC is multiplexing all of it's commands through TIPC_GENL_CMD. TIPC is a deviation; they had the 100 ioctls and therefore did a direct one-to-one mapping. I wonder, if this is how other protocols are likely to utilize genl, then we could possibly drop the command registration code completely and one command op can be registered by the protocol during genl_register_family(). The intent is to have a handful of commands as in classical netlink (eg route or qdisc etc) where you are controlling data that sits in the kernel; i.e when you have an attribute or a vector of attributes, then the commands will be of the semantics: ADD/DEL/GET/DUMP only. Other that TIPC the two other users i have seen use it in this manner. But, you are right if usage tends to lean in some other way we could get rid of it (I think TIPC is a bad example). The taskstats interface, currently in -mm, is one user of genetlink http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17-rc6/2.6.17-rc6-mm2/broken-out/per-task-delay-accounting-taskstats-interface.patch Based on Jamal's suggestions, we found it useful to have the limited set of commands model and ended up with having to register just one GET command. And in subsequent discussions, a SET command would also be handy. But I'm not too clear about what are the advantages of trying to limit the number of commands registered by a given exploiter of genetlink (say TIPC or taskstats), other than the conventional usage of netlink. e.g in the taskstats code, userspace needs to GET data on a per-pid and per-tgid basis from the kernel and supplies the specific pid or tgid. We could either have registered two commands (say GET_PID and GET_TGID) and then the parsing of the supplied uint32 would be implicit in the command. But we went with the model where we have only one GET command and the type of the parameter is specified via netlink attributes. In our case, it didn't matter and since the type of data returned is very similar and so is the parameter supplied (pid/tgid), one GET suffices. But I'm wondering if userspace should consciously try and limit the commands or would it be better from a performance standpoint, to permit a reasonably larger fan-out to happen at the genetlink command level (for each exploiter). I guess this introduces more overhead for in-kernel structures (the linked list of command structures that needs to be kept around) while saving time on doing a second level of parsing within the exploiter-defined function that services the GET command. The small set model looks like a good compromise. Reducing number of commands to one is not a good idea IMHOfor reasons similar to why ioctl type syscalls aren't encouraged...since the genetlink layer anyway has code for demultiplexing, might as well use it and avoid an extra level of indirection. --Shailabh This would both simplify the genl code and API, and help ensure consistency of users. You are talking from an SELinux perspective i take it? My view: If you want to have ACLs against such commands then it becomes easier to say can only do ADD but not DEL for example (We need to resolve genl_rcv_msg() check on commands to be in sync with SELinux as was pointed by Thomas) cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 1/1] AF_UNIX Datagram getpeersec (with minor fix)
James Morris [EMAIL PROTECTED] wrote on 06/18/2006 04:04:06 AM: On Sun, 18 Jun 2006, Catherine Zhang wrote: I'd also mention here that this is to complement the SO_PEERSEC option for stream sockets. OK. There's an implementation issue, which I'm sure has been mentioned previously. This code should not be calling SELinux API functions. @@ -62,6 +70,12 @@ static __inline__ void scm_recv(struct s if (test_bit(SOCK_PASSCRED, sock-flags)) put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(scm-creds), scm-creds); + if (test_bit(SOCK_PASSSEC, sock-flags)) { + err = selinux_ctxid_to_string(scm-sid, scontext, scontext_len); I remember this issue being discussed, but no conclusion was made. The reason that we cannot use socket_getpeersec_dgram directly is that it passes skb as the argument, instead of socket. If we want to reuse the same hook for UNIX, then we have to make change to the interface. I was debating on whether I should add another hook for the UNIX domain... Let me check whether it'll be possible to reuse socket_getpeersec_dgram without too much disruption/complicaiton and I will repost. thanks, Catherine - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Jesper Dangaard Brouer wrote: Harry Edmon [EMAIL PROTECTED] wrote: I have a system with a strange network performance degradation from 2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6. The system is has Dual single core Xeons with hyperthreading on. cut Hi Harry Can you check which high-res timesource you are using? In the kernel log look for: kernel: Using tsc for high-res timesource kernel: Using pmtmr for high-res timesource I have experinced some network performance degradation when using the pmtmr timesource, on a Opteron AMD system. It seems that the default timesource change between 2.6.15 to 2.6.16. If you use pmtmr try to reboot with kernel option clock=tsc. On my Opteron AMD system i normally can route 400 kpps, but with timesource pmtmr i could only route around 83 kpps. (I found the timer to be the issue by using oprofile). We have CONFIG_HPET_TIMER=y, so we do not see these messages. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Andi Kleen wrote: Incoming packets are only time stamped when someone asks for the timestamps. Doesn't that add scheduling latency to the timestamps? Or is is a flag that gets set to trigger timestamping at packet arrival? Chris - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Mon, 19 Jun 2006, Andi Kleen wrote: If you use pmtmr try to reboot with kernel option clock=tsc. That's dangerous advice - when the system choses not to use TSC it often has a reason. Sorry, it was not a general advice, just something to try out. It really solved my network performance issue... On my Opteron AMD system i normally can route 400 kpps, but with timesource pmtmr i could only route around 83 kpps. (I found the timer to be the issue by using oprofile). Unless you're using packet sniffing or any other application that requests time stamps on a socket then the timer shouldn't make much difference. Incoming packets are only time stamped when someone asks for the timestamps. I do not know what caused the issue on my machine, but I can look into it if you like to know? I do have VLAN interfaces on the machine and it seems that eth1 runs in PROMISC mode (eth1.xxx does not). Could it be caused by that? Hilsen Jesper Brouer -- --- MSc. Master of Computer Science Dept. of Computer Science, University of Copenhagen Author of http://www.adsl-optimizer.dk --- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] update sunrpc to use in-kernel sockets API - ver2
This patch updates sunrpc to use in-kernel sockets API. Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED] Acked-by: James Morris [EMAIL PROTECTED] diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -388,7 +388,7 @@ svc_sendto(struct svc_rqst *rqstp, struc /* send head */ if (slen == xdr-head[0].iov_len) flags = 0; - len = sock-ops-sendpage(sock, rqstp-rq_respages[0], 0, xdr-head[0].iov_len, flags); + len = kernel_sendpage(sock, rqstp-rq_respages[0], 0, xdr-head[0].iov_len, flags); if (len != xdr-head[0].iov_len) goto out; slen -= xdr-head[0].iov_len; @@ -400,7 +400,7 @@ svc_sendto(struct svc_rqst *rqstp, struc while (pglen 0) { if (slen == size) flags = 0; - result = sock-ops-sendpage(sock, *ppage, base, size, flags); + result = kernel_sendpage(sock, *ppage, base, size, flags); if (result 0) len += result; if (result != size) @@ -413,7 +413,7 @@ svc_sendto(struct svc_rqst *rqstp, struc } /* send tail */ if (xdr-tail[0].iov_len) { - result = sock-ops-sendpage(sock, rqstp-rq_respages[rqstp-rq_restailpage], + result = kernel_sendpage(sock, rqstp-rq_respages[rqstp-rq_restailpage], ((unsigned long)xdr-tail[0].iov_base) (PAGE_SIZE-1), xdr-tail[0].iov_len, 0); @@ -434,13 +434,10 @@ out: static int svc_recv_available(struct svc_sock *svsk) { - mm_segment_toldfs; struct socket *sock = svsk-sk_sock; int avail, err; - oldfs = get_fs(); set_fs(KERNEL_DS); - err = sock-ops-ioctl(sock, TIOCINQ, (unsigned long) avail); - set_fs(oldfs); + err = kernel_sock_ioctl(sock, TIOCINQ, (unsigned long) avail); return (err = 0)? avail : err; } @@ -472,7 +469,7 @@ svc_recvfrom(struct svc_rqst *rqstp, str * at accept time. FIXME */ alen = sizeof(rqstp-rq_addr); - sock-ops-getname(sock, (struct sockaddr *)rqstp-rq_addr, alen, 1); + kernel_getpeername(sock, (struct sockaddr *)rqstp-rq_addr, alen); dprintk(svc: socket %p recvfrom(%p, %Zu) = %d\n, rqstp-rq_sock, iov[0].iov_base, iov[0].iov_len, len); @@ -758,7 +755,6 @@ svc_tcp_accept(struct svc_sock *svsk) struct svc_serv *serv = svsk-sk_server; struct socket *sock = svsk-sk_sock; struct socket *newsock; - const struct proto_ops *ops; struct svc_sock *newsvsk; int err, slen; @@ -766,29 +762,23 @@ svc_tcp_accept(struct svc_sock *svsk) if (!sock) return; - err = sock_create_lite(PF_INET, SOCK_STREAM, IPPROTO_TCP, newsock); - if (err) { + clear_bit(SK_CONN, svsk-sk_flags); + err = kernel_accept(sock, newsock, O_NONBLOCK); + if (err 0) { if (err == -ENOMEM) printk(KERN_WARNING %s: no more sockets!\n, serv-sv_name); - return; - } - - dprintk(svc: tcp_accept %p allocated\n, newsock); - newsock-ops = ops = sock-ops; - - clear_bit(SK_CONN, svsk-sk_flags); - if ((err = ops-accept(sock, newsock, O_NONBLOCK)) 0) { - if (err != -EAGAIN net_ratelimit()) + else if (err != -EAGAIN net_ratelimit()) printk(KERN_WARNING %s: accept failed (err %d)!\n, serv-sv_name, -err); - goto failed;/* aborted connection or whatever */ + return; } + set_bit(SK_CONN, svsk-sk_flags); svc_sock_enqueue(svsk); slen = sizeof(sin); - err = ops-getname(newsock, (struct sockaddr *) sin, slen, 1); + err = kernel_getpeername(newsock, (struct sockaddr *) sin, slen); if (err 0) { if (net_ratelimit()) printk(KERN_WARNING %s: peername failed (err %d)!\n, @@ -1407,14 +1397,14 @@ svc_create_socket(struct svc_serv *serv, if (sin != NULL) { if (type == SOCK_STREAM) sock-sk-sk_reuse = 1; /* allow address reuse */ - error = sock-ops-bind(sock, (struct sockaddr *) sin, + error = kernel_bind(sock, (struct sockaddr *) sin, sizeof(*sin)); if (error 0) goto bummer; } if (protocol == IPPROTO_TCP) { - if ((error = sock-ops-listen(sock, 64)) 0) + if ((error = kernel_listen(sock, 64)) 0) goto bummer; } diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c --- a/net/sunrpc/xprtsock.c +++
[PATCH 1/2] in-kernel sockets API - ver2
This patch implements wrapper functions that provide a convenient way to access the sockets API for in-kernel users like sunrpc, cifs ocsf2 etc and any future users. The only change from the version i submitted last week is the renaming of kernel_ioctl to kernel_sock_ioctl. I left the exports to use EXPORT_SYMBOL() to match with the existing interfaces sock_create_kern(), kernel_sendmsg(), kernel_recvmsg etc. Thanks Sridhar Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED] Acked-by: James Morris [EMAIL PROTECTED] diff --git a/include/linux/net.h b/include/linux/net.h --- a/include/linux/net.h +++ b/include/linux/net.h @@ -208,6 +208,25 @@ extern int kernel_recvmsg(struct struct kvec *vec, size_t num, size_t len, int flags); +extern int kernel_bind(struct socket *sock, struct sockaddr *addr, + int addrlen); +extern int kernel_listen(struct socket *sock, int backlog); +extern int kernel_accept(struct socket *sock, struct socket **newsock, +int flags); +extern int kernel_connect(struct socket *sock, struct sockaddr *addr, + int addrlen, int flags); +extern int kernel_getsockname(struct socket *sock, struct sockaddr *addr, + int *addrlen); +extern int kernel_getpeername(struct socket *sock, struct sockaddr *addr, + int *addrlen); +extern int kernel_getsockopt(struct socket *sock, int level, int optname, +char *optval, int *optlen); +extern int kernel_setsockopt(struct socket *sock, int level, int optname, +char *optval, int optlen); +extern int kernel_sendpage(struct socket *sock, struct page *page, int offset, + size_t size, int flags); +extern int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg); + #ifndef CONFIG_SMP #define SOCKOPS_WRAPPED(name) name #define SOCKOPS_WRAP(name, fam) diff --git a/net/socket.c b/net/socket.c --- a/net/socket.c +++ b/net/socket.c @@ -2160,6 +2160,109 @@ static long compat_sock_ioctl(struct fil } #endif +int kernel_bind(struct socket *sock, struct sockaddr *addr, int addrlen) +{ + return sock-ops-bind(sock, addr, addrlen); +} + +int kernel_listen(struct socket *sock, int backlog) +{ + return sock-ops-listen(sock, backlog); +} + +int kernel_accept(struct socket *sock, struct socket **newsock, int flags) +{ + struct sock *sk = sock-sk; + int err; + + err = sock_create_lite(sk-sk_family, sk-sk_type, sk-sk_protocol, + newsock); + if (err 0) + goto done; + + err = sock-ops-accept(sock, *newsock, flags); + if (err 0) { + sock_release(*newsock); + goto done; + } + + (*newsock)-ops = sock-ops; + +done: + return err; +} + +int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen, + int flags) +{ + return sock-ops-connect(sock, addr, addrlen, flags); +} + +int kernel_getsockname(struct socket *sock, struct sockaddr *addr, +int *addrlen) +{ + return sock-ops-getname(sock, addr, addrlen, 0); +} + +int kernel_getpeername(struct socket *sock, struct sockaddr *addr, +int *addrlen) +{ + return sock-ops-getname(sock, addr, addrlen, 1); +} + +int kernel_getsockopt(struct socket *sock, int level, int optname, + char *optval, int *optlen) +{ + mm_segment_t oldfs = get_fs(); + int err; + + set_fs(KERNEL_DS); + if (level == SOL_SOCKET) + err = sock_getsockopt(sock, level, optname, optval, optlen); + else + err = sock-ops-getsockopt(sock, level, optname, optval, + optlen); + set_fs(oldfs); + return err; +} + +int kernel_setsockopt(struct socket *sock, int level, int optname, + char *optval, int optlen) +{ + mm_segment_t oldfs = get_fs(); + int err; + + set_fs(KERNEL_DS); + if (level == SOL_SOCKET) + err = sock_setsockopt(sock, level, optname, optval, optlen); + else + err = sock-ops-setsockopt(sock, level, optname, optval, + optlen); + set_fs(oldfs); + return err; +} + +int kernel_sendpage(struct socket *sock, struct page *page, int offset, + size_t size, int flags) +{ + if (sock-ops-sendpage) + return sock-ops-sendpage(sock, page, offset, size, flags); + + return sock_no_sendpage(sock, page, offset, size, flags); +} + +int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg) +{ + mm_segment_t oldfs = get_fs(); + int err; + + set_fs(KERNEL_DS); + err = sock-ops-ioctl(sock, cmd, arg); +
Re: [PATCH 2/2] NET: Accurate packet scheduling for ATM/ADSL (userspace)
On Thu, 15 Jun 2006, jamal wrote: On Thu, 2006-15-06 at 10:47 +1000, Russell Stuart wrote: On Wed, 2006-06-14 at 11:57 +0100, Alan Cox wrote: The other problem I see with this code is it is very tightly tied to ATM cell sizes, not to solving the generic question of packetisation. Others have made this point also. I can't speak for Jesper, but I did consider making it generic. I also have considered to make it generic, but choose to make my patch as non-intrusive as possible to the kernel (and try to handle as much in userspace as possible). Actually I do think that the kernel patch part is very generic. The patch simply allow us to align the rate table/array. With the kernel patch in place, we can work on the userspace TC program to support more and more types of exotic link layer modeling. The issue was that doing so would add more code, but I don't personally know of any real world situation that would use the generic solution. I didn't fancy the thought of arguing on these lists for code that no one would actually use. ;-) If someone could put up their hand and say Hey, I need this, then expanding the patch to accommodate them would be a pleasure. I like generic code too. It is probably doable by just looking at netdevice-type and figuring the link layer technology. Totally in user space and building the compensated for tables there before telling the kernel (advantage is no kernel changes and therefore it would work with older kernels as well). I think you have got the setup all wrong. The linux middlebox/router has two ethernet interfaces, one of the ethernet interfaces is connected to the ADSL modem. Thus, the linux ethernet card cannot determine that it is connected to an ADSL line. The patch is the solution to the classical problem people have when tryng to configure traffic control on an ADSL link? Q: The packet scheduling does not work all the time? A: Try to decrease to bandwidth. The issue here is, that ATM does not have fixed overhead (due to alignment and padding). This means that a fixed reduction of the bandwidth is not the solution. We could reduce the bandwidth to the worst-case overhead, which is 62%, I do not think that is a good solution... With the patch, you can now simply configure HTB to use the rate that was specified by the ISP. Please read chapter 6 (Achieving Queue Control) page 55-65, where I demonstrate that the naive approach of reducing bandwidth does not work, when the packet distribution change on the link. http://www.adsl-optimizer.dk/thesis/ Cheers, Jesper Brouer -- --- MSc. Master of Computer Science Dept. of Computer Science, University of Copenhagen Author of http://www.adsl-optimizer.dk --- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFT] bcm43xx: Busting the 1G limit
On Monday 19 June 2006 17:23, Daniel Gryniewicz wrote: On Sat, 2006-06-17 at 19:28 +0200, Michael Buesch wrote: Hi, This patch adds full 32-bit and 64-bit DMA support to the bcm43xx driver. Well, it _should_ do this. I can not test it, as I don't have a machine to trigger the 1G limit. The 1G limit should be exploitable on an AMD64 machine with more than 1G RAM. Please test and report, if it works or not. In the case of works not, please provide full dmesg log. Note that I am not sure which cards actually support full 32-bit or even 64-bit mode. Older cards might still only support 30-bit DMA. Hi. I tried this on both 2.6.17-rc6 and on wireless-dev, and got pretty much the same panic on both (modulo locking). My box is a turion with 2 GB of ram and a BCM4318. Here's the panic from wireless-dev: Unable to handle kernel NULL pointer dereference at 0020 RIP: 88104f24{:bcm43xx:bcm43xx_dma_handle_xmitstatus+436} I am still not absolutely sure where this oops comes from. Could you remove at least 1G of your RAM and retry? -- Greetings Michael. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] pcnet32 NAPI changes
On Fri, Jun 16, 2006 at 12:11:54PM -0700, Don Fry wrote: This patch is a collection of changes to pcnet32 which does the following: - Fix section mismatch warning. - fix set_ringparam to correctly handle memory allocation failures - fix off-by-one in get_ringparam. - cleanup at end of loopback_test when not up. - Add NAPI to driver, fixing set_ringparam and loopback_test to work correctly with poll. - for multicast, do not reset the chip unless cannot enter suspend mode to avoid race with poll. The set_ringparam code is larger than I would prefer, but it will not leave null pointers around for the code to stumble over when memory allocation fails. If anyone has a better idea, please let me know. Some complexity could be avoided by allocating memory for the maximum number of tx and rx buffers at probe time. Requiring 14k for the tx ring and arrays, and another 14k for rx; instead of about 10k total for the default sizes. It is NAPI only, unlike Len Sorensen's version which allows for compile time selection. Some drivers are NAPI only, others have compile options. Which is preferred? I believe it is preferred to be a compile option for non-gigabit drivers, given that it will be eating a lot of cycles for infrequent packets (especially for the 10Mb). I believe there was a thread about this last year when e100 was having NAPI problems. A general nit. There are ALOT of magic numbers in the code, most existing prior to this patch. The driver would benefit from a little clean-up. Also nothing to do with this patch, but I noticed it when the code was moved. A comment about why the following is necessary might be nice: lp-rx_ring[i].buf_length = le16_to_cpu(2 - PKT_BUF_SZ); Thanks, Jon I have tested these changes with a 79C971, 973, 976, and 978 on a ppc64 machine, and 970A, 972, 973, 975, and 976 on an x86 machine. I have not tested these changes with VMware or Xen. --- linux-2.6.17-rc6/drivers/net/orig.pcnet32.c 2006-06-15 11:49:39.0 -0700 +++ linux-2.6.17-rc6/drivers/net/pcnet32.c2006-06-16 11:30:45.0 -0700 @@ -22,8 +22,8 @@ */ #define DRV_NAME pcnet32 -#define DRV_VERSION 1.32 -#define DRV_RELDATE 18.Mar.2006 +#define DRV_VERSION 1.33-NAPI +#define DRV_RELDATE 16.Jun.2006 #define PFX DRV_NAME : static const char *const version = @@ -277,13 +277,12 @@ struct pcnet32_private { u32 phymask; }; -static void pcnet32_probe_vlbus(void); static int pcnet32_probe_pci(struct pci_dev *, const struct pci_device_id *); static int pcnet32_probe1(unsigned long, int, struct pci_dev *); static int pcnet32_open(struct net_device *); static int pcnet32_init_ring(struct net_device *); static int pcnet32_start_xmit(struct sk_buff *, struct net_device *); -static int pcnet32_rx(struct net_device *); +static int pcnet32_poll(struct net_device *dev, int *budget); static void pcnet32_tx_timeout(struct net_device *dev); static irqreturn_t pcnet32_interrupt(int, void *, struct pt_regs *); static int pcnet32_close(struct net_device *); @@ -425,6 +424,215 @@ static struct pcnet32_access pcnet32_dwi .reset = pcnet32_dwio_reset }; +static void pcnet32_netif_stop(struct net_device *dev) +{ + dev-trans_start = jiffies; + netif_poll_disable(dev); + netif_tx_disable(dev); +} + +static void pcnet32_netif_start(struct net_device *dev) +{ + netif_wake_queue(dev); + netif_poll_enable(dev); +} + +/* + * Allocate space for the new sized tx ring. + * Free old resources + * Save new resources. + * Any failure keeps old resources. + * Must be called with lp-lock held. + */ +static void pcnet32_realloc_tx_ring(struct net_device *dev, + struct pcnet32_private *lp, + unsigned int size) +{ + dma_addr_t new_ring_dma_addr; + dma_addr_t *new_dma_addr_list; + struct pcnet32_tx_head *new_tx_ring; + struct sk_buff **new_skb_list; + + pcnet32_purge_tx_ring(dev); + + new_tx_ring = pci_alloc_consistent(lp-pci_dev, +sizeof(struct pcnet32_tx_head) * +(1 size), +new_ring_dma_addr); + if (new_tx_ring == NULL) { + if (pcnet32_debug NETIF_MSG_DRV) + printk(\n KERN_ERR PFX +%s: Consistent memory allocation failed.\n, +dev-name); + return; + } + memset(new_tx_ring, 0, sizeof(struct pcnet32_tx_head) * (1 size)); + + new_dma_addr_list = kcalloc(sizeof(dma_addr_t), (1 size), GFP_ATOMIC); + if (!new_dma_addr_list) { + if (pcnet32_debug NETIF_MSG_DRV) + printk(\n KERN_ERR PFX +
Re: [RFT] pcnet32 NAPI changes
On Mon, Jun 19, 2006 at 03:41:40PM -0500, Jon Mason wrote: I believe it is preferred to be a compile option for non-gigabit drivers, given that it will be eating a lot of cycles for infrequent packets (especially for the 10Mb). I believe there was a thread about this last year when e100 was having NAPI problems. How does NAPI eat cycles? It goes back to interrupt mode when the queue is empty, and only on RX interrupt does it turn on polling again. It is certainly possible that there are bugs in a NAPI conversion, which I guess could be a reason to have the option to stick with the old method, although then again not having the option ensures the bugs get found sooner. A general nit. There are ALOT of magic numbers in the code, most existing prior to this patch. The driver would benefit from a little clean-up. Also nothing to do with this patch, but I noticed it when the code was moved. A comment about why the following is necessary might be nice: lp-rx_ring[i].buf_length = le16_to_cpu(2 - PKT_BUF_SZ); I suspect many drivers are in need of some cleanup. Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Monday 19 June 2006 19:34, Chris Friesen wrote: Andi Kleen wrote: Incoming packets are only time stamped when someone asks for the timestamps. Doesn't that add scheduling latency to the timestamps? Or is is a flag that gets set to trigger timestamping at packet arrival? It's a flag (or more precise a global counter) -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 6681] New: TC crash and rule freeze
[EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=6681 Summary: TC crash and rule freeze Kernel Version: 2.6.16-gentoo-r6 Status: NEW Severity: normal Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: 2.6.16-gentoo-r6 Distribution: Gentoo Hardware Environment: 00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub Interface (rev 02) 00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated Graphics Controller (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02) 00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2) 00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02) 01:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 05) 01:0a.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 05) 01:0b.0 SCSI storage controller: Adaptec ASC-39320 U320 (rev 03) 01:0b.1 SCSI storage controller: Adaptec ASC-39320 U320 (rev 03) Software Environment: sys-apps/iproute2-2.6.16.20060323 Problem Description: #cat dmeseg Unable to handle kernel NULL pointer dereference at virtual address 000c printing eip: c0217c26 *pde = Oops: [#1] SMP Modules linked in: sch_sfq cls_u32 sch_red sch_htb iptable_filter ip_tables x_tables uhci_hcd ehci_hcd usbcore CPU:0 EIP:0060:[c0217c26]Not tainted VLI EFLAGS: 00010287 (2.6.16-gentoo-r6 #2) EIP is at __rb_erase_color+0x94/0x1ad eax: f55f4954 ebx: f724bb54 ecx: f724bb54 edx: esi: edi: f7679468 ebp: f7679468 esp: f676bbbc ds: 007b es: 007b ss: 0068 Process tc (pid: 24294, threadinfo=f676a000 task=f7d7da90) Stack: 0f724bb54 f7679468 e6bbf154 c0217e36 f724bb54 f7679468 e6bbf000 e6bbf06c f7679000 f7679080 f8903366 e6bbf154 f7679468 0004 00d0 00010006 000103c9 f7679000 c0311ffa f7679000 Call Trace: [c0217e36] rb_erase+0xf7/0x12d [f8903366] htb_destroy_class+0xec/0x15d [sch_htb] [c0311ffa] tc_ctl_tclass+0x1b1/0x288 [c030d69e] rtnetlink_dump_ifinfo+0x6c/0x89 [c030dcef] rtnetlink_rcv_msg+0x171/0x233 [c031759f] netlink_dump+0x94/0x1e2 [c030db7e] rtnetlink_rcv_msg+0x0/0x233 [c0317a45] netlink_rcv_skb+0x46/0xad [c030db7e] rtnetlink_rcv_msg+0x0/0x233 [c0317aec] netlink_run_queue+0x40/0xd0 [c030db7e] rtnetlink_rcv_msg+0x0/0x233 [c030db5e] rtnetlink_rcv+0x2e/0x4e [c030db7e] rtnetlink_rcv_msg+0x0/0x233 [c031735c] netlink_data_ready+0x60/0x62 [c03164ed] netlink_sendskb+0x32/0x61 [c031704d] netlink_sendmsg+0x291/0x304 [c02f9b0d] sock_sendmsg+0xeb/0x10d [c02f9b0d] sock_sendmsg+0xeb/0x10d [c0131fa6] autoremove_wake_function+0x0/0x57 [c021a084] copy_from_user+0x46/0x7e [c0300ae4] verify_iovec+0x44/0x9e [c02fb525] sys_sendmsg+0x15a/0x272 [c0140ca6] filemap_nopage+0x30d/0x38a [c0152e83] page_add_file_rmap+0x2a/0x2e [c014e014] do_no_page+0x219/0x278 [c021a084] copy_from_user+0x46/0x7e [c02fbaf7] sys_socketcall+0x28d/0x294 [c0102ca7] sysenter_past_esp+0x54/0x75 Code: 04 01 00 00 00 c7 43 04 00 00 00 00 89 7c 24 04 89 1c 24 e8 6b fe ff ff 8b 53 0c eb 8e 8b 53 08 8b 72 04 85 f6 0f 84 82 00 00 00 8b 4a 0c 85 c9 74 0a 83 79 04 01 0f 85 00 01 00 00 8b 72 08 85 It crashed in net/sched/somewhere. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 6682] New: BUG: soft lockup detected on CPU#0! / ksoftirqd takse 100% CPU
[EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=6682 Summary: BUG: soft lockup detected on CPU#0! / ksoftirqd takse 100% CPU Kernel Version: 2.6.15.6 Status: NEW Severity: normal Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: (unknown) Distribution: Gentoo Hardware Environment: 2x Xeon 2.66, 1 GB RAM, NICS: 2 x e1000, and one double port e100. Based on Intel E7501 architecture (2U rack Intel chassis). Software Environment: quagga 0.98.6 Problem Description: ksoftirqd/0 takes 100% of CPU. further investigation shows no sing of network flood or something (and also 2 of 3 NICs are e1000 with NAPI). Ocassionaly there are BUG: soft lockup detected on CPU#0!. Steps to reproduce: There is no simple way to reproduce. I think that everythint started when we attached second provider with BGP support. We are using quagga which injects about 186 000 routes into kernel. When running for a while (at least few hours, sometimes a day) we get 100% usage on ksoftirqd/0 and following messages in logs: BUG: soft lockup detected on CPU#0! Pid: 6506, comm:zebra EIP: 0060:[c027f6fd] CPU: 0 EIP is at _spin_lock+0x7/0xf EFLAGS: 0286Not tainted (2.6.15.6) EAX: f6203180 EBX: e6fbf000 ECX: EDX: f6bec000 ESI: f6203000 EDI: eddb4b80 EBP: fff4 DS: 007b ES: 007b CR0: 8005003b CR2: aca6dff0 CR3: 361ad000 CR4: 06d0 [c02396f9] dev_queue_xmit+0xe0/0x203 [c0250de8] ip_output+0x1e1/0x237 [c024f3f5] ip_forward+0x181/0x1df [c024e21a] ip_rcv+0x40c/0x485 [c0239bd0] netif_receive_skb+0x12f/0x165 [f885aa4c] e1000_clean_rx_irq+0x389/0x410 [e1000] [f885a1ca] e1000_clean+0x94/0x12f [e1000] [c0239d5a] net_rx_action+0x69/0xf0 [c011a305] __do_softirq+0x55/0xbd [c011a39a] do_softirq+0x2d/0x31 [c011a3f8] local_bh_enable+0x5a/0x65 [c024a0a1] rt_run_flush+0x5f/0x80 [c027623f] fn_hash_insert+0x352/0x39f [c027364c] inet_rtm_newroute+0x57/0x62 [c02413ed] rtnetlink_rcv_msg+0x1a8/0x1cb [c0241245] rtnetlink_rcv_msg+0x0/0x1cb [c0247c1e] netlink_rcv_skb+0x3a/0x8b [c0247cb1] netlink_run_queue+0x42/0xc3 [c0241245] rtnetlink_rcv_msg+0x0/0x1cb [c0241245] rtnetlink_rcv_msg+0x0/0x1cb [c0241227] rtnetlink_rcv+0x22/0x40 [c0241245] rtnetlink_rcv_msg+0x0/0x1cb [c024764c] netlink_data_ready+0x17/0x54 [c0246a99] netlink_sendskb+0x1f/0x39 [c0247449] netlink_sendmsg+0x27b/0x28c [c0231467] sock_sendmsg+0xce/0xe9 [c0112b36] __wake_up+0x27/0x3b [c01a6216] copy_to_user+0x38/0x42 [c01a625a] copy_from_user+0x3a/0x60 [c01a625a] copy_from_user+0x3a/0x60 [c0126be2] autoremove_wake_function+0x0/0x3a [c0236bcd] verify_iovec+0x49/0x7f [c02327f2] sys_sendmsg+0x152/0x1a8 [c0147a62] do_sync_read+0xb8/0xeb [c01a6216] copy_to_user+0x38/0x42 [c0126be2] autoremove_wake_function+0x0/0x3a [c0122b7a] getrusage+0x34/0x43 [c0168504] inotify_dentry_parent_queue_event+0x29/0x7c [c01a625a] copy_from_user+0x3a/0x60 [c0232b6b] sys_socketcall+0x167/0x180 [c0102433] sysenter_past_esp+0x54/0x75 BUG: soft lockup detected on CPU#0! Pid: 6506, comm:zebra EIP: 0060:[f8952052] CPU: 0 EIP is at u32_classify+0x52/0x170 [cls_u32] EFLAGS: 0206Not tainted (2.6.15.6) EAX: e2fbd020 EBX: f48649c0 ECX: 0010 EDX: 29b09d5a ESI: f48649ec EDI: 0001 EBP: e2fbd020 DS: 007b ES: 007b CR0: 8005003b CR2: 08154004 CR3: 361ad000 CR4: 06d0 [f88462fa] ipt_do_table+0x2de/0x2fd [ip_tables] [f883b523] ip_nat_fn+0x177/0x185 [iptable_nat] [f88e159f] ip_refrag+0x23/0x5f [ip_conntrack] [c0244d82] tc_classify+0x2c/0x3f [f895514b] htb_classify+0x14b/0x1dd [sch_htb] [f8955638] htb_enqueue+0x1d/0x13a [sch_htb] [c02396fd] dev_queue_xmit+0xe4/0x203 [c0250de8] ip_output+0x1e1/0x237 [c024f3f5] ip_forward+0x181/0x1df [c024e21a] ip_rcv+0x40c/0x485 [c0239bd0] netif_receive_skb+0x12f/0x165 [f885aa4c] e1000_clean_rx_irq+0x389/0x410 [e1000] [f885a1ca] e1000_clean+0x94/0x12f [e1000] [c0239d5a] net_rx_action+0x69/0xf0 [c011a305] __do_softirq+0x55/0xbd [c011a39a] do_softirq+0x2d/0x31 [c011a3f8] local_bh_enable+0x5a/0x65 [c024a0a1] rt_run_flush+0x5f/0x80 [c027623f] fn_hash_insert+0x352/0x39f [c027364c] inet_rtm_newroute+0x57/0x62 [c02413ed] rtnetlink_rcv_msg+0x1a8/0x1cb [c0241245] rtnetlink_rcv_msg+0x0/0x1cb [c0247c1e] netlink_rcv_skb+0x3a/0x8b [c0247cb1] netlink_run_queue+0x42/0xc3 [c0241245] rtnetlink_rcv_msg+0x0/0x1cb [c0241245] rtnetlink_rcv_msg+0x0/0x1cb [c0241227] rtnetlink_rcv+0x22/0x40 [c0241245] rtnetlink_rcv_msg+0x0/0x1cb [c024764c] netlink_data_ready+0x17/0x54 [c0246a99] netlink_sendskb+0x1f/0x39 [c0247449] netlink_sendmsg+0x27b/0x28c [c0231467] sock_sendmsg+0xce/0xe9 [c0112b36] __wake_up+0x27/0x3b [c01a625a] copy_from_user+0x3a/0x60 [c01a625a] copy_from_user+0x3a/0x60 [c0126be2]
Re: [NET]: Prevent multiple qdisc runs
On Mon, Jun 19, 2006 at 10:36:50AM -0400, jamal wrote: Ok, but: The queue lock will ensure only one of the qdisc runs (assuming different CPUs) will be able to dequeue at any one iota in time, no? And if you assume that the cpu that manages to get the tx lock as well is going to be contending for the qlock in ordewr to requeue, then the only scenario i can see the race happening is when you have one CPU faster than the other. Did i miss something? First of all you could receive an IRQ in between dropping xmit_lock and regaining the queue lock. Secondly we now have lockless drivers where this assumption also does not hold. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [DOC]: generic netlink
jamal wrote: Folks, Attached is a document that should help people wishing to use generic netlink interface. It is a WIP so a lot more to go if i see interest. The doc has been around for a while, i spent part of yesterday and this morning cleaning it up. If you have sent me comments before, please forgive me for having misplaced them - just send again. Jamal, Completing the documentation on generic netlink usage will definitely be useful. I'd be happy to help out with this since I've recently gone through trying to understand and use genetlink for the taskstats interface. Hopefully this will help other users like me who aren't netlink experts to begin with ! I've sent you a patch to the document that attempts to cover the following TODOS (didn't see any point sending it to the whole list since its harder to read patches to documentation). Pls use as you see fit. TODO: a) Add a more complete compiling kernel module with events. Have Thomas put his Mashimaro example and point to it. (not the Mashimaro example, nor a completly compiled module but snippets of pseudo code taken from the user space program used in taskstats development, modified to the foobar example you've used) b) Describe some details on how user space - kernel works probably using libnl?? c) Describe discovery using the controller.. I'll provide another patch that will cover d) and e) in the set below, again in the context of the foobar example, which might need to be modified a bit. d) talk about policies etc e) talk about how something coming from user space eventually gets to you. f) Talk about the TLV manipulation stuff from Thomas. g) submit controller patch to iproute2 One point...does d), f) etc. belong in a separate doc describing usage of netlink attributes ? Its useful here too but not directly related to genetlink perhaps. PS:- I dont have a good place to put this doc and point to, hence the 17K attachment http://www.kernel.org/pub/linux/kernel/people/hadi/ ? (unless your permissions have been revoked for lack of use ! :-) Having the current document will be useful to see what edits have been accepted and work on that instead of the original. --Shailabh - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bcm43xx-d80211: AccessPoint mode related fixes
Hi John, Please apply this to wireless-dev. There is no real reason to delay it, even _if_ there might be still bugs in it. It's a development tree. That's what it is for. ;) -- Get AccessPoint mode working in bcm43xx-d80211. This patch is derived from Alexander Tsvyashchenko's original patch. I (mb) extended it by endianess fixes and other bugfixes. From: Alexander Tsvyashchenko [EMAIL PROTECTED] Signed-off-by: Michael Buesch [EMAIL PROTECTED] Index: wireless-dev-dscapeports/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c === --- wireless-dev-dscapeports.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c 2006-06-17 21:26:10.0 +0200 +++ wireless-dev-dscapeports/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c 2006-06-19 11:25:02.0 +0200 @@ -151,8 +151,10 @@ { u32 status; + assert(offset % 4 == 0); + status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD); - if (!(status BCM43xx_SBF_XFER_REG_BYTESWAP)) + if (status BCM43xx_SBF_XFER_REG_BYTESWAP) val = swab32(val); bcm43xx_write32(bcm, BCM43xx_MMIO_RAM_CONTROL, offset); @@ -312,7 +314,7 @@ } } -void bcm43xx_tsf_write(struct bcm43xx_private *bcm, u64 tsf) +static void bcm43xx_time_lock(struct bcm43xx_private *bcm) { u32 status; @@ -320,7 +322,19 @@ status |= BCM43xx_SBF_TIME_UPDATE; bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status); mmiowb(); +} + +static void bcm43xx_time_unlock(struct bcm43xx_private *bcm) +{ + u32 status; + + status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD); + status = ~BCM43xx_SBF_TIME_UPDATE; + bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status); +} +static void bcm43xx_tsf_write_locked(struct bcm43xx_private *bcm, u64 tsf) +{ /* Be careful with the in-progress timer. * First zero out the low register, so we have a full * register-overflow duration to complete the operation. @@ -350,10 +364,13 @@ mmiowb(); bcm43xx_write16(bcm, BCM43xx_MMIO_TSF_0, v0); } +} - status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD); - status = ~BCM43xx_SBF_TIME_UPDATE; - bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status); +void bcm43xx_tsf_write(struct bcm43xx_private *bcm, u64 tsf) +{ + bcm43xx_time_lock(bcm); + bcm43xx_tsf_write_locked(bcm, tsf); + bcm43xx_time_unlock(bcm); } static void bcm43xx_measure_channel_change_time(struct bcm43xx_private *bcm) @@ -415,10 +432,11 @@ static void bcm43xx_write_mac_bssid_templates(struct bcm43xx_private *bcm) { static const u8 zero_addr[ETH_ALEN] = { 0 }; - const u8 *mac = NULL; - const u8 *bssid = NULL; + const u8 *mac; + const u8 *bssid; u8 mac_bssid[ETH_ALEN * 2]; int i; + u32 tmp; bssid = bcm-interface.bssid; if (!bssid) @@ -431,12 +449,13 @@ memcpy(mac_bssid + ETH_ALEN, bssid, ETH_ALEN); /* Write our MAC address and BSSID to template ram */ - for (i = 0; i ARRAY_SIZE(mac_bssid); i += sizeof(u32)) - bcm43xx_ram_write(bcm, 0x20 + i, *((u32 *)(mac_bssid + i))); - for (i = 0; i ARRAY_SIZE(mac_bssid); i += sizeof(u32)) - bcm43xx_ram_write(bcm, 0x78 + i, *((u32 *)(mac_bssid + i))); - for (i = 0; i ARRAY_SIZE(mac_bssid); i += sizeof(u32)) - bcm43xx_ram_write(bcm, 0x478 + i, *((u32 *)(mac_bssid + i))); + for (i = 0; i ARRAY_SIZE(mac_bssid); i += sizeof(u32)) { + tmp = (u32)(mac_bssid[i + 0]); + tmp |= (u32)(mac_bssid[i + 1]) 8; + tmp |= (u32)(mac_bssid[i + 2]) 16; + tmp |= (u32)(mac_bssid[i + 3]) 24; + bcm43xx_ram_write(bcm, 0x20 + i, tmp); + } } static void bcm43xx_set_slot_time(struct bcm43xx_private *bcm, u16 slot_time) @@ -460,49 +479,6 @@ bcm-short_slot = 0; } -/* FIXME: To get the MAC-filter working, we need to implement the - *following functions (and rename them :) - */ -#if 0 -static void bcm43xx_disassociate(struct bcm43xx_private *bcm) -{ - bcm43xx_mac_suspend(bcm); - bcm43xx_macfilter_clear(bcm, BCM43xx_MACFILTER_ASSOC); - - bcm43xx_ram_write(bcm, 0x0026, 0x); - bcm43xx_ram_write(bcm, 0x0028, 0x); - bcm43xx_ram_write(bcm, 0x007E, 0x); - bcm43xx_ram_write(bcm, 0x0080, 0x); - bcm43xx_ram_write(bcm, 0x047E, 0x); - bcm43xx_ram_write(bcm, 0x0480, 0x); - - if (bcm-current_core-rev 3) { - bcm43xx_write16(bcm, 0x0610, 0x8000); - bcm43xx_write16(bcm, 0x060E, 0x); - } else - bcm43xx_write32(bcm, 0x0188, 0x8000); - - bcm43xx_shm_write32(bcm, BCM43xx_SHM_WIRELESS, 0x0004, 0x03ff); - -#if 0 - if (bcm43xx_current_phy(bcm)-type ==
Re: [Bugme-new] [Bug 6698] New: unregister_netdevice hangs indefinitely from /proc/sys/net/ipv6/conf/all/forwarding
[EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=6698 Summary: unregister_netdevice hangs indefinitely from /proc/sys/net/ipv6/conf/all/forwarding Kernel Version: 2.6.17-rc6 Status: NEW Severity: normal Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: none known (yet) Distribution: reproduced on Debian/stable, SuSE/10.0, SuSE/10.1 Hardware Environment: reproduced on UML, i386, x86/64 Software Environment: reproduced with openvpn and UML tap devices Problem Description: after adding IPv6 to my previously working openvpn tunneling setup, a (really old) IPv6-related bug started to occurr: http://lkml.org/lkml/2003/8/21/1 I also reproduced this bug with kernel 2.6.15.1(vanilla,uml) and 2.6.16.13(SuSE-version,x86/64) and linux-2.6.13 (SuSE-version,i386) Steps to reproduce: echo 0 /proc/sys/net/ipv6/conf/all/forwarding # this is important initialization Have (any version of) openvpn open a tunnel using a tap (virtual ethernet) device. In the up script do: echo 1 /proc/sys/net/ipv6/conf/all/forwarding this can be easily tested with these lines: apt-get install openvpn modprobe tun mknod /dev/net/tun c 10 200 echo 0 /proc/sys/net/ipv6/conf/all/forwarding echo echo 1 /proc/sys/net/ipv6/conf/all/forwarding /tmp/up ; chmod a+x /tmp/up openvpn --dev-type tap --remote tunnel.lsmod.de 5003 --ifconfig 10.9.0.2 255.255.255.0 --dev-node /dev/net/tun --up /tmp/up # at this point you can verify your tunnel setup by ping 10.9.0.1 # on the server I have this: openvpn --dev-type tap --ifconfig 10.9.0.1 255.255.255.0 --port 5003 --dev-node /dev/net/tun --float # you need UDP port 5003 to pass through your firewall for this Alternatively get an user-mode-linux(UML) binary and do something along the lines of: apt-get install uml-utilities TAP=`tunctl -b` ifconfig $TAP 192.168.121.1 netmask 255.255.255.252 echo 1 /proc/sys/net/ipv6/conf/all/forwarding /path/to/linux eth0=tuntap,$TAP ... # booting up to the point where the tap dev is really bound (at ifconfig eth0 192.168.121.2 within the UML) tunctl -d $TAP After 20 seconds kill the openvpn or linux process. This hangs indefinitely, leaving the openvpn process in D state. syslog states every 10 secs: unregister_netdevice: waiting for tap0 to become free. Usage count = 1 The kernel will then hang ifconfig and ip commands, probably because the waiting-for-tap0 still holds a mutex. After a dozen reboots of trying I found a work-around: replacing the critical line with (sleep 2 ; echo 1 /proc/sys/net/ipv6/conf/all/forwarding ) A sleep 1 does not suffice. Doing the echo before calling openvpn also works fine, so there seems to be a timing problem or race condition during initialization of the IPv6 on the newly created tap0 device. Thought to be an ipv6 refcount leak. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFT] bcm43xx: Busting the 1G limit
On Mon, 2006-06-19 at 22:43 +0200, Michael Buesch wrote: On Monday 19 June 2006 17:23, Daniel Gryniewicz wrote: On Sat, 2006-06-17 at 19:28 +0200, Michael Buesch wrote: Hi, This patch adds full 32-bit and 64-bit DMA support to the bcm43xx driver. Well, it _should_ do this. I can not test it, as I don't have a machine to trigger the 1G limit. The 1G limit should be exploitable on an AMD64 machine with more than 1G RAM. Please test and report, if it works or not. In the case of works not, please provide full dmesg log. Note that I am not sure which cards actually support full 32-bit or even 64-bit mode. Older cards might still only support 30-bit DMA. Hi. I tried this on both 2.6.17-rc6 and on wireless-dev, and got pretty much the same panic on both (modulo locking). My box is a turion with 2 GB of ram and a BCM4318. Here's the panic from wireless-dev: Unable to handle kernel NULL pointer dereference at 0020 RIP: 88104f24{:bcm43xx:bcm43xx_dma_handle_xmitstatus+436} I am still not absolutely sure where this oops comes from. Could you remove at least 1G of your RAM and retry? I took out 1G of RAM (2 1G sticks), and there was no more panic. It still didn't work (no output from iwlist scan), but also no panic. dmesg output was: Jun 19 18:00:54 athena bcm43xx: Radio turned on Jun 19 18:00:54 athena bcm43xx: ASSERTION FAILED (radio_attenuation 10) at: drivers/net/wireless/bcm43xx/bcm43xx_phy.c:1485:bcm43xx_find_lopair() Jun 19 18:00:54 athena bcm43xx: ASSERTION FAILED (radio_attenuation 10) at: drivers/net/wireless/bcm43xx/bcm43xx_phy.c:1485:bcm43xx_find_lopair() Jun 19 18:00:54 athena bcm43xx: Chip initialized Jun 19 18:00:54 athena bcm43xx: 32-bit DMA initialized Jun 19 18:00:54 athena bcm43xx: 80211 cores initialized Jun 19 18:00:54 athena bcm43xx: Keys cleared Jun 19 18:00:54 athena SoftMAC: Associate: Scanning for networks first. Jun 19 18:00:54 athena SoftMAC: Associate: failed to initiate scan. Is device up? followed by a bunch of: Jun 19 18:01:15 athena SoftMAC: Start scanning with channel: 1 Jun 19 18:01:15 athena SoftMAC: Scanning 14 channels Jun 19 18:01:15 athena SoftMAC: Scanning finished followed by: Jun 19 18:02:03 athena SoftMAC: Associate: Scanning for networks first. Jun 19 18:02:03 athena SoftMAC: Start scanning with channel: 1 Jun 19 18:02:03 athena SoftMAC: Scanning 14 channels Jun 19 18:02:03 athena bcm43xx: set security called Jun 19 18:02:03 athena bcm43xx:.level = 0 Jun 19 18:02:03 athena bcm43xx:.enabled = 0 Jun 19 18:02:03 athena bcm43xx:.encrypt = 0 Jun 19 18:02:03 athena SoftMAC: Scanning finished Jun 19 18:02:03 athena SoftMAC: Associate: Scanning for networks first. Jun 19 18:02:03 athena SoftMAC: Start scanning with channel: 1 Jun 19 18:02:03 athena SoftMAC: Scanning 14 channels Jun 19 18:02:03 athena SoftMAC: Scanning finished Jun 19 18:02:03 athena SoftMAC: Associate: Scanning for networks first. Jun 19 18:02:03 athena SoftMAC: Start scanning with channel: 1 Jun 19 18:02:03 athena SoftMAC: Scanning 14 channels Jun 19 18:02:04 athena SoftMAC: Scanning finished Jun 19 18:02:04 athena SoftMAC: Unable to find matching network after scan! and finally: Jun 19 18:02:44 athena bcm43xx: Radio turned off Jun 19 18:02:44 athena bcm43xx: DMA-32 0x0200 (RX) max used slots: 0/64 Jun 19 18:02:44 athena bcm43xx: DMA-32 0x02A0 (TX) max used slots: 0/512 Jun 19 18:02:44 athena bcm43xx: DMA-32 0x0280 (TX) max used slots: 0/512 Jun 19 18:02:44 athena bcm43xx: DMA-32 0x0260 (TX) max used slots: 0/512 Jun 19 18:02:44 athena bcm43xx: DMA-32 0x0240 (TX) max used slots: 0/512 Jun 19 18:02:44 athena bcm43xx: DMA-32 0x0220 (TX) max used slots: 2/512 Jun 19 18:02:44 athena bcm43xx: DMA-32 0x0200 (TX) max used slots: 0/512 At that point, I remove the bcm43xx module, and switched over to my prism54 card in order to get net access. This was all on wireless-dev as of yesterday with the 1G limit patch from this thread. Let me know if there's anything I can try, I'd love to get this working properly. Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 6682] New: BUG: soft lockup detected on CPU#0! / ksoftirqd takse 100% CPU
On Mon, Jun 19, 2006 at 03:20:10PM -0700, Andrew Morton wrote: [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=6682 Summary: BUG: soft lockup detected on CPU#0! / ksoftirqd takse 100% CPU Kernel Version: 2.6.15.6 Status: NEW Severity: normal Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: (unknown) Distribution: Gentoo Hardware Environment: 2x Xeon 2.66, 1 GB RAM, NICS: 2 x e1000, and one double port e100. Based on Intel E7501 architecture (2U rack Intel chassis). Software Environment: quagga 0.98.6 Problem Description: ksoftirqd/0 takes 100% of CPU. further investigation shows no sing of network flood or something (and also 2 of 3 NICs are e1000 with NAPI). Ocassionaly there are BUG: soft lockup detected on CPU#0!. Steps to reproduce: There is no simple way to reproduce. I think that everythint started when we attached second provider with BGP support. We are using quagga which injects about 186 000 routes into kernel. When running for a while (at least few hours, sometimes a day) we get 100% usage on ksoftirqd/0 and following messages in logs: Is it possible that there is a routing loop, either in the overall configuration or in some intermediate point in the route injection? Both CPUs seem to be receiving ethernet packets at the time of the oops. Thanx, Paul BUG: soft lockup detected on CPU#0! Pid: 6506, comm:zebra EIP: 0060:[c027f6fd] CPU: 0 EIP is at _spin_lock+0x7/0xf EFLAGS: 0286Not tainted (2.6.15.6) EAX: f6203180 EBX: e6fbf000 ECX: EDX: f6bec000 ESI: f6203000 EDI: eddb4b80 EBP: fff4 DS: 007b ES: 007b CR0: 8005003b CR2: aca6dff0 CR3: 361ad000 CR4: 06d0 [c02396f9] dev_queue_xmit+0xe0/0x203 [c0250de8] ip_output+0x1e1/0x237 [c024f3f5] ip_forward+0x181/0x1df [c024e21a] ip_rcv+0x40c/0x485 [c0239bd0] netif_receive_skb+0x12f/0x165 [f885aa4c] e1000_clean_rx_irq+0x389/0x410 [e1000] [f885a1ca] e1000_clean+0x94/0x12f [e1000] [c0239d5a] net_rx_action+0x69/0xf0 [c011a305] __do_softirq+0x55/0xbd [c011a39a] do_softirq+0x2d/0x31 [c011a3f8] local_bh_enable+0x5a/0x65 [c024a0a1] rt_run_flush+0x5f/0x80 [c027623f] fn_hash_insert+0x352/0x39f [c027364c] inet_rtm_newroute+0x57/0x62 [c02413ed] rtnetlink_rcv_msg+0x1a8/0x1cb [c0241245] rtnetlink_rcv_msg+0x0/0x1cb [c0247c1e] netlink_rcv_skb+0x3a/0x8b [c0247cb1] netlink_run_queue+0x42/0xc3 [c0241245] rtnetlink_rcv_msg+0x0/0x1cb [c0241245] rtnetlink_rcv_msg+0x0/0x1cb [c0241227] rtnetlink_rcv+0x22/0x40 [c0241245] rtnetlink_rcv_msg+0x0/0x1cb [c024764c] netlink_data_ready+0x17/0x54 [c0246a99] netlink_sendskb+0x1f/0x39 [c0247449] netlink_sendmsg+0x27b/0x28c [c0231467] sock_sendmsg+0xce/0xe9 [c0112b36] __wake_up+0x27/0x3b [c01a6216] copy_to_user+0x38/0x42 [c01a625a] copy_from_user+0x3a/0x60 [c01a625a] copy_from_user+0x3a/0x60 [c0126be2] autoremove_wake_function+0x0/0x3a [c0236bcd] verify_iovec+0x49/0x7f [c02327f2] sys_sendmsg+0x152/0x1a8 [c0147a62] do_sync_read+0xb8/0xeb [c01a6216] copy_to_user+0x38/0x42 [c0126be2] autoremove_wake_function+0x0/0x3a [c0122b7a] getrusage+0x34/0x43 [c0168504] inotify_dentry_parent_queue_event+0x29/0x7c [c01a625a] copy_from_user+0x3a/0x60 [c0232b6b] sys_socketcall+0x167/0x180 [c0102433] sysenter_past_esp+0x54/0x75 BUG: soft lockup detected on CPU#0! Pid: 6506, comm:zebra EIP: 0060:[f8952052] CPU: 0 EIP is at u32_classify+0x52/0x170 [cls_u32] EFLAGS: 0206Not tainted (2.6.15.6) EAX: e2fbd020 EBX: f48649c0 ECX: 0010 EDX: 29b09d5a ESI: f48649ec EDI: 0001 EBP: e2fbd020 DS: 007b ES: 007b CR0: 8005003b CR2: 08154004 CR3: 361ad000 CR4: 06d0 [f88462fa] ipt_do_table+0x2de/0x2fd [ip_tables] [f883b523] ip_nat_fn+0x177/0x185 [iptable_nat] [f88e159f] ip_refrag+0x23/0x5f [ip_conntrack] [c0244d82] tc_classify+0x2c/0x3f [f895514b] htb_classify+0x14b/0x1dd [sch_htb] [f8955638] htb_enqueue+0x1d/0x13a [sch_htb] [c02396fd] dev_queue_xmit+0xe4/0x203 [c0250de8] ip_output+0x1e1/0x237 [c024f3f5] ip_forward+0x181/0x1df [c024e21a] ip_rcv+0x40c/0x485 [c0239bd0] netif_receive_skb+0x12f/0x165 [f885aa4c] e1000_clean_rx_irq+0x389/0x410 [e1000] [f885a1ca] e1000_clean+0x94/0x12f [e1000] [c0239d5a] net_rx_action+0x69/0xf0 [c011a305] __do_softirq+0x55/0xbd [c011a39a] do_softirq+0x2d/0x31 [c011a3f8] local_bh_enable+0x5a/0x65 [c024a0a1] rt_run_flush+0x5f/0x80 [c027623f] fn_hash_insert+0x352/0x39f [c027364c] inet_rtm_newroute+0x57/0x62 [c02413ed] rtnetlink_rcv_msg+0x1a8/0x1cb [c0241245] rtnetlink_rcv_msg+0x0/0x1cb [c0247c1e] netlink_rcv_skb+0x3a/0x8b [c0247cb1] netlink_run_queue+0x42/0xc3 [c0241245]
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
jamal wrote: - For further reflection: Have you considered the case where the rate table has already been considered on some link speed in user space and then somewhere post-config the physical link speed changes? This would happen in the case where ethernet AN is involved and the partner makes some changes (use ethtool). I would say the last bullet is a more interesting problem than a corner case of some link layer technology that has high overhead. Your work would be more interesting if it was generic for many link layers instead of just ATM. I've thought about this a couple of times, scaling the virtual clock rate should be enough for simple qdiscs like TBF or HTB, which have a linear relation between time and bandwidth. I haven't really thought about the effects on HFSC yet, on a small scale the relation is non-linear. But this is a different problem from trying to accomodate for link-layer overhead. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
jamal wrote: You are still speaking ATM (and the above may still be valid), but: Could you for example look at the netdevice-type and from that figure out the link layer overhead and compensate for it. Obviously a lot more useful if such activity is doable in user space without any knowledge of the kernel? and therefore zero change to the kernel and everything then becomes forward and backward compatible. It would be nice to have support for HFSC as well, which unfortunately needs to be done in the kernel since it doesn't use rate tables. What about qdiscs like SFQ (which uses the packet size in quantum calculations)? I guess it would make sense to use the wire-length there as well. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Harry Edmon [EMAIL PROTECTED] wrote: That did not help. I have 1 minute outputs from tcpdump under both 2.6.11.12 and 2.6.16.20. You will see a large size difference between the files. Since the 2.6.11.12 one is 2 MBytes, I thought I would post them via the web instead of via attachments. Look at: http://www.atmos.washington.edu/~harry/linux/2.6.11.12.out.1min http://www.atmos.washington.edu/~harry/linux/2.6.16.20.out.1min The latter shows that it took 40ms to generate an ACK. What does 'vmstat 1' show while this is happneing? -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] ieee80211: fix not allocating IV+ICV space when using encryption in ieee80211_tx_frame
We should preallocate IV+ICV space when encrypting the frame. Currently no problem shows up just because dev_alloc_skb aligns the data len to SMP_CACHE_BYTES which can be used for ICV. Thanks, Hong diff -urp a/net/ieee80211/ieee80211_tx.c b/net/ieee80211/ieee80211_tx.c --- a/net/ieee80211/ieee80211_tx.c 2006-06-20 09:36:13.0 +0800 +++ b/net/ieee80211/ieee80211_tx.c 2006-06-20 09:32:39.0 +0800 @@ -562,10 +562,12 @@ int ieee80211_tx_frame(struct ieee80211_ struct net_device_stats *stats = ieee-stats; struct sk_buff *skb_frag; int priority = -1; + int fraglen = total_len; + struct ieee80211_crypt_data *crypt = ieee-crypt[ieee-tx_keyidx]; spin_lock_irqsave(ieee-lock, flags); - if (encrypt_mpdu !ieee-sec.encrypt) + if (encrypt_mpdu (!ieee-sec.encrypt || !crypt)) encrypt_mpdu = 0; /* If there is no driver handler to take the TXB, dont' bother @@ -581,20 +583,25 @@ int ieee80211_tx_frame(struct ieee80211_ goto success; } - if (encrypt_mpdu) + if (encrypt_mpdu) { frame-frame_ctl |= cpu_to_le16(IEEE80211_FCTL_PROTECTED); + /* mpdu_prefix_len will be add to the headroom */ + fraglen += crypt-ops-extra_mpdu_postfix_len; + } /* When we allocate the TXB we allocate enough space for the reserve * and full fragment bytes (bytes_per_frag doesn't include prefix, * postfix, header, FCS, etc.) */ - txb = ieee80211_alloc_txb(1, total_len, ieee-tx_headroom, GFP_ATOMIC); + txb = ieee80211_alloc_txb(1, fraglen, ieee-tx_headroom + + crypt-ops-extra_mpdu_prefix_len, + GFP_ATOMIC); if (unlikely(!txb)) { printk(KERN_WARNING %s: Could not allocate TXB\n, ieee-dev-name); goto failed; } txb-encrypted = 0; - txb-payload_size = total_len; + txb-payload_size = fraglen; skb_frag = txb-fragments[0];
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Wed, Jun 14, 2006 at 11:40:04AM +0200, Jesper Dangaard Brouer wrote: The Linux traffic's control engine inaccurately calculates transmission times for packets sent over ADSL links. For some packet sizes the error rises to over 50%. This occurs because ADSL uses ATM as its link layer transport, and ATM transmits packets in fixed sized 53 byte cells. What if AAL5 is used? The cell-alignment math is going to be wrong there surely? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html