Re: [PATCH 3/6] [DCCP]: Bug-Fix - AWL was never updated
On Jan 28, 2008 11:16 PM, Gerrit Renker <[EMAIL PROTECTED]> wrote: > This patch was triggered by finding the following message in the syslog: > "kernel: dccp_check_seqno: DCCP: Step 6 failed for DATAACK packet, [...] >P.ackno exists or LAWL(82947089) <= P.ackno(82948208) > <= S.AWH(82948728), sending SYNC..." > > Note the difference between AWH and AWL: it is 1639 packets (while Sequence > Window was actually at 100). A closer look at the trace showed that > LAWL = AWL = 82947089 equalled the ISS on the Response. > > The cause of the bug was that AWL was only ever set on the first packet - the > DCCP-Request sent by dccp_v{4,6}_connect(). > > The fix is to continually update AWL/AWH with each new packet (as GSS=AWH). > > In addition, AWL/AWH are now updated to enforce more stringent checks on the > initial sequence numbers when connecting: > * AWL is initialised to ISS and remains at this value; > * AWH is always set to GSS (via dccp_update_gss()); > * so on the first Request: AWL = AWH = ISS, >and on the n-th Request: AWL = ISS, AWH = ISS+n. > > As a consequence, only Response packets that refer to Requests sent by this > host will pass, all others are discarded. This is the intention and in effect > implements the initial adjustments for AWL as specified in RFC 4340, 7.5.1. > > Note: A problem that remains is that ISS can potentially be under-run even > after > the initial handshake; this is addressed a subsequent patch. > > Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]> Yes I had seen this and had worked out that variables weren't being updated as they should be but hadn't got as far as a fix before I stopped my coding days so much :-( Acked-by: Ian McDonald <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
On 12/3/07, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote: > WARNING: After reading some messages from Ingo Molnar on lkml I think we > should really > trim the number of lists we use for kernel development. And since I > moved > back to using mutt for reading e-mails, something I should have > never, ever > stopped doing, I guess we should move the DCCP discussions to netdev, > where we hopefully can get more people interested and reviewing the > work we > do, so please consider moving DCCP discussion to > netdev@vger.kernel.org, > where lots of smart networking folks are present and can help our > efforts > on turning RFCs to code. > I (and others too) don't necessarily have time to read netdev so would vote on keeping dccp. I would totally agree to making sure that cross-post to netdev as well as dccp. Ian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Will RFC1146 (tcp alternative checksum options) be implemented in Linux tcp stack ?
On 10/16/07, Yanping Du <[EMAIL PROTECTED]> wrote: > Hi, > > We found the standard 16-bit tcp checksum is not > strong enough in some cases. Is there any roadmap on > implementing RFC1146 (tcp alternative checksum > options) in Linux tcp stack ? If yes, how soon will > that be in ? > > Please kindly copy reply to my email address as I've > not subscribed the netdev@ mailing list at present. > > http://www.faqs.org/rfcs/rfc1146.html > > Thanks! > -Yanping > > Yanping, The way that features get added to Linux is that someone interested writes it. You can't just say - is this on the roadmap, as there is no roadmap really! I have been interested in network features from an academic point of view and so I wrote what I needed (along with others) and that was added to the Linux kernel. So have a go at implementing it if you consider it important and come back here with some patches. Then others will help review it until the patches are good. I will let others comment on whether the checksums are a good idea or not. Ian -- Web1: http://wand.net.nz/~iam4/ Web2: http://www.jandi.co.nz Blog: http://iansblog.jandi.co.nz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
On 8/30/07, David Miller <[EMAIL PROTECTED]> wrote: > In fact this is a great example why we don't treat RFCs as dictations > from the gods. They are often wrong, impractical, or full of fatal > flaws. > Correct - they often have flaws in them, just like all documents. If that is the case we should try and get the RFCs fixed. I've raised this in a discussion in the ICCRG group and see if I get any sort of response. Ian -- Web1: http://wand.net.nz/~iam4/ Web2: http://www.jandi.co.nz Blog: http://iansblog.jandi.co.nz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
On 8/30/07, David Miller <[EMAIL PROTECTED]> wrote: > From: "Ian McDonald" <[EMAIL PROTECTED]> > Date: Thu, 30 Aug 2007 09:32:38 +1200 > > > So I'm suspecting that the default should be changed to 1000 to match > > the RFC which would solve this issue. I note that the RFC is a SHOULD > > rather than a MUST. I had a quick look around and not sure why Linux > > overrides the RFC on this one. > > Everyone uses this value, even BSD since ancient times. > > None of the research folks want to commit to saying a lower value is > OK, even though it's quite clear that on a local 10 gigabit link a > minimum value of even 200 is absolutely and positively absurd. > Understand what you are saying. That is why I questioned as 200 msecs makes no sense on a LAN with < 1 msec RTT. So if the current is ridiculous and 1000 is even more so, why do we use? Just because that is how TCP is written I'm guessing. I know that in DCCP CCID3 the RTO is 4 x RTT (from memory - it might be a slight variation) but we ended up putting a minimum on it as you also face a problem if it fires too frequently (i.e. link is in usecs). I might ask around on research lists and see why this issue has never been revisited. Now to the original issue - high RTT links. If that is an issue, and I believe it would be, then it's probably better to do this on a per route basis or similar, although then we're becoming a defacto X x rtt type setup. Rereading the RFC this actually doesn't seem prohibited and here is the code from DCCP CCID3 that we use: /* * Update timeout interval for the nofeedback timer. * We use a configuration option to increase the lower bound. * This can help avoid triggering the nofeedback timer too * often ('spinning') on LANs with small RTTs. */ hctx->ccid3hctx_t_rto = max_t(u32, 4 * hctx->ccid3hctx_rtt, CONFIG_IP_DCCP_CCID3_RTO * (USEC_PER_SEC/1000)); /* * Schedule no feedback timer to expire in * max(t_RTO, 2 * s/X) = max(t_RTO, 2 * t_ipi) */ t_nfb = max(hctx->ccid3hctx_t_rto, 2 * hctx->ccid3hctx_t_ipi); ccid3_pr_debug("%s(%p), Scheduled no feedback timer to " "expire in %lu jiffies (%luus)\n", dccp_role(sk), sk, usecs_to_jiffies(t_nfb), t_nfb); sk_reset_timer(sk, &hctx->ccid3hctx_no_feedback_timer, jiffies + usecs_to_jiffies(t_nfb)); Maybe the TCP code could do this also (with a sysctl to turn behaviour off and on) and then it would save system administrators having to "tune" the TCP stack if they want this sort of behaviour. Ian -- Web1: http://wand.net.nz/~iam4/ Web2: http://www.jandi.co.nz Blog: http://iansblog.jandi.co.nz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
On 8/30/07, Rick Jones <[EMAIL PROTECTED]> wrote: > Enable configuration of the minimum TCP Retransmission Timeout via > a new sysctl "tcp_rto_min" to help those who's networks (eg cellular) > have quite variable RTTs avoid spurrious RTOs. > > Signed-off-by: Rick Jones <[EMAIL PROTECTED]> > Signed-off-by: Lamont Jones <[EMAIL PROTECTED]> > --- > > diff -r 1559df81a153 Documentation/networking/ip-sysctl.txt > --- a/Documentation/networking/ip-sysctl.txtMon Aug 13 05:00:33 2007 + > +++ b/Documentation/networking/ip-sysctl.txtWed Aug 22 10:42:55 2007 -0700 > @@ -339,6 +339,13 @@ tcp_rmem - vector of 3 INTEGERs: min, de > selected receiver buffers for TCP socket. This value does not override > net.core.rmem_max, "static" selection via SO_RCVBUF does not use this. > Default: 87380*2 bytes. > + > +tcp_rto_min - INTEGER > + The minimum value for the TCP Retransmission Timeout, expressed > + in milliseconds for the convenience of the user. > + This is bounded at the low-end by TCP_RTO_MIN and by TCP_RTO_MAX at > + the high-end. > + Default: 200. > Hmmm... RFC2988 says: (2.4) Whenever RTO is computed, if it is less than 1 second then the RTO SHOULD be rounded up to 1 second. Traditionally, TCP implementations use coarse grain clocks to measure the RTT and trigger the RTO, which imposes a large minimum value on the RTO. Research suggests that a large minimum RTO is needed to keep TCP conservative and avoid spurious retransmissions [AP99]. Therefore, this specification requires a large minimum RTO as a conservative approach, while at the same time acknowledging that at some future point, research may show that a smaller minimum RTO is acceptable or superior. I went and had a look and this RFC has not been obsoleted. RFC3390 also backs this assertion up. So I'm suspecting that the default should be changed to 1000 to match the RFC which would solve this issue. I note that the RFC is a SHOULD rather than a MUST. I had a quick look around and not sure why Linux overrides the RFC on this one. Ian -- Web1: http://wand.net.nz/~iam4/ Web2: http://www.jandi.co.nz Blog: http://iansblog.jandi.co.nz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.22-rc5] TCP: Make TCP_RTO_MAX a variable
On 7/12/07, OBATA Noboru <[EMAIL PROTECTED]> wrote: > Ian McDonald wrote: > > On 6/26/07, OBATA Noboru <[EMAIL PROTECTED]> wrote: > > > >> From: OBATA Noboru <[EMAIL PROTECTED]> > >> > >> Make TCP_RTO_MAX a variable, and allow a user to change it via a > >> new sysctl entry /proc/sys/net/ipv4/tcp_rto_max. A user can > >> then guarantee TCP retransmission to be more controllable, say, > >> at least once per 10 seconds, by setting it to 10. This is > >> quite helpful on failover-capable network devices, such as an > >> active-backup bonding device. On such devices, it is desirable > >> that TCP retransmits a packet shortly after the failover, which > >> is what I would like to do with this patch. Please see > >> Background and Problem below for rationale in detail. > >> > > RFC2988 says this: > > (2.4) Whenever RTO is computed, if it is less than 1 second then the > > RTO SHOULD be rounded up to 1 second. > > > > Traditionally, TCP implementations use coarse grain clocks to > > measure the RTT and trigger the RTO, which imposes a large > > minimum value on the RTO. Research suggests that a large > > minimum RTO is needed to keep TCP conservative and avoid > > spurious retransmissions [AP99]. Therefore, this > > specification requires a large minimum RTO as a conservative > > approach, while at the same time acknowledging that at some > > future point, research may show that a smaller minimum RTO is > > acceptable or superior. > > > > (2.5) A maximum value MAY be placed on RTO provided it is at least 60 > > seconds. > > > > Your code doesn't seem to meet requirements of section 2.5 as your > > minimum is 1 second. > > (At the risk of having another Emily Litella moment entering a > discussion late...) > > I thought that those sorts of things were generally referring to the > _default_ setting? I believe so. And the requirement of section 2.5 is rather weak (it says "MAY"). It is weak in saying you don't have to have a maximum, but if you do have one IT IS AT LEAST 60 seconds (emphasis mine). So the time period is a strong requirement if you decide to implement - which is a weak requirement. Ian -- Web: http://wand.net.nz/~iam4/ Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] the overdue eepro100 removal
On 7/10/07, Bill Davidsen <[EMAIL PROTECTED]> wrote: If there were any benefit to removing a working driver I would at least be able to see it as a resources issue, but as far as I can see you just seem to have a personal preference for the e100 driver and want to force others to use it because you are so much better able to decide what users need than the system administrators. That's one of the reasons people choose open source, because they have a choice, and can use what's best for them. And be thankful it is open source. If Microsoft drops a driver in Vista you don't have a choice. If Linux drops a driver you can go and patch it back in if you feel that passionate about it. Unfortunately things change in life but at least you have the choice of being stuck with the old bit-rotting driver if you really want to. Ian -- Web: http://wand.net.nz/~iam4/ Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.22-rc5] TCP: Make TCP_RTO_MAX a variable
On 6/26/07, OBATA Noboru <[EMAIL PROTECTED]> wrote: From: OBATA Noboru <[EMAIL PROTECTED]> Make TCP_RTO_MAX a variable, and allow a user to change it via a new sysctl entry /proc/sys/net/ipv4/tcp_rto_max. A user can then guarantee TCP retransmission to be more controllable, say, at least once per 10 seconds, by setting it to 10. This is quite helpful on failover-capable network devices, such as an active-backup bonding device. On such devices, it is desirable that TCP retransmits a packet shortly after the failover, which is what I would like to do with this patch. Please see Background and Problem below for rationale in detail. RFC2988 says this: (2.4) Whenever RTO is computed, if it is less than 1 second then the RTO SHOULD be rounded up to 1 second. Traditionally, TCP implementations use coarse grain clocks to measure the RTT and trigger the RTO, which imposes a large minimum value on the RTO. Research suggests that a large minimum RTO is needed to keep TCP conservative and avoid spurious retransmissions [AP99]. Therefore, this specification requires a large minimum RTO as a conservative approach, while at the same time acknowledging that at some future point, research may show that a smaller minimum RTO is acceptable or superior. (2.5) A maximum value MAY be placed on RTO provided it is at least 60 seconds. Your code doesn't seem to meet requirements of section 2.5 as your minimum is 1 second. I think if you're trying to solve the bonding issue then you should solve that issue, not hack the TCP implementation as that opens it up to abuse in other ways. Ian -- Web: http://wand.net.nz/~iam4/ Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcp_probe: __attribute__ string location
On 6/6/07, David Miller <[EMAIL PROTECTED]> wrote: From: Randy Dunlap <[EMAIL PROTECTED]> Date: Tue, 5 Jun 2007 18:01:41 -0700 > From: Randy Dunlap <[EMAIL PROTECTED]> > > gcc doesn't like the location of the __attribute__ string here: > net/ipv4/tcp_probe.c:83: warning: empty declaration > > so move it to before the function and all is well. > > Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> Yeah I noticed this one too and a similar fix is in my net-2.6 GIT tree, but thanks anyways Randy. I'm wondering if either of you can actually load tcp_probe at present. We had reports on dccp mailing list that dccp_probe and tcp_probe can't load at present and produce a back trace. It appears related to the jprobe stuff according to Arnaldo. If the bug reporter or Arnaldo doesn't follow up on it I'll track it down a little more later and post to correct place. I'll also copy this change across to DCCP at sometime as well as several others that we haven't transferred across as well. Ian -- Web: http://wand.net.nz/~iam4/ Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] New driver API to speed up small packets xmits
On 5/11/07, Vlad Yasevich <[EMAIL PROTECTED]> wrote: The win might be biggest on a system were a lot of applications send a lot of small packets. Some number will aggregate in the prio queue and then get shoved into a driver in one go. That's assuming that the device doesn't run out of things to send first But... this is all conjecture until we see the code. Agree -- Web: http://wand.net.nz/~iam4/ Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] New driver API to speed up small packets xmits
On 5/11/07, Vlad Yasevich <[EMAIL PROTECTED]> wrote: >> May be for TCP? What about other protocols? > > There are other protocols?-) True, UDP, and I suppose certain modes of > SCTP might be sending streams of small packets, as might TCP with > TCP_NODELAY set. > > Do they often queue-up outside the driver? Not sure if DCCP might fall into this category as well... Yes DCCP definitely can queue outside the driver. I think the idea of this patch is gather some number of these small packets and shove them at the driver in one go instead of each small packet at a time. I might be helpful, but reserve judgment till I see more numbers. -vlad As I see this proposed patch it is about reducing the number of "task switches" between the driver and the protocol. I use task switch in speech marks as it isn't really as is in the kernel. So in other words we are hoping that spending more time in each area would keep the cache hot and work to be done if locks held. This of course requires that the complexity added doesn't outweigh the gains - otherwise you could end up in a worse scenario where the driver doesn't send packets because the protocol is busy linking them. As far as I can tell you're not combining packets?? This would definitely break UDP/DCCP which are datagram based. Will be interesting to see results but I'm a little sceptical. -- Web: http://wand.net.nz/~iam4/ Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][SOCK]: shrink struct sock
On 5/4/07, David Miller <[EMAIL PROTECTED]> wrote: sk_buff_head is due for being killed from the whole tree. Nobody really needs the qlen, few things really need the lock, and those that do can define their own as needed :-) I've got out of tree research code that uses the qlen quite significantly. However it's not high speed networking so can compute it myself if needed... -- Web: http://wand.net.nz/~iam4/ Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: {Spam?} Re: {Spam?} [PATCH] NET: Remove obsolete traffic shaper code.
On 4/15/07, Robert P. J. Day <[EMAIL PROTECTED]> wrote: in fact, according to this: http://lkml.org/lkml/2006/1/13/139 that notice was put in the feature removal file well over a year ago, during 2.6.15. so that would seem to be more than adequate time for everyone to prepare for it. but it must have been deleted from that file since then as well. Yes and that was never merged and so was resent on January 19th, 2006: http://www.nabble.com/-2.6-patch--schedule-SHAPER-for-removal-t949871.html At that point people debated about it being too short notice and the patch never went in. I therefore think we can't just remove with NO notice. Ian -- Web: http://wand.net.nz/~iam4/ Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: {Spam?} [PATCH] NET: Remove obsolete traffic shaper code.
On 4/15/07, Robert P. J. Day <[EMAIL PROTECTED]> wrote: Remove the obsolete code for the traffic shaper. Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]> Apart from the merits of removing this which I can't comment on, I thought the usual procedure was to place a removal in Documentation/feature-removal-schedule.txt to notify people of what is going to be removed. Then wait the period you determine there and then remove. Ian -- Web: http://wand.net.nz/~iam4/ Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/3] 2.6.21-rc6: known regressions
On 4/14/07, Linus Torvalds <[EMAIL PROTECTED]> wrote: Note: Ingo also reports what looks like a memory corruption due to the 6b6b6b6b pattern on presumably the same box. The 6b6b6b6b pattern is POISON_FREE, implying some kind of slab misuse, most likely a use-after-free, although possibly just due to overrunning a slab into the next one or something like that. What I'm leading up to is that I'm wondering if these mysterious network driver bugs aren't due to the network drivers themselves, but due to some higher-level problem. I think the hangs that Ingo sees with forcedeth were preceded by mysterious and "impossible" NULL pointer oopses. Ingo? Davem - have there been network infrastructure changes that migt be suspect? Jeff and/or Greg - anything in the generic network driver/device driver level? We had some trouble earlier with the transition to the driver core, and kref miscounting. Related? The last Oops Ingo saw was a module refcounting one, iirc. It does seem networking related somehow. Yeah, it could be obviously be a combination of independent bugs both in e1000/ and forcedeth drivers, but maybe there is something in common here... I don't know if this is a red herring or not but I reported on March 13th slab corruption and it looked like file_free_rcu - these are fairly recent changes I think (rcu)? Anyway original message is at http://lkml.org/lkml/2007/3/12/364 My apologies if this is not related. Ian -- Web: http://wand.net.nz/~iam4/ Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recent wireless breakage (ipw2200, iwconfig, NetworkManager)
On 3/5/07, Matt Mackall <[EMAIL PROTECTED]> wrote: > This is due to the recent sysfs restructuring I think. IIRC the fix is > to upgrade hal to a current git version. If that's the cause, the fix is to back out whatever was done to break userspace. Breaking userspace is not ok. Upgrading from 2.6.x to 2.6.x+1 should not entail replacing substantial parts of userspace, especially with NOT-EVEN-FRAKKING-RELEASED-YET CODE. I will try a new HAL when it shows up in Debian/unstable and not a moment sooner. But you're running a kernel that's not in Debian/unstable so this seems a bit hypocritical. When you work with bleeding edge kernels you have to be prepared to work around things. Hell for ages git wasn't in Debian - unstable even, udev would break things etc. Just my 2c worth. -- Web: http://wand.net.nz/~iam4 Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: all syscalls initially taking 4usec on a P4? Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?
On 2/21/07, bert hubert <[EMAIL PROTECTED]> wrote: I'm trying to figure out which processes have the most impact, I had already killed anything non-essential. But that still leaves 140 pids. Bert That sounds way too many pids. I run a script to shut down processes when I do testing as it makes a HUGE difference to my timing of things which can be quite critical. Here's my list of 46 and that includes me sshing into a box and checking for processes: UnIDPID PPID CMD root 1 0 init [2] root 2 1 [ksoftirqd/0] root 3 1 [watchdog/0] root 4 1 [events/0] root 5 1 [khelper] root 6 1 [kthread] root40 6 [kblockd/0] root41 6 [kacpid] root 110 6 [cqueue/0] root 111 6 [ata/0] root 112 6 [ata_aux] root 113 6 [kseriod] root 135 6 [rt-test-0] root 137 6 [rt-test-1] root 139 6 [rt-test-2] root 141 6 [rt-test-3] root 143 6 [rt-test-4] root 145 6 [rt-test-5] root 147 6 [rt-test-6] root 149 6 [rt-test-7] root 151 6 [pdflush] root 152 6 [pdflush] root 153 6 [kswapd0] root 154 6 [aio/0] root 838 6 [kedac] root 843 6 [kjournald] root 1720 6 [ksuspend_usbd] root 1721 6 [khubd] root 1741 6 [kpsmoused] root 2544 1 /sbin/syslogd root 2554 1 /sbin/klogd -x root 2851 1 /usr/sbin/inetd root 2863 1 /usr/sbin/sshd ntp 2954 1 /usr/sbin/ntpd -p /var/run/ntpd.pid -u 111:111 -g root 3061 1 /bin/login -- root 3062 1 /sbin/getty 38400 tty2 root 3063 1 /sbin/getty 38400 tty3 root 3064 1 /sbin/getty 38400 tty4 root 3065 1 /sbin/getty 38400 tty5 root 3066 1 /sbin/getty 38400 tty6 ian 3083 3061 -bash root 21518 2863 sshd: ian [priv] ian 21520 21518 sshd: [EMAIL PROTECTED]/1 ian 21521 21520 -bash ian 21747 21521 ps -ef -- Web: http://wand.net.nz/~iam4 Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/3] tcp: remove experimental variants from default list
On 2/13/07, David Miller <[EMAIL PROTECTED]> wrote: This is not the internet of 15 years ago, please wake up everyone. We cannot sit on eggs for 5 years to make sure they hatch perfectly like was previously possible. OK. I get the point. I am more conservative by nature and more of an academic. Now there's been some explanation I'm happier for the change to go ahead. I just think changes like this that affect the Internet should be discussed a little. Now it's been discussed a little I feel better. Ian -- Web: http://wand.net.nz/~iam4 Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/3] tcp: remove experimental variants from default list
On 2/13/07, Baruch Even <[EMAIL PROTECTED]> wrote: * Stephen Hemminger <[EMAIL PROTECTED]> [070212 18:04]: > The TCP Vegas implementation is buggy, and BIC is too agressive > so they should not be in the default list. Westwood is okay, but > not well tested. Since no one really agrees on the relative merits and problems of the different algorithms and since the users themselves dont know, dont care and have no clue on what should be the correct behaviour to report bugs (see the old bic bugs, the htcp bugs, the recent sack bugs) I would suggest to avoid making the whole internet a guinea pig and get back to reno. If someone really needs to push high BDP flows he should test it himself and choose what works for his kernel at the time. For myself and anyone who asks me I recommend to set the default to reno. For the few who really need high speed flows, they should test kernel and protocol combination. Baruch I agree wholeheartedly with Baruch. If we are going to remove BIC as default we should go back to Reno as Cubic is even less tested in production use than BIC. Unless of course the papers you saw at PFLDNET showed that Cubic was a really good choice and you want to point us to those papers. Ian -- Web: http://wand.net.nz/~iam4 Blog: http://iansblog.jandi.co.nz WAND Network Research Group - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Status of kernel.org servers??
I've searched lkml archives but can't find anything there apart from one person complaining. Can anybody basically tell me how to get access to git trees in a way that works at present? I've tried git://git.kernel.org, git://git2.kernel.org, http://master.kernel.org, http://kernel.org all without success. Can anybody point to whats going on as well at present and a timeline/plan to resolve? Thanks, Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Check if user has CAP_NET_ADMIN to change congestion control algorithm
On 10/27/06, David Miller <[EMAIL PROTECTED]> wrote: From: "Ian McDonald" <[EMAIL PROTECTED]> Date: Fri, 27 Oct 2006 12:59:30 +1300 > I don't agree with this at all. I would love Firefox, BitTorrent etc > to implement usage of TCP-LP for example so they use "unused" > bandwidth only. > > With this change applications can't do this. > > If we are going to restrict by capabilities then I think we should > only restrict module loading - this way the admin of the box can > decide what algorithms can be used. You are using an example of a (supposedly) safe case of this as a justification for allowing all cases. It is bad, very bad, to allow arbitrary users to select arbitrary congestion control algorithms. It is just as bad as allowing them to disable congestion control completely if that were an option. OK understand your point here but I think low priority TCP has its use. Don't agree it is just as bad, but it is bad under the wrong circumstances - it's still better than UDP which has no congestion control... Don't want to make it over complicated though. I think the most sense would be to restrict it as shown as tcp-lp is the exception and allow tcp-lp via another mechanism. That is a situation where the user could specify how low priority they want the traffic to be... If I ever get enough time I'll have a go at it but can't see it this year :-( It actually makes more sense to tie the congestion control algorithm to the route/destination IP if we are going to change it but that is a whole another exercise in itself. If someone, for example, builds all the algorithms statically into their kernel, for testing as root, this lets all users on the machine do the same which is not right. This is the state at present as I understand it. However that doesn't make it right. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Check if user has CAP_NET_ADMIN to change congestion control algorithm
On 10/27/06, Hagen Paul Pfeifer <[EMAIL PROTECTED]> wrote: Check if user has CAP_NET_ADMIN capability to change congestion control algorithm. Under normal circumstances a application programmer doesn't have enough information to choose the "right" algorithm (expect he is the pchar/pathchar maintainer). At 99.9% only the local host administrator has the knowledge to select a proper standard, system-wide algorithm (the remaining 0.1% are for testing purpose). If we let the user select an alternative algorithm we introduce one potential weak spot - so we ban this eventuality. I don't agree with this at all. I would love Firefox, BitTorrent etc to implement usage of TCP-LP for example so they use "unused" bandwidth only. With this change applications can't do this. If we are going to restrict by capabilities then I think we should only restrict module loading - this way the admin of the box can decide what algorithms can be used. Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getaddrinfo - should accept IPPROTO_SCTP no?
On 10/14/06, Rick Jones <[EMAIL PROTECTED]> wrote: I made some recent changes to netperf to workaround what is IMO a bug in the Solaris getaddrinfo() where it will clear the ai_protocol field even when one gives it a protocol in the hints. [If you happen to be trying to use the test-specific -D to set TCP_NODELAY in netperf on Solaris, you might want to grab netperf TOT to get this workaround as it relates to issues with setting TCP_NODELAY - modulo what it will do to being able to run the netperf SCTP tests on Linux...] In the process though I have stumbled across what appears to be a bug (?) in "Linux" getaddrinfo() - returning a -7 EAI_SOCKTYPE if given as hints SOCK_STREAM and IPPROTO_SCTP - this on a system that ostensibly supports SCTP. I've seen this on RHAS4U4 as well as another less well known distro. I'm about to see about concocting an additional workaround in netperf for this, but thought I'd ask if my assumption - that getaddrinfo() returning -7 when given IPPROTO_SCTP - is indeed a bug in getaddrinfo(). Or am I just woefully behind in patches or completely offbase on what is correct behaviour for getaddrinfo and hints? FWIW, which may not be much, Solaris 10 06/06 seems content to accept IPPROTO_SCTP in the hints. thanks, rick jones http://www.netperf.org/svn/netperf2/trunk/ In all the DCCP code which has similar issues I just do the protocol selection on the socket call e.g. case TCP: new_sock = socket(AF_INET,SOCK_STREAM,0); break; case DCCP: new_sock = socket(AF_INET,SOCK_DCCP,IPPROTO_DCCP); break; case UDP: new_sock = socket(AF_INET,SOCK_DGRAM,0); break; I'm sure you know all this anyway so apologies in advance for telling you something you probably already know! We need to come up with a way to select service codes etc for DCCP which is another parameter needed for a DCCP socket when getaddrinfo is tidied up. Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bcm43xx-softmac: fix warning from ignoring returned value from pci_enable_device
On 9/28/06, Larry Finger <[EMAIL PROTECTED]> wrote: Linus's tree now has a configuration option that prints a warning whenever the returned value of any routine is ignored. This patch fixes the only such warning for bcm43xx. Can you tell me how to make this check please so I can check my code in the kernel? I could look it up but obviously you can tell me quickly :-) Regards, Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [e2e] performance of BIC-TCP, High-Speed-TCP, H-TCP etc
I wasn't aware of the planned move to cubic in Linux. Can I ask the rationale for this ? Cubic is, of course, closely related to HTCP (borrowing the HTCP idea of using elapsed time since last backoff as the quantity used to adjust the cwnd increase rate) which *is* tested in the reported study. I'd be more than happy to run tests on cubic and I reckon we should do this sooner rather than later now that you have flagged up plans to rollout cubic. As I understand it, it is because Cubic is better than bic for differing rtts and bic is the current default. Stephen might like to add to it. More tests are always good! Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [e2e] performance of BIC-TCP, High-Speed-TCP, H-TCP etc
On 9/23/06, Douglas Leith <[EMAIL PROTECTED]> wrote: For those interested in TCP for high-speed environments, and perhaps also people interested in TCP evaluation generally, I'd like to point you towards the results of a detailed experimental study which are now available at: http://www.hamilton.ie/net/eval/ToNfinal.pdf This study consistently compares Scalable-TCP, HS-TCP, BIC-TCP, FAST-TCP and H-TCP performance under a wide range of conditions including with mixes of long and short-lived flows. This study has now been subject to peer review (to hopefully give it some legitimacy) and is due to appear in the Transactions on Networking. The conclusions (see summary below) seem especially topical as BIC-TCP is currently widely deployed as the default algorithm in Linux. Comments appreciated. Our measurements are publicly available - on the web or drop me a line if you'd like a copy. Summary: In this paper we present experimental results evaluating the performance of the Scalable-TCP, HS-TCP, BIC-TCP, FAST-TCP and H-TCP proposals in a series of benchmark tests. We find that many recent proposals perform surprisingly poorly in even the most simple test, namely achieving fairness between two competing flows in a dumbbell topology with the same round-trip times and shared bottleneck link. Specifically, both Scalable-TCP and FAST TCP exhibit very substantial unfairness in this test. We also find that Scalable-TCP, HS-TCP and BIC-TCP induce significantly greater RTT unfairness between competing flows with different round-trip times. The unfairness can be an order of magnitude greater than that with standard TCP and is such that flows with longer round-trip times can be completely starved of bandwidth. While the TCP proposals studied are all successful at improving the link utilisation in a relatively static environment with long-lived flows, in our tests many of the proposals exhibit poor responsiveness to changing network conditions. We observe that Scalable-TCP, HS-TCP and BIC-TCP can all suffer from extremely slow (>100s) convergence times following the startup of a new flow. We also observe that while FAST-TCP flows typically converge quickly initially, flows may later diverge again to create significant and sustained unfairness. --Doug Hamilton Institute www.hamilton.ie Interesting reading and I am replying to netdev@vger.kernel.org as well. I will read in more detail later but my first questions/comments are: - have you tested CUBIC subsequently as this is meant to fix many of the rtt issues? This is becoming the default in 2.6.19 probably. - have you tested subsequently on more recent kernels than 2.6.6? Looks like some very useful information. Regards, Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/dccp: Allow default/fallback service code
Gerrit, Not sure what happened here but I can't apply this with git-apply. Can you check and resubmit. Looks great patch though and would love to test! This would mean DCCP is easier to port to which must be good. Just a quick note - you didn't updated last changed date in Documentation/networking/dccp.txt. I think rather than updating you can remove as people can find dates by looking at git history. Ian On 9/12/06, Gerrit Renker <[EMAIL PROTECTED]> wrote: [DCCP]: Allow default/fallback service code. This has been discussed on [EMAIL PROTECTED] and removes the necessity for applications to supply service codes in each and every case. If an application does not want to provide a service code, that's fine, it will be given 0. Otherwise, service codes can be set via socket options as before. This patch has been tested using various client/server configurations (including listening on multiple service codes) and patches against Torvalds' tree. Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]> -- Documentation/networking/dccp.txt |7 +-- include/linux/dccp.h |6 +- net/dccp/ipv4.c |3 --- net/dccp/proto.c | 11 +-- 4 files changed, 7 insertions(+), 20 deletions(-) diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index c45daab..2f479af 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt @@ -42,8 +42,11 @@ Socket options DCCP_SOCKOPT_PACKET_SIZE is used for CCID3 to set default packet size for calculations. -DCCP_SOCKOPT_SERVICE sets the service. This is compulsory as per the -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.17][Trivial] net/dccp: update references to standards
Arnaldo - this looks good. Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> On 9/15/06, Gerrit Renker <[EMAIL PROTECTED]> wrote: Sorry kmail garbled this, clean text below. - Gerrit -- diff --git a/net/dccp/Kconfig b/net/dccp/Kconfig index 859e335..2c345c0 100644 --- a/net/dccp/Kconfig +++ b/net/dccp/Kconfig @@ -4,9 +4,9 @@ menu "DCCP Configuration (EXPERIMENTAL)" config IP_DCCP tristate "The DCCP Protocol (EXPERIMENTAL)" -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] [DCCP]: Shift constants into header
This shifts some constants from ccid3.c to ccid3.h This is not needed for in tree code (yet) but for my own work. Makes sense to have constants in header though. Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 67d2dc0..7b4699a 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -75,14 +75,6 @@ static struct dccp_tx_hist *ccid3_tx_his static struct dccp_rx_hist *ccid3_rx_hist; static struct dccp_li_hist *ccid3_li_hist; -/* TFRC sender states */ -enum ccid3_hc_tx_states { - TFRC_SSTATE_NO_SENT = 1, - TFRC_SSTATE_NO_FBACK, - TFRC_SSTATE_FBACK, - TFRC_SSTATE_TERM, -}; - #ifdef CCID3_DEBUG static const char *ccid3_tx_state_name(enum ccid3_hc_tx_states state) { diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h index 0a2cb75..df4ff13 100644 --- a/net/dccp/ccids/ccid3.h +++ b/net/dccp/ccids/ccid3.h @@ -65,6 +65,14 @@ enum ccid3_options { TFRC_OPT_RECEIVE_RATE= 194, }; +/* TFRC sender states */ +enum ccid3_hc_tx_states { + TFRC_SSTATE_NO_SENT = 1, + TFRC_SSTATE_NO_FBACK, + TFRC_SSTATE_FBACK, + TFRC_SSTATE_TERM, +}; + struct ccid3_options_received { u64 ccid3or_seqno:48, ccid3or_loss_intervals_idx:16; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/7] [DCCP]: Introduce two new socket options
This creates two new socket options DCCP_SOCKOPT_TX_PACKET_SIZE and DCCP_SOCKOPT_RX_PACKET_SIZE. DCCP_SOCKOPT_PACKET_SIZE doesn't work and packet size should be set independently on two half connections. Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/include/linux/dccp.h b/include/linux/dccp.h index a073164..ef1c57b 100644 --- a/include/linux/dccp.h +++ b/include/linux/dccp.h @@ -200,6 +200,8 @@ #define DCCP_SOCKOPT_PACKET_SIZE1 #define DCCP_SOCKOPT_SERVICE 2 #define DCCP_SOCKOPT_CHANGE_L 3 #define DCCP_SOCKOPT_CHANGE_R 4 +#define DCCP_SOCKOPT_TX_PACKET_SIZE5 +#define DCCP_SOCKOPT_RX_PACKET_SIZE6 #define DCCP_SOCKOPT_CCID_RX_INFO 128 #define DCCP_SOCKOPT_CCID_TX_INFO 192 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] [DCCP]: Introduce constants for CCID numbers
This change introduces a constant for CCID numbers. Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/include/linux/dccp.h b/include/linux/dccp.h index 2d7671c..a073164 100644 --- a/include/linux/dccp.h +++ b/include/linux/dccp.h @@ -169,6 +169,12 @@ enum { DCCPO_MAX_CCID_SPECIFIC = 255, }; +/* DCCP CCIDS */ +enum { + DCCPC_CCID2 = 2, + DCCPC_CCID3 = 3, +}; + /* DCCP features */ enum { DCCPF_RESERVED = 0, @@ -320,7 +326,7 @@ static inline unsigned int dccp_hdr_len( /* initial values for each feature */ #define DCCPF_INITIAL_SEQUENCE_WINDOW 100 #define DCCPF_INITIAL_ACK_RATIO2 -#define DCCPF_INITIAL_CCID 2 +#define DCCPF_INITIAL_CCID DCCPC_CCID2 #define DCCPF_INITIAL_SEND_ACK_VECTOR 1 /* FIXME: for now we're default to 1 but it should really be 0 */ #define DCCPF_INITIAL_SEND_NDP_COUNT 1 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] [DCCP]: Introduce dccp_probe
This adds DCCP probing shamelessly ripped off from TCP probes by Stephen Hemminger. I've put in here support for further CCID3 variables as well. Andrea/Arnaldo might look to extend for CCID2. Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/net/dccp/Kconfig b/net/dccp/Kconfig index 859e335..e2a095d 100644 --- a/net/dccp/Kconfig +++ b/net/dccp/Kconfig @@ -40,6 +40,22 @@ config IP_DCCP_DEBUG Just say N. +config NET_DCCPPROBE + tristate "DCCP connection probing" + depends on PROC_FS && KPROBES + ---help--- + This module allows for capturing the changes to DCCP connection + state in response to incoming packets. It is used for debugging + DCCP congestion avoidance modules. If you don't understand + what was just said, you don't need it: say N. + + Documentation on how to use the packet generator can be found + at http://linux-net.osdl.org/index.php/DccpProbe + + To compile this code as a module, choose M here: the + module will be called dccp_probe. + + endmenu endmenu diff --git a/net/dccp/Makefile b/net/dccp/Makefile index 7696e21..47b1371 100644 --- a/net/dccp/Makefile +++ b/net/dccp/Makefile @@ -11,6 +11,7 @@ dccp_ipv4-y := ipv4.o dccp-$(CONFIG_IP_DCCP_ACKVEC) += ackvec.o obj-$(CONFIG_INET_DCCP_DIAG) += dccp_diag.o +obj-$(CONFIG_NET_DCCPPROBE) += dccp_probe.o dccp-$(CONFIG_SYSCTL) += sysctl.o diff --git a/net/dccp/dccp_probe.c b/net/dccp/dccp_probe.c new file mode 100644 index 000..4b65aad --- /dev/null +++ b/net/dccp/dccp_probe.c @@ -0,0 +1,197 @@ +/* + * dccpprobe - Observe the DCCP flow with kprobes. + * + * The idea for this came from Werner Almesberger's umlsim + * Copyright (C) 2004, Stephen Hemminger <[EMAIL PROTECTED]> + * + * Modified for DCCP from Stephen Hemminger's code + * Copyright (C) 2006, Ian McDonald <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "dccp.h" +#include "ccid.h" +#include "ccids/ccid3.h" + +MODULE_AUTHOR("Ian McDonald <[EMAIL PROTECTED]>"); +MODULE_DESCRIPTION("DCCP snooper"); +MODULE_LICENSE("GPL"); + +static int port = 0; +MODULE_PARM_DESC(port, "Port to match (0=all)"); +module_param(port, int, 0); + +static int bufsize = 64*1024; +MODULE_PARM_DESC(bufsize, "Log buffer size (default 64k)"); +module_param(bufsize, int, 0); + +static const char procname[] = "dccpprobe"; + +struct { + struct kfifo *fifo; + spinlock_tlock; + wait_queue_head_t wait; + struct timeval tstart; +} dccpw; + +static void printl(const char *fmt, ...) +{ + va_list args; + int len; + struct timeval now; + char tbuf[256]; + + va_start(args, fmt); + do_gettimeofday(&now); + + now.tv_sec -= dccpw.tstart.tv_sec; + now.tv_usec -= dccpw.tstart.tv_usec; + if (now.tv_usec < 0) { + --now.tv_sec; + now.tv_usec += 100; + } + + len = sprintf(tbuf, "%lu.%06lu ", + (unsigned long) now.tv_sec, + (unsigned long) now.tv_usec); + len += vscnprintf(tbuf+len, sizeof(tbuf)-len, fmt, args); + va_end(args); + + kfifo_put(dccpw.fifo, tbuf, len); + wake_up(&dccpw.wait); +} + +static int jdccp_sendmsg(struct kiocb *iocb, struct sock *sk, + struct msghdr *msg, size_t size) +{ + const struct dccp_minisock *dmsk = dccp_msk(sk); + const struct inet_sock *inet = inet_sk(sk); + struct ccid3_hc_tx_sock *hctx; + + if (dmsk->dccpms_tx_ccid == DCCPC_CCID3) + hctx = ccid3_hc_tx_sk(sk); + else + hctx = NULL; + + if (port == 0 || ntohs(inet->dport) == port || + ntohs(inet->sport) == port) { + if (hctx) + printl("%d.%d.%d.%d:%u %d.%d.%d.%d:%u %d %d %d %d %d\n", + NIPQUAD(inet->saddr), ntohs(inet->sport), + NIPQUAD(inet->daddr), ntohs(inet->dport), size, +
[PATCH 0/7] [DCCP]: Further fixes and enhancements
Here is my latest set of patches for DCCP. If possible I would like these to go into 2.6.19. I have tested against 2.6.18rc5 and latest net-2.6.19 git tree of Dave M as well. Dave - Patches 1 and 2 are trivial and just introducing constants and using them. Patch 4 is shifting some code into a header. If patch 3 could be merged also that would be great - it is just about the same as Stephen Hemminger's TCP Probe code but instead for DCCP. Patch 3 depends on patch 1. I think it would be good for Arnaldo or another person to sign off on 5, 6 and 7 after a bit more of a look. These fix up packet size setting for CCID3 and also change them to work on half connections as you may have different packet sizes for each. I've tested myself thoroughly but others might have an opinion on the style of these. These three patches need to be applied in order. Ian - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] [DCCP]: Remove socket option
This removes DCCP_SOCKOPT_PACKET_SIZE for two reasons: * the current code doesn't work * tx and rx should be different (introduced in former patch) Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/include/linux/dccp.h b/include/linux/dccp.h index ef1c57b..18fbbb4 100644 --- a/include/linux/dccp.h +++ b/include/linux/dccp.h @@ -196,7 +196,6 @@ struct dccp_so_feat { }; /* DCCP socket options */ -#define DCCP_SOCKOPT_PACKET_SIZE 1 #define DCCP_SOCKOPT_SERVICE 2 #define DCCP_SOCKOPT_CHANGE_L 3 #define DCCP_SOCKOPT_CHANGE_R 4 @@ -465,7 +464,6 @@ struct dccp_sock { struct dccp_service_list*dccps_service_list; struct timeval dccps_timestamp_time; __u32 dccps_timestamp_echo; - __u32 dccps_packet_size; __u16 dccps_l_ack_ratio; __u16 dccps_r_ack_ratio; unsigned long dccps_ndp_count; diff --git a/net/dccp/proto.c b/net/dccp/proto.c index c8f7d5a..c8c884e 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -458,7 +458,6 @@ out_free_val: static int do_dccp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { - struct dccp_sock *dp = dccp_sk(sk); struct dccp_minisock *dmsk = dccp_msk(sk); struct ccid3_hc_tx_sock *hctx; struct ccid3_hc_rx_sock *hcrx; @@ -478,10 +477,6 @@ static int do_dccp_setsockopt(struct soc err = 0; switch (optname) { - case DCCP_SOCKOPT_PACKET_SIZE: - dp->dccps_packet_size = val; - break; - case DCCP_SOCKOPT_CHANGE_L: if (optlen != sizeof(struct dccp_so_feat)) err = -EINVAL; @@ -605,10 +600,6 @@ static int do_dccp_getsockopt(struct soc return -EINVAL; switch (optname) { - case DCCP_SOCKOPT_PACKET_SIZE: - val = dp->dccps_packet_size; - len = sizeof(dp->dccps_packet_size); - break; case DCCP_SOCKOPT_TX_PACKET_SIZE: if (dmsk->dccpms_tx_ccid != DCCPC_CCID3) return -EINVAL; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] [DCCP]: Use constants for CCIDs
With constants for CCID numbers this now uses them in some places. Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/net/dccp/ccids/ccid2.c b/net/dccp/ccids/ccid2.c index 457dd3d..2efb505 100644 --- a/net/dccp/ccids/ccid2.c +++ b/net/dccp/ccids/ccid2.c @@ -808,7 +808,7 @@ static void ccid2_hc_rx_packet_recv(stru } static struct ccid_operations ccid2 = { - .ccid_id= 2, + .ccid_id= DCCPC_CCID2, .ccid_name = "ccid2", .ccid_owner = THIS_MODULE, .ccid_hc_tx_obj_size= sizeof(struct ccid2_hc_tx_sock), diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 195aa95..67d2dc0 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -1240,7 +1240,7 @@ static int ccid3_hc_tx_getsockopt(struct } static struct ccid_operations ccid3 = { - .ccid_id = 3, + .ccid_id = DCCPC_CCID3, .ccid_name = "ccid3", .ccid_owner= THIS_MODULE, .ccid_hc_tx_obj_size = sizeof(struct ccid3_hc_tx_sock), - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] [DCCP]: Fix setting of packet size in CCID3
Set initial packet size to defaults as existing code doesn't work as set_sockopt occurs after initialisation so dccps_packet_size is of no use really. Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 7b4699a..e6c8e4c 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -642,15 +642,9 @@ static int ccid3_hc_tx_parse_options(str static int ccid3_hc_tx_init(struct ccid *ccid, struct sock *sk) { - struct dccp_sock *dp = dccp_sk(sk); struct ccid3_hc_tx_sock *hctx = ccid_priv(ccid); - if (dp->dccps_packet_size >= TFRC_MIN_PACKET_SIZE && - dp->dccps_packet_size <= TFRC_MAX_PACKET_SIZE) - hctx->ccid3hctx_s = dp->dccps_packet_size; - else - hctx->ccid3hctx_s = TFRC_STD_PACKET_SIZE; - + hctx->ccid3hctx_s = TFRC_STD_PACKET_SIZE; /* Set transmission rate to 1 packet per second */ hctx->ccid3hctx_x = hctx->ccid3hctx_s; hctx->ccid3hctx_t_rto = USEC_PER_SEC; @@ -1113,17 +1107,11 @@ static void ccid3_hc_rx_packet_recv(stru static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) { - struct dccp_sock *dp = dccp_sk(sk); struct ccid3_hc_rx_sock *hcrx = ccid_priv(ccid); ccid3_pr_debug("%s, sk=%p\n", dccp_role(sk), sk); - if (dp->dccps_packet_size >= TFRC_MIN_PACKET_SIZE && - dp->dccps_packet_size <= TFRC_MAX_PACKET_SIZE) - hcrx->ccid3hcrx_s = dp->dccps_packet_size; - else - hcrx->ccid3hcrx_s = TFRC_STD_PACKET_SIZE; - + hcrx->ccid3hcrx_s = TFRC_STD_PACKET_SIZE; hcrx->ccid3hcrx_state = TFRC_RSTATE_NO_DATA; INIT_LIST_HEAD(&hcrx->ccid3hcrx_hist); INIT_LIST_HEAD(&hcrx->ccid3hcrx_li_hist); diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 962df0e..c8f7d5a 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -35,6 +35,7 @@ #include #include "ccid.h" #include "dccp.h" #include "feat.h" +#include "ccids/ccid3.h" DEFINE_SNMP_STAT(struct dccp_mib, dccp_statistics) __read_mostly; @@ -457,7 +458,10 @@ out_free_val: static int do_dccp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { - struct dccp_sock *dp; + struct dccp_sock *dp = dccp_sk(sk); + struct dccp_minisock *dmsk = dccp_msk(sk); + struct ccid3_hc_tx_sock *hctx; + struct ccid3_hc_rx_sock *hcrx; int err; int val; @@ -471,7 +475,6 @@ static int do_dccp_setsockopt(struct soc return dccp_setsockopt_service(sk, val, optval, optlen); lock_sock(sk); - dp = dccp_sk(sk); err = 0; switch (optname) { @@ -497,6 +500,30 @@ static int do_dccp_setsockopt(struct soc optval); break; + case DCCP_SOCKOPT_TX_PACKET_SIZE: + if (dmsk->dccpms_tx_ccid != DCCPC_CCID3) + err = -EINVAL; + else + if (val >= TFRC_MIN_PACKET_SIZE && + val <= TFRC_MAX_PACKET_SIZE) { + hctx = ccid3_hc_tx_sk(sk); + hctx->ccid3hctx_s = val; + } else + err = -EINVAL; + break; + + case DCCP_SOCKOPT_RX_PACKET_SIZE: + if (dmsk->dccpms_rx_ccid != DCCPC_CCID3) + err = -EINVAL; + else + if (val >= TFRC_MIN_PACKET_SIZE && + val <= TFRC_MAX_PACKET_SIZE) { + hcrx = ccid3_hc_rx_sk(sk); + hcrx->ccid3hcrx_s = val; + } else + err = -EINVAL; + break; + default: err = -ENOPROTOOPT; break; @@ -565,7 +592,10 @@ out: static int do_dccp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) { - struct dccp_sock *dp; + struct dccp_sock *dp = dccp_sk(sk); + struct dccp_minisock *dmsk = dccp_msk(sk); + struct ccid3_hc_tx_sock *hctx; + struct ccid3_hc_rx_sock *hcrx; int val, len; if (get_user(len, optlen)) @@ -574,13 +604,25 @@ static int do_dccp_getsockopt(struct soc if (len < sizeof(int)) return -EINVAL; - dp = dccp_sk(sk); - switch (optname) { case DCCP_SOCKOPT_PACKET_SIZE: val = dp->dccps_packet_size; len = sizeof(dp->dccps_packet_size); break; + case DCCP_SOCKOPT_TX_PACKET_SIZE: + if (dmsk->dccpms_t
Re: UDP Out 0f Sequence
On 9/21/06, Majumder, Rajib <[EMAIL PROTECTED]> wrote: Does this mean if we have 2 hosts connected back to back (there's no network device in between), sequence is guaranteed even in UDP? I think if you're trying to make the packets appear in order you need to untie the Gordian knot http://en.wikipedia.org/wiki/Gordian_Knot In other words you should fix the application rather than the near impossible task of trying to make the packets in order... Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcp: set congestion default through Kconfig
Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/ipv4/Kconfig | 39 +-- net/ipv4/sysctl_net_ipv4.c |7 +++ net/ipv4/tcp_cong.c|2 +- 3 files changed, 45 insertions(+), 3 deletions(-) Nice solution. Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP Pacing
On 9/13/06, Daniele Lacamera <[EMAIL PROTECTED]> wrote: On Tuesday 12 September 2006 23:26, Ian McDonald wrote: > Where is the published research? If you are going to mention research > you need URLs to papers and please put this in source code too so > people can check. I added the main reference to the code. I am going to give you all the pointers on this research, mainly recent congestion control proposals that include pacing. Thanks > I agree with Arnaldo's comments and also would add I don't like having > to select 1000 as HZ unit. Something is wrong if you need this as I > can run higher resolution timers without having to do this I removed that select in Kconfig, I agree it doesn't make sense at all, for portability. However, pacing works with 1ms resolution, so maybe a "depends HZ_1000" is still required. (How do you run 1ms timers with HZ!=1000?) The HZ refers to time slices per second mostly for user space - e.g. how often to task switch. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP Pacing
On 9/13/06, Daniele Lacamera <[EMAIL PROTECTED]> wrote: Hello, Please let me insist once again on the importance of adding a TCP Pacing mechanism in our TCP, as many people are including this algorithm in their congestion control proposals. Recent researches have found out that it really can help improving performance in different scenarios, like satellites and long-delay high-speed channels (>100ms RTT, Gbit). Hybla module itself is cripple without this feature in its natural scenario. Where is the published research? If you are going to mention research you need URLs to papers and please put this in source code too so people can check. The following patch is totally non-invasive: it has a config option and a sysctl switch, both turned off by default. When the config option is enabled, it adds only 6B to the tcp_sock. I agree with Arnaldo's comments and also would add I don't like having to select 1000 as HZ unit. Something is wrong if you need this as I can run higher resolution timers without having to do this Haven't reviewed the rest of the code or tested. Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: i not find in the kernel code the code of this command
On 9/2/06, Franco <[EMAIL PROTECTED]> wrote: thanks for your response! Yes, The code is under net/sched in the source tree. The file act_police.c in the directoy net/sched don't exist. there is police.c that have a very similar code act_police.c (that i have found on internet) Go to http://kernel.org and download a recent kernel. if i want create a proc file in police.c, therefore modificate the kernel, i must install on my pc another version on linux. It is exact? As I said above get a newer version. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: i not find in the kernel code the code of this command
On 9/2/06, Franco <[EMAIL PROTECTED]> wrote: I thought that this code was police.c but seem that it isn't i must implement a proc file in the code and recompiling the kernel. I'm not sure I understand your question. Please tell me if I answer wrong! The code is under net/sched in the source tree. The main file is act_police.c but it is in use elsewhere as well. grep for POLICE. To build the code you need to alter your kernel options under 'make menuconfig' Networking, Networking Options, Qos and/or fair queueing, Actions must be selected and then Traffic Police. I hope this helps. Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand -- VGER BF report: U 0.5 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
> I'm ready to rip out ABC entirely, to be honest. Or at least > turn it off by default. Turn it off for 2.6.18, by default then evaluate more for 2.6.19 If it goes out in 2.6.18 there could probably be a good argument for going into the stable tree as well... to stop the likes of the JVM type issues that users keep hitting (which is fixed or going to be fixed by Sun). -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
The word performance in this list seems to always mean 'throughput'. It seems though that there could be some knob to tweak for those of us who don't care so much about throughput but care a great deal about latency. SCTP has been mentioned. There is also DCCP - http://www.read.cs.ucla.edu/dccp/ -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] [DCCP]: Tidy up code slightly
> I haven't seen this go into the 2.6.19 tree yet? Because I simply haven't applied it yet. OK. My apologies for hassling you. I'm being too hasty and Arnaldo has correctly chastised me. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] [DCCP]: Tidy up code slightly
On 8/28/06, David Miller <[EMAIL PROTECTED]> wrote: From: Ian McDonald <[EMAIL PROTECTED]> Date: Mon, 28 Aug 2006 16:34:50 +1200 > Arnaldo has pointed this one out to me in latest series of > patches. Can this go into 2.6.18 please? It's not a bug fix, so we'll defer it to 2.6.19 I haven't seen this go into the 2.6.19 tree yet? Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] [DCCP]: Tidy up code slightly
On 8/28/06, David Miller <[EMAIL PROTECTED]> wrote: From: Ian McDonald <[EMAIL PROTECTED]> Date: Mon, 28 Aug 2006 16:34:50 +1200 > Arnaldo has pointed this one out to me in latest series of > patches. Can this go into 2.6.18 please? It's not a bug fix, so we'll defer it to 2.6.19 I guess that's true unless we change the structure which would make it a bug fix but I'm happy for this to be in 2.6.19 since that hasn't happened. Ian - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] [DCCP]: Tidyup CCID3 list handling
As Arnaldo Carvalho de Melo points out I should be using list_entry in case the structure changes in future. Current code functions but is reliant on position and requires type cast. Noticed when doing this that I have one more variable than I needed so removing that also. Signed off by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 090bc39..195aa95 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -900,7 +900,7 @@ found: static void ccid3_hc_rx_update_li(struct sock *sk, u64 seq_loss, u8 win_loss) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - struct dccp_li_hist_entry *next, *head; + struct dccp_li_hist_entry *head; u64 seq_temp; if (list_empty(&hcrx->ccid3hcrx_li_hist)) { @@ -908,15 +908,15 @@ static void ccid3_hc_rx_update_li(struct &hcrx->ccid3hcrx_li_hist, seq_loss, win_loss)) return; - next = (struct dccp_li_hist_entry *) - hcrx->ccid3hcrx_li_hist.next; - next->dccplih_interval = ccid3_hc_rx_calc_first_li(sk); + head = list_entry(hcrx->ccid3hcrx_li_hist.next, + struct dccp_li_hist_entry, dccplih_node); + head->dccplih_interval = ccid3_hc_rx_calc_first_li(sk); } else { struct dccp_li_hist_entry *entry; struct list_head *tail; - head = (struct dccp_li_hist_entry *) - hcrx->ccid3hcrx_li_hist.next; + head = list_entry(hcrx->ccid3hcrx_li_hist.next, + struct dccp_li_hist_entry, dccplih_node); /* FIXME win count check removed as was wrong */ /* should make this check with receive history */ /* and compare there as per section 10.2 of RFC4342 */ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/1] [DCCP]: Tidy up code slightly
Dave, Arnaldo has pointed this one out to me in latest series of patches. Can this go into 2.6.18 please? (And I've checked for white space too!) Ian - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
CCID2 patches
On 8/27/06, Andrea Bittau <[EMAIL PROTECTED]> wrote: > The two sets of patches are at: > http://darkircop.org/dccp > These look good in general and I know you have done a lot of work on these. Here are some comments. NB I haven't actually compiled or tested - just from reading the code. You need a description of each patch and signed off line in each patch. They can't be accepted in this form. Alternatively they could be resubmitted. Please state whether the target is 2.6.18 (bug fixes only) or 2.6.19 (enhancements) Please state any dependencies between patches. In patch 01_ackvec_opt: -int dccp_ackvec_parse(struct sock *sk, const struct sk_buff *skb, +u64 dccp_ackvec_parse(struct sock *sk, u64 ackno, const u8 opt, const u8 *value, const u8 len) { - if (len > DCCP_MAX_ACKVEC_LEN) - return -1; - /* dccp_ackvector_print(DCCP_SKB_CB(skb)->dccpd_ack_seq, value, len); */ - dccp_ackvec_check_rcv_ackvector(dccp_sk(sk)->dccps_hc_rx_ackvec, sk, - DCCP_SKB_CB(skb)->dccpd_ack_seq, - len, value); - return 0; + return dccp_ackvec_check_rcv_ackvector(dccp_sk(sk)->dccps_hc_rx_ackvec, + sk, ackno, len, value); This becomes a one line function. This is only used in one place that I can see so this should go and that code should go there... Also there is some weird shit going on as this is also defined as inline with return -1 in ackvec.h. This needs fixing as well. In patch 05_ccid2_seq_alloc I don't get this code: +static int ccid2_hc_tx_alloc_seq(struct ccid2_hc_tx_sock *hctx, int num) +{ + struct ccid2_seq *seqp; + int i; + + /* check if we have space to preserve the pointer to the buffer */ + if (hctx->ccid2hctx_seqbufc >= (sizeof(hctx->ccid2hctx_seqbuf) / + sizeof(struct ccid2_seq*))) + return -ENOMEM; + + /* allocate buffer and initialize linked list */ + seqp = kmalloc(sizeof(*seqp) * num, gfp_any()); + if (seqp == NULL) + return -ENOMEM; + + for (i = 0; i < (num - 1); i++) { + seqp[i].ccid2s_next = &seqp[i + 1]; + seqp[i + 1].ccid2s_prev = &seqp[i]; + } If you are allocating an array of structures in effect you shouldn't need to set next/prev pointers as they are allocated contiguously. If you are allocating groups of arrays, which I suspect you are, I still think the design is a bit ugly and wastes memory. In 06_ccid2_ssthresh you don't justify why ssthresh can be infinite to start off with. And if it is allowed then I don't think you should do this by just picking a random high number. Change it to something like ~0 In 07_ccid2_send_poll - changing to 1 msec poll is not nice. I now you said this should be dequeued. I've just added tx queuing to 2.6.19 tree now so you can do this. Up to Dave/Arnaldo if this is OK as a short term solution. In 08_ccid2_cwnd: +static void ccid2_congestion_event(struct ccid2_hc_tx_sock *hctx, + struct ccid2_seq *seqp) +{ + if (time_before(seqp->ccid2s_sent, hctx->ccid2hctx_last_cong)) { + dccp_pr_debug("Multiple losses in one RTT---treating as one\n"); I think that should be a ccid2_pr_debug not a dccp_pr_debug in 10_ccid2_change: in ccid2_change_srtt why do you do the test there because you are going to take twice as many instructions if you are going to alter. Is this to minimise sets? in 11_ccid2_profile: in ccid2_profile_time the code there is ugly. I am guessing you are assuming 32 bit intel vs 64 bit intel. The world doesn't revolve around intel. If you need something architecture specific that should be put into the arch subtree not here. in ccid2_hc_tx_init why are you using 6000? What is the significant. Make it a constant defined somewhere. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] [DCCP]: Fixes and enhancements
On 8/27/06, David Miller <[EMAIL PROTECTED]> wrote: From: "Ian McDonald" <[EMAIL PROTECTED]> Date: Sun, 27 Aug 2006 16:57:17 +1200 > Yes I see that now. However I can't see #5 in net-2.6.git in your tree > or Linus' where 1-4 made it in... Resend it to me privately and I'll figure out what happened. Done. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] [DCCP]: Fixes and enhancements
> On 8/27/06, David Miller <[EMAIL PROTECTED]> wrote: > > > > > > I would love 3, 4, 5 to go into 2.6.18 as these resolve long standing CCID3 > > > issues that have been in the DCCP tree since inception and have caught a > > > number of people. > > > > Ok, I'll toss 1-5 into 2.6.18 > > Thanks for that. Are 6 and 7 going into 2.6.19 or do you want Arnaldo > to have a bit more of a look? 6 in particular is trivial. Those patches are already in net-2.6.19 Yes I see that now. However I can't see #5 in net-2.6.git in your tree or Linus' where 1-4 made it in... Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] [DCCP]: Fixes and enhancements
On 8/27/06, David Miller <[EMAIL PROTECTED]> wrote: > > I would love 3, 4, 5 to go into 2.6.18 as these resolve long standing CCID3 > issues that have been in the DCCP tree since inception and have caught a > number of people. Ok, I'll toss 1-5 into 2.6.18 Thanks for that. Are 6 and 7 going into 2.6.19 or do you want Arnaldo to have a bit more of a look? 6 in particular is trivial. One thing I don't understand is this description from patch 5: This gives a theoretical speed of 71.9 Kbits/s. I measured across three runs with this patch set and got 70.1 Kbits/s. Without this patchset the average was 232 Kbits/s which means Linux can't be used for CCID3 research properly. Decreasing the transfer rate is desirable? I read this as saying this "fix" drops the transfer rate down from 232Kb/sec to 70.1Kb/sec. What's going on here? DCCP CCID3 (RFC 4342) uses TFRC (RFC 3448) to calculate the desired rate to send at based on feedback from the receiver. The reason for this is that TFRC is not ACK/Window based to control rate and TFRC calculates a rate so that the flow is "fair" when competing with TCP. TFRC is designed to be smoother than TCP at dealing with loss - more sine wave than saw tooth. The calculation is based on the work Padhye et al did in this paper - http://citeseer.ist.psu.edu/padhye98modeling.html As it turns out this is based on TCP Reno at that time and modern TCP variants are more efficient when dealing with loss as can be verified through iperf but we should implement what the RFC says. Basically the implementation in the DCCP code was buggy and was transmitting too fast so I have made it conform to the RFC much closer. Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] [DCCP]: Fixes and enhancements
I spent all of today on USAGI's IPSEC/MIPV6 patches and related issues, so I'll look into this tomorrow. Thanks Ian. Yes I saw that. Take your time as this is nowhere near as important! Regards, Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] [DCCP]: Shift sysctls into feat.h
This shifts further sysctls into feat.h. No change in functionality - shifting code only. Signed off by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/net/dccp/feat.h b/net/dccp/feat.h index b44c455..cee553d 100644 --- a/net/dccp/feat.h +++ b/net/dccp/feat.h @@ -27,5 +27,10 @@ extern int dccp_feat_clone(struct sock extern int dccp_feat_init(struct dccp_minisock *dmsk); extern int dccp_feat_default_sequence_window; +extern int dccp_feat_default_rx_ccid; +extern int dccp_feat_default_tx_ccid; +extern int dccp_feat_default_ack_ratio; +extern int dccp_feat_default_send_ack_vector; +extern int dccp_feat_default_send_ndp_count; #endif /* _DCCP_FEAT_H */ diff --git a/net/dccp/sysctl.c b/net/dccp/sysctl.c index c1ba945..38bc157 100644 --- a/net/dccp/sysctl.c +++ b/net/dccp/sysctl.c @@ -11,18 +11,12 @@ #include #include +#include "feat.h" #ifndef CONFIG_SYSCTL #error This file should not be compiled without CONFIG_SYSCTL defined #endif -extern int dccp_feat_default_sequence_window; -extern int dccp_feat_default_rx_ccid; -extern int dccp_feat_default_tx_ccid; -extern int dccp_feat_default_ack_ratio; -extern int dccp_feat_default_send_ack_vector; -extern int dccp_feat_default_send_ndp_count; - static struct ctl_table dccp_default_table[] = { { .ctl_name = NET_DCCP_DEFAULT_SEQ_WINDOW, - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] [DCCP]: Introduce tx buffering
This adds transmit buffering to DCCP. I have tested with CCID2/3 and with loss and rate limiting. Signed off by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/include/linux/dccp.h b/include/linux/dccp.h index 676333b..2d7671c 100644 --- a/include/linux/dccp.h +++ b/include/linux/dccp.h @@ -438,6 +438,7 @@ struct dccp_ackvec; * @dccps_role - Role of this sock, one of %dccp_role * @dccps_ndp_count - number of Non Data Packets since last data packet * @dccps_hc_rx_ackvec - rx half connection ack vector + * @dccps_xmit_timer - timer for when CCID is not ready to send */ struct dccp_sock { /* inet_connection_sock has to be the first member of dccp_sock */ @@ -470,6 +471,7 @@ struct dccp_sock { enum dccp_role dccps_role:2; __u8dccps_hc_rx_insert_options:1; __u8dccps_hc_tx_insert_options:1; + struct timer_list dccps_xmit_timer; }; static inline struct dccp_sock *dccp_sk(const struct sock *sk) diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h index 84b477d..f9f0721 100644 --- a/net/dccp/dccp.h +++ b/net/dccp/dccp.h @@ -130,7 +130,7 @@ extern void dccp_send_delayed_ack(struct extern void dccp_send_sync(struct sock *sk, const u64 seq, const enum dccp_pkt_type pkt_type); -extern int dccp_write_xmit(struct sock *sk, struct sk_buff *skb, long *timeo); +extern void dccp_write_xmit(struct sock *sk, int block); extern void dccp_write_space(struct sock *sk); extern void dccp_init_xmit_timers(struct sock *sk); diff --git a/net/dccp/output.c b/net/dccp/output.c index 58669be..5986cb9 100644 --- a/net/dccp/output.c +++ b/net/dccp/output.c @@ -198,7 +198,7 @@ static int dccp_wait_for_ccid(struct soc while (1) { prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); - if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) + if (sk->sk_err) goto do_error; if (!*timeo) goto do_nonblock; @@ -234,37 +234,72 @@ do_interrupted: goto out; } -int dccp_write_xmit(struct sock *sk, struct sk_buff *skb, long *timeo) +static void dccp_write_xmit_timer(unsigned long data) { + struct sock *sk = (struct sock *)data; + struct dccp_sock *dp = dccp_sk(sk); + + bh_lock_sock(sk); + if (sock_owned_by_user(sk)) + sk_reset_timer(sk, &dp->dccps_xmit_timer, jiffies+1); + else + dccp_write_xmit(sk, 0); + bh_unlock_sock(sk); + sock_put(sk); +} + +void dccp_write_xmit(struct sock *sk, int block) { - const struct dccp_sock *dp = dccp_sk(sk); - int err = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb, + struct dccp_sock *dp = dccp_sk(sk); + struct sk_buff *skb; + long timeo = 3; /* If a packet is taking longer than 2 secs + we have other issues */ + + while ((skb = skb_peek(&sk->sk_write_queue))) { + int err = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb, skb->len); + + if (err > 0) { + if (!block) { + sk_reset_timer(sk, &dp->dccps_xmit_timer, + msecs_to_jiffies(err)+jiffies); + break; + } else + err = dccp_wait_for_ccid(sk, skb, &timeo); + if (err) { + printk(KERN_CRIT "%s:err at dccp_wait_for_ccid" +" %d\n", __FUNCTION__, err); + dump_stack(); + } + } - if (err > 0) - err = dccp_wait_for_ccid(sk, skb, timeo); + skb_dequeue(&sk->sk_write_queue); + if (err == 0) { + struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb); + const int len = skb->len; - if (err == 0) { - struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb); - const int len = skb->len; - - if (sk->sk_state == DCCP_PARTOPEN) { - /* See 8.1.5. Handshake Completion */ - inet_csk_schedule_ack(sk); - inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK, + if (sk->sk_state == DCCP_PARTOPEN) { + /* See 8.1.5. Handshake Completion */ + inet_csk_schedule_ack(sk); + inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK, inet_csk(sk)->icsk_rto,
[PATCH 5/7] [DCCP]: Fix CCID3 to correct performance
This fixes CCID3 to give much closer performance to RFC4342. CCID3 is meant to alter sending rate based on RTT and loss. The performance was verified against: http://wand.net.nz/~perry/max_download.php For example I tested with netem and had the following parameters: Delayed Acks 1, MSS 256 bytes, RTT 105 ms, packet loss 5%. This gives a theoretical speed of 71.9 Kbits/s. I measured across three runs with this patch set and got 70.1 Kbits/s. Without this patchset the average was 232 Kbits/s which means Linux can't be used for CCID3 research properly. I also tested with netem turned off so box just acting as router with 1.2 msec RTT. The performance with this is the same with or without the patch at around 30 Mbit/s. Signed off by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 0f85970..dad20c9 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -342,6 +342,8 @@ static int ccid3_hc_tx_send_packet(struc new_packet->dccphtx_ccval = DCCP_SKB_CB(skb)->dccpd_ccval = hctx->ccid3hctx_last_win_count; + timeval_add_usecs(&hctx->ccid3hctx_t_nom, + hctx->ccid3hctx_t_ipi); } out: return rc; @@ -413,7 +415,8 @@ static void ccid3_hc_tx_packet_sent(stru case TFRC_SSTATE_NO_FBACK: case TFRC_SSTATE_FBACK: if (len > 0) { - hctx->ccid3hctx_t_nom = now; + timeval_sub_usecs(&hctx->ccid3hctx_t_nom, + hctx->ccid3hctx_t_ipi); ccid3_calc_new_t_ipi(hctx); ccid3_calc_new_delta(hctx); timeval_add_usecs(&hctx->ccid3hctx_t_nom, @@ -757,8 +760,7 @@ static void ccid3_hc_rx_send_feedback(st } hcrx->ccid3hcrx_tstamp_last_feedback = now; - hcrx->ccid3hcrx_last_counter = packet->dccphrx_ccval; - hcrx->ccid3hcrx_seqno_last_counter = packet->dccphrx_seqno; + hcrx->ccid3hcrx_ccval_last_counter = packet->dccphrx_ccval; hcrx->ccid3hcrx_bytes_recv = 0; /* Convert to multiples of 10us */ @@ -782,7 +784,7 @@ static int ccid3_hc_rx_insert_options(st if (!(sk->sk_state == DCCP_OPEN || sk->sk_state == DCCP_PARTOPEN)) return 0; - DCCP_SKB_CB(skb)->dccpd_ccval = hcrx->ccid3hcrx_last_counter; + DCCP_SKB_CB(skb)->dccpd_ccval = hcrx->ccid3hcrx_ccval_last_counter; if (dccp_packet_without_ack(skb)) return 0; @@ -854,6 +856,11 @@ static u32 ccid3_hc_rx_calc_first_li(str interval = 1; } found: + if (!tail) { + LIMIT_NETDEBUG(KERN_WARNING "%s: tail is null\n", + __FUNCTION__); + return ~0; + } rtt = timeval_delta(&tstamp, &tail->dccphrx_tstamp) * 4 / interval; ccid3_pr_debug("%s, sk=%p, approximated RTT to %uus\n", dccp_role(sk), sk, rtt); @@ -864,9 +871,20 @@ found: delta = timeval_delta(&tstamp, &hcrx->ccid3hcrx_tstamp_last_feedback); x_recv = usecs_div(hcrx->ccid3hcrx_bytes_recv, delta); + if (x_recv == 0) + x_recv = hcrx->ccid3hcrx_x_recv; + tmp1 = (u64)x_recv * (u64)rtt; do_div(tmp1,1000); tmp2 = (u32)tmp1; + + if (!tmp2) { + LIMIT_NETDEBUG(KERN_WARNING "tmp2 = 0 " + "%s: x_recv = %u, rtt =%u\n", + __FUNCTION__, x_recv, rtt); + return ~0; + } + fval = (hcrx->ccid3hcrx_s * 10) / tmp2; /* do not alter order above or you will get overflow on 32 bit */ p = tfrc_calc_x_reverse_lookup(fval); @@ -882,31 +900,101 @@ found: static void ccid3_hc_rx_update_li(struct sock *sk, u64 seq_loss, u8 win_loss) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); + struct dccp_li_hist_entry *next, *head; + u64 seq_temp; - if (seq_loss != DCCP_MAX_SEQNO + 1 && - list_empty(&hcrx->ccid3hcrx_li_hist)) { - struct dccp_li_hist_entry *li_tail; + if (list_empty(&hcrx->ccid3hcrx_li_hist)) { + if (!dccp_li_hist_interval_new(ccid3_li_hist, + &hcrx->ccid3hcrx_li_hist, seq_loss, win_loss)) + return; - li_tail = dccp_li_hist_interval_new(ccid3_li_hist, - &hcrx->ccid3hcrx_li_hist, - seq_loss, win_loss); - if (li_tail == NULL) + next = (struct dccp_li_hist_entry *) + hcrx->ccid3hcrx_li_hist.next; +
[PATCH 4/7] [DCCP]: Introduce dccp_rx_hist_find_entry
This adds a new function dccp_rx_hist_find_entry. Signed off by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 7b6b03e..1c68182 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -365,6 +365,25 @@ struct dccp_tx_hist_entry * EXPORT_SYMBOL_GPL(dccp_tx_hist_find_entry); +int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq, + u8 *ccval) +{ + struct dccp_rx_hist_entry *packet = NULL, *entry; + + list_for_each_entry(entry, list, dccphrx_node) + if (entry->dccphrx_seqno == seq) { + packet = entry; + break; + } + + if (packet) + *ccval = packet->dccphrx_ccval; + + return packet != NULL; +} + +EXPORT_SYMBOL_GPL(dccp_rx_hist_find_entry); + void dccp_tx_hist_purge_older(struct dccp_tx_hist *hist, struct list_head *list, struct dccp_tx_hist_entry *packet) diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 27c4309..aea9c5d 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -106,6 +106,8 @@ static inline void dccp_tx_hist_entry_de extern struct dccp_tx_hist_entry * dccp_tx_hist_find_entry(const struct list_head *list, const u64 seq); +extern int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq, + u8 *ccval); static inline void dccp_tx_hist_add_entry(struct list_head *list, struct dccp_tx_hist_entry *entry) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] [DCCP]: Introduces follows48 function
This adds a new function to see if two sequence numbers follow each other. Signed off by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h index b8931d3..84b477d 100644 --- a/net/dccp/dccp.h +++ b/net/dccp/dccp.h @@ -81,6 +81,14 @@ static inline u64 max48(const u64 seq1, return after48(seq1, seq2) ? seq1 : seq2; } +/* is seq1 next seqno after seq2 */ +static inline int follows48(const u64 seq1, const u64 seq2) +{ + int diff = (seq1 & 0x) - (seq2 & 0x); + + return diff==1; +} + enum { DCCP_MIB_NUM = 0, DCCP_MIB_ACTIVEOPENS, /* ActiveOpens */ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] [DCCP]: Update contact details and copyright
Just updating copyright and contacts Signed off by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/CREDITS b/CREDITS index 29be6d1..0fe904e 100644 --- a/CREDITS +++ b/CREDITS @@ -2209,7 +2209,7 @@ S: (address available on request) S: USA N: Ian McDonald -E: [EMAIL PROTECTED] +E: [EMAIL PROTECTED] E: [EMAIL PROTECTED] W: http://wand.net.nz/~iam4 W: http://imcdnzl.blogspot.com diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index c39bff7..0f85970 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -2,7 +2,7 @@ * net/dccp/ccids/ccid3.c * * Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand. - * Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]> + * Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]> * * An implementation of the DCCP protocol * @@ -1230,7 +1230,7 @@ static __exit void ccid3_module_exit(voi } module_exit(ccid3_module_exit); -MODULE_AUTHOR("Ian McDonald <[EMAIL PROTECTED]>, " +MODULE_AUTHOR("Ian McDonald <[EMAIL PROTECTED]>, " "Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>"); MODULE_DESCRIPTION("DCCP TFRC CCID3 CCID"); MODULE_LICENSE("GPL"); diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h index 5ade4f6..22cb9f8 100644 --- a/net/dccp/ccids/ccid3.h +++ b/net/dccp/ccids/ccid3.h @@ -1,13 +1,13 @@ /* * net/dccp/ccids/ccid3.h * - * Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand. + * Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand. * * An implementation of the DCCP protocol * * This code has been developed by the University of Waikato WAND * research group. For further information please see http://www.wand.net.nz/ - * or e-mail Ian McDonald - [EMAIL PROTECTED] + * or e-mail Ian McDonald - [EMAIL PROTECTED] * * This code also uses code from Lulea University, rereleased as GPL by its * authors: diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c index 5d7b7d8..b93d9fc 100644 --- a/net/dccp/ccids/lib/loss_interval.c +++ b/net/dccp/ccids/lib/loss_interval.c @@ -2,7 +2,7 @@ * net/dccp/ccids/lib/loss_interval.c * * Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand. - * Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]> + * Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]> * Copyright (c) 2005 Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> * * This program is free software; you can redistribute it and/or modify diff --git a/net/dccp/ccids/lib/loss_interval.h b/net/dccp/ccids/lib/loss_interval.h index 43bf782..dcb370a 100644 --- a/net/dccp/ccids/lib/loss_interval.h +++ b/net/dccp/ccids/lib/loss_interval.h @@ -4,7 +4,7 @@ #define _DCCP_LI_HIST_ * net/dccp/ccids/lib/loss_interval.h * * Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand. - * Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]> + * Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]> * Copyright (c) 2005 Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> * * This program is free software; you can redistribute it and/or modify it diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 6739be1..7b6b03e 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -1,13 +1,13 @@ /* * net/dccp/packet_history.c * - * Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand. + * Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand. * * An implementation of the DCCP protocol * * This code has been developed by the University of Waikato WAND * research group. For further information please see http://www.wand.net.nz/ - * or e-mail Ian McDonald - [EMAIL PROTECTED] + * or e-mail Ian McDonald - [EMAIL PROTECTED] * * This code also uses code from Lulea University, rereleased as GPL by its * authors: @@ -391,7 +391,7 @@ void dccp_tx_hist_purge(struct dccp_tx_h EXPORT_SYMBOL_GPL(dccp_tx_hist_purge); -MODULE_AUTHOR("Ian McDonald <[EMAIL PROTECTED]>, " +MODULE_AUTHOR("Ian McDonald <[EMAIL PROTECTED]>, " "Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>"); MODULE_DESCRIPTION("DCCP TFRC library"); MODULE_LICENSE("GPL"); diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 673c209..27c4309 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -1,13 +1,13 @@ /* * net/dccp/packet_history.h * - * Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand. + * Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand. * * An implementation of the DCCP protocol * * This code has been developed by the University of Waikato WAND * research
[PATCH 1/7] [DCCP]: Fix typo
This fixes a small typo in net/dccp/libs/packet_history.c Signed off by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index ad98d6a..6739be1 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -1,5 +1,5 @@ /* - * net/dccp/packet_history.h + * net/dccp/packet_history.c * * Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand. * - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/7] [DCCP]: Fixes and enhancements
Please find following a series of patches for DCCP. These have been tested against torvalds/linux-2.6.git and davem/net-2.6.19.git My opinion is that 1 and 2 can go straight into 2.6.18 as documentation changes only - Dave - are you able to do as Arnaldo is very busy at present. I would love 3, 4, 5 to go into 2.6.18 as these resolve long standing CCID3 issues that have been in the DCCP tree since inception and have caught a number of people. Number 6 is just shifting code around to tidy it up and introduces no change in logic. You could argue for it to go in either 2.6.18 or 2.6.19! Number 7 is implementing transmit buffering and is 2.6.19 material. Andrea - this might be quite useful for you in CCID2 as well I believe. These patches are all capable of being done independently except 3, 4, 5 which are a group. Also on http://wand.net.nz/~iam4/dccp/patches/ are the following further patches which are not ready for merge but others might be interested in: -DCCP-Probe ala TCP-Probe -The starts of memory buffer limiting (this is not actually needed for number 7 as it is actually receive where problems occur which is an existing issue) -My research code -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take12 0/3] kevent: Generic event handling mechanism.
I wonder whether designing-in a millisecond granularity is the right thing to do. If in a few years the kernel is running tickless with high-res clock interrupt sources, that might look a bit lumpy. I'd second that - when working on DCCP I've done a lot of the work in microseconds and it made quite a difference instead of milliseconds because of it's design. I haven't followed kevents in great detail but it sounds like something that could be useful for me with higher resolution timers than milliseconds. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: means to artificially alter the bandwidth of a system
>Hi, > >For research purposes we are considering to develop a program to alter >the bandwidth of a system via the software, so instance: a machine has >100 MB/s and we change it to 1MB/s. > >Does something like this already exist? Or is there a way to do this >without creating a program/kernel module Of course: see http://linux-net.osdl.org/index.php/Iproute2 (especially tc) >Any help will be highly appreciated! > >Irfan Habib HGN You may also want to look at Netem http://linux-net.osdl.org/index.php/Netem if you want to play with delay, loss as well. The examples there are good but I can send scripts for you as well if you wish. Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Who maintains the website ?
On 7/26/06, Christophe Devriese <[EMAIL PROTECTED]> wrote: I would like to have a VLAN page on the main page, so that I can update it a bit with relevant info, and then include the link to the external site as it's basically a "here is a patch, here is a usage" page, while an explanation of the different stuff would be nice (such as the forwarding path, the vlan acceleration, where packets go ...). What is the external page? If it doesn't exist consider putting the content on the wiki itself so others can improve it. Prepare the Wiki page including linking in the existing VLAN link on the front page and then we can see what can be done. You can create a VLAN page without having to change the wiki front page initially... If it looks good then Stephen, or myself or others can change the front page. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bandwidth limitation help
On 7/26/06, Piotrowski, Ted P. <[EMAIL PROTECTED]> wrote: Hi, I am new to the mailing list so I'm not sure if anybody reads these, but here goes nothing. I recently read: Linux Advanced Routing & Traffic Control HOWTO and have been trying to test my applications using bandwidth limitation. All the examples described in the HOWTO do not simulate the conditions I need to test my software. What I would like is for my bandwidth limitation to empty my UDP buffer at a given rate. I have tried using a simple TBF to do this, but all that happens is that my application floods the TBF buffer at link speed and the TBF buffer quickly overflows and drops packets. I want the packets to actually stay in the UDP buffer and be emptied at a given rate without modifying my application. I don't know if any of you are familiar with netem, but it can be used in conjuction with tc to add delay to a link. Surprisingly, packets delayed by netem appear to remain in the UDP buffer until it is time for them to be sent. I would like this same behavior of keeping the packets in the UDP buffer, but with bandwidth limitation on the rate at which the buffer empties, not just packet delay. Has anybody ever done anything like this or can point me to some resources? Have a look at: http://linux-net.osdl.org/index.php/Netem I have written my own test scenarios using examples from this website but I can also send you my small scripts if you want. Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Who maintains the website ?
On 7/26/06, Christophe Devriese <[EMAIL PROTECTED]> wrote: The http://linux-net.osdl.org/index.php/Main_Page website I mean. It's a Wiki so anybody can alter content on the website. The exception to this is that particular page - the main page. If you want something altered on that particular page send email to one of the sysops. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netchannles: first stage has been completed. Further ideas.
If we consider netchannels as how Van Jackobson discribed them, then mutext is not needed, since it is impossible to have several readers or writers. But in socket case even if there is only one userspace consumer, that lock must be held to protect against bh (or introduce several queues and complicate a lot their's management (ucopy for example)). As I recall Van's talk you don't need a lock with a ring buffer if you have a start and end variable pointing to location within ring buffer. He didn't explain this in great depth as it is computer science 101 but here is how I would explain it: Once socket is initialiased consumer is the only one that sets start variable and network driver reads this only. It is the other way around for the end variable. As long as the writes are atomic then you are fine. You only need one ring buffer in this scenario and two atomic variables. Having atomic writes does have overhead but far less than locking semantic. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] net: fix __sk_stream_mem_reclaim
__sk_stream_mem_reclaim is only called by sk_stream_mem_reclaim. As such the check on sk->sk_forward_alloc is not needed and can be removed. Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/net/core/stream.c b/net/core/stream.c index e948969..d1d7dec 100644 --- a/net/core/stream.c +++ b/net/core/stream.c @@ -196,15 +196,13 @@ EXPORT_SYMBOL(sk_stream_error); void __sk_stream_mem_reclaim(struct sock *sk) { - if (sk->sk_forward_alloc >= SK_STREAM_MEM_QUANTUM) { - atomic_sub(sk->sk_forward_alloc / SK_STREAM_MEM_QUANTUM, - sk->sk_prot->memory_allocated); - sk->sk_forward_alloc &= SK_STREAM_MEM_QUANTUM - 1; - if (*sk->sk_prot->memory_pressure && - (atomic_read(sk->sk_prot->memory_allocated) < -sk->sk_prot->sysctl_mem[0])) - *sk->sk_prot->memory_pressure = 0; - } + atomic_sub(sk->sk_forward_alloc / SK_STREAM_MEM_QUANTUM, + sk->sk_prot->memory_allocated); + sk->sk_forward_alloc &= SK_STREAM_MEM_QUANTUM - 1; + if (*sk->sk_prot->memory_pressure && + (atomic_read(sk->sk_prot->memory_allocated) < +sk->sk_prot->sysctl_mem[0])) + *sk->sk_prot->memory_pressure = 0; } EXPORT_SYMBOL(__sk_stream_mem_reclaim); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] net: fix __sk_stream_mem_reclaim
__sk_stream_mem_reclaim is only called by sk_stream_mem_reclaim. As such the check on sk->sk_forward_alloc is not needed and can be removed. At the same time remove the EXPORT_SYMBOL_GPL as not needed and shift it into include/net/sock.h Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> --- diff --git a/include/net/sock.h b/include/net/sock.h index 324b3ea..3a62b5b 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -694,7 +694,6 @@ static inline struct inode *SOCK_INODE(s return &container_of(socket, struct socket_alloc, socket)->vfs_inode; } -extern void __sk_stream_mem_reclaim(struct sock *sk); extern int sk_stream_mem_schedule(struct sock *sk, int size, int kind); #define SK_STREAM_MEM_QUANTUM ((int)PAGE_SIZE) @@ -704,6 +703,17 @@ static inline int sk_stream_pages(int am return (amt + SK_STREAM_MEM_QUANTUM - 1) / SK_STREAM_MEM_QUANTUM; } +static void __sk_stream_mem_reclaim(struct sock *sk) +{ + atomic_sub(sk->sk_forward_alloc / SK_STREAM_MEM_QUANTUM, + sk->sk_prot->memory_allocated); + sk->sk_forward_alloc &= SK_STREAM_MEM_QUANTUM - 1; + if (*sk->sk_prot->memory_pressure && + (atomic_read(sk->sk_prot->memory_allocated) < +sk->sk_prot->sysctl_mem[0])) + *sk->sk_prot->memory_pressure = 0; +} + static inline void sk_stream_mem_reclaim(struct sock *sk) { if (sk->sk_forward_alloc >= SK_STREAM_MEM_QUANTUM) diff --git a/net/core/stream.c b/net/core/stream.c index e948969..8ff97e6 100644 --- a/net/core/stream.c +++ b/net/core/stream.c @@ -194,21 +194,6 @@ int sk_stream_error(struct sock *sk, int EXPORT_SYMBOL(sk_stream_error); -void __sk_stream_mem_reclaim(struct sock *sk) -{ - if (sk->sk_forward_alloc >= SK_STREAM_MEM_QUANTUM) { - atomic_sub(sk->sk_forward_alloc / SK_STREAM_MEM_QUANTUM, - sk->sk_prot->memory_allocated); - sk->sk_forward_alloc &= SK_STREAM_MEM_QUANTUM - 1; - if (*sk->sk_prot->memory_pressure && - (atomic_read(sk->sk_prot->memory_allocated) < -sk->sk_prot->sysctl_mem[0])) - *sk->sk_prot->memory_pressure = 0; - } -} - -EXPORT_SYMBOL(__sk_stream_mem_reclaim); - int sk_stream_mem_schedule(struct sock *sk, int size, int kind) { int amt = sk_stream_pages(size); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unnecessary check in __sk_stream_mem_reclaim?
On 7/12/06, Herbert Xu <[EMAIL PROTECTED]> wrote: Ian McDonald <[EMAIL PROTECTED]> wrote: > > It looks to me like this check here in net/core/stream.c for > __sk_stream_mem_reclaim: >if (sk->sk_forward_alloc >= SK_STREAM_MEM_QUANTUM) { > > is unnecessary. It's needed after skb's have been freed which can push sk_forward_alloc above a quantum. I'm not saying the check is unneeded - just saying doing it twice is unneeded. Sorry Herbert for two copies - forgot to add netdev first time. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Unnecessary check in __sk_stream_mem_reclaim?
Folks, It looks to me like this check here in net/core/stream.c for __sk_stream_mem_reclaim: if (sk->sk_forward_alloc >= SK_STREAM_MEM_QUANTUM) { is unnecessary. It is also done in include/net/sock.h for sk_stream_mem_reclaim which if the test succeeds calls __sk_stream_mem_reclaim. This is the only use of it in the kernel. Now sk_stream_mem_reclaim seems to be in the current form for perfomance reasons which make sense so I think it makes sense to remove it from __sk_stream_mem_reclaim The danger of removing the check is an external module could use it - which I suspect is highly unlikely. This could be overcome by removing the export_symbol_gpl and shifting the function into the header file although this would result in mutliple instances being linked in. I am guessing that there is a smarter way to do this though which still results in the symbol not being exported. I don't know my way around the linking/exporting very well. Comments? I guess if this was done it would have to be put in feature removal schedule though because it is currently exported? Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] net/dccp/: possible cleanups
Comments below: On 6/29/06, Adrian Bunk <[EMAIL PROTECTED]> wrote: This patch contains the following possible cleanups: - sysctl.c: the Kconfig rules already disallow CONFIG_SYSCTL=n, there's no need for an additional check Agree - proper extern declarations for some variables in dccp.h NAK - have sent another patch to shift these to feat.h. Arnaldo is reviewing patches next week. - make the following needlessly global function static: - ipv4.c: dccp_v4_checksum() Agree - #if 0 the following unused functions: - ackvec.c: dccp_ackvector_print() - ackvec.c: dccp_ackvec_print() - output.c: dccp_send_delayed_ack() NAK on the first two. These are for debugging and DCCP still needs improving so I think worthwhile having there in short term so we can quickly call them if needed. I will leave Arnaldo or Andrea to comment on last one... Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Locking validator output on DCCP
On 6/22/06, Ian McDonald <[EMAIL PROTECTED]> wrote: On 6/21/06, Arjan van de Ven <[EMAIL PROTECTED]> wrote: > On Wed, 2006-06-21 at 10:34 +1000, Herbert Xu wrote: > > > As I read this it is not a recursive lock as sk_clone is occurring > > > second and is actually creating a new socket so they are trying to > > > lock on different sockets. > > > > > > Can someone tell me whether I am correct in my thinking or not? If I > > > am then I will work out how to tell the lock validator not to worry > > > about it. > > > > I agree, this looks bogus. Ingo, could you please take a look? > > Fix is relatively easy: > > > sk_clone creates a new socket, and thus can never deadlock, and in fact > can be called with the original socket locked. This therefore is a > legitimate nesting case; mark it as such. > > Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> > > > --- > net/core/sock.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-2.6.17-rc6-mm2/net/core/sock.c > === > --- linux-2.6.17-rc6-mm2.orig/net/core/sock.c > +++ linux-2.6.17-rc6-mm2/net/core/sock.c > @@ -846,7 +846,7 @@ struct sock *sk_clone(const struct sock > /* SANITY */ > sk_node_init(&newsk->sk_node); > sock_lock_init(newsk); > - bh_lock_sock(newsk); > + bh_lock_sock_nested(newsk); > > atomic_set(&newsk->sk_rmem_alloc, 0); > atomic_set(&newsk->sk_wmem_alloc, 0); > > When I do this it now shifts around. I'll investigate further (probably tomorrow). Now get Jun 22 14:20:48 localhost kernel: [ 1276.424531] = Jun 22 14:20:48 localhost kernel: [ 1276.424541] [ INFO: possible recursive locking detected ] Jun 22 14:20:48 localhost kernel: [ 1276.424546] - Jun 22 14:20:48 localhost kernel: [ 1276.424553] idle/0 is trying to acquire lock: Jun 22 14:20:48 localhost kernel: [ 1276.424559] (&sk->sk_lock.slock#5/1){-+..}, at: [] sk_clone+0x5f/0x195 Jun 22 14:20:48 localhost kernel: [ 1276.424585] Jun 22 14:20:48 localhost kernel: [ 1276.424587] but task is already holding lock: Jun 22 14:20:48 localhost kernel: [ 1276.424592] (&sk->sk_lock.slock#5/1){-+..}, at: [] tcp_v4_rcv+0x42e/0x9b3 Jun 22 14:20:48 localhost kernel: [ 1276.424616] Jun 22 14:20:48 localhost kernel: [ 1276.424618] other info that might help us debug this: Jun 22 14:20:48 localhost kernel: [ 1276.424624] 2 locks held by idle/0: Jun 22 14:20:48 localhost kernel: [ 1276.424628] #0: (&tp->rx_lock){-+..}, at: [] rtl8139_poll+0x42/0x41c [8139too] Jun 22 14:20:48 localhost kernel: [ 1276.424666] #1: (&sk->sk_lock.slock#5/1){-+..}, at: [] tcp_v4_rcv+0x42e/0x9b3 Jun 22 14:20:48 localhost kernel: [ 1276.424685] Jun 22 14:20:48 localhost kernel: [ 1276.424686] stack backtrace: Jun 22 14:20:48 localhost kernel: [ 1276.425002] [] show_trace_log_lvl+0x53/0xff Jun 22 14:20:48 localhost kernel: [ 1276.425038] [] show_trace+0x16/0x19 Jun 22 14:20:48 localhost kernel: [ 1276.425068] [] dump_stack+0x1a/0x1f Jun 22 14:20:48 localhost kernel: [ 1276.425099] [] __lock_acquire+0x8e6/0x902 Jun 22 14:20:48 localhost kernel: [ 1276.425311] [] lock_acquire+0x4e/0x66 Jun 22 14:20:48 localhost kernel: [ 1276.425510] [] _spin_lock_nested+0x26/0x36 Jun 22 14:20:48 localhost kernel: [ 1276.425726] [] sk_clone+0x5f/0x195 Jun 22 14:20:48 localhost kernel: [ 1276.427191] [] inet_csk_clone+0xf/0x67 Jun 22 14:20:48 localhost kernel: [ 1276.428879] [] tcp_create_openreq_child+0x15/0x32b Jun 22 14:20:48 localhost kernel: [ 1276.430598] [] tcp_v4_syn_recv_sock+0x47/0x29c Jun 22 14:20:48 localhost kernel: [ 1276.432313] [] tcp_v6_syn_recv_sock+0x37/0x534 [ipv6] Jun 22 14:20:48 localhost kernel: [ 1276.432482] [] tcp_check_req+0x1a0/0x2db Jun 22 14:20:48 localhost kernel: [ 1276.434198] [] tcp_v4_do_rcv+0x9f/0x2fe Jun 22 14:20:48 localhost kernel: [ 1276.435911] [] tcp_v4_rcv+0x932/0x9b3 Jun 22 14:20:48 localhost kernel: [ 1276.437632] [] ip_local_deliver+0x159/0x1f1 Jun 22 14:20:48 localhost kernel: [ 1276.439305] [] ip_rcv+0x3e9/0x416 Jun 22 14:20:48 localhost kernel: [ 1276.440977] [] netif_receive_skb+0x287/0x317 Jun 22 14:20:48 localhost kernel: [ 1276.442542] [] rtl8139_poll+0x294/0x41c [8139too] Jun 22 14:20:48 localhost kernel: [ 1276.442590] [] net_rx_action+0x8b/0x17c Jun 22 14:20:48 localhost kernel: [ 1276.444160] [] __do_softirq+0x54/0xb3 Jun 22 14:20:48 localhost kernel: [ 1276.444335] [] do_softirq+0x2f/0x47 Jun 22 14:20:48 localhost kernel: [ 1276.60] [] irq_exit+0x39/0x46 Jun 22 14:20:48 localhost kernel: [ 1276.444585] [] do_IRQ+0x77/0x84 Jun 22 14:20:48 localhost k
Re: Locking validator output on DCCP
On 6/21/06, Ingo Molnar <[EMAIL PROTECTED]> wrote: * Herbert Xu <[EMAIL PROTECTED]> wrote: > > Can someone tell me whether I am correct in my thinking or not? If I > > am then I will work out how to tell the lock validator not to worry > > about it. > > I agree, this looks bogus. Ingo, could you please take a look? sure - Ian, could you try Arjan's fix below? Ingo Subject: lock validator: annotate vlan "master" device locks From: Arjan van de Ven <[EMAIL PROTECTED]> The fix you sent here was the incorrect one but I did test Arjan's as per previous e-mail. Real dumb question time. The lock validator is testing for recursive lock holding. Given that this is a lock at a different address can we eliminate all such cases? Or are you trying to detect code here that keeps on locking same type of lock in case of error and we should explicitly flag... Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Locking validator output on DCCP
On 6/21/06, Arjan van de Ven <[EMAIL PROTECTED]> wrote: On Wed, 2006-06-21 at 10:34 +1000, Herbert Xu wrote: > > As I read this it is not a recursive lock as sk_clone is occurring > > second and is actually creating a new socket so they are trying to > > lock on different sockets. > > > > Can someone tell me whether I am correct in my thinking or not? If I > > am then I will work out how to tell the lock validator not to worry > > about it. > > I agree, this looks bogus. Ingo, could you please take a look? Fix is relatively easy: sk_clone creates a new socket, and thus can never deadlock, and in fact can be called with the original socket locked. This therefore is a legitimate nesting case; mark it as such. Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> --- net/core/sock.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.17-rc6-mm2/net/core/sock.c === --- linux-2.6.17-rc6-mm2.orig/net/core/sock.c +++ linux-2.6.17-rc6-mm2/net/core/sock.c @@ -846,7 +846,7 @@ struct sock *sk_clone(const struct sock /* SANITY */ sk_node_init(&newsk->sk_node); sock_lock_init(newsk); - bh_lock_sock(newsk); + bh_lock_sock_nested(newsk); atomic_set(&newsk->sk_rmem_alloc, 0); atomic_set(&newsk->sk_wmem_alloc, 0); When I do this it now shifts around. I'll investigate further (probably tomorrow). Now get Jun 22 14:20:48 localhost kernel: [ 1276.424531] = Jun 22 14:20:48 localhost kernel: [ 1276.424541] [ INFO: possible recursive locking detected ] Jun 22 14:20:48 localhost kernel: [ 1276.424546] - Jun 22 14:20:48 localhost kernel: [ 1276.424553] idle/0 is trying to acquire lock: Jun 22 14:20:48 localhost kernel: [ 1276.424559] (&sk->sk_lock.slock#5/1){-+..}, at: [] sk_clone+0x5f/0x195 Jun 22 14:20:48 localhost kernel: [ 1276.424585] Jun 22 14:20:48 localhost kernel: [ 1276.424587] but task is already holding lock: Jun 22 14:20:48 localhost kernel: [ 1276.424592] (&sk->sk_lock.slock#5/1){-+..}, at: [] tcp_v4_rcv+0x42e/0x9b3 Jun 22 14:20:48 localhost kernel: [ 1276.424616] Jun 22 14:20:48 localhost kernel: [ 1276.424618] other info that might help us debug this: Jun 22 14:20:48 localhost kernel: [ 1276.424624] 2 locks held by idle/0: Jun 22 14:20:48 localhost kernel: [ 1276.424628] #0: (&tp->rx_lock){-+..}, at: [] rtl8139_poll+0x42/0x41c [8139too] Jun 22 14:20:48 localhost kernel: [ 1276.424666] #1: (&sk->sk_lock.slock#5/1){-+..}, at: [] tcp_v4_rcv+0x42e/0x9b3 Jun 22 14:20:48 localhost kernel: [ 1276.424685] Jun 22 14:20:48 localhost kernel: [ 1276.424686] stack backtrace: Jun 22 14:20:48 localhost kernel: [ 1276.425002] [] show_trace_log_lvl+0x53/0xff Jun 22 14:20:48 localhost kernel: [ 1276.425038] [] show_trace+0x16/0x19 Jun 22 14:20:48 localhost kernel: [ 1276.425068] [] dump_stack+0x1a/0x1f Jun 22 14:20:48 localhost kernel: [ 1276.425099] [] __lock_acquire+0x8e6/0x902 Jun 22 14:20:48 localhost kernel: [ 1276.425311] [] lock_acquire+0x4e/0x66 Jun 22 14:20:48 localhost kernel: [ 1276.425510] [] _spin_lock_nested+0x26/0x36 Jun 22 14:20:48 localhost kernel: [ 1276.425726] [] sk_clone+0x5f/0x195 Jun 22 14:20:48 localhost kernel: [ 1276.427191] [] inet_csk_clone+0xf/0x67 Jun 22 14:20:48 localhost kernel: [ 1276.428879] [] tcp_create_openreq_child+0x15/0x32b Jun 22 14:20:48 localhost kernel: [ 1276.430598] [] tcp_v4_syn_recv_sock+0x47/0x29c Jun 22 14:20:48 localhost kernel: [ 1276.432313] [] tcp_v6_syn_recv_sock+0x37/0x534 [ipv6] Jun 22 14:20:48 localhost kernel: [ 1276.432482] [] tcp_check_req+0x1a0/0x2db Jun 22 14:20:48 localhost kernel: [ 1276.434198] [] tcp_v4_do_rcv+0x9f/0x2fe Jun 22 14:20:48 localhost kernel: [ 1276.435911] [] tcp_v4_rcv+0x932/0x9b3 Jun 22 14:20:48 localhost kernel: [ 1276.437632] [] ip_local_deliver+0x159/0x1f1 Jun 22 14:20:48 localhost kernel: [ 1276.439305] [] ip_rcv+0x3e9/0x416 Jun 22 14:20:48 localhost kernel: [ 1276.440977] [] netif_receive_skb+0x287/0x317 Jun 22 14:20:48 localhost kernel: [ 1276.442542] [] rtl8139_poll+0x294/0x41c [8139too] Jun 22 14:20:48 localhost kernel: [ 1276.442590] [] net_rx_action+0x8b/0x17c Jun 22 14:20:48 localhost kernel: [ 1276.444160] [] __do_softirq+0x54/0xb3 Jun 22 14:20:48 localhost kernel: [ 1276.444335] [] do_softirq+0x2f/0x47 Jun 22 14:20:48 localhost kernel: [ 1276.60] [] irq_exit+0x39/0x46 Jun 22 14:20:48 localhost kernel: [ 1276.444585] [] do_IRQ+0x77/0x84 Jun 22 14:20:48 localhost kernel: [ 1276.444621] [] common_interrupt+0x25/0x2c -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe fr
Locking validator output on DCCP
Folks, I am getting this when I am using DCCP with 2.6.17-rc6-mm2 with Ingo's lock dependency patch: Jun 21 09:38:58 localhost kernel: [ 102.068588] Jun 21 09:38:58 localhost kernel: [ 102.068592] = Jun 21 09:38:58 localhost kernel: [ 102.068602] [ INFO: possible recursive locking detected ] Jun 21 09:38:58 localhost kernel: [ 102.068608] - Jun 21 09:38:58 localhost kernel: [ 102.068615] idle/0 is trying to acquire lock: Jun 21 09:38:58 localhost kernel: [ 102.068620] (&sk->sk_lock.slock#3){-+..}, at: [] sk_clone+0x5a/0x190 Jun 21 09:38:58 localhost kernel: [ 102.068644] Jun 21 09:38:58 localhost kernel: [ 102.068646] but task is already holding lock: Jun 21 09:38:58 localhost kernel: [ 102.068651] (&sk->sk_lock.slock#3){-+..}, at: [] sk_receive_skb+0xe6/0xfe Jun 21 09:38:58 localhost kernel: [ 102.068668] Jun 21 09:38:58 localhost kernel: [ 102.068670] other info that might help us debug this: Jun 21 09:38:58 localhost kernel: [ 102.068676] 2 locks held by idle/0: Jun 21 09:38:58 localhost kernel: [ 102.068679] #0: (&tp->rx_lock){-+..}, at: [] rtl8139_poll+0x42/0x41c [8139too] Jun 21 09:38:58 localhost kernel: [ 102.068722] #1: (&sk->sk_lock.slock#3){-+..}, at: [] sk_receive_skb+0xe6/0xfe Jun 21 09:38:58 localhost kernel: [ 102.068739] Jun 21 09:38:58 localhost kernel: [ 102.068741] stack backtrace: Jun 21 09:38:58 localhost kernel: [ 102.069053] [] show_trace_log_lvl+0x53/0xff Jun 21 09:38:58 localhost kernel: [ 102.069091] [] show_trace+0x16/0x19 Jun 21 09:38:58 localhost kernel: [ 102.069121] [] dump_stack+0x1a/0x1f Jun 21 09:38:58 localhost kernel: [ 102.069151] [] __lock_acquire+0x8e6/0x902 Jun 21 09:38:58 localhost kernel: [ 102.069363] [] lock_acquire+0x4e/0x66 Jun 21 09:38:58 localhost kernel: [ 102.069562] [] _spin_lock+0x24/0x32 Jun 21 09:38:58 localhost kernel: [ 102.069777] [] sk_clone+0x5a/0x190 Jun 21 09:38:58 localhost kernel: [ 102.071244] [] inet_csk_clone+0xf/0x67 Jun 21 09:38:58 localhost kernel: [ 102.072932] [] dccp_create_openreq_child+0x17/0x2fe [dccp] Jun 21 09:38:58 localhost kernel: [ 102.072993] [] dccp_v4_request_recv_sock+0x47/0x260 [dccp_ipv4] Jun 21 09:38:58 localhost kernel: [ 102.073020] [] dccp_check_req+0x128/0x264 [dccp] Jun 21 09:38:58 localhost kernel: [ 102.073049] [] dccp_v4_do_rcv+0x74/0x290 [dccp_ipv4] Jun 21 09:38:58 localhost kernel: [ 102.073067] [] sk_receive_skb+0x6b/0xfe Jun 21 09:38:58 localhost kernel: [ 102.074607] [] dccp_v4_rcv+0x4ea/0x66e [dccp_ipv4] Jun 21 09:38:58 localhost kernel: [ 102.074651] [] ip_local_deliver+0x159/0x1f1 Jun 21 09:38:58 localhost kernel: [ 102.076322] [] ip_rcv+0x3e9/0x416 Jun 21 09:38:58 localhost kernel: [ 102.077995] [] netif_receive_skb+0x287/0x317 Jun 21 09:38:58 localhost kernel: [ 102.079562] [] rtl8139_poll+0x294/0x41c [8139too] Jun 21 09:38:58 localhost kernel: [ 102.079610] [] net_rx_action+0x8b/0x17c Jun 21 09:38:58 localhost kernel: [ 102.081181] [] __do_softirq+0x54/0xb3 Jun 21 09:38:58 localhost kernel: [ 102.081357] [] do_softirq+0x2f/0x47 Jun 21 09:38:58 localhost kernel: [ 102.081482] [] irq_exit+0x39/0x46 Jun 21 09:38:58 localhost kernel: [ 102.081608] [] do_IRQ+0x77/0x84 Jun 21 09:38:58 localhost kernel: [ 102.081644] [] common_interrupt+0x25/0x2c Jun 21 09:38:58 localhost kernel: [ 154.463644] CCID: Registered CCID 3 (ccid3) The code of sk_clone (net/core/sock.c) is: struct sock *sk_clone(const struct sock *sk, const gfp_t priority) { struct sock *newsk = sk_alloc(sk->sk_family, priority, sk->sk_prot, 0); if (newsk != NULL) { struct sk_filter *filter; memcpy(newsk, sk, sk->sk_prot->obj_size); /* SANITY */ sk_node_init(&newsk->sk_node); sock_lock_init(newsk); The relevant code is the sock_lock_init The code of sk_receive_skb (net/core/sock.c) is: int sk_receive_skb(struct sock *sk, struct sk_buff *skb) { int rc = NET_RX_SUCCESS; if (sk_filter(sk, skb, 0)) goto discard_and_relse; skb->dev = NULL; bh_lock_sock(sk); The relevant code is the bh_lock_sock. As I read this it is not a recursive lock as sk_clone is occurring second and is actually creating a new socket so they are trying to lock on different sockets. Can someone tell me whether I am correct in my thinking or not? If I am then I will work out how to tell the lock validator not to worry about it. Thanks, Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to submit a new module to linux kernel?
On 5/23/06, Erik Mouw <[EMAIL PROTECTED]> wrote: On Mon, May 22, 2006 at 03:18:12PM +0800, #ZHOU BIN# wrote: > I'm new in this mailing list. I implemented a new TCP congestion > control module for linux kernel 2.6.16.13. > Does anybody know how to apply for the integration of it into the > linux kernel? How long will this process take? See Documentation/SubmittingPatches in your kernel tree. I would also add that for this type of patch a peer reviewed paper outlining the congestion control work would be useful/needed. Every person wants to improve TCP and has written a congestion control mechanism (myself included!) but that doesn't mean it is worth including in the kernel (mine certainly isn't). In particular for TCP congestion control you need to show in what cases it is better than others, what cases it is worse and how fair is it compared to other TCP flows. Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: address pingable with interface down
So where's the linux networking faq? I've been lurking here long enough to know that there's no shortage of faqs, but there's no canonical netdev faq that i'm aware of. Maybe one should be started? Jason http://linux-net.osdl.org/index.php/ is the linux networking canonical wiki. I've added this FAQ under IPv4. I'm sure if this isn't the best place someone will shift it being a wiki :-) Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: latest -stable breaks Squid
On 5/4/06, Ben Greear <[EMAIL PROTECTED]> wrote: Herbert Xu wrote: > Dave Jones <[EMAIL PROTECTED]> wrote: > >>So I pushed out an update for Fedora Core 5 users yesterday >>that moved the kernel from 2.6.16.9 to 2.6.16.13. >>I've since heard "My network performance is awful", and worse >>yet, some apps seem broken as in the report below. >> >>Anyone have any ideas ? > > > Try reverting the e1000 truesize patch. Although the fix is 100% > correct, it might have a negative impact on user-space apps with > particuarly small rcvbuf settings. Prior to the fix, due to the > incorrect accounting we are essentially enlarging rcvbuf by as much > as 10 times. At least one of the reports shows problems with non e1000 NICs, so it's probably not just the e1000 change. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=190620 Ben Wouldn't it be more likely commit 5d0b6f2bdaf7e016e750cd24164a241512d968a3 as this touches net/ipv4/tcp_output.c and is also in same general area? -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [offlist] Re: [LARTC] how to do probabilistic packet loss in kernel?
On 4/20/06, George Nychis <[EMAIL PROTECTED]> wrote: > Hey Martin, > > I was able to do it with netem and its working great now. > > I've actually moved on to another challenge, I would like to drop > packets at the hardware level such as to see rate control. > Have a look at: http://linux-net.osdl.org/index.php/Netem#Rate_control Works well for me... Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to do probabilistic packet loss in kernel?
On 4/17/06, George P Nychis <[EMAIL PROTECTED]> wrote: > Hi, > > I am using iproute2 to setup fowarding, adding routes like "ip route add > 192.168.1.3 via 192.168.1.2" > > I was wondering where in the kernel I can insert probabilistic packet loss > only for forwarded packets? So that for instance I can drop 5% of all > forwarded packets? > Have a look at: http://linux-net.osdl.org/index.php/Netem -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Iperf support for DCCP and selectable TCP congestion control
Folks, I've just posted a patch at http://wand.net.nz/~iam4/software/congestion-iperf-2.0.2-1.diff which adds being able to select the TCP congestion control mechanism to iperf for TCP performance testing. Thanks to Angelo Castellani for writing this which I have tidied up a little and merged into the patch which also has DCCP support. Hope the netdev people don't mind me spamming the list but I find this quite a useful testing tool whenever testing TCP/DCCP changes to see if regression or progression is occurring... Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Writing a rate based transport protocol
> The qdiscs would ideally exist at the layer 2 / layer 3 boundary like > existing qdiscs, but the problem is getting the scheduling parameters down > that far. Perhaps a transport protocol could create a tagged route entry > with the appropriate parameters, the routing layer could assign skbs to it > by flow tag, and the qdisc could refer to the route entry and some sort of > skb sequence number to derive the appropriate scheduling information. > I'm trying to do something a little bit different but I send expiry time down through the msg structure with dccp_sendmsg and then check it in dccp_write_xmit and discard there. This is trivial to implement (even I managed it!) but I think you are wanting to do it down one layer. If you want further info or sourcecode then feel free to take this discussion further with me offline... Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Writing a rate based transport protocol
On 3/23/06, Mark Butler <[EMAIL PROTECTED]> wrote: > I understand that timed intervals between individual packets is not > realistic in general. What I have in mind is a fixed granularity > transmission timer, where packets are assigned to buckets, and > transmitted one bucket per timer expiration. Why is it not realistic? > > >From a protocol design point of view, the main question is which is > more expensive, rate based timer expiration, or generating ACKs at a > high enough rate to self clock. With 1 Gb/sec reliable transport > protocol, every other packet ACK generation, and Ethernet MTU size > packets, ACKs are generated every 30 usec on average. With Van Jacobsen > style pre-queuing a large percentage of them are wasted overhead, > because a long series of them accumulate in the prequeue before the > receiving thread is activated. Various others look at doing things like certain number of ACKs per RTT rather than per packet or fixed number of packets. For both using interpacket intervals and differing ACK strategies have a look at TFRC: http://www.icir.org/tfrc/ This seems to work quite well for me in all the testing I've done although I have only tested up to 100 Mbits - but this tested OK on 500 MHz machines so newer machines should handle faster rates well. Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Writing a rate based transport protocol
> The bigger problem is that too be effective rate control needs accurate > real time. Linux is doing better at real time, but still providing useful > high speed inter packet spacing is beyond the current capabilities. To get > around this I think most high speed 10G cards provide some form of rate > control > in firmware. > - At present most of the network timing for TCP (and probably other protocols) is in milliseconds for measures such as RTT. We found that when writing DCCP CCID3 that this did not provide enough granularity when you find interpacket interval by taking 1/transmit rate*constant As such we put quite a few things in microseconds. If you are serious about writing this have a look at net/dccp files and ccid3 in particular - for example we also put in a whole lot of integer division code in there which you will find useful. Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net-2.6.17 rebased...
And *** again... someone pointed out my mailer is wrapping lines. I'll go hide in the corner and beat myself up. This time attached as I can't get gmail to defeat line wrapping. I promise I'll get it right next patch so I don't humiliate myself quite so much next time Dave, If you get a chance can you push the ccid3 divide by zero fix upstream to Linus for 2.6.16 as it has no functionality changed and eliminates a nasty little bug... The commit for this is b6da19617f4ab610d3d90bcbdf65fa7e2b3d7b53 in your tree I have also put at end of this e-mail after reapplying on linus tree so above commit doesn't have fuzz... [DCCP] ccid3: Divide by zero fix In rare circumstances 0 is returned by dccp_li_hist_calc_i_mean which leads to a divide by zero in ccid3_hc_rx_packet_recv. Explicitly check for zero return now. Update copyright notice at same time. Found by Arnaldo. Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index aa68e0a..35d1d34 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -2,7 +2,7 @@ * net/dccp/ccids/ccid3.c * * Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand. - * Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]> + * Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]> * * An implementation of the DCCP protocol * @@ -1033,9 +1033,13 @@ static void ccid3_hc_rx_packet_recv(stru p_prev = hcrx->ccid3hcrx_p; /* Calculate loss event rate */ - if (!list_empty(&hcrx->ccid3hcrx_li_hist)) + if (!list_empty(&hcrx->ccid3hcrx_li_hist)) { + u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist); + /* Scaling up by 100 as fixed decimal */ - hcrx->ccid3hcrx_p = 100 / dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist); + if (i_mean != 0) + hcrx->ccid3hcrx_p = 100 / i_mean; + } if (hcrx->ccid3hcrx_p > p_prev) { ccid3_hc_rx_send_feedback(sk);
Re: net-2.6.17 rebased...
F**k - just pasted in the wrong file. Trying again On 3/2/06, Ian McDonald <[EMAIL PROTECTED]> wrote: > On 3/2/06, David S. Miller <[EMAIL PROTECTED]> wrote: > > > > This tree was getting crufty, so I rebased it today. > > It was actually a lot easier than I had anticipated. > > > > master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.17.git > > > Dave, > > If you get a chance can you push the ccid3 divide by zero fix upstream > to Linus for 2.6.16 as it has no functionality changed and eliminates > a nasty little bug... > > The commit for this is b6da19617f4ab610d3d90bcbdf65fa7e2b3d7b53 in your tree > I have also put at end of this e-mail after reapplying on linus tree so above commit doesn't have fuzz... [DCCP] ccid3: Divide by zero fix In rare circumstances 0 is returned by dccp_li_hist_calc_i_mean which leads to a divide by zero in ccid3_hc_rx_packet_recv. Explicitly check for zero return now. Update copyright notice at same time. Found by Arnaldo. Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index aa68e0a..35d1d34 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -2,7 +2,7 @@ * net/dccp/ccids/ccid3.c * * Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand. - * Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]> + * Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]> * * An implementation of the DCCP protocol * @@ -1033,9 +1033,13 @@ static void ccid3_hc_rx_packet_recv(stru p_prev = hcrx->ccid3hcrx_p; /* Calculate loss event rate */ - if (!list_empty(&hcrx->ccid3hcrx_li_hist)) + if (!list_empty(&hcrx->ccid3hcrx_li_hist)) { + u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist); + /* Scaling up by 100 as fixed decimal */ - hcrx->ccid3hcrx_p = 100 / dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist); + if (i_mean != 0) + hcrx->ccid3hcrx_p = 100 / i_mean; + } if (hcrx->ccid3hcrx_p > p_prev) { ccid3_hc_rx_send_feedback(sk); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net-2.6.17 rebased...
On 3/2/06, David S. Miller <[EMAIL PROTECTED]> wrote: > > This tree was getting crufty, so I rebased it today. > It was actually a lot easier than I had anticipated. > > master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.17.git > Dave, If you get a chance can you push the ccid3 divide by zero fix upstream to Linus for 2.6.16 as it has no functionality changed and eliminates a nasty little bug... The commit for this is b6da19617f4ab610d3d90bcbdf65fa7e2b3d7b53 in your tree I have also put at end of this e-mail after reapplying on linus tree so above commit doesn't have fuzz... [DCCP] ccid3: Divide by zero fix In rare circumstances 0 is returned by dccp_li_hist_calc_i_mean which leads to a divide by zero in ccid3_hc_rx_packet_recv. Explicitly check for zero return now. Update copyright notice at same time. Found by Arnaldo. Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> --- 2b82f96f1291c42ee9485465801f5f51897bec64 +++ ff426a9009993445a15cfcac6c88be1e39e07913 @@ -2,7 +2,7 @@ * net/dccp/ccids/ccid3.c * * Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand. - * Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]> + * Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]> * * An implementation of the DCCP protocol * @@ -1014,9 +1014,13 @@ static void ccid3_hc_rx_packet_recv(stru p_prev = hcrx->ccid3hcrx_p; /* Calculate loss event rate */ - if (!list_empty(&hcrx->ccid3hcrx_li_hist)) + if (!list_empty(&hcrx->ccid3hcrx_li_hist)) { + u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist); + /* Scaling up by 100 as fixed decimal */ - hcrx->ccid3hcrx_p = 100 / dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist); + if (i_mean != 0) + hcrx->ccid3hcrx_p = 100 / i_mean; + } if (hcrx->ccid3hcrx_p > p_prev) { ccid3_hc_rx_send_feedback(sk); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 losing promisc rx_mode bit
On 2/24/06, Michael Chan <[EMAIL PROTECTED]> wrote: > This is a known problem caused by ASF or IPMI firmware overwriting the > promiscuous mode bit. I will have someone contact you to get the > firmware upgraded. > > Thanks. > Thinking out loud here without reading source... - can you check the version of the firmware and make noise if they have a version like this one? Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mtu probing: move tcp-specific data out of inet_connection_sock
> No Ian. John was the one that moved those fields out of tcp.h and into > inet_connection_sock.h: > > http://master.kernel.org/git/?p=linux/kernel/git/acme/net-2.6.17.git;a=commit;h=55bb045aa49d5e5234c6213d1ed0bfef0c636971 > > When we get to fix the DCCP PMTU code we can revisit if this move is > interesting. > OK. Will teach me to hit send without researching my facts. Sorry to all. Carry on as normal... Ian -- Ian McDonald http://wand.net.nz/~iam4 WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mtu probing: move tcp-specific data out of inet_connection_sock
On 2/17/06, John Heffner <[EMAIL PROTECTED]> wrote: > This moves some TCP-specific MTU probing state out of > inet_connection_sock back to tcp_sock. > > Signed-off-by: John Heffner <[EMAIL PROTECTED]> > Why do you want to do this? What benefit does it give? I would like to see PMTU done in DCCP and this seems a better place and probably why Arnaldo put it there Ian -- Ian McDonald http://wand.net.nz/~iam4 WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KERNEL: assertion (!sk->sk_forward_alloc) failed
On 2/10/06, Boris B. Zhmurov <[EMAIL PROTECTED]> wrote: > Hello, Ian McDonald. > > On 09.02.2006 22:25 you said the following: > > > Is it possible for you to download 2.6.16-rc2 or similar and see if it > > goes away? > > It'll be better, if I get only patch fixs that problem, not all 2.6.16-rc2. > Oops I didn't read Jesse's message earlier properly. That patch which probably fixed it is (from his message): I think the commit id that is missing from 2.6.14.X is fb5f5e6e0cebd574be737334671d1aa8f170d5f3 but here is the web link if i gave the wrong info http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fb5f5e6e0cebd574be737334671d1aa8f170d5f3 -- Ian McDonald http://wand.net.nz/~iam4 WAND Network Research Group University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KERNEL: assertion (!sk->sk_forward_alloc) failed
On 2/10/06, Boris B. Zhmurov <[EMAIL PROTECTED]> wrote: > Hello, Jesse Brandeburg. > > On 08.02.2006 23:07 you said the following: > > > whats the relevance of e1000? > > > > I though Herbert had fixed these > > Nope :( I had this messages on 2.6.14.2 and now I have it on 2.6.15.3. > For what it's worth I had these messages for a while and they got fixed 2 or 3 weeks ago from memory in Dave's 2.6.16 net tree or net2.6 tree. Is it possible for you to download 2.6.16-rc2 or similar and see if it goes away? Ian -- Ian McDonald http://wand.net.nz/~iam4 WAND Network Research Group University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPVS] Shrink ip_vs_*.c includes
> Unfortunately this seems like it is going to be more tedious than > we first thought. I would guess writing some sort of tool to analyse > symbols and headers is the way to go. Else it seems more or less > impossible to clean up headers, even on a small scale. > Search the netdev archives or look at Arnaldo's kernel.org space as he has done some scripts to do this once. -- Ian McDonald http://wand.net.nz/~iam4 WAND Network Research Group University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [e2e] FW: Performance evaluation of high speed TCPs
> Seriously, where's the value in comparing buggy implementations - isn't > that just a waste of all our time ? If we are genuine about wanting to > understand tcp performance then I think we just have to take the hit from > issues such as this that are outside all of our control. > A real part of the problem here is that the Linux doesn't have a full TCP testing suite and doesn't have build checking to check for regressions in TCP variants. As I understand the only thing tested in nightly builds is throughput for the default TCP. Stephen Hemminger has done some work on TCP Probes but this is where I think real progress could be made in improving Linux TCP. I may get around to doing this myself at some point in my research but would welcome other people doing it also! Ian -- Ian McDonald http://wand.net.nz/~iam4 WAND Network Research Group University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] 1/1 net/core: use USEC_PER_SEC and line spacing
> Yup, that is my current understanding. Heck: > > 1. using hrtimers in DCCP > 2. Jumping into VJ's net channels to implement Eddie Kohler packet > rings DCCP API > 3. reviewing Andrea's CCID2 code and merging it > 4. reviewing Ian's work on using sk_write_queue > 5. reviewing/merging Andrea's feature negotiation patches > 6. making DCCP rock solid (Hi Sorbo 8-) ) > 7. getting ostra to be easy to use and usable with userspace code > 8. Real Customer Work(tm) > > too many things on my plate right now 8) > Hold off on #4 as I've found that I'm only creating a queue with depth 1. Reworking at present Ian -- Ian McDonald http://wand.net.nz/~iam4 WAND Network Research Group University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] 1/1 net/core: use USEC_PER_SEC and line spacing
On 2/1/06, Herbert Xu <[EMAIL PROTECTED]> wrote: > Ian McDonald <[EMAIL PROTECTED]> wrote: > > > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -162,7 +162,8 @@ static int sock_set_timeout(long *timeo_ > >if (tv.tv_sec == 0 && tv.tv_usec == 0) > >return 0; > >if (tv.tv_sec < (MAX_SCHEDULE_TIMEOUT/HZ - 1)) > > - *timeo_p = tv.tv_sec*HZ + > > (tv.tv_usec+(100/HZ-1))/(100/HZ); > > + *timeo_p = tv.tv_sec*HZ + > > + (tv.tv_usec+(USEC_PER_SEC/HZ-1))/(USEC_PER_SEC/HZ); > > Is there a macro for this calculation? If not could we add one? > I don't know if there is or not. There is similar code in DCCP. I think the way forward is to use hrtimers (http://lwn.net/Articles/167897/) as there are currently problems with NTP changing time which affects jiffies. In the meantime this patch makes the code a little bit tidier so I think it should go in... Ian -- Ian McDonald http://wand.net.nz/~iam4 WAND Network Research Group University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] 1/1 net/core: use USEC_PER_SEC and line spacing
This puts in a constant for USEC_PER_SEC instead of 100. Also fixing > 80 character lines in a couple of places Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> diff --git a/net/core/sock.c b/net/core/sock.c index 6e00811..1d06ec9 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -162,7 +162,8 @@ static int sock_set_timeout(long *timeo_ if (tv.tv_sec == 0 && tv.tv_usec == 0) return 0; if (tv.tv_sec < (MAX_SCHEDULE_TIMEOUT/HZ - 1)) - *timeo_p = tv.tv_sec*HZ + (tv.tv_usec+(100/HZ-1))/(100/HZ); + *timeo_p = tv.tv_sec*HZ + + (tv.tv_usec+(USEC_PER_SEC/HZ-1))/(USEC_PER_SEC/HZ); return 0; } @@ -561,7 +562,8 @@ int sock_getsockopt(struct socket *sock, v.tm.tv_usec = 0; } else { v.tm.tv_sec = sk->sk_rcvtimeo / HZ; - v.tm.tv_usec = ((sk->sk_rcvtimeo % HZ) * 100) / HZ; + v.tm.tv_usec = ((sk->sk_rcvtimeo % HZ) + * USEC_PER_SEC) / HZ; } break; @@ -572,7 +574,8 @@ int sock_getsockopt(struct socket *sock, v.tm.tv_usec = 0; } else { v.tm.tv_sec = sk->sk_sndtimeo / HZ; - v.tm.tv_usec = ((sk->sk_sndtimeo % HZ) * 100) / HZ; + v.tm.tv_usec = ((sk->sk_sndtimeo % HZ) + * USEC_PER_SEC) / HZ; } break; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html