Re: [PATCH 3/6] [DCCP]: Bug-Fix - AWL was never updated

2008-01-28 Thread Ian McDonald
On Jan 28, 2008 11:16 PM, Gerrit Renker <[EMAIL PROTECTED]> wrote:
> This patch was triggered by finding the  following message in the syslog:
>  "kernel: dccp_check_seqno: DCCP: Step 6 failed for DATAACK packet, [...]
>P.ackno exists or LAWL(82947089) <= P.ackno(82948208)
> <= S.AWH(82948728), sending SYNC..."
>
> Note the difference between AWH and AWL: it is 1639 packets (while Sequence
> Window was actually at 100). A closer look at the trace showed that
> LAWL = AWL = 82947089 equalled the ISS on the Response.
>
> The cause of the bug was that AWL was only ever set on the first packet - the
> DCCP-Request sent by dccp_v{4,6}_connect().
>
> The fix is to continually update AWL/AWH with each new packet (as GSS=AWH).
>
> In addition, AWL/AWH are now updated to enforce more stringent checks on the
> initial sequence numbers when connecting:
>  * AWL is initialised to ISS and remains at this value;
>  * AWH is always set to GSS (via dccp_update_gss());
>  * so on the first Request: AWL =  AWH = ISS,
>and on the n-th Request: AWL = ISS, AWH = ISS+n.
>
> As a consequence, only Response packets that refer to Requests sent by this
> host will pass, all others are discarded. This is the intention and in effect
> implements the initial adjustments for AWL as specified in RFC 4340, 7.5.1.
>
> Note: A problem that remains is that ISS can potentially be under-run even 
> after
>   the initial handshake; this is addressed a subsequent patch.
>
> Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>

Yes I had seen this and had worked out that variables weren't being
updated as they should be but hadn't got as far as a fix before I
stopped my coding days so much :-(

Acked-by: Ian McDonald <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches

2007-12-03 Thread Ian McDonald
On 12/3/07, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote:
> WARNING: After reading some messages from Ingo Molnar on lkml I think we 
> should really
>  trim the number of lists we use for kernel development. And since I 
> moved
>  back to using mutt for reading e-mails, something I should have 
> never, ever
>  stopped doing, I guess we should move the DCCP discussions to netdev,
>  where we hopefully can get more people interested and reviewing the 
> work we
>  do, so please consider moving DCCP discussion to 
> netdev@vger.kernel.org,
>  where lots of smart networking folks are present and can help our 
> efforts
>  on turning RFCs to code.
>
I (and others too) don't necessarily have time to read netdev so would
vote on keeping dccp. I would totally agree to making sure that
cross-post to netdev as well as dccp.

Ian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Will RFC1146 (tcp alternative checksum options) be implemented in Linux tcp stack ?

2007-10-15 Thread Ian McDonald
On 10/16/07, Yanping Du <[EMAIL PROTECTED]> wrote:
> Hi,
>
>   We found the standard 16-bit tcp checksum is not
> strong enough in some cases. Is there any roadmap on
> implementing RFC1146 (tcp alternative checksum
> options) in Linux tcp stack ? If yes, how soon will
> that be in ?
>
> Please kindly copy reply to my email address as I've
> not subscribed the netdev@ mailing list at present.
>
>   http://www.faqs.org/rfcs/rfc1146.html
>
> Thanks!
> -Yanping
>
>
Yanping,

The way that features get added to Linux is that someone interested
writes it. You can't just say - is this on the roadmap, as there is no
roadmap really!

I have been interested in network features from an academic point of
view and so I wrote what I needed (along with others) and that was
added to the Linux kernel.

So have a go at implementing it if you consider it important and come
back here with some patches. Then others will help review it until the
patches are good.

I will let others comment on whether the checksums are a good idea or not.

Ian
-- 
Web1: http://wand.net.nz/~iam4/
Web2: http://www.jandi.co.nz
Blog: http://iansblog.jandi.co.nz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make _minimum_ TCP retransmission timeout configurable

2007-08-29 Thread Ian McDonald
On 8/30/07, David Miller <[EMAIL PROTECTED]> wrote:
> In fact this is a great example why we don't treat RFCs as dictations
> from the gods.  They are often wrong, impractical, or full of fatal
> flaws.
>
Correct - they often have flaws in them, just like all documents. If
that is the case we should try and get the RFCs fixed. I've raised
this in a discussion in the ICCRG group and see if I get any sort of
response.

Ian
-- 
Web1: http://wand.net.nz/~iam4/
Web2: http://www.jandi.co.nz
Blog: http://iansblog.jandi.co.nz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make _minimum_ TCP retransmission timeout configurable

2007-08-29 Thread Ian McDonald
On 8/30/07, David Miller <[EMAIL PROTECTED]> wrote:
> From: "Ian McDonald" <[EMAIL PROTECTED]>
> Date: Thu, 30 Aug 2007 09:32:38 +1200
>
> > So I'm suspecting that the default should be changed to 1000 to match
> > the RFC which would solve this issue. I note that the RFC is a SHOULD
> > rather than a MUST. I had a quick look around and not sure why Linux
> > overrides the RFC on this one.
>
> Everyone uses this value, even BSD since ancient times.
>
> None of the research folks want to commit to saying a lower value is
> OK, even though it's quite clear that on a local 10 gigabit link a
> minimum value of even 200 is absolutely and positively absurd.
>
Understand what you are saying. That is why I questioned as 200 msecs
makes no sense on a LAN with < 1 msec RTT. So if the current is
ridiculous and 1000 is even more so, why do we use? Just because that
is how TCP is written I'm guessing.

I know that in DCCP CCID3 the RTO is 4 x RTT (from memory - it might
be a slight variation) but we ended up putting a minimum on it as you
also face a problem if it fires too frequently (i.e. link is in
usecs).

I might ask around on research lists and see why this issue has never
been revisited.

Now to the original issue - high RTT links. If that is an issue, and I
believe it would be, then it's probably better to do this on a per
route basis or similar, although then we're becoming a defacto X x rtt
type setup. Rereading the RFC this actually doesn't seem prohibited
and here is the code from DCCP CCID3 that we use:

/*
 * Update timeout interval for the nofeedback timer.
 * We use a configuration option to increase the lower bound.
 * This can help avoid triggering the nofeedback timer too
 * often ('spinning') on LANs with small RTTs.
 */
hctx->ccid3hctx_t_rto = max_t(u32, 4 * hctx->ccid3hctx_rtt,
   CONFIG_IP_DCCP_CCID3_RTO *
   (USEC_PER_SEC/1000));
/*
 * Schedule no feedback timer to expire in
 * max(t_RTO, 2 * s/X)  =  max(t_RTO, 2 * t_ipi)
 */
t_nfb = max(hctx->ccid3hctx_t_rto, 2 * hctx->ccid3hctx_t_ipi);

ccid3_pr_debug("%s(%p), Scheduled no feedback timer to "
   "expire in %lu jiffies (%luus)\n",
   dccp_role(sk),
   sk, usecs_to_jiffies(t_nfb), t_nfb);

sk_reset_timer(sk, &hctx->ccid3hctx_no_feedback_timer,
   jiffies + usecs_to_jiffies(t_nfb));

Maybe the TCP code could do this also (with a sysctl to turn behaviour
off and on) and then it would save system administrators having to
"tune" the TCP stack if they want this sort of behaviour.

Ian
-- 
Web1: http://wand.net.nz/~iam4/
Web2: http://www.jandi.co.nz
Blog: http://iansblog.jandi.co.nz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make _minimum_ TCP retransmission timeout configurable

2007-08-29 Thread Ian McDonald
On 8/30/07, Rick Jones <[EMAIL PROTECTED]> wrote:
> Enable configuration of the minimum TCP Retransmission Timeout via
> a new sysctl "tcp_rto_min" to help those who's networks (eg cellular)
> have quite variable RTTs avoid spurrious RTOs.
>
> Signed-off-by: Rick Jones <[EMAIL PROTECTED]>
> Signed-off-by: Lamont Jones <[EMAIL PROTECTED]>
> ---
>
> diff -r 1559df81a153 Documentation/networking/ip-sysctl.txt
> --- a/Documentation/networking/ip-sysctl.txtMon Aug 13 05:00:33 2007 +
> +++ b/Documentation/networking/ip-sysctl.txtWed Aug 22 10:42:55 2007 -0700
> @@ -339,6 +339,13 @@ tcp_rmem - vector of 3 INTEGERs: min, de
> selected receiver buffers for TCP socket. This value does not override
> net.core.rmem_max, "static" selection via SO_RCVBUF does not use this.
> Default: 87380*2 bytes.
> +
> +tcp_rto_min - INTEGER
> +   The minimum value for the TCP Retransmission Timeout, expressed
> +   in milliseconds for the convenience of the user.
> +   This is bounded at the low-end by TCP_RTO_MIN and by TCP_RTO_MAX at
> +   the high-end.
> +   Default: 200.
>

Hmmm... RFC2988 says:
   (2.4) Whenever RTO is computed, if it is less than 1 second then the
 RTO SHOULD be rounded up to 1 second.

 Traditionally, TCP implementations use coarse grain clocks to
 measure the RTT and trigger the RTO, which imposes a large
 minimum value on the RTO.  Research suggests that a large
 minimum RTO is needed to keep TCP conservative and avoid
 spurious retransmissions [AP99].  Therefore, this
 specification requires a large minimum RTO as a conservative
 approach, while at the same time acknowledging that at some
 future point, research may show that a smaller minimum RTO is
 acceptable or superior.

I went and had a look and this RFC has not been obsoleted. RFC3390
also backs this assertion up.

So I'm suspecting that the default should be changed to 1000 to match
the RFC which would solve this issue. I note that the RFC is a SHOULD
rather than a MUST. I had a quick look around and not sure why Linux
overrides the RFC on this one.

Ian
-- 
Web1: http://wand.net.nz/~iam4/
Web2: http://www.jandi.co.nz
Blog: http://iansblog.jandi.co.nz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.22-rc5] TCP: Make TCP_RTO_MAX a variable

2007-07-12 Thread Ian McDonald

On 7/12/07, OBATA Noboru <[EMAIL PROTECTED]> wrote:

> Ian McDonald wrote:
> > On 6/26/07, OBATA Noboru <[EMAIL PROTECTED]> wrote:
> >
> >> From: OBATA Noboru <[EMAIL PROTECTED]>
> >>
> >> Make TCP_RTO_MAX a variable, and allow a user to change it via a
> >> new sysctl entry /proc/sys/net/ipv4/tcp_rto_max.  A user can
> >> then guarantee TCP retransmission to be more controllable, say,
> >> at least once per 10 seconds, by setting it to 10.  This is
> >> quite helpful on failover-capable network devices, such as an
> >> active-backup bonding device.  On such devices, it is desirable
> >> that TCP retransmits a packet shortly after the failover, which
> >> is what I would like to do with this patch.  Please see
> >> Background and Problem below for rationale in detail.
> >>
> > RFC2988 says this:
> >   (2.4) Whenever RTO is computed, if it is less than 1 second then the
> > RTO SHOULD be rounded up to 1 second.
> >
> > Traditionally, TCP implementations use coarse grain clocks to
> > measure the RTT and trigger the RTO, which imposes a large
> > minimum value on the RTO.  Research suggests that a large
> > minimum RTO is needed to keep TCP conservative and avoid
> > spurious retransmissions [AP99].  Therefore, this
> > specification requires a large minimum RTO as a conservative
> > approach, while at the same time acknowledging that at some
> > future point, research may show that a smaller minimum RTO is
> > acceptable or superior.
> >
> >   (2.5) A maximum value MAY be placed on RTO provided it is at least 60
> > seconds.
> >
> > Your code doesn't seem to meet requirements of section 2.5 as your
> > minimum is 1 second.
>
> (At the risk of having another Emily Litella moment entering a
> discussion late...)
>
> I thought that those sorts of things were generally referring to the
> _default_ setting?

I believe so.  And the requirement of section 2.5 is rather weak
(it says "MAY").


It is weak in saying you don't have to have a maximum, but if you do
have one IT IS AT LEAST 60 seconds (emphasis mine). So the time period
is a strong requirement if you decide to implement - which is a weak
requirement.

Ian
--
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] the overdue eepro100 removal

2007-07-09 Thread Ian McDonald

On 7/10/07, Bill Davidsen <[EMAIL PROTECTED]> wrote:

If there were any benefit to removing a working driver I would at least
be able to see it as a resources issue, but as far as I can see you just
seem to have a personal preference for the e100 driver and want to force
others to use it because you are so much better able to decide what
users need than the system administrators. That's one of the reasons
people choose open source, because they have a choice, and can use
what's best for them.


And be thankful it is open source. If Microsoft drops a driver in
Vista you don't have a choice. If Linux drops a driver you can go and
patch it back in if you feel that passionate about it.

Unfortunately things change in life but at least you have the choice
of being stuck with the old bit-rotting driver if you really want to.

Ian
--
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.22-rc5] TCP: Make TCP_RTO_MAX a variable

2007-06-25 Thread Ian McDonald

On 6/26/07, OBATA Noboru <[EMAIL PROTECTED]> wrote:

From: OBATA Noboru <[EMAIL PROTECTED]>

Make TCP_RTO_MAX a variable, and allow a user to change it via a
new sysctl entry /proc/sys/net/ipv4/tcp_rto_max.  A user can
then guarantee TCP retransmission to be more controllable, say,
at least once per 10 seconds, by setting it to 10.  This is
quite helpful on failover-capable network devices, such as an
active-backup bonding device.  On such devices, it is desirable
that TCP retransmits a packet shortly after the failover, which
is what I would like to do with this patch.  Please see
Background and Problem below for rationale in detail.


RFC2988 says this:
  (2.4) Whenever RTO is computed, if it is less than 1 second then the
RTO SHOULD be rounded up to 1 second.

Traditionally, TCP implementations use coarse grain clocks to
measure the RTT and trigger the RTO, which imposes a large
minimum value on the RTO.  Research suggests that a large
minimum RTO is needed to keep TCP conservative and avoid
spurious retransmissions [AP99].  Therefore, this
specification requires a large minimum RTO as a conservative
approach, while at the same time acknowledging that at some
future point, research may show that a smaller minimum RTO is
acceptable or superior.

  (2.5) A maximum value MAY be placed on RTO provided it is at least 60
seconds.

Your code doesn't seem to meet requirements of section 2.5 as your
minimum is 1 second.

I think if you're trying to solve the bonding issue then you should
solve that issue, not hack the TCP implementation as that opens it up
to abuse in other ways.

Ian
--
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcp_probe: __attribute__ string location

2007-06-05 Thread Ian McDonald

On 6/6/07, David Miller <[EMAIL PROTECTED]> wrote:

From: Randy Dunlap <[EMAIL PROTECTED]>
Date: Tue, 5 Jun 2007 18:01:41 -0700

> From: Randy Dunlap <[EMAIL PROTECTED]>
>
> gcc doesn't like the location of the __attribute__ string here:
> net/ipv4/tcp_probe.c:83: warning: empty declaration
>
> so move it to before the function and all is well.
>
> Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>

Yeah I noticed this one too and a similar fix is in
my net-2.6 GIT tree, but thanks anyways Randy.


I'm wondering if either of you can actually load tcp_probe at present.
We had reports on dccp mailing list that dccp_probe and tcp_probe
can't load at present and produce a back trace. It appears related to
the jprobe stuff according to Arnaldo.

If the bug reporter or Arnaldo doesn't follow up on it I'll track it
down a little more later and post to correct place.

I'll also copy this change across to DCCP at sometime as well as
several others that we haven't transferred across as well.

Ian
--
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-10 Thread Ian McDonald

On 5/11/07, Vlad Yasevich <[EMAIL PROTECTED]> wrote:

The win might be biggest on a system were a lot of applications send a lot of
small packets.  Some number will aggregate in the prio queue and then get shoved
into a driver in one go.


That's assuming that the device doesn't run out of things to send first


But...  this is all conjecture until we see the code.


Agree

--
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-10 Thread Ian McDonald

On 5/11/07, Vlad Yasevich <[EMAIL PROTECTED]> wrote:

>> May be for TCP?  What about other protocols?
>
> There are other protocols?-)  True, UDP, and I suppose certain modes of
> SCTP might be sending streams of small packets, as might TCP with
> TCP_NODELAY set.
>
> Do they often queue-up outside the driver?

Not sure if DCCP might fall into this category as well...


Yes DCCP definitely can queue outside the driver.


I think the idea of this patch is gather some number of these small packets and
shove them at the driver in one go instead of each small packet at a time.

I might be helpful, but reserve judgment till I see more numbers.

-vlad


As I see this proposed patch it is about reducing the number of "task
switches" between the driver and the protocol. I use task switch in
speech marks as it isn't really as is in the kernel. So in other words
we are hoping that spending more time in each area would keep the
cache hot and work to be done if locks held. This of course requires
that the complexity added doesn't outweigh the gains - otherwise you
could end up in a worse scenario where the driver doesn't send packets
because the protocol is busy linking them.

As far as I can tell you're not combining packets?? This would
definitely break UDP/DCCP which are datagram based.

Will be interesting to see results but I'm a little sceptical.

--
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][SOCK]: shrink struct sock

2007-05-04 Thread Ian McDonald

On 5/4/07, David Miller <[EMAIL PROTECTED]> wrote:

sk_buff_head is due for being killed from the whole tree.  Nobody
really needs the qlen, few things really need the lock, and those that
do can define their own as needed :-)


I've got out of tree research code that uses the qlen quite
significantly. However it's not high speed networking so can compute
it myself if needed...
--
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: {Spam?} Re: {Spam?} [PATCH] NET: Remove obsolete traffic shaper code.

2007-04-14 Thread Ian McDonald

On 4/15/07, Robert P. J. Day <[EMAIL PROTECTED]> wrote:

in fact, according to this:

http://lkml.org/lkml/2006/1/13/139

that notice was put in the feature removal file well over a year ago,
during 2.6.15.  so that would seem to be more than adequate time for
everyone to prepare for it.  but it must have been deleted from that
file since then as well.


Yes and that was never merged and so was resent on January 19th, 2006:
http://www.nabble.com/-2.6-patch--schedule-SHAPER-for-removal-t949871.html

At that point people debated about it being too short notice and the
patch never went in.

I therefore think we can't just remove with NO notice.

Ian
--
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: {Spam?} [PATCH] NET: Remove obsolete traffic shaper code.

2007-04-14 Thread Ian McDonald

On 4/15/07, Robert P. J. Day <[EMAIL PROTECTED]> wrote:


Remove the obsolete code for the traffic shaper.

Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>


Apart from the merits of removing this which I can't comment on, I
thought the usual procedure was to place a removal in
Documentation/feature-removal-schedule.txt to notify people of what is
going to be removed. Then wait the period you determine there and then
remove.

Ian
--
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/3] 2.6.21-rc6: known regressions

2007-04-13 Thread Ian McDonald

On 4/14/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:


Note: Ingo also reports what looks like a memory corruption due to
the 6b6b6b6b pattern on presumably the same box.

The 6b6b6b6b pattern is POISON_FREE, implying some kind of slab misuse,
most likely a use-after-free, although possibly just due to overrunning a
slab into the next one or something like that.

What I'm leading up to is that I'm wondering if these mysterious network
driver bugs aren't due to the network drivers themselves, but due to some
higher-level problem. I think the hangs that Ingo sees with forcedeth were
preceded by mysterious and "impossible" NULL pointer oopses. Ingo?

Davem - have there been network infrastructure changes that migt be
suspect? Jeff and/or Greg - anything in the generic network driver/device
driver level? We had some trouble earlier with the transition to the
driver core, and kref miscounting. Related? The last Oops Ingo saw was a
module refcounting one, iirc.

It does seem networking related somehow. Yeah, it could be obviously be a
combination of independent bugs both in e1000/ and forcedeth drivers, but
maybe there is something in common here...


I don't know if this is a red herring or not but I reported on March
13th slab corruption and it looked like file_free_rcu - these are
fairly recent changes I think (rcu)?

Anyway original message is at http://lkml.org/lkml/2007/3/12/364

My apologies if this is not related.

Ian
--
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recent wireless breakage (ipw2200, iwconfig, NetworkManager)

2007-03-04 Thread Ian McDonald

On 3/5/07, Matt Mackall <[EMAIL PROTECTED]> wrote:

> This is due to the recent sysfs restructuring I think. IIRC the fix is
> to upgrade hal to a current git version.

If that's the cause, the fix is to back out whatever was done to break
userspace. Breaking userspace is not ok. Upgrading from 2.6.x to
2.6.x+1 should not entail replacing substantial parts of userspace,
especially with NOT-EVEN-FRAKKING-RELEASED-YET CODE.

I will try a new HAL when it shows up in Debian/unstable and not a
moment sooner.


But you're running a kernel that's not in Debian/unstable so this
seems a bit hypocritical.

When you work with bleeding edge kernels you have to be prepared to
work around things. Hell for ages git wasn't in Debian - unstable
even, udev would break things etc.

Just my 2c worth.
--
Web: http://wand.net.nz/~iam4
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: all syscalls initially taking 4usec on a P4? Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?

2007-02-20 Thread Ian McDonald

On 2/21/07, bert hubert <[EMAIL PROTECTED]> wrote:

I'm trying to figure out which processes have the most impact, I had already
killed anything non-essential. But that still leaves 140 pids.

Bert


That sounds way too many pids. I run a script to shut down processes
when I do testing as it makes a HUGE difference to my timing of things
which can be quite critical.

Here's my list of 46 and that includes me sshing into a box and
checking for processes:
UnIDPID  PPID CMD
root 1 0 init [2]
root 2 1 [ksoftirqd/0]
root 3 1 [watchdog/0]
root 4 1 [events/0]
root 5 1 [khelper]
root 6 1 [kthread]
root40 6 [kblockd/0]
root41 6 [kacpid]
root   110 6 [cqueue/0]
root   111 6 [ata/0]
root   112 6 [ata_aux]
root   113 6 [kseriod]
root   135 6 [rt-test-0]
root   137 6 [rt-test-1]
root   139 6 [rt-test-2]
root   141 6 [rt-test-3]
root   143 6 [rt-test-4]
root   145 6 [rt-test-5]
root   147 6 [rt-test-6]
root   149 6 [rt-test-7]
root   151 6 [pdflush]
root   152 6 [pdflush]
root   153 6 [kswapd0]
root   154 6 [aio/0]
root   838 6 [kedac]
root   843 6 [kjournald]
root  1720 6 [ksuspend_usbd]
root  1721 6 [khubd]
root  1741 6 [kpsmoused]
root  2544 1 /sbin/syslogd
root  2554 1 /sbin/klogd -x
root  2851 1 /usr/sbin/inetd
root  2863 1 /usr/sbin/sshd
ntp   2954 1 /usr/sbin/ntpd -p /var/run/ntpd.pid -u 111:111 -g
root  3061 1 /bin/login --
root  3062 1 /sbin/getty 38400 tty2
root  3063 1 /sbin/getty 38400 tty3
root  3064 1 /sbin/getty 38400 tty4
root  3065 1 /sbin/getty 38400 tty5
root  3066 1 /sbin/getty 38400 tty6
ian   3083  3061 -bash
root 21518  2863 sshd: ian [priv]
ian  21520 21518 sshd: [EMAIL PROTECTED]/1
ian  21521 21520 -bash
ian  21747 21521 ps -ef

--
Web: http://wand.net.nz/~iam4
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/3] tcp: remove experimental variants from default list

2007-02-12 Thread Ian McDonald

On 2/13/07, David Miller <[EMAIL PROTECTED]> wrote:

This is not the internet of 15 years ago, please wake up everyone.
We cannot sit on eggs for 5 years to make sure they hatch perfectly
like was previously possible.


OK. I get the point. I am more conservative by nature and more of an academic.

Now there's been some explanation I'm happier for the change to go
ahead. I just think changes like this that affect the Internet should
be discussed a little. Now it's been discussed a little I feel better.

Ian
--
Web: http://wand.net.nz/~iam4
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/3] tcp: remove experimental variants from default list

2007-02-12 Thread Ian McDonald

On 2/13/07, Baruch Even <[EMAIL PROTECTED]> wrote:

* Stephen Hemminger <[EMAIL PROTECTED]> [070212 18:04]:
> The TCP Vegas implementation is buggy, and BIC is too agressive
> so they should not be in the default list. Westwood is okay, but
> not well tested.

Since no one really agrees on the relative merits and problems of the
different algorithms and since the users themselves dont know, dont care
and have no clue on what should be the correct behaviour to report bugs
(see the old bic bugs, the htcp bugs, the recent sack bugs) I would
suggest to avoid making the whole internet a guinea pig and get back to
reno. If someone really needs to push high BDP flows he should test it
himself and choose what works for his kernel at the time.

For myself and anyone who asks me I recommend to set the default to
reno. For the few who really need high speed flows, they should test
kernel and protocol combination.

Baruch


I agree wholeheartedly with Baruch. If we are going to remove BIC as
default we should go back to Reno as Cubic is even less tested in
production use than BIC.

Unless of course the papers you saw at PFLDNET showed that Cubic was a
really good choice and you want to point us to those papers.

Ian
--
Web: http://wand.net.nz/~iam4
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Status of kernel.org servers??

2006-11-16 Thread Ian McDonald

I've searched lkml archives but can't find anything there apart from
one person complaining.

Can anybody basically tell me how to get access to git trees in a way
that works at present?

I've tried git://git.kernel.org, git://git2.kernel.org,
http://master.kernel.org, http://kernel.org all without success.

Can anybody point to whats going on as well at present and a
timeline/plan to resolve?

Thanks,

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Check if user has CAP_NET_ADMIN to change congestion control algorithm

2006-10-26 Thread Ian McDonald

On 10/27/06, David Miller <[EMAIL PROTECTED]> wrote:

From: "Ian McDonald" <[EMAIL PROTECTED]>
Date: Fri, 27 Oct 2006 12:59:30 +1300

> I don't agree with this at all. I would love Firefox, BitTorrent etc
> to implement usage of TCP-LP for example so they use "unused"
> bandwidth only.
>
> With this change applications can't do this.
>
> If we are going to restrict by capabilities then I think we should
> only restrict module loading - this way the admin of the box can
> decide what algorithms can be used.

You are using an example of a (supposedly) safe case of this
as a justification for allowing all cases.

It is bad, very bad, to allow arbitrary users to select arbitrary
congestion control algorithms.  It is just as bad as allowing them to
disable congestion control completely if that were an option.


OK understand your point here but I think low priority TCP has its
use. Don't agree it is just as bad, but it is bad under the wrong
circumstances - it's still better than UDP which has no congestion
control...

Don't want to make it over complicated though.

I think the most sense would be to restrict it as shown as tcp-lp is
the exception and allow tcp-lp via another mechanism. That is a
situation where the user could specify how low priority they want the
traffic to be... If I ever get enough time I'll have a go at it but
can't see it this year :-(

It actually makes more sense to tie the congestion control algorithm
to the route/destination IP if we are going to change it but that is a
whole another exercise in itself.


If someone, for example, builds all the algorithms statically into
their kernel, for testing as root, this lets all users on the machine
do the same which is not right.


This is the state at present as I understand it. However that doesn't
make it right.
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Check if user has CAP_NET_ADMIN to change congestion control algorithm

2006-10-26 Thread Ian McDonald

On 10/27/06, Hagen Paul Pfeifer <[EMAIL PROTECTED]> wrote:


Check if user has CAP_NET_ADMIN capability to change congestion control
algorithm.

Under normal circumstances a application programmer doesn't have enough
information to choose the "right" algorithm (expect he is the pchar/pathchar
maintainer). At 99.9% only the local host administrator has the knowledge to
select a proper standard, system-wide algorithm (the remaining 0.1% are
for testing purpose). If we let the user select an alternative algorithm we
introduce one potential weak spot - so we ban this eventuality.


I don't agree with this at all. I would love Firefox, BitTorrent etc
to implement usage of TCP-LP for example so they use "unused"
bandwidth only.

With this change applications can't do this.

If we are going to restrict by capabilities then I think we should
only restrict module loading - this way the admin of the box can
decide what algorithms can be used.

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getaddrinfo - should accept IPPROTO_SCTP no?

2006-10-13 Thread Ian McDonald

On 10/14/06, Rick Jones <[EMAIL PROTECTED]> wrote:

I made some recent changes to netperf to workaround what is IMO a bug in the
Solaris getaddrinfo() where it will clear the ai_protocol field even when one
gives it a protocol in the hints.

[If you happen to be trying to use the test-specific -D to set TCP_NODELAY in
netperf on Solaris, you might want to grab netperf TOT to get this workaround as
it relates to issues with setting TCP_NODELAY - modulo what it will do to being
able to run the netperf SCTP tests on Linux...]

In the process though I have stumbled across what appears to be a bug (?) in
"Linux" getaddrinfo() - returning a -7 EAI_SOCKTYPE if given as hints
SOCK_STREAM and IPPROTO_SCTP - this on a system that ostensibly supports SCTP.
I've seen this on RHAS4U4 as well as another less well known distro.

I'm about to see about concocting an additional workaround in netperf for this,
but thought I'd ask if my assumption - that getaddrinfo() returning -7 when
given IPPROTO_SCTP - is indeed a bug in getaddrinfo().  Or am I just woefully
behind in patches or completely offbase on what is correct behaviour for
getaddrinfo and hints?

FWIW, which may not be much, Solaris 10 06/06 seems content to accept
IPPROTO_SCTP in the hints.

thanks,

rick jones
http://www.netperf.org/svn/netperf2/trunk/


In all the DCCP code which has similar issues I just do the protocol
selection on the socket call e.g.
case TCP:
new_sock = socket(AF_INET,SOCK_STREAM,0);
break;
case DCCP:
new_sock = socket(AF_INET,SOCK_DCCP,IPPROTO_DCCP);
break;
case UDP:
new_sock = socket(AF_INET,SOCK_DGRAM,0);
break;

I'm sure you know all this anyway so apologies in advance for telling
you something you probably already know!

We need to come up with a way to select service codes etc for DCCP
which is another parameter needed for a DCCP socket when getaddrinfo
is tidied up.

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bcm43xx-softmac: fix warning from ignoring returned value from pci_enable_device

2006-09-28 Thread Ian McDonald

On 9/28/06, Larry Finger <[EMAIL PROTECTED]> wrote:

Linus's tree now has a configuration option that prints a warning whenever
the returned value of any routine is ignored. This patch fixes the only such
warning for bcm43xx.


Can you tell me how to make this check please so I can check my code
in the kernel? I could look it up but obviously you can tell me
quickly :-)

Regards,

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [e2e] performance of BIC-TCP, High-Speed-TCP, H-TCP etc

2006-09-22 Thread Ian McDonald

I wasn't aware of the planned move to cubic in Linux.  Can I ask the
rationale for this ?  Cubic is, of course, closely related to HTCP
(borrowing the HTCP idea of using elapsed time since last backoff as the
quantity used to adjust the cwnd increase rate) which *is* tested in the
reported study.  I'd be more than happy to run tests on cubic and I
reckon we should do this sooner rather than later now that you have
flagged up plans to rollout cubic.


As I understand it, it is because Cubic is better than bic for
differing rtts and bic is the current default. Stephen might like to
add to it.

More tests are always good!

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [e2e] performance of BIC-TCP, High-Speed-TCP, H-TCP etc

2006-09-22 Thread Ian McDonald

On 9/23/06, Douglas Leith <[EMAIL PROTECTED]> wrote:

For those interested in TCP for high-speed environments, and perhaps
also people interested in TCP evaluation generally, I'd like to point
you towards the results of a detailed experimental study which are now
available at:

http://www.hamilton.ie/net/eval/ToNfinal.pdf

This study consistently compares Scalable-TCP, HS-TCP, BIC-TCP, FAST-TCP
and H-TCP performance under a wide range of conditions including with
mixes of long and short-lived flows.  This study has now been subject to
peer review (to hopefully give it some legitimacy) and is due to appear
in the Transactions on Networking.

The conclusions (see summary below) seem especially topical as BIC-TCP
is currently widely deployed as the default algorithm in Linux.

Comments appreciated.  Our measurements are publicly available - on the
web or drop me a line if you'd like a copy.

Summary:
In this paper we present experimental results evaluating the
performance of the Scalable-TCP, HS-TCP, BIC-TCP, FAST-TCP and
H-TCP proposals in a series of benchmark tests.

We find that many recent proposals perform surprisingly poorly in
even the most simple test, namely achieving fairness between two
competing flows in a dumbbell topology with the same round-trip
times and shared bottleneck link. Specifically, both Scalable-TCP
and FAST TCP exhibit very substantial unfairness in this test.

We also find that Scalable-TCP, HS-TCP and BIC-TCP induce significantly
greater RTT unfairness between competing flows with different round-trip
times.  The unfairness can be an order of magnitude greater than that
with standard TCP and is such that flows with longer round-trip times
can be completely starved of bandwidth.

While the TCP proposals studied are all successful at improving
the link utilisation in a relatively static environment with
long-lived flows, in our tests many of the proposals exhibit poor
responsiveness to changing network conditions.  We observe that
Scalable-TCP, HS-TCP and BIC-TCP can all suffer from extremely
slow (>100s) convergence times following the startup of a new
flow. We also observe that while FAST-TCP flows typically converge
quickly initially, flows may later diverge again to create
significant and sustained unfairness.

--Doug

Hamilton Institute
www.hamilton.ie




Interesting reading and I am replying to netdev@vger.kernel.org as
well. I will read in more detail later but my first questions/comments
are:
- have you tested CUBIC subsequently as this is meant to fix many of
the rtt issues? This is becoming the default in 2.6.19 probably.
- have you tested subsequently on more recent kernels than 2.6.6?

Looks like some very useful information.

Regards,

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/dccp: Allow default/fallback service code

2006-09-21 Thread Ian McDonald

Gerrit,

Not sure what happened here but I can't apply this with git-apply. Can
you check and resubmit. Looks great patch though and would love to
test! This would mean DCCP is easier to port to which must be good.

Just a quick note - you didn't updated last changed date in
Documentation/networking/dccp.txt. I think rather than updating you
can remove as people can find dates by looking at git history.

Ian

On 9/12/06, Gerrit Renker <[EMAIL PROTECTED]> wrote:

[DCCP]: Allow default/fallback service code.

This has been discussed on [EMAIL PROTECTED] and removes the necessity for
applications to supply service codes in each and every case.

If an application does not want to provide a service code, that's
fine, it will be given 0. Otherwise, service codes can be set
via socket options as before.

This patch has been tested using various client/server configurations
(including listening on multiple service codes) and patches against
Torvalds' tree.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
--
 Documentation/networking/dccp.txt |7 +--
 include/linux/dccp.h  |6 +-
 net/dccp/ipv4.c   |3 ---
 net/dccp/proto.c  |   11 +--
 4 files changed, 7 insertions(+), 20 deletions(-)


diff --git a/Documentation/networking/dccp.txt 
b/Documentation/networking/dccp.txt
index c45daab..2f479af 100644
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.txt
@@ -42,8 +42,11 @@ Socket options
 DCCP_SOCKOPT_PACKET_SIZE is used for CCID3 to set default packet size for
 calculations.

-DCCP_SOCKOPT_SERVICE sets the service. This is compulsory as per the

--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.17][Trivial] net/dccp: update references to standards

2006-09-21 Thread Ian McDonald

Arnaldo - this looks good.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>

On 9/15/06, Gerrit Renker <[EMAIL PROTECTED]> wrote:

Sorry kmail garbled this, clean text below.

- Gerrit
--

diff --git a/net/dccp/Kconfig b/net/dccp/Kconfig
index 859e335..2c345c0 100644
--- a/net/dccp/Kconfig
+++ b/net/dccp/Kconfig
@@ -4,9 +4,9 @@ menu "DCCP Configuration (EXPERIMENTAL)"
 config IP_DCCP
tristate "The DCCP Protocol (EXPERIMENTAL)"

--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] [DCCP]: Shift constants into header

2006-09-21 Thread Ian McDonald
This shifts some constants from ccid3.c to ccid3.h

This is not needed for in tree code (yet) but for my own work.

Makes sense to have constants in header though.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 67d2dc0..7b4699a 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -75,14 +75,6 @@ static struct dccp_tx_hist *ccid3_tx_his
 static struct dccp_rx_hist *ccid3_rx_hist;
 static struct dccp_li_hist *ccid3_li_hist;
 
-/* TFRC sender states */
-enum ccid3_hc_tx_states {
-   TFRC_SSTATE_NO_SENT = 1,
-   TFRC_SSTATE_NO_FBACK,
-   TFRC_SSTATE_FBACK,
-   TFRC_SSTATE_TERM,
-};
-
 #ifdef CCID3_DEBUG
 static const char *ccid3_tx_state_name(enum ccid3_hc_tx_states state)
 {
diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h
index 0a2cb75..df4ff13 100644
--- a/net/dccp/ccids/ccid3.h
+++ b/net/dccp/ccids/ccid3.h
@@ -65,6 +65,14 @@ enum ccid3_options {
TFRC_OPT_RECEIVE_RATE= 194,
 };
 
+/* TFRC sender states */
+enum ccid3_hc_tx_states {
+   TFRC_SSTATE_NO_SENT = 1,
+   TFRC_SSTATE_NO_FBACK,
+   TFRC_SSTATE_FBACK,
+   TFRC_SSTATE_TERM,
+};
+
 struct ccid3_options_received {
u64 ccid3or_seqno:48,
ccid3or_loss_intervals_idx:16;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] [DCCP]: Introduce two new socket options

2006-09-21 Thread Ian McDonald
This creates two new socket options DCCP_SOCKOPT_TX_PACKET_SIZE
and DCCP_SOCKOPT_RX_PACKET_SIZE. DCCP_SOCKOPT_PACKET_SIZE doesn't
work and packet size should be set independently on two half
connections.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/include/linux/dccp.h b/include/linux/dccp.h
index a073164..ef1c57b 100644
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -200,6 +200,8 @@ #define DCCP_SOCKOPT_PACKET_SIZE1
 #define DCCP_SOCKOPT_SERVICE   2
 #define DCCP_SOCKOPT_CHANGE_L  3
 #define DCCP_SOCKOPT_CHANGE_R  4
+#define DCCP_SOCKOPT_TX_PACKET_SIZE5
+#define DCCP_SOCKOPT_RX_PACKET_SIZE6
 #define DCCP_SOCKOPT_CCID_RX_INFO  128
 #define DCCP_SOCKOPT_CCID_TX_INFO  192
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] [DCCP]: Introduce constants for CCID numbers

2006-09-21 Thread Ian McDonald
This change introduces a constant for CCID numbers.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/include/linux/dccp.h b/include/linux/dccp.h
index 2d7671c..a073164 100644
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -169,6 +169,12 @@ enum {
DCCPO_MAX_CCID_SPECIFIC = 255,
 };
 
+/* DCCP CCIDS */
+enum {
+   DCCPC_CCID2 = 2,
+   DCCPC_CCID3 = 3,
+};
+
 /* DCCP features */
 enum {
DCCPF_RESERVED = 0,
@@ -320,7 +326,7 @@ static inline unsigned int dccp_hdr_len(
 /* initial values for each feature */
 #define DCCPF_INITIAL_SEQUENCE_WINDOW  100
 #define DCCPF_INITIAL_ACK_RATIO2
-#define DCCPF_INITIAL_CCID 2
+#define DCCPF_INITIAL_CCID DCCPC_CCID2
 #define DCCPF_INITIAL_SEND_ACK_VECTOR  1
 /* FIXME: for now we're default to 1 but it should really be 0 */
 #define DCCPF_INITIAL_SEND_NDP_COUNT   1
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] [DCCP]: Introduce dccp_probe

2006-09-21 Thread Ian McDonald
This adds DCCP probing shamelessly ripped off from TCP probes by Stephen
Hemminger.

I've put in here support for further CCID3 variables as well.
Andrea/Arnaldo might look to extend for CCID2.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/net/dccp/Kconfig b/net/dccp/Kconfig
index 859e335..e2a095d 100644
--- a/net/dccp/Kconfig
+++ b/net/dccp/Kconfig
@@ -40,6 +40,22 @@ config IP_DCCP_DEBUG
 
  Just say N.
 
+config NET_DCCPPROBE
+   tristate "DCCP connection probing"
+   depends on PROC_FS && KPROBES
+   ---help---
+   This module allows for capturing the changes to DCCP connection
+   state in response to incoming packets. It is used for debugging
+   DCCP congestion avoidance modules. If you don't understand
+   what was just said, you don't need it: say N.
+
+   Documentation on how to use the packet generator can be found
+   at http://linux-net.osdl.org/index.php/DccpProbe
+
+   To compile this code as a module, choose M here: the
+   module will be called dccp_probe.
+
+
 endmenu
 
 endmenu
diff --git a/net/dccp/Makefile b/net/dccp/Makefile
index 7696e21..47b1371 100644
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -11,6 +11,7 @@ dccp_ipv4-y := ipv4.o
 dccp-$(CONFIG_IP_DCCP_ACKVEC) += ackvec.o
 
 obj-$(CONFIG_INET_DCCP_DIAG) += dccp_diag.o
+obj-$(CONFIG_NET_DCCPPROBE) += dccp_probe.o
 
 dccp-$(CONFIG_SYSCTL) += sysctl.o
 
diff --git a/net/dccp/dccp_probe.c b/net/dccp/dccp_probe.c
new file mode 100644
index 000..4b65aad
--- /dev/null
+++ b/net/dccp/dccp_probe.c
@@ -0,0 +1,197 @@
+/*
+ * dccpprobe - Observe the DCCP flow with kprobes.
+ *
+ * The idea for this came from Werner Almesberger's umlsim
+ * Copyright (C) 2004, Stephen Hemminger <[EMAIL PROTECTED]>
+ *
+ * Modified for DCCP from Stephen Hemminger's code
+ * Copyright (C) 2006, Ian McDonald <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "dccp.h"
+#include "ccid.h"
+#include "ccids/ccid3.h"
+
+MODULE_AUTHOR("Ian McDonald <[EMAIL PROTECTED]>");
+MODULE_DESCRIPTION("DCCP snooper");
+MODULE_LICENSE("GPL");
+
+static int port = 0;
+MODULE_PARM_DESC(port, "Port to match (0=all)");
+module_param(port, int, 0);
+
+static int bufsize = 64*1024;
+MODULE_PARM_DESC(bufsize, "Log buffer size (default 64k)");
+module_param(bufsize, int, 0);
+
+static const char procname[] = "dccpprobe";
+
+struct {
+   struct kfifo  *fifo;
+   spinlock_tlock;
+   wait_queue_head_t wait;
+   struct timeval tstart;
+} dccpw;
+
+static void printl(const char *fmt, ...)
+{
+   va_list args;
+   int len;
+   struct timeval now;
+   char tbuf[256];
+
+   va_start(args, fmt);
+   do_gettimeofday(&now);
+
+   now.tv_sec -= dccpw.tstart.tv_sec;
+   now.tv_usec -= dccpw.tstart.tv_usec;
+   if (now.tv_usec < 0) {
+   --now.tv_sec;
+   now.tv_usec += 100;
+   }
+
+   len = sprintf(tbuf, "%lu.%06lu ",
+ (unsigned long) now.tv_sec,
+ (unsigned long) now.tv_usec);
+   len += vscnprintf(tbuf+len, sizeof(tbuf)-len, fmt, args);
+   va_end(args);
+
+   kfifo_put(dccpw.fifo, tbuf, len);
+   wake_up(&dccpw.wait);
+}
+
+static int jdccp_sendmsg(struct kiocb *iocb, struct sock *sk,
+   struct msghdr *msg, size_t size)
+{
+   const struct dccp_minisock *dmsk = dccp_msk(sk);
+   const struct inet_sock *inet = inet_sk(sk);
+   struct ccid3_hc_tx_sock *hctx;
+
+   if (dmsk->dccpms_tx_ccid == DCCPC_CCID3)
+   hctx = ccid3_hc_tx_sk(sk);
+   else
+   hctx = NULL;
+
+   if (port == 0 || ntohs(inet->dport) == port ||
+   ntohs(inet->sport) == port) {
+   if (hctx)
+   printl("%d.%d.%d.%d:%u %d.%d.%d.%d:%u %d %d %d %d %d\n",
+  NIPQUAD(inet->saddr), ntohs(inet->sport),
+  NIPQUAD(inet->daddr), ntohs(inet->dport), size,
+  

[PATCH 0/7] [DCCP]: Further fixes and enhancements

2006-09-21 Thread Ian McDonald
Here is my latest set of patches for DCCP.

If possible I would like these to go into 2.6.19. I have tested against
2.6.18rc5 and latest net-2.6.19 git tree of Dave M as well.

Dave - Patches 1 and 2 are trivial and just introducing constants and using
them. Patch 4 is shifting some code into a header. If patch 3 could be
merged also that would be great - it is just about the same as Stephen
Hemminger's TCP Probe code but instead for DCCP. Patch 3 depends on patch 1.

I think it would be good for Arnaldo or another person to sign off on 5, 6
and 7 after a bit more of a look. These fix up packet size setting for 
CCID3 and also change them to work on half connections as you may have
different packet sizes for each. I've tested myself thoroughly but others
might have an opinion on the style of these. These three patches need to be 
applied in order.

Ian
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] [DCCP]: Remove socket option

2006-09-21 Thread Ian McDonald
This removes DCCP_SOCKOPT_PACKET_SIZE for two reasons:
* the current code doesn't work
* tx and rx should be different (introduced in former patch)

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/include/linux/dccp.h b/include/linux/dccp.h
index ef1c57b..18fbbb4 100644
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -196,7 +196,6 @@ struct dccp_so_feat {
 };
 
 /* DCCP socket options */
-#define DCCP_SOCKOPT_PACKET_SIZE   1
 #define DCCP_SOCKOPT_SERVICE   2
 #define DCCP_SOCKOPT_CHANGE_L  3
 #define DCCP_SOCKOPT_CHANGE_R  4
@@ -465,7 +464,6 @@ struct dccp_sock {
struct dccp_service_list*dccps_service_list;
struct timeval  dccps_timestamp_time;
__u32   dccps_timestamp_echo;
-   __u32   dccps_packet_size;
__u16   dccps_l_ack_ratio;
__u16   dccps_r_ack_ratio;
unsigned long   dccps_ndp_count;
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index c8f7d5a..c8c884e 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -458,7 +458,6 @@ out_free_val:
 static int do_dccp_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
 {
-   struct dccp_sock *dp = dccp_sk(sk);
struct dccp_minisock *dmsk = dccp_msk(sk);
struct ccid3_hc_tx_sock *hctx;
struct ccid3_hc_rx_sock *hcrx;
@@ -478,10 +477,6 @@ static int do_dccp_setsockopt(struct soc
err = 0;
 
switch (optname) {
-   case DCCP_SOCKOPT_PACKET_SIZE:
-   dp->dccps_packet_size = val;
-   break;
-
case DCCP_SOCKOPT_CHANGE_L:
if (optlen != sizeof(struct dccp_so_feat))
err = -EINVAL;
@@ -605,10 +600,6 @@ static int do_dccp_getsockopt(struct soc
return -EINVAL;
 
switch (optname) {
-   case DCCP_SOCKOPT_PACKET_SIZE:
-   val = dp->dccps_packet_size;
-   len = sizeof(dp->dccps_packet_size);
-   break;
case DCCP_SOCKOPT_TX_PACKET_SIZE:
if (dmsk->dccpms_tx_ccid != DCCPC_CCID3)
return -EINVAL;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] [DCCP]: Use constants for CCIDs

2006-09-21 Thread Ian McDonald
With constants for CCID numbers this now uses them in some places.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/net/dccp/ccids/ccid2.c b/net/dccp/ccids/ccid2.c
index 457dd3d..2efb505 100644
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -808,7 +808,7 @@ static void ccid2_hc_rx_packet_recv(stru
 }
 
 static struct ccid_operations ccid2 = {
-   .ccid_id= 2,
+   .ccid_id= DCCPC_CCID2,
.ccid_name  = "ccid2",
.ccid_owner = THIS_MODULE,
.ccid_hc_tx_obj_size= sizeof(struct ccid2_hc_tx_sock),
diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 195aa95..67d2dc0 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -1240,7 +1240,7 @@ static int ccid3_hc_tx_getsockopt(struct
 }
 
 static struct ccid_operations ccid3 = {
-   .ccid_id   = 3,
+   .ccid_id   = DCCPC_CCID3,
.ccid_name = "ccid3",
.ccid_owner= THIS_MODULE,
.ccid_hc_tx_obj_size   = sizeof(struct ccid3_hc_tx_sock),
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] [DCCP]: Fix setting of packet size in CCID3

2006-09-21 Thread Ian McDonald
Set initial packet size to defaults as existing code doesn't work
as set_sockopt occurs after initialisation so dccps_packet_size
is of no use really.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 7b4699a..e6c8e4c 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -642,15 +642,9 @@ static int ccid3_hc_tx_parse_options(str
 
 static int ccid3_hc_tx_init(struct ccid *ccid, struct sock *sk)
 {
-   struct dccp_sock *dp = dccp_sk(sk);
struct ccid3_hc_tx_sock *hctx = ccid_priv(ccid);
 
-   if (dp->dccps_packet_size >= TFRC_MIN_PACKET_SIZE &&
-   dp->dccps_packet_size <= TFRC_MAX_PACKET_SIZE)
-   hctx->ccid3hctx_s = dp->dccps_packet_size;
-   else
-   hctx->ccid3hctx_s = TFRC_STD_PACKET_SIZE;
-
+   hctx->ccid3hctx_s = TFRC_STD_PACKET_SIZE;
/* Set transmission rate to 1 packet per second */
hctx->ccid3hctx_x = hctx->ccid3hctx_s;
hctx->ccid3hctx_t_rto = USEC_PER_SEC;
@@ -1113,17 +1107,11 @@ static void ccid3_hc_rx_packet_recv(stru
 
 static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk)
 {
-   struct dccp_sock *dp = dccp_sk(sk);
struct ccid3_hc_rx_sock *hcrx = ccid_priv(ccid);
 
ccid3_pr_debug("%s, sk=%p\n", dccp_role(sk), sk);
 
-   if (dp->dccps_packet_size >= TFRC_MIN_PACKET_SIZE &&
-   dp->dccps_packet_size <= TFRC_MAX_PACKET_SIZE)
-   hcrx->ccid3hcrx_s = dp->dccps_packet_size;
-   else
-   hcrx->ccid3hcrx_s = TFRC_STD_PACKET_SIZE;
-
+   hcrx->ccid3hcrx_s = TFRC_STD_PACKET_SIZE;
hcrx->ccid3hcrx_state = TFRC_RSTATE_NO_DATA;
INIT_LIST_HEAD(&hcrx->ccid3hcrx_hist);
INIT_LIST_HEAD(&hcrx->ccid3hcrx_li_hist);
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 962df0e..c8f7d5a 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -35,6 +35,7 @@ #include 
 #include "ccid.h"
 #include "dccp.h"
 #include "feat.h"
+#include "ccids/ccid3.h"
 
 DEFINE_SNMP_STAT(struct dccp_mib, dccp_statistics) __read_mostly;
 
@@ -457,7 +458,10 @@ out_free_val:
 static int do_dccp_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
 {
-   struct dccp_sock *dp;
+   struct dccp_sock *dp = dccp_sk(sk);
+   struct dccp_minisock *dmsk = dccp_msk(sk);
+   struct ccid3_hc_tx_sock *hctx;
+   struct ccid3_hc_rx_sock *hcrx;
int err;
int val;
 
@@ -471,7 +475,6 @@ static int do_dccp_setsockopt(struct soc
return dccp_setsockopt_service(sk, val, optval, optlen);
 
lock_sock(sk);
-   dp = dccp_sk(sk);
err = 0;
 
switch (optname) {
@@ -497,6 +500,30 @@ static int do_dccp_setsockopt(struct soc
 optval);
break;
 
+   case DCCP_SOCKOPT_TX_PACKET_SIZE:
+   if (dmsk->dccpms_tx_ccid != DCCPC_CCID3)
+   err = -EINVAL;
+   else
+   if (val >= TFRC_MIN_PACKET_SIZE &&
+  val <= TFRC_MAX_PACKET_SIZE) {
+   hctx = ccid3_hc_tx_sk(sk);
+   hctx->ccid3hctx_s = val;
+   } else
+   err = -EINVAL;
+   break;
+
+   case DCCP_SOCKOPT_RX_PACKET_SIZE:
+   if (dmsk->dccpms_rx_ccid != DCCPC_CCID3)
+   err = -EINVAL;
+   else
+   if (val >= TFRC_MIN_PACKET_SIZE &&
+  val <= TFRC_MAX_PACKET_SIZE) {
+   hcrx = ccid3_hc_rx_sk(sk);
+   hcrx->ccid3hcrx_s = val;
+   } else
+   err = -EINVAL;
+   break;
+
default:
err = -ENOPROTOOPT;
break;
@@ -565,7 +592,10 @@ out:
 static int do_dccp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
 {
-   struct dccp_sock *dp;
+   struct dccp_sock *dp = dccp_sk(sk);
+   struct dccp_minisock *dmsk = dccp_msk(sk);
+   struct ccid3_hc_tx_sock *hctx;
+   struct ccid3_hc_rx_sock *hcrx;
int val, len;
 
if (get_user(len, optlen))
@@ -574,13 +604,25 @@ static int do_dccp_getsockopt(struct soc
if (len < sizeof(int))
return -EINVAL;
 
-   dp = dccp_sk(sk);
-
switch (optname) {
case DCCP_SOCKOPT_PACKET_SIZE:
val = dp->dccps_packet_size;
len = sizeof(dp->dccps_packet_size);
break;
+   case DCCP_SOCKOPT_TX_PACKET_SIZE:
+   if (dmsk->dccpms_t

Re: UDP Out 0f Sequence

2006-09-20 Thread Ian McDonald

On 9/21/06, Majumder, Rajib <[EMAIL PROTECTED]> wrote:

Does this mean if we have 2 hosts connected back to back (there's no network 
device in between), sequence is guaranteed even in UDP?


I think if you're trying to make the packets appear in order you need
to untie the Gordian knot http://en.wikipedia.org/wiki/Gordian_Knot

In other words you should fix the application rather than the near
impossible task of trying to make the packets in order...

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcp: set congestion default through Kconfig

2006-09-18 Thread Ian McDonald

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/ipv4/Kconfig   |   39 +--
 net/ipv4/sysctl_net_ipv4.c |7 +++
 net/ipv4/tcp_cong.c|2 +-
 3 files changed, 45 insertions(+), 3 deletions(-)


Nice solution.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP Pacing

2006-09-13 Thread Ian McDonald

On 9/13/06, Daniele Lacamera <[EMAIL PROTECTED]> wrote:

On Tuesday 12 September 2006 23:26, Ian McDonald wrote:
> Where is the published research? If you are going to mention research
> you need URLs to papers and please put this in source code too so
> people can check.

I added the main reference to the code. I am going to give you all the
pointers on this research, mainly recent congestion control proposals
that include pacing.


Thanks


> I agree with Arnaldo's comments and also would add I don't like having
> to select 1000 as HZ unit. Something is wrong if you need this as I
> can run higher resolution timers without having to do this

I removed that select in Kconfig, I agree it doesn't make sense at all,
for portability. However, pacing works with 1ms resolution, so maybe
a "depends HZ_1000" is still required. (How do you run 1ms timers with
HZ!=1000?)


The HZ refers to time slices per second mostly for user space - e.g.
how often to task switch.
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP Pacing

2006-09-12 Thread Ian McDonald

On 9/13/06, Daniele Lacamera <[EMAIL PROTECTED]> wrote:

Hello,

Please let me insist once again on the importance of adding a TCP Pacing
mechanism in our TCP, as many people are including this algorithm in
their congestion control proposals. Recent researches have found out
that it really can help improving performance in different scenarios,
like satellites and long-delay high-speed channels (>100ms RTT, Gbit).
Hybla module itself is cripple without this feature in its natural
scenario.


Where is the published research? If you are going to mention research
you need URLs to papers and please put this in source code too so
people can check.


The following patch is totally non-invasive: it has a config option and
a sysctl switch, both turned off by default. When the config option is
enabled, it adds only 6B to the tcp_sock.


I agree with Arnaldo's comments and also would add I don't like having
to select 1000 as HZ unit. Something is wrong if you need this as I
can run higher resolution timers without having to do this

Haven't reviewed the rest of the code or tested.

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: i not find in the kernel code the code of this command

2006-09-04 Thread Ian McDonald

On 9/2/06, Franco <[EMAIL PROTECTED]> wrote:

thanks for your response!
Yes, The code is under net/sched in the source tree.
The file act_police.c in the directoy net/sched don't exist. there is
police.c that have a very similar code act_police.c (that i have found on
internet)


Go to http://kernel.org and download a recent kernel.


if i want create a proc file in police.c, therefore modificate the kernel, i
must install on my pc another version on linux. It is exact?


As I said above get a newer version.
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: i not find in the kernel code the code of this command

2006-09-01 Thread Ian McDonald

On 9/2/06, Franco <[EMAIL PROTECTED]> wrote:

I thought that this code was police.c but seem that it isn't
i must implement a proc file in the code and recompiling the kernel.


I'm not sure I understand your question. Please tell me if I answer wrong!

The code is under net/sched in the source tree. The main file is
act_police.c but it is in use elsewhere as well. grep for POLICE.

To build the code you need to alter your kernel options under 'make
menuconfig' Networking, Networking Options, Qos and/or fair queueing,
Actions must be selected and then Traffic Police.

I hope this helps.

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand

--
VGER BF report: U 0.5
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: high latency with TCP connections

2006-08-31 Thread Ian McDonald

> I'm ready to rip out ABC entirely, to be honest.  Or at least
> turn it off by default.

Turn it off for 2.6.18, by default then evaluate more for 2.6.19


If it goes out in 2.6.18 there could probably be a good argument for
going into the stable tree as well... to stop the likes of the JVM
type issues that users keep hitting (which is fixed or going to be
fixed by Sun).
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: high latency with TCP connections

2006-08-31 Thread Ian McDonald

The word performance in this list seems to always mean 'throughput'.
It seems though that there could be some knob to tweak for those of us
who don't care so much about throughput but care a great deal about
latency.


SCTP has been mentioned. There is also DCCP - http://www.read.cs.ucla.edu/dccp/

--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1] [DCCP]: Tidy up code slightly

2006-08-29 Thread Ian McDonald

> I haven't seen this go into the 2.6.19 tree yet?

Because I simply haven't applied it yet.


OK. My apologies for hassling you. I'm being too hasty and Arnaldo has
correctly chastised me.

--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1] [DCCP]: Tidy up code slightly

2006-08-29 Thread Ian McDonald

On 8/28/06, David Miller <[EMAIL PROTECTED]> wrote:

From: Ian McDonald <[EMAIL PROTECTED]>
Date: Mon, 28 Aug 2006 16:34:50 +1200

> Arnaldo has pointed this one out to me in latest series of
> patches. Can this go into 2.6.18 please?

It's not a bug fix, so we'll defer it to 2.6.19


I haven't seen this go into the 2.6.19 tree yet?

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1] [DCCP]: Tidy up code slightly

2006-08-27 Thread Ian McDonald

On 8/28/06, David Miller <[EMAIL PROTECTED]> wrote:

From: Ian McDonald <[EMAIL PROTECTED]>
Date: Mon, 28 Aug 2006 16:34:50 +1200

> Arnaldo has pointed this one out to me in latest series of
> patches. Can this go into 2.6.18 please?

It's not a bug fix, so we'll defer it to 2.6.19


I guess that's true unless we change the structure which would make it
a bug fix but I'm happy for this to be in 2.6.19 since that hasn't
happened.

Ian
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] [DCCP]: Tidyup CCID3 list handling

2006-08-27 Thread Ian McDonald
As Arnaldo Carvalho de Melo points out I should be using list_entry in case
the structure changes in future. Current code functions but is reliant
on position and requires type cast.

Noticed when doing this that I have one more variable than I needed so
removing that also.

Signed off by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 090bc39..195aa95 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -900,7 +900,7 @@ found:
 static void ccid3_hc_rx_update_li(struct sock *sk, u64 seq_loss, u8 win_loss)
 {
struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
-   struct dccp_li_hist_entry *next, *head;
+   struct dccp_li_hist_entry *head;
u64 seq_temp;
 
if (list_empty(&hcrx->ccid3hcrx_li_hist)) {
@@ -908,15 +908,15 @@ static void ccid3_hc_rx_update_li(struct
   &hcrx->ccid3hcrx_li_hist, seq_loss, win_loss))
return;
 
-   next = (struct dccp_li_hist_entry *)
-  hcrx->ccid3hcrx_li_hist.next;
-   next->dccplih_interval = ccid3_hc_rx_calc_first_li(sk);
+   head = list_entry(hcrx->ccid3hcrx_li_hist.next,
+  struct dccp_li_hist_entry, dccplih_node);
+   head->dccplih_interval = ccid3_hc_rx_calc_first_li(sk);
} else {
struct dccp_li_hist_entry *entry;
struct list_head *tail;
 
-   head = (struct dccp_li_hist_entry *)
-  hcrx->ccid3hcrx_li_hist.next;
+   head = list_entry(hcrx->ccid3hcrx_li_hist.next,
+  struct dccp_li_hist_entry, dccplih_node);
/* FIXME win count check removed as was wrong */
/* should make this check with receive history */
/* and compare there as per section 10.2 of RFC4342 */
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] [DCCP]: Tidy up code slightly

2006-08-27 Thread Ian McDonald
Dave,

Arnaldo has pointed this one out to me in latest series of patches. Can this go 
into 2.6.18 please?

(And I've checked for white space too!)

Ian
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


CCID2 patches

2006-08-27 Thread Ian McDonald

On 8/27/06, Andrea Bittau <[EMAIL PROTECTED]> wrote:
> The two sets of patches are at:
> http://darkircop.org/dccp
>


These look good in general and I know you have done a lot of work on these.

Here are some comments. NB I haven't actually compiled or tested -
just from reading the code.

You need a description of each patch and signed off line in each
patch. They can't be accepted in this form. Alternatively they could
be resubmitted.

Please state whether the target is 2.6.18 (bug fixes only) or 2.6.19
(enhancements)

Please state any dependencies between patches.

In patch 01_ackvec_opt:

-int dccp_ackvec_parse(struct sock *sk, const struct sk_buff *skb,
+u64 dccp_ackvec_parse(struct sock *sk, u64 ackno,
  const u8 opt, const u8 *value, const u8 len)
{
-   if (len > DCCP_MAX_ACKVEC_LEN)
-   return -1;
-
/* dccp_ackvector_print(DCCP_SKB_CB(skb)->dccpd_ack_seq, value, len); */
-   dccp_ackvec_check_rcv_ackvector(dccp_sk(sk)->dccps_hc_rx_ackvec, sk,
-   DCCP_SKB_CB(skb)->dccpd_ack_seq,
-   len, value);
-   return 0;
+   return dccp_ackvec_check_rcv_ackvector(dccp_sk(sk)->dccps_hc_rx_ackvec,
+  sk, ackno, len, value);

This becomes a one line function. This is only used in one place that I can
see so this should go and that code should go there... Also there is some
weird shit going on as this is also defined as inline with return -1 in
ackvec.h. This needs fixing as well.

In patch 05_ccid2_seq_alloc I don't get this code:

+static int ccid2_hc_tx_alloc_seq(struct ccid2_hc_tx_sock *hctx, int num)
+{
+   struct ccid2_seq *seqp;
+   int i;
+
+   /* check if we have space to preserve the pointer to the buffer */
+   if (hctx->ccid2hctx_seqbufc >= (sizeof(hctx->ccid2hctx_seqbuf) /
+   sizeof(struct ccid2_seq*)))
+   return -ENOMEM;
+
+   /* allocate buffer and initialize linked list */
+   seqp = kmalloc(sizeof(*seqp) * num, gfp_any());
+   if (seqp == NULL)
+   return -ENOMEM;
+
+   for (i = 0; i < (num - 1); i++) {
+   seqp[i].ccid2s_next = &seqp[i + 1];
+   seqp[i + 1].ccid2s_prev = &seqp[i];
+   }

If you are allocating an array of structures in effect you shouldn't
need to set next/prev pointers as they are allocated contiguously. If
you are allocating groups of arrays, which I suspect you are, I still
think the design is a bit ugly and wastes memory.

In 06_ccid2_ssthresh you don't justify why ssthresh can be infinite to
start off with. And if it is allowed then I don't think you should do
this by just picking a random high number. Change it to something like
~0

In 07_ccid2_send_poll - changing to 1 msec poll is not nice. I now you
said this should be dequeued. I've just added tx queuing to 2.6.19
tree now so you can do this. Up to Dave/Arnaldo if this is OK as a
short term solution.

In 08_ccid2_cwnd:
+static void ccid2_congestion_event(struct ccid2_hc_tx_sock *hctx,
+  struct ccid2_seq *seqp)
+{
+   if (time_before(seqp->ccid2s_sent, hctx->ccid2hctx_last_cong)) {
+   dccp_pr_debug("Multiple losses in one RTT---treating as one\n");

I think that should be a ccid2_pr_debug not a dccp_pr_debug

in 10_ccid2_change:

in ccid2_change_srtt why do you do the test there because you are
going to take twice as many instructions if you are going to alter. Is
this to minimise sets?

in 11_ccid2_profile:

in ccid2_profile_time the code there is ugly. I am guessing you are
assuming 32 bit intel vs 64 bit intel. The world doesn't revolve
around intel. If you need something architecture specific that should
be put into the arch subtree not here.

in ccid2_hc_tx_init why are you using 6000? What is the significant.
Make it a constant defined somewhere.
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] [DCCP]: Fixes and enhancements

2006-08-26 Thread Ian McDonald

On 8/27/06, David Miller <[EMAIL PROTECTED]> wrote:

From: "Ian McDonald" <[EMAIL PROTECTED]>
Date: Sun, 27 Aug 2006 16:57:17 +1200

> Yes I see that now. However I can't see #5 in net-2.6.git in your tree
> or Linus' where 1-4 made it in...

Resend it to me privately and I'll figure out what happened.


Done.

--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] [DCCP]: Fixes and enhancements

2006-08-26 Thread Ian McDonald

> On 8/27/06, David Miller <[EMAIL PROTECTED]> wrote:
> > >
> > > I would love 3, 4, 5 to go into 2.6.18 as these resolve long standing 
CCID3
> > > issues that have been in the DCCP tree since inception and have caught a
> > > number of people.
> >
> > Ok, I'll toss 1-5 into 2.6.18
>
> Thanks for that. Are 6 and 7 going into 2.6.19 or do you want Arnaldo
> to have a bit more of a look? 6 in particular is trivial.

Those patches are already in net-2.6.19


Yes I see that now. However I can't see #5 in net-2.6.git in your tree
or Linus' where 1-4 made it in...

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] [DCCP]: Fixes and enhancements

2006-08-26 Thread Ian McDonald

On 8/27/06, David Miller <[EMAIL PROTECTED]> wrote:

>
> I would love 3, 4, 5 to go into 2.6.18 as these resolve long standing CCID3
> issues that have been in the DCCP tree since inception and have caught a
> number of people.

Ok, I'll toss 1-5 into 2.6.18


Thanks for that. Are 6 and 7 going into 2.6.19 or do you want Arnaldo
to have a bit more of a look? 6 in particular is trivial.


One thing I don't understand is this description from patch 5:


This gives a theoretical speed of 71.9 Kbits/s. I measured across three
runs with this patch set and got 70.1 Kbits/s. Without this patchset the
average was 232 Kbits/s which means Linux can't be used for CCID3 research
properly.


Decreasing the transfer rate is desirable?  I read this as saying
this "fix" drops the transfer rate down from 232Kb/sec to
70.1Kb/sec.  What's going on here?


DCCP CCID3 (RFC 4342) uses TFRC (RFC 3448) to calculate the desired
rate to send at based on feedback from the receiver. The reason for
this is that TFRC is not ACK/Window based to control rate and TFRC
calculates a rate so that the flow is "fair" when competing with TCP.
TFRC is designed to be smoother than TCP at dealing with loss - more
sine wave than saw tooth.

The calculation is based on the work Padhye et al did in this paper -
http://citeseer.ist.psu.edu/padhye98modeling.html

As it turns out this is based on TCP Reno at that time and modern TCP
variants are more efficient when dealing with loss as can be verified
through iperf but we should implement what the RFC says.

Basically the implementation in the DCCP code was buggy and was
transmitting too fast so I have made it conform to the RFC much
closer.

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] [DCCP]: Fixes and enhancements

2006-08-24 Thread Ian McDonald

I spent all of today on USAGI's IPSEC/MIPV6 patches and related
issues, so I'll look into this tomorrow.

Thanks Ian.


Yes I saw that. Take your time as this is nowhere near as important!

Regards,

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] [DCCP]: Shift sysctls into feat.h

2006-08-23 Thread Ian McDonald
This shifts further sysctls into feat.h. No change in 
functionality - shifting code only.

Signed off by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/net/dccp/feat.h b/net/dccp/feat.h
index b44c455..cee553d 100644
--- a/net/dccp/feat.h
+++ b/net/dccp/feat.h
@@ -27,5 +27,10 @@ extern int  dccp_feat_clone(struct sock 
 extern int  dccp_feat_init(struct dccp_minisock *dmsk);
 
 extern int  dccp_feat_default_sequence_window;
+extern int  dccp_feat_default_rx_ccid;
+extern int  dccp_feat_default_tx_ccid;
+extern int  dccp_feat_default_ack_ratio;
+extern int  dccp_feat_default_send_ack_vector;
+extern int  dccp_feat_default_send_ndp_count;
 
 #endif /* _DCCP_FEAT_H */
diff --git a/net/dccp/sysctl.c b/net/dccp/sysctl.c
index c1ba945..38bc157 100644
--- a/net/dccp/sysctl.c
+++ b/net/dccp/sysctl.c
@@ -11,18 +11,12 @@
 
 #include 
 #include 
+#include "feat.h"
 
 #ifndef CONFIG_SYSCTL
 #error This file should not be compiled without CONFIG_SYSCTL defined
 #endif
 
-extern int dccp_feat_default_sequence_window;
-extern int dccp_feat_default_rx_ccid;
-extern int dccp_feat_default_tx_ccid;
-extern int dccp_feat_default_ack_ratio;
-extern int dccp_feat_default_send_ack_vector;
-extern int dccp_feat_default_send_ndp_count;
-
 static struct ctl_table dccp_default_table[] = {
{
.ctl_name   = NET_DCCP_DEFAULT_SEQ_WINDOW,
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] [DCCP]: Introduce tx buffering

2006-08-23 Thread Ian McDonald
This adds transmit buffering to DCCP.

I have tested with CCID2/3 and with loss and rate limiting.

Signed off by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/include/linux/dccp.h b/include/linux/dccp.h
index 676333b..2d7671c 100644
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -438,6 +438,7 @@ struct dccp_ackvec;
  * @dccps_role - Role of this sock, one of %dccp_role
  * @dccps_ndp_count - number of Non Data Packets since last data packet
  * @dccps_hc_rx_ackvec - rx half connection ack vector
+ * @dccps_xmit_timer - timer for when CCID is not ready to send
  */
 struct dccp_sock {
/* inet_connection_sock has to be the first member of dccp_sock */
@@ -470,6 +471,7 @@ struct dccp_sock {
enum dccp_role  dccps_role:2;
__u8dccps_hc_rx_insert_options:1;
__u8dccps_hc_tx_insert_options:1;
+   struct timer_list   dccps_xmit_timer;
 };
  
 static inline struct dccp_sock *dccp_sk(const struct sock *sk)
diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index 84b477d..f9f0721 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -130,7 +130,7 @@ extern void dccp_send_delayed_ack(struct
 extern void dccp_send_sync(struct sock *sk, const u64 seq,
   const enum dccp_pkt_type pkt_type);
 
-extern int dccp_write_xmit(struct sock *sk, struct sk_buff *skb, long *timeo);
+extern void dccp_write_xmit(struct sock *sk, int block);
 extern void dccp_write_space(struct sock *sk);
 
 extern void dccp_init_xmit_timers(struct sock *sk);
diff --git a/net/dccp/output.c b/net/dccp/output.c
index 58669be..5986cb9 100644
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -198,7 +198,7 @@ static int dccp_wait_for_ccid(struct soc
while (1) {
prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
 
-   if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
+   if (sk->sk_err)
goto do_error;
if (!*timeo)
goto do_nonblock;
@@ -234,37 +234,72 @@ do_interrupted:
goto out;
 }
 
-int dccp_write_xmit(struct sock *sk, struct sk_buff *skb, long *timeo)
+static void dccp_write_xmit_timer(unsigned long data) {
+   struct sock *sk = (struct sock *)data;
+   struct dccp_sock *dp = dccp_sk(sk);
+
+   bh_lock_sock(sk);
+   if (sock_owned_by_user(sk))
+   sk_reset_timer(sk, &dp->dccps_xmit_timer, jiffies+1);
+   else
+   dccp_write_xmit(sk, 0);
+   bh_unlock_sock(sk);
+   sock_put(sk);
+}
+
+void dccp_write_xmit(struct sock *sk, int block)
 {
-   const struct dccp_sock *dp = dccp_sk(sk);
-   int err = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb,
+   struct dccp_sock *dp = dccp_sk(sk);
+   struct sk_buff *skb;
+   long timeo = 3; /* If a packet is taking longer than 2 secs
+  we have other issues */
+
+   while ((skb = skb_peek(&sk->sk_write_queue))) {
+   int err = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb,
 skb->len);
+   
+   if (err > 0) {
+   if (!block) { 
+   sk_reset_timer(sk, &dp->dccps_xmit_timer, 
+   msecs_to_jiffies(err)+jiffies);
+   break;
+   } else 
+   err = dccp_wait_for_ccid(sk, skb, &timeo);
+   if (err) {
+   printk(KERN_CRIT "%s:err at dccp_wait_for_ccid"
+" %d\n", __FUNCTION__, err);
+   dump_stack();
+   }
+   }
 
-   if (err > 0)
-   err = dccp_wait_for_ccid(sk, skb, timeo);
+   skb_dequeue(&sk->sk_write_queue);
+   if (err == 0) {
+   struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
+   const int len = skb->len;
 
-   if (err == 0) {
-   struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
-   const int len = skb->len;
-
-   if (sk->sk_state == DCCP_PARTOPEN) {
-   /* See 8.1.5.  Handshake Completion */
-   inet_csk_schedule_ack(sk);
-   inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
+   if (sk->sk_state == DCCP_PARTOPEN) {
+   /* See 8.1.5.  Handshake Completion */
+   inet_csk_schedule_ack(sk);
+   inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
  inet_csk(sk)->icsk_rto,
  

[PATCH 5/7] [DCCP]: Fix CCID3 to correct performance

2006-08-23 Thread Ian McDonald
This fixes CCID3 to give much closer performance to RFC4342.

CCID3 is meant to alter sending rate based on RTT and loss.

The performance was verified against:
http://wand.net.nz/~perry/max_download.php

For example I tested with netem and had the following parameters:
Delayed Acks 1, MSS 256 bytes, RTT 105 ms, packet loss 5%.

This gives a theoretical speed of 71.9 Kbits/s. I measured across three
runs with this patch set and got 70.1 Kbits/s. Without this patchset the
average was 232 Kbits/s which means Linux can't be used for CCID3 research
properly.

I also tested with netem turned off so box just acting as router with 1.2
msec RTT. The performance with this is the same with or without the patch
at around 30 Mbit/s.

Signed off by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 0f85970..dad20c9 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -342,6 +342,8 @@ static int ccid3_hc_tx_send_packet(struc
new_packet->dccphtx_ccval =
DCCP_SKB_CB(skb)->dccpd_ccval =
hctx->ccid3hctx_last_win_count;
+   timeval_add_usecs(&hctx->ccid3hctx_t_nom,
+ hctx->ccid3hctx_t_ipi);
}
 out:
return rc;
@@ -413,7 +415,8 @@ static void ccid3_hc_tx_packet_sent(stru
case TFRC_SSTATE_NO_FBACK:
case TFRC_SSTATE_FBACK:
if (len > 0) {
-   hctx->ccid3hctx_t_nom = now;
+   timeval_sub_usecs(&hctx->ccid3hctx_t_nom,
+ hctx->ccid3hctx_t_ipi);
ccid3_calc_new_t_ipi(hctx);
ccid3_calc_new_delta(hctx);
timeval_add_usecs(&hctx->ccid3hctx_t_nom,
@@ -757,8 +760,7 @@ static void ccid3_hc_rx_send_feedback(st
}
 
hcrx->ccid3hcrx_tstamp_last_feedback = now;
-   hcrx->ccid3hcrx_last_counter = packet->dccphrx_ccval;
-   hcrx->ccid3hcrx_seqno_last_counter   = packet->dccphrx_seqno;
+   hcrx->ccid3hcrx_ccval_last_counter   = packet->dccphrx_ccval;
hcrx->ccid3hcrx_bytes_recv   = 0;
 
/* Convert to multiples of 10us */
@@ -782,7 +784,7 @@ static int ccid3_hc_rx_insert_options(st
if (!(sk->sk_state == DCCP_OPEN || sk->sk_state == DCCP_PARTOPEN))
return 0;
 
-   DCCP_SKB_CB(skb)->dccpd_ccval = hcrx->ccid3hcrx_last_counter;
+   DCCP_SKB_CB(skb)->dccpd_ccval = hcrx->ccid3hcrx_ccval_last_counter;
 
if (dccp_packet_without_ack(skb))
return 0;
@@ -854,6 +856,11 @@ static u32 ccid3_hc_rx_calc_first_li(str
interval = 1;
}
 found:
+   if (!tail) {
+   LIMIT_NETDEBUG(KERN_WARNING "%s: tail is null\n",
+  __FUNCTION__);
+   return ~0;
+   }
rtt = timeval_delta(&tstamp, &tail->dccphrx_tstamp) * 4 / interval;
ccid3_pr_debug("%s, sk=%p, approximated RTT to %uus\n",
   dccp_role(sk), sk, rtt);
@@ -864,9 +871,20 @@ found:
delta = timeval_delta(&tstamp, &hcrx->ccid3hcrx_tstamp_last_feedback);
x_recv = usecs_div(hcrx->ccid3hcrx_bytes_recv, delta);
 
+   if (x_recv == 0) 
+   x_recv = hcrx->ccid3hcrx_x_recv;
+
tmp1 = (u64)x_recv * (u64)rtt;
do_div(tmp1,1000);
tmp2 = (u32)tmp1;
+
+   if (!tmp2) {
+   LIMIT_NETDEBUG(KERN_WARNING "tmp2 = 0 "
+  "%s: x_recv = %u, rtt =%u\n",
+  __FUNCTION__, x_recv, rtt);
+   return ~0;
+   }
+
fval = (hcrx->ccid3hcrx_s * 10) / tmp2;
/* do not alter order above or you will get overflow on 32 bit */
p = tfrc_calc_x_reverse_lookup(fval);
@@ -882,31 +900,101 @@ found:
 static void ccid3_hc_rx_update_li(struct sock *sk, u64 seq_loss, u8 win_loss)
 {
struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
+   struct dccp_li_hist_entry *next, *head;
+   u64 seq_temp;
 
-   if (seq_loss != DCCP_MAX_SEQNO + 1 &&
-   list_empty(&hcrx->ccid3hcrx_li_hist)) {
-   struct dccp_li_hist_entry *li_tail;
+   if (list_empty(&hcrx->ccid3hcrx_li_hist)) {
+   if (!dccp_li_hist_interval_new(ccid3_li_hist,
+  &hcrx->ccid3hcrx_li_hist, seq_loss, win_loss))
+   return;
 
-   li_tail = dccp_li_hist_interval_new(ccid3_li_hist,
-   &hcrx->ccid3hcrx_li_hist,
-   seq_loss, win_loss);
-   if (li_tail == NULL)
+   next = (struct dccp_li_hist_entry *)
+  hcrx->ccid3hcrx_li_hist.next;
+  

[PATCH 4/7] [DCCP]: Introduce dccp_rx_hist_find_entry

2006-08-23 Thread Ian McDonald
This adds a new function dccp_rx_hist_find_entry.

Signed off by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index 7b6b03e..1c68182 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -365,6 +365,25 @@ struct dccp_tx_hist_entry *
 
 EXPORT_SYMBOL_GPL(dccp_tx_hist_find_entry);
 
+int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq, 
+   u8 *ccval)
+{
+   struct dccp_rx_hist_entry *packet = NULL, *entry;
+
+   list_for_each_entry(entry, list, dccphrx_node)
+   if (entry->dccphrx_seqno == seq) {
+   packet = entry;
+   break;
+   }
+
+   if (packet)
+   *ccval = packet->dccphrx_ccval;
+
+   return packet != NULL;
+}
+
+EXPORT_SYMBOL_GPL(dccp_rx_hist_find_entry);
+
 void dccp_tx_hist_purge_older(struct dccp_tx_hist *hist,
  struct list_head *list,
  struct dccp_tx_hist_entry *packet)
diff --git a/net/dccp/ccids/lib/packet_history.h 
b/net/dccp/ccids/lib/packet_history.h
index 27c4309..aea9c5d 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -106,6 +106,8 @@ static inline void dccp_tx_hist_entry_de
 extern struct dccp_tx_hist_entry *
dccp_tx_hist_find_entry(const struct list_head *list,
const u64 seq);
+extern int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq,
+   u8 *ccval);
 
 static inline void dccp_tx_hist_add_entry(struct list_head *list,
  struct dccp_tx_hist_entry *entry)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] [DCCP]: Introduces follows48 function

2006-08-23 Thread Ian McDonald
This adds a new function to see if two sequence numbers follow each other.

Signed off by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index b8931d3..84b477d 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -81,6 +81,14 @@ static inline u64 max48(const u64 seq1, 
return after48(seq1, seq2) ? seq1 : seq2;
 }
 
+/* is seq1 next seqno after seq2 */
+static inline int follows48(const u64 seq1, const u64 seq2)
+{
+   int diff = (seq1 & 0x) - (seq2 & 0x);
+   
+   return diff==1;
+}
+
 enum {
DCCP_MIB_NUM = 0,
DCCP_MIB_ACTIVEOPENS,   /* ActiveOpens */
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] [DCCP]: Update contact details and copyright

2006-08-23 Thread Ian McDonald
Just updating copyright and contacts

Signed off by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/CREDITS b/CREDITS
index 29be6d1..0fe904e 100644
--- a/CREDITS
+++ b/CREDITS
@@ -2209,7 +2209,7 @@ S: (address available on request)
 S: USA
 
 N: Ian McDonald
-E: [EMAIL PROTECTED]
+E: [EMAIL PROTECTED]
 E: [EMAIL PROTECTED]
 W: http://wand.net.nz/~iam4
 W: http://imcdnzl.blogspot.com
diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index c39bff7..0f85970 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -2,7 +2,7 @@
  *  net/dccp/ccids/ccid3.c
  *
  *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
- *  Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]>
+ *  Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]>
  *
  *  An implementation of the DCCP protocol
  *
@@ -1230,7 +1230,7 @@ static __exit void ccid3_module_exit(voi
 }
 module_exit(ccid3_module_exit);
 
-MODULE_AUTHOR("Ian McDonald <[EMAIL PROTECTED]>, "
+MODULE_AUTHOR("Ian McDonald <[EMAIL PROTECTED]>, "
  "Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>");
 MODULE_DESCRIPTION("DCCP TFRC CCID3 CCID");
 MODULE_LICENSE("GPL");
diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h
index 5ade4f6..22cb9f8 100644
--- a/net/dccp/ccids/ccid3.h
+++ b/net/dccp/ccids/ccid3.h
@@ -1,13 +1,13 @@
 /*
  *  net/dccp/ccids/ccid3.h
  *
- *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
+ *  Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand.
  *
  *  An implementation of the DCCP protocol
  *
  *  This code has been developed by the University of Waikato WAND
  *  research group. For further information please see http://www.wand.net.nz/
- *  or e-mail Ian McDonald - [EMAIL PROTECTED]
+ *  or e-mail Ian McDonald - [EMAIL PROTECTED]
  *
  *  This code also uses code from Lulea University, rereleased as GPL by its
  *  authors:
diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index 5d7b7d8..b93d9fc 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -2,7 +2,7 @@
  *  net/dccp/ccids/lib/loss_interval.c
  *
  *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
- *  Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]>
+ *  Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]>
  *  Copyright (c) 2005 Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
  *
  *  This program is free software; you can redistribute it and/or modify
diff --git a/net/dccp/ccids/lib/loss_interval.h 
b/net/dccp/ccids/lib/loss_interval.h
index 43bf782..dcb370a 100644
--- a/net/dccp/ccids/lib/loss_interval.h
+++ b/net/dccp/ccids/lib/loss_interval.h
@@ -4,7 +4,7 @@ #define _DCCP_LI_HIST_
  *  net/dccp/ccids/lib/loss_interval.h
  *
  *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
- *  Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]>
+ *  Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]>
  *  Copyright (c) 2005 Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
  *
  *  This program is free software; you can redistribute it and/or modify it
diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index 6739be1..7b6b03e 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -1,13 +1,13 @@
 /*
  *  net/dccp/packet_history.c
  *
- *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
+ *  Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand.
  *
  *  An implementation of the DCCP protocol
  *
  *  This code has been developed by the University of Waikato WAND
  *  research group. For further information please see http://www.wand.net.nz/
- *  or e-mail Ian McDonald - [EMAIL PROTECTED]
+ *  or e-mail Ian McDonald - [EMAIL PROTECTED]
  *
  *  This code also uses code from Lulea University, rereleased as GPL by its
  *  authors:
@@ -391,7 +391,7 @@ void dccp_tx_hist_purge(struct dccp_tx_h
 
 EXPORT_SYMBOL_GPL(dccp_tx_hist_purge);
 
-MODULE_AUTHOR("Ian McDonald <[EMAIL PROTECTED]>, "
+MODULE_AUTHOR("Ian McDonald <[EMAIL PROTECTED]>, "
  "Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>");
 MODULE_DESCRIPTION("DCCP TFRC library");
 MODULE_LICENSE("GPL");
diff --git a/net/dccp/ccids/lib/packet_history.h 
b/net/dccp/ccids/lib/packet_history.h
index 673c209..27c4309 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -1,13 +1,13 @@
 /*
  *  net/dccp/packet_history.h
  *
- *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
+ *  Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand.
  *
  *  An implementation of the DCCP protocol
  *
  *  This code has been developed by the University of Waikato WAND
  *  research 

[PATCH 1/7] [DCCP]: Fix typo

2006-08-23 Thread Ian McDonald
This fixes a small typo in net/dccp/libs/packet_history.c

Signed off by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index ad98d6a..6739be1 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -1,5 +1,5 @@
 /*
- *  net/dccp/packet_history.h
+ *  net/dccp/packet_history.c
  *
  *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
  *
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] [DCCP]: Fixes and enhancements

2006-08-23 Thread Ian McDonald
Please find following a series of patches for DCCP.

These have been tested against torvalds/linux-2.6.git and davem/net-2.6.19.git

My opinion is that 1 and 2 can go straight into 2.6.18 as documentation
changes only - Dave - are you able to do as Arnaldo is very busy at present.

I would love 3, 4, 5 to go into 2.6.18 as these resolve long standing CCID3
issues that have been in the DCCP tree since inception and have caught a
number of people.

Number 6 is just shifting code around to tidy it up and introduces no change
in logic. You could argue for it to go in either 2.6.18 or 2.6.19!

Number 7 is implementing transmit buffering and is 2.6.19 material.
Andrea - this might be quite useful for you in CCID2 as well I believe.

These patches are all capable of being done independently except 3, 4, 5 which
are a group.

Also on http://wand.net.nz/~iam4/dccp/patches/ are the following further
patches which are not ready for merge but others might be interested in:
-DCCP-Probe ala TCP-Probe
-The starts of memory buffer limiting (this is not actually needed for number
7 as it is actually receive where problems occur which is an existing issue)
-My research code

-- 
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-23 Thread Ian McDonald

I wonder whether designing-in a millisecond granularity is the right thing
to do.  If in a few years the kernel is running tickless with high-res clock
interrupt sources, that might look a bit lumpy.


I'd second that - when working on DCCP I've done a lot of the work in
microseconds and it made quite a difference instead of milliseconds
because of it's design.

I haven't followed kevents in great detail but it sounds like
something that could be useful for me with higher resolution timers
than milliseconds.
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: means to artificially alter the bandwidth of a system

2006-08-02 Thread Ian McDonald

>Hi,
>
>For research purposes we are considering to develop a program to alter
>the bandwidth of a system via the software, so instance: a machine has
>100 MB/s and we change it to 1MB/s.
>
>Does something like this already exist? Or is there a way to do this
>without creating a program/kernel module

Of course: see http://linux-net.osdl.org/index.php/Iproute2 (especially tc)

>Any help will be highly appreciated!
>
>Irfan Habib

HGN


You may also want to look at Netem
http://linux-net.osdl.org/index.php/Netem if you want to play with
delay, loss as well. The examples there are good but I can send
scripts for you as well if you wish.

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Who maintains the website ?

2006-07-26 Thread Ian McDonald

On 7/26/06, Christophe Devriese <[EMAIL PROTECTED]> wrote:

I would like to have a VLAN page on the main page, so that I can update it
a bit with relevant info, and then include the link to the external site
as it's basically a "here is a patch, here is a usage" page, while an
explanation of the different stuff would be nice (such as the forwarding
path, the vlan acceleration, where packets go ...).


What is the external page? If it doesn't exist consider putting the
content on the wiki itself so others can improve it.

Prepare the Wiki page including linking in the existing VLAN link on
the front page and then we can see what can be done. You can create a
VLAN page without having to change the wiki front page initially...

If it looks good then Stephen, or myself or others can change the front page.
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bandwidth limitation help

2006-07-25 Thread Ian McDonald

On 7/26/06, Piotrowski, Ted P. <[EMAIL PROTECTED]> wrote:

Hi,

I am new to the mailing list so I'm not sure if anybody reads these, but
here goes nothing. I recently read: Linux Advanced Routing & Traffic
Control HOWTO and have been trying to test my applications using
bandwidth limitation. All the examples described in the HOWTO do not
simulate the conditions I need to test my software. What I would like is
for my bandwidth limitation to empty my UDP buffer at a given rate. I
have tried using a simple TBF to do this, but all that happens is that
my application floods the TBF buffer at link speed and the TBF buffer
quickly overflows and drops packets. I want the packets to actually stay
in the UDP buffer and be emptied at a given rate without modifying my
application.

I don't know if any of you are familiar with netem, but it can be used
in conjuction with tc to add delay to a link. Surprisingly, packets
delayed by netem appear to remain in the UDP buffer until it is time for
them to be sent. I would like this same behavior of keeping the packets
in the UDP buffer, but with bandwidth limitation on the rate at which
the buffer empties, not just packet delay. Has anybody ever done
anything like this or can point me to some resources?


Have a look at:
http://linux-net.osdl.org/index.php/Netem

I have written my own test scenarios using examples from this website
but I can also send you my small scripts if you want.

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Who maintains the website ?

2006-07-25 Thread Ian McDonald

On 7/26/06, Christophe Devriese <[EMAIL PROTECTED]> wrote:

The http://linux-net.osdl.org/index.php/Main_Page website I mean.


It's a Wiki so anybody can alter content on the website. The exception
to this is that particular page - the main page. If you want something
altered on that particular page send email to one of the sysops.
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Netchannles: first stage has been completed. Further ideas.

2006-07-20 Thread Ian McDonald


If we consider netchannels as how Van Jackobson discribed them, then
mutext is not needed, since it is impossible to have several readers or
writers. But in socket case even if there is only one userspace
consumer, that lock must be held to protect against bh (or introduce
several queues and complicate a lot their's management (ucopy for
example)).


As I recall Van's talk you don't need a lock with a ring buffer if you
have a start and end variable pointing to location within ring buffer.

He didn't explain this in great depth as it is computer science 101
but here is how I would explain it:

Once socket is initialiased consumer is the only one that sets start
variable and network driver reads this only. It is the other way
around for the end variable. As long as the writes are atomic then you
are fine. You only need one ring buffer in this scenario and two
atomic variables.

Having atomic writes does have overhead but far less than locking semantic.
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] net: fix __sk_stream_mem_reclaim

2006-07-12 Thread Ian McDonald

__sk_stream_mem_reclaim is only called by sk_stream_mem_reclaim.

As such the check on sk->sk_forward_alloc is not needed and can be removed.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/net/core/stream.c b/net/core/stream.c
index e948969..d1d7dec 100644
--- a/net/core/stream.c
+++ b/net/core/stream.c
@@ -196,15 +196,13 @@ EXPORT_SYMBOL(sk_stream_error);

void __sk_stream_mem_reclaim(struct sock *sk)
{
-   if (sk->sk_forward_alloc >= SK_STREAM_MEM_QUANTUM) {
-   atomic_sub(sk->sk_forward_alloc / SK_STREAM_MEM_QUANTUM,
-  sk->sk_prot->memory_allocated);
-   sk->sk_forward_alloc &= SK_STREAM_MEM_QUANTUM - 1;
-   if (*sk->sk_prot->memory_pressure &&
-   (atomic_read(sk->sk_prot->memory_allocated) <
-sk->sk_prot->sysctl_mem[0]))
-   *sk->sk_prot->memory_pressure = 0;
-   }
+   atomic_sub(sk->sk_forward_alloc / SK_STREAM_MEM_QUANTUM,
+  sk->sk_prot->memory_allocated);
+   sk->sk_forward_alloc &= SK_STREAM_MEM_QUANTUM - 1;
+   if (*sk->sk_prot->memory_pressure &&
+   (atomic_read(sk->sk_prot->memory_allocated) <
+sk->sk_prot->sysctl_mem[0]))
+   *sk->sk_prot->memory_pressure = 0;
}

EXPORT_SYMBOL(__sk_stream_mem_reclaim);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] net: fix __sk_stream_mem_reclaim

2006-07-12 Thread Ian McDonald
__sk_stream_mem_reclaim is only called by sk_stream_mem_reclaim.

As such the check on sk->sk_forward_alloc is not needed and can be removed.

At the same time remove the EXPORT_SYMBOL_GPL as not needed and shift it
into include/net/sock.h

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/include/net/sock.h b/include/net/sock.h
index 324b3ea..3a62b5b 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -694,7 +694,6 @@ static inline struct inode *SOCK_INODE(s
return &container_of(socket, struct socket_alloc, socket)->vfs_inode;
 }
 
-extern void __sk_stream_mem_reclaim(struct sock *sk);
 extern int sk_stream_mem_schedule(struct sock *sk, int size, int kind);
 
 #define SK_STREAM_MEM_QUANTUM ((int)PAGE_SIZE)
@@ -704,6 +703,17 @@ static inline int sk_stream_pages(int am
return (amt + SK_STREAM_MEM_QUANTUM - 1) / SK_STREAM_MEM_QUANTUM;
 }
 
+static void __sk_stream_mem_reclaim(struct sock *sk)
+{
+   atomic_sub(sk->sk_forward_alloc / SK_STREAM_MEM_QUANTUM,
+  sk->sk_prot->memory_allocated);
+   sk->sk_forward_alloc &= SK_STREAM_MEM_QUANTUM - 1;
+   if (*sk->sk_prot->memory_pressure &&
+   (atomic_read(sk->sk_prot->memory_allocated) <
+sk->sk_prot->sysctl_mem[0]))
+   *sk->sk_prot->memory_pressure = 0;
+}
+
 static inline void sk_stream_mem_reclaim(struct sock *sk)
 {
if (sk->sk_forward_alloc >= SK_STREAM_MEM_QUANTUM)
diff --git a/net/core/stream.c b/net/core/stream.c
index e948969..8ff97e6 100644
--- a/net/core/stream.c
+++ b/net/core/stream.c
@@ -194,21 +194,6 @@ int sk_stream_error(struct sock *sk, int
 
 EXPORT_SYMBOL(sk_stream_error);
 
-void __sk_stream_mem_reclaim(struct sock *sk)
-{
-   if (sk->sk_forward_alloc >= SK_STREAM_MEM_QUANTUM) {
-   atomic_sub(sk->sk_forward_alloc / SK_STREAM_MEM_QUANTUM,
-  sk->sk_prot->memory_allocated);
-   sk->sk_forward_alloc &= SK_STREAM_MEM_QUANTUM - 1;
-   if (*sk->sk_prot->memory_pressure &&
-   (atomic_read(sk->sk_prot->memory_allocated) <
-sk->sk_prot->sysctl_mem[0]))
-   *sk->sk_prot->memory_pressure = 0;
-   }
-}
-
-EXPORT_SYMBOL(__sk_stream_mem_reclaim);
-
 int sk_stream_mem_schedule(struct sock *sk, int size, int kind)
 {
int amt = sk_stream_pages(size);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unnecessary check in __sk_stream_mem_reclaim?

2006-07-11 Thread Ian McDonald

On 7/12/06, Herbert Xu <[EMAIL PROTECTED]> wrote:

Ian McDonald <[EMAIL PROTECTED]> wrote:
>
> It looks to me like this check here in net/core/stream.c for
> __sk_stream_mem_reclaim:
>if (sk->sk_forward_alloc >= SK_STREAM_MEM_QUANTUM) {
>
> is unnecessary.

It's needed after skb's have been freed which can push sk_forward_alloc
above a quantum.


I'm not saying the check is unneeded - just saying doing it twice is unneeded.

Sorry Herbert for two copies - forgot to add netdev first time.
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unnecessary check in __sk_stream_mem_reclaim?

2006-07-11 Thread Ian McDonald

Folks,

It looks to me like this check here in net/core/stream.c for
__sk_stream_mem_reclaim:
if (sk->sk_forward_alloc >= SK_STREAM_MEM_QUANTUM) {

is unnecessary.

It is also done in include/net/sock.h for sk_stream_mem_reclaim which
if the test succeeds calls __sk_stream_mem_reclaim. This is the only
use of it in the kernel.

Now sk_stream_mem_reclaim seems to be in the current form for
perfomance reasons which make sense so I think it makes sense to
remove it from __sk_stream_mem_reclaim

The danger of removing the check is an external module could use it -
which I suspect is highly unlikely. This could be overcome by removing
the export_symbol_gpl and shifting the function into the header file
although this would result in mutliple instances being linked in. I am
guessing that there is a smarter way to do this though which still
results in the symbol not being exported. I don't know my way around
the linking/exporting very well.

Comments? I guess if this was done it would have to be put in feature
removal schedule though because it is currently exported?

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] net/dccp/: possible cleanups

2006-06-28 Thread Ian McDonald

Comments below:

On 6/29/06, Adrian Bunk <[EMAIL PROTECTED]> wrote:

This patch contains the following possible cleanups:
- sysctl.c: the Kconfig rules already disallow CONFIG_SYSCTL=n,
there's no need for an additional check

Agree


- proper extern declarations for some variables in dccp.h

NAK - have sent another patch to shift these to feat.h. Arnaldo is
reviewing patches next week.


- make the following needlessly global function static:
  - ipv4.c: dccp_v4_checksum()

Agree


- #if 0 the following unused functions:
  - ackvec.c: dccp_ackvector_print()
  - ackvec.c: dccp_ackvec_print()
  - output.c: dccp_send_delayed_ack()


NAK on the first two. These are for debugging and DCCP still needs
improving so I think worthwhile having there in short term so we can
quickly call them if needed.

I will leave Arnaldo or Andrea to comment on last one...

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Locking validator output on DCCP

2006-06-22 Thread Ian McDonald

On 6/22/06, Ian McDonald <[EMAIL PROTECTED]> wrote:

On 6/21/06, Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> On Wed, 2006-06-21 at 10:34 +1000, Herbert Xu wrote:
> > > As I read this it is not a recursive lock as sk_clone is occurring
> > > second and is actually creating a new socket so they are trying to
> > > lock on different sockets.
> > >
> > > Can someone tell me whether I am correct in my thinking or not? If I
> > > am then I will work out how to tell the lock validator not to worry
> > > about it.
> >
> > I agree, this looks bogus.  Ingo, could you please take a look?
>
> Fix is relatively easy:
>
>
> sk_clone creates a new socket, and thus can never deadlock, and in fact
> can be called with the original socket locked. This therefore is a
> legitimate nesting case; mark it as such.
>
> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
>
>
> ---
>  net/core/sock.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-2.6.17-rc6-mm2/net/core/sock.c
> ===
> --- linux-2.6.17-rc6-mm2.orig/net/core/sock.c
> +++ linux-2.6.17-rc6-mm2/net/core/sock.c
> @@ -846,7 +846,7 @@ struct sock *sk_clone(const struct sock
> /* SANITY */
> sk_node_init(&newsk->sk_node);
> sock_lock_init(newsk);
> -   bh_lock_sock(newsk);
> +   bh_lock_sock_nested(newsk);
>
> atomic_set(&newsk->sk_rmem_alloc, 0);
> atomic_set(&newsk->sk_wmem_alloc, 0);
>
>
When I do this it now shifts around. I'll investigate further
(probably tomorrow).

Now get

Jun 22 14:20:48 localhost kernel: [ 1276.424531]
=
Jun 22 14:20:48 localhost kernel: [ 1276.424541] [ INFO: possible
recursive locking detected ]
Jun 22 14:20:48 localhost kernel: [ 1276.424546]
-
Jun 22 14:20:48 localhost kernel: [ 1276.424553] idle/0 is trying to
acquire lock:
Jun 22 14:20:48 localhost kernel: [ 1276.424559]
(&sk->sk_lock.slock#5/1){-+..}, at: [] sk_clone+0x5f/0x195
Jun 22 14:20:48 localhost kernel: [ 1276.424585]
Jun 22 14:20:48 localhost kernel: [ 1276.424587] but task is already
holding lock:
Jun 22 14:20:48 localhost kernel: [ 1276.424592]
(&sk->sk_lock.slock#5/1){-+..}, at: []
tcp_v4_rcv+0x42e/0x9b3
Jun 22 14:20:48 localhost kernel: [ 1276.424616]
Jun 22 14:20:48 localhost kernel: [ 1276.424618] other info that might
help us debug this:
Jun 22 14:20:48 localhost kernel: [ 1276.424624] 2 locks held by idle/0:
Jun 22 14:20:48 localhost kernel: [ 1276.424628]  #0:
(&tp->rx_lock){-+..}, at: [] rtl8139_poll+0x42/0x41c
[8139too]
Jun 22 14:20:48 localhost kernel: [ 1276.424666]  #1:
(&sk->sk_lock.slock#5/1){-+..}, at: []
tcp_v4_rcv+0x42e/0x9b3
Jun 22 14:20:48 localhost kernel: [ 1276.424685]
Jun 22 14:20:48 localhost kernel: [ 1276.424686] stack backtrace:
Jun 22 14:20:48 localhost kernel: [ 1276.425002]  []
show_trace_log_lvl+0x53/0xff
Jun 22 14:20:48 localhost kernel: [ 1276.425038]  []
show_trace+0x16/0x19
Jun 22 14:20:48 localhost kernel: [ 1276.425068]  []
dump_stack+0x1a/0x1f
Jun 22 14:20:48 localhost kernel: [ 1276.425099]  []
__lock_acquire+0x8e6/0x902
Jun 22 14:20:48 localhost kernel: [ 1276.425311]  []
lock_acquire+0x4e/0x66
Jun 22 14:20:48 localhost kernel: [ 1276.425510]  []
_spin_lock_nested+0x26/0x36
Jun 22 14:20:48 localhost kernel: [ 1276.425726]  []
sk_clone+0x5f/0x195
Jun 22 14:20:48 localhost kernel: [ 1276.427191]  []
inet_csk_clone+0xf/0x67
Jun 22 14:20:48 localhost kernel: [ 1276.428879]  []
tcp_create_openreq_child+0x15/0x32b
Jun 22 14:20:48 localhost kernel: [ 1276.430598]  []
tcp_v4_syn_recv_sock+0x47/0x29c
Jun 22 14:20:48 localhost kernel: [ 1276.432313]  []
tcp_v6_syn_recv_sock+0x37/0x534 [ipv6]
Jun 22 14:20:48 localhost kernel: [ 1276.432482]  []
tcp_check_req+0x1a0/0x2db
Jun 22 14:20:48 localhost kernel: [ 1276.434198]  []
tcp_v4_do_rcv+0x9f/0x2fe
Jun 22 14:20:48 localhost kernel: [ 1276.435911]  []
tcp_v4_rcv+0x932/0x9b3
Jun 22 14:20:48 localhost kernel: [ 1276.437632]  []
ip_local_deliver+0x159/0x1f1
Jun 22 14:20:48 localhost kernel: [ 1276.439305]  []
ip_rcv+0x3e9/0x416
Jun 22 14:20:48 localhost kernel: [ 1276.440977]  []
netif_receive_skb+0x287/0x317
Jun 22 14:20:48 localhost kernel: [ 1276.442542]  []
rtl8139_poll+0x294/0x41c [8139too]
Jun 22 14:20:48 localhost kernel: [ 1276.442590]  []
net_rx_action+0x8b/0x17c
Jun 22 14:20:48 localhost kernel: [ 1276.444160]  []
__do_softirq+0x54/0xb3
Jun 22 14:20:48 localhost kernel: [ 1276.444335]  []
do_softirq+0x2f/0x47
Jun 22 14:20:48 localhost kernel: [ 1276.60]  []
irq_exit+0x39/0x46
Jun 22 14:20:48 localhost kernel: [ 1276.444585]  [] do_IRQ+0x77/0x84
Jun 22 14:20:48 localhost k

Re: Locking validator output on DCCP

2006-06-21 Thread Ian McDonald

On 6/21/06, Ingo Molnar <[EMAIL PROTECTED]> wrote:


* Herbert Xu <[EMAIL PROTECTED]> wrote:

> > Can someone tell me whether I am correct in my thinking or not? If I
> > am then I will work out how to tell the lock validator not to worry
> > about it.
>
> I agree, this looks bogus.  Ingo, could you please take a look?

sure - Ian, could you try Arjan's fix below?

Ingo


Subject: lock validator: annotate vlan "master" device locks
From: Arjan van de Ven <[EMAIL PROTECTED]>


The fix you sent here was the incorrect one but I did test Arjan's as
per previous e-mail.

Real dumb question time. The lock validator is testing for recursive
lock holding. Given that this is a lock at a different address can we
eliminate all such cases? Or are you trying to detect code here that
keeps on locking same type of lock in case of error and we should
explicitly flag...

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Locking validator output on DCCP

2006-06-21 Thread Ian McDonald

On 6/21/06, Arjan van de Ven <[EMAIL PROTECTED]> wrote:

On Wed, 2006-06-21 at 10:34 +1000, Herbert Xu wrote:
> > As I read this it is not a recursive lock as sk_clone is occurring
> > second and is actually creating a new socket so they are trying to
> > lock on different sockets.
> >
> > Can someone tell me whether I am correct in my thinking or not? If I
> > am then I will work out how to tell the lock validator not to worry
> > about it.
>
> I agree, this looks bogus.  Ingo, could you please take a look?

Fix is relatively easy:


sk_clone creates a new socket, and thus can never deadlock, and in fact
can be called with the original socket locked. This therefore is a
legitimate nesting case; mark it as such.

Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>


---
 net/core/sock.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.17-rc6-mm2/net/core/sock.c
===
--- linux-2.6.17-rc6-mm2.orig/net/core/sock.c
+++ linux-2.6.17-rc6-mm2/net/core/sock.c
@@ -846,7 +846,7 @@ struct sock *sk_clone(const struct sock
/* SANITY */
sk_node_init(&newsk->sk_node);
sock_lock_init(newsk);
-   bh_lock_sock(newsk);
+   bh_lock_sock_nested(newsk);

atomic_set(&newsk->sk_rmem_alloc, 0);
atomic_set(&newsk->sk_wmem_alloc, 0);



When I do this it now shifts around. I'll investigate further
(probably tomorrow).

Now get

Jun 22 14:20:48 localhost kernel: [ 1276.424531]
=
Jun 22 14:20:48 localhost kernel: [ 1276.424541] [ INFO: possible
recursive locking detected ]
Jun 22 14:20:48 localhost kernel: [ 1276.424546]
-
Jun 22 14:20:48 localhost kernel: [ 1276.424553] idle/0 is trying to
acquire lock:
Jun 22 14:20:48 localhost kernel: [ 1276.424559]
(&sk->sk_lock.slock#5/1){-+..}, at: [] sk_clone+0x5f/0x195
Jun 22 14:20:48 localhost kernel: [ 1276.424585]
Jun 22 14:20:48 localhost kernel: [ 1276.424587] but task is already
holding lock:
Jun 22 14:20:48 localhost kernel: [ 1276.424592]
(&sk->sk_lock.slock#5/1){-+..}, at: []
tcp_v4_rcv+0x42e/0x9b3
Jun 22 14:20:48 localhost kernel: [ 1276.424616]
Jun 22 14:20:48 localhost kernel: [ 1276.424618] other info that might
help us debug this:
Jun 22 14:20:48 localhost kernel: [ 1276.424624] 2 locks held by idle/0:
Jun 22 14:20:48 localhost kernel: [ 1276.424628]  #0:
(&tp->rx_lock){-+..}, at: [] rtl8139_poll+0x42/0x41c
[8139too]
Jun 22 14:20:48 localhost kernel: [ 1276.424666]  #1:
(&sk->sk_lock.slock#5/1){-+..}, at: []
tcp_v4_rcv+0x42e/0x9b3
Jun 22 14:20:48 localhost kernel: [ 1276.424685]
Jun 22 14:20:48 localhost kernel: [ 1276.424686] stack backtrace:
Jun 22 14:20:48 localhost kernel: [ 1276.425002]  []
show_trace_log_lvl+0x53/0xff
Jun 22 14:20:48 localhost kernel: [ 1276.425038]  []
show_trace+0x16/0x19
Jun 22 14:20:48 localhost kernel: [ 1276.425068]  []
dump_stack+0x1a/0x1f
Jun 22 14:20:48 localhost kernel: [ 1276.425099]  []
__lock_acquire+0x8e6/0x902
Jun 22 14:20:48 localhost kernel: [ 1276.425311]  []
lock_acquire+0x4e/0x66
Jun 22 14:20:48 localhost kernel: [ 1276.425510]  []
_spin_lock_nested+0x26/0x36
Jun 22 14:20:48 localhost kernel: [ 1276.425726]  []
sk_clone+0x5f/0x195
Jun 22 14:20:48 localhost kernel: [ 1276.427191]  []
inet_csk_clone+0xf/0x67
Jun 22 14:20:48 localhost kernel: [ 1276.428879]  []
tcp_create_openreq_child+0x15/0x32b
Jun 22 14:20:48 localhost kernel: [ 1276.430598]  []
tcp_v4_syn_recv_sock+0x47/0x29c
Jun 22 14:20:48 localhost kernel: [ 1276.432313]  []
tcp_v6_syn_recv_sock+0x37/0x534 [ipv6]
Jun 22 14:20:48 localhost kernel: [ 1276.432482]  []
tcp_check_req+0x1a0/0x2db
Jun 22 14:20:48 localhost kernel: [ 1276.434198]  []
tcp_v4_do_rcv+0x9f/0x2fe
Jun 22 14:20:48 localhost kernel: [ 1276.435911]  []
tcp_v4_rcv+0x932/0x9b3
Jun 22 14:20:48 localhost kernel: [ 1276.437632]  []
ip_local_deliver+0x159/0x1f1
Jun 22 14:20:48 localhost kernel: [ 1276.439305]  []
ip_rcv+0x3e9/0x416
Jun 22 14:20:48 localhost kernel: [ 1276.440977]  []
netif_receive_skb+0x287/0x317
Jun 22 14:20:48 localhost kernel: [ 1276.442542]  []
rtl8139_poll+0x294/0x41c [8139too]
Jun 22 14:20:48 localhost kernel: [ 1276.442590]  []
net_rx_action+0x8b/0x17c
Jun 22 14:20:48 localhost kernel: [ 1276.444160]  []
__do_softirq+0x54/0xb3
Jun 22 14:20:48 localhost kernel: [ 1276.444335]  []
do_softirq+0x2f/0x47
Jun 22 14:20:48 localhost kernel: [ 1276.60]  []
irq_exit+0x39/0x46
Jun 22 14:20:48 localhost kernel: [ 1276.444585]  [] do_IRQ+0x77/0x84
Jun 22 14:20:48 localhost kernel: [ 1276.444621]  []
common_interrupt+0x25/0x2c



--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe fr

Locking validator output on DCCP

2006-06-20 Thread Ian McDonald

Folks,

I am getting this when I am using DCCP with 2.6.17-rc6-mm2 with Ingo's
lock dependency patch:

Jun 21 09:38:58 localhost kernel: [  102.068588]
Jun 21 09:38:58 localhost kernel: [  102.068592]
=
Jun 21 09:38:58 localhost kernel: [  102.068602] [ INFO: possible
recursive locking detected ]
Jun 21 09:38:58 localhost kernel: [  102.068608]
-
Jun 21 09:38:58 localhost kernel: [  102.068615] idle/0 is trying to
acquire lock:
Jun 21 09:38:58 localhost kernel: [  102.068620]
(&sk->sk_lock.slock#3){-+..}, at: [] sk_clone+0x5a/0x190
Jun 21 09:38:58 localhost kernel: [  102.068644]
Jun 21 09:38:58 localhost kernel: [  102.068646] but task is already
holding lock:
Jun 21 09:38:58 localhost kernel: [  102.068651]
(&sk->sk_lock.slock#3){-+..}, at: []
sk_receive_skb+0xe6/0xfe
Jun 21 09:38:58 localhost kernel: [  102.068668]
Jun 21 09:38:58 localhost kernel: [  102.068670] other info that might
help us debug this:
Jun 21 09:38:58 localhost kernel: [  102.068676] 2 locks held by idle/0:
Jun 21 09:38:58 localhost kernel: [  102.068679]  #0:
(&tp->rx_lock){-+..}, at: [] rtl8139_poll+0x42/0x41c
[8139too]
Jun 21 09:38:58 localhost kernel: [  102.068722]  #1:
(&sk->sk_lock.slock#3){-+..}, at: []
sk_receive_skb+0xe6/0xfe
Jun 21 09:38:58 localhost kernel: [  102.068739]
Jun 21 09:38:58 localhost kernel: [  102.068741] stack backtrace:
Jun 21 09:38:58 localhost kernel: [  102.069053]  []
show_trace_log_lvl+0x53/0xff
Jun 21 09:38:58 localhost kernel: [  102.069091]  []
show_trace+0x16/0x19
Jun 21 09:38:58 localhost kernel: [  102.069121]  []
dump_stack+0x1a/0x1f
Jun 21 09:38:58 localhost kernel: [  102.069151]  []
__lock_acquire+0x8e6/0x902
Jun 21 09:38:58 localhost kernel: [  102.069363]  []
lock_acquire+0x4e/0x66
Jun 21 09:38:58 localhost kernel: [  102.069562]  []
_spin_lock+0x24/0x32
Jun 21 09:38:58 localhost kernel: [  102.069777]  []
sk_clone+0x5a/0x190
Jun 21 09:38:58 localhost kernel: [  102.071244]  []
inet_csk_clone+0xf/0x67
Jun 21 09:38:58 localhost kernel: [  102.072932]  []
dccp_create_openreq_child+0x17/0x2fe [dccp]
Jun 21 09:38:58 localhost kernel: [  102.072993]  []
dccp_v4_request_recv_sock+0x47/0x260 [dccp_ipv4]
Jun 21 09:38:58 localhost kernel: [  102.073020]  []
dccp_check_req+0x128/0x264 [dccp]
Jun 21 09:38:58 localhost kernel: [  102.073049]  []
dccp_v4_do_rcv+0x74/0x290 [dccp_ipv4]
Jun 21 09:38:58 localhost kernel: [  102.073067]  []
sk_receive_skb+0x6b/0xfe
Jun 21 09:38:58 localhost kernel: [  102.074607]  []
dccp_v4_rcv+0x4ea/0x66e [dccp_ipv4]
Jun 21 09:38:58 localhost kernel: [  102.074651]  []
ip_local_deliver+0x159/0x1f1
Jun 21 09:38:58 localhost kernel: [  102.076322]  []
ip_rcv+0x3e9/0x416
Jun 21 09:38:58 localhost kernel: [  102.077995]  []
netif_receive_skb+0x287/0x317
Jun 21 09:38:58 localhost kernel: [  102.079562]  []
rtl8139_poll+0x294/0x41c [8139too]
Jun 21 09:38:58 localhost kernel: [  102.079610]  []
net_rx_action+0x8b/0x17c
Jun 21 09:38:58 localhost kernel: [  102.081181]  []
__do_softirq+0x54/0xb3
Jun 21 09:38:58 localhost kernel: [  102.081357]  []
do_softirq+0x2f/0x47
Jun 21 09:38:58 localhost kernel: [  102.081482]  []
irq_exit+0x39/0x46
Jun 21 09:38:58 localhost kernel: [  102.081608]  [] do_IRQ+0x77/0x84
Jun 21 09:38:58 localhost kernel: [  102.081644]  []
common_interrupt+0x25/0x2c
Jun 21 09:38:58 localhost kernel: [  154.463644] CCID: Registered CCID 3 (ccid3)

The code of sk_clone (net/core/sock.c) is:

struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
{
struct sock *newsk = sk_alloc(sk->sk_family, priority, sk->sk_prot, 0);

if (newsk != NULL) {
struct sk_filter *filter;

memcpy(newsk, sk, sk->sk_prot->obj_size);

/* SANITY */
sk_node_init(&newsk->sk_node);
sock_lock_init(newsk);

The relevant code is the sock_lock_init

The code of sk_receive_skb (net/core/sock.c) is:

int sk_receive_skb(struct sock *sk, struct sk_buff *skb)
{
int rc = NET_RX_SUCCESS;

if (sk_filter(sk, skb, 0))
goto discard_and_relse;

skb->dev = NULL;

bh_lock_sock(sk);

The relevant code is the bh_lock_sock.

As I read this it is not a recursive lock as sk_clone is occurring
second and is actually creating a new socket so they are trying to
lock on different sockets.

Can someone tell me whether I am correct in my thinking or not? If I
am then I will work out how to tell the lock validator not to worry
about it.

Thanks,

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to submit a new module to linux kernel?

2006-05-22 Thread Ian McDonald

On 5/23/06, Erik Mouw <[EMAIL PROTECTED]> wrote:

On Mon, May 22, 2006 at 03:18:12PM +0800, #ZHOU BIN# wrote:
> I'm new in this mailing list. I implemented a new TCP congestion
> control module for linux kernel 2.6.16.13.
> Does anybody know how to apply for the integration of it into the
> linux kernel? How long will this process take?

See Documentation/SubmittingPatches in your kernel tree.



I would also add that for this type of patch a peer reviewed paper
outlining the congestion control work would be useful/needed.

Every person wants to improve TCP and has written a congestion control
mechanism (myself included!) but that doesn't mean it is worth
including in the kernel (mine certainly isn't).

In particular for TCP congestion control you need to show in what
cases it is better than others, what cases it is worse and how fair is
it compared to other TCP flows.

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: address pingable with interface down

2006-05-10 Thread Ian McDonald

So where's the linux networking faq? I've been lurking here long enough
to know that there's no shortage of faqs, but there's no canonical
netdev faq that i'm aware of. Maybe one should be started?

Jason


http://linux-net.osdl.org/index.php/ is the linux networking canonical wiki.

I've added this FAQ under IPv4. I'm sure if this isn't the best place
someone will shift it being a wiki :-)

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: latest -stable breaks Squid

2006-05-03 Thread Ian McDonald

On 5/4/06, Ben Greear <[EMAIL PROTECTED]> wrote:

Herbert Xu wrote:
> Dave Jones <[EMAIL PROTECTED]> wrote:
>
>>So I pushed out an update for Fedora Core 5 users yesterday
>>that moved the kernel from 2.6.16.9 to 2.6.16.13.
>>I've since heard "My network performance is awful", and worse
>>yet, some apps seem broken as in the report below.
>>
>>Anyone have any ideas ?
>
>
> Try reverting the e1000 truesize patch.  Although the fix is 100%
> correct, it might have a negative impact on user-space apps with
> particuarly small rcvbuf settings.  Prior to the fix, due to the
> incorrect accounting we are essentially enlarging rcvbuf by as much
> as 10 times.

At least one of the reports shows problems with non e1000 NICs, so it's
probably not just the e1000 change.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=190620

Ben


Wouldn't it be more likely commit 5d0b6f2bdaf7e016e750cd24164a241512d968a3

as this touches net/ipv4/tcp_output.c and is also in same general area?
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [offlist] Re: [LARTC] how to do probabilistic packet loss in kernel?

2006-04-19 Thread Ian McDonald
On 4/20/06, George Nychis <[EMAIL PROTECTED]> wrote:
> Hey Martin,
>
> I was able to do it with netem and its working great now.
>
> I've actually moved on to another challenge, I would like to drop
> packets at the hardware level such as to see rate control.
>
Have a look at:
http://linux-net.osdl.org/index.php/Netem#Rate_control

Works well for me...

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to do probabilistic packet loss in kernel?

2006-04-16 Thread Ian McDonald
On 4/17/06, George P Nychis <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am using iproute2 to setup fowarding, adding routes like "ip route add 
> 192.168.1.3 via 192.168.1.2"
>
> I was wondering where in the kernel I can insert probabilistic packet loss 
> only for forwarded packets?  So that for instance I can drop 5% of all 
> forwarded packets?
>
Have a look at:
http://linux-net.osdl.org/index.php/Netem

--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Iperf support for DCCP and selectable TCP congestion control

2006-03-30 Thread Ian McDonald
Folks,

I've just posted a patch at
http://wand.net.nz/~iam4/software/congestion-iperf-2.0.2-1.diff which
adds being able to select the TCP congestion control mechanism to
iperf for TCP performance testing. Thanks to Angelo Castellani for
writing this which I have tidied up a little and merged into the patch
which also has DCCP support.

Hope the netdev people don't mind me spamming the list but I find this
quite a useful testing tool whenever testing TCP/DCCP changes to see
if regression or progression is occurring...

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Writing a rate based transport protocol

2006-03-22 Thread Ian McDonald
>  The qdiscs would ideally exist at the layer 2 / layer 3 boundary like
> existing qdiscs, but the problem is getting the scheduling parameters down
> that far.  Perhaps a transport protocol could create a tagged route entry
> with the appropriate parameters, the routing layer could assign skbs to it
> by flow tag, and the qdisc could refer to the route entry and some sort of
> skb sequence number to derive the appropriate scheduling information.
>
I'm trying to do something a little bit different but I send expiry
time down through the msg structure with dccp_sendmsg and then check
it in dccp_write_xmit and discard there. This is trivial to implement
(even I managed it!) but I think you are wanting to do it down one
layer.

If you want further info or sourcecode then feel free to take this
discussion further with me offline...

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Writing a rate based transport protocol

2006-03-22 Thread Ian McDonald
On 3/23/06, Mark Butler <[EMAIL PROTECTED]> wrote:
> I understand that timed intervals between individual packets is not
> realistic in general.  What I have in mind is a fixed granularity
> transmission timer, where packets are assigned to buckets, and
> transmitted one bucket per timer expiration.

Why is it not realistic?
>
>  >From a protocol design point of view, the main question is which is
> more expensive, rate based timer expiration, or generating ACKs at a
> high enough rate to self clock.  With 1 Gb/sec reliable transport
> protocol, every other packet ACK generation, and Ethernet MTU size
> packets, ACKs are generated every 30 usec on average. With Van Jacobsen
> style pre-queuing a large percentage of them are wasted overhead,
> because a long series of them accumulate in the prequeue before the
> receiving thread is activated.

Various others look at doing things like certain number of ACKs per
RTT rather than per packet or fixed number of packets.

For both using interpacket intervals and differing ACK strategies have
a look at TFRC:
http://www.icir.org/tfrc/

This seems to work quite well for me in all the testing I've done
although I have only tested up to 100 Mbits - but this tested OK on
500 MHz machines so newer machines should handle faster rates well.

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Writing a rate based transport protocol

2006-03-22 Thread Ian McDonald
> The bigger problem is that too be effective rate control needs accurate
> real time. Linux is doing better at real time, but still providing useful
> high speed inter packet spacing is beyond the current capabilities. To get
> around this I think most high speed 10G cards provide some form of rate 
> control
> in firmware.
> -
At present most of the network timing for TCP (and probably other
protocols) is in milliseconds for measures such as RTT. We found that
when writing DCCP CCID3 that this did not provide enough granularity
when you find interpacket interval by taking 1/transmit
rate*constant As such we put quite a few things in microseconds.
If you are serious about writing this have a look at net/dccp files
and ccid3 in particular - for example we also put in a whole lot of
integer division code in there which you will find useful.

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net-2.6.17 rebased...

2006-03-01 Thread Ian McDonald
And *** again... someone pointed out my mailer is wrapping lines. I'll
go hide in the corner and beat myself up. This time attached as I
can't get gmail to defeat line wrapping. I promise I'll get it right
next patch so I don't humiliate myself quite so much next time

Dave,
If you get a chance can you push the ccid3 divide by zero fix upstream
to Linus for 2.6.16 as it has no functionality changed and eliminates
a nasty little bug...
The commit for this is b6da19617f4ab610d3d90bcbdf65fa7e2b3d7b53 in your tree

I have also put at end of this e-mail after reapplying on linus tree
so above commit doesn't have fuzz...

[DCCP] ccid3: Divide by zero fix

In rare circumstances 0 is returned by dccp_li_hist_calc_i_mean which leads to
a divide by zero in ccid3_hc_rx_packet_recv. Explicitly check for zero return
now. Update copyright notice at same time.

Found by Arnaldo.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index aa68e0a..35d1d34 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -2,7 +2,7 @@
  *  net/dccp/ccids/ccid3.c
  *
  *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
- *  Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]>
+ *  Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]>
  *
  *  An implementation of the DCCP protocol
  *
@@ -1033,9 +1033,13 @@ static void ccid3_hc_rx_packet_recv(stru
 	p_prev = hcrx->ccid3hcrx_p;
 	
 	/* Calculate loss event rate */
-	if (!list_empty(&hcrx->ccid3hcrx_li_hist))
+	if (!list_empty(&hcrx->ccid3hcrx_li_hist)) {
+		u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist);
+
 		/* Scaling up by 100 as fixed decimal */
-		hcrx->ccid3hcrx_p = 100 / dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist);
+		if (i_mean != 0)
+			hcrx->ccid3hcrx_p = 100 / i_mean;
+	}
 
 	if (hcrx->ccid3hcrx_p > p_prev) {
 		ccid3_hc_rx_send_feedback(sk);


Re: net-2.6.17 rebased...

2006-03-01 Thread Ian McDonald
F**k - just pasted in the wrong file. Trying again

On 3/2/06, Ian McDonald <[EMAIL PROTECTED]> wrote:
> On 3/2/06, David S. Miller <[EMAIL PROTECTED]> wrote:
> >
> > This tree was getting crufty, so I rebased it today.
> > It was actually a lot easier than I had anticipated.
> >
> > master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.17.git
> >
> Dave,
>
> If you get a chance can you push the ccid3 divide by zero fix upstream
> to Linus for 2.6.16 as it has no functionality changed and eliminates
> a nasty little bug...
>
> The commit for this is b6da19617f4ab610d3d90bcbdf65fa7e2b3d7b53 in your tree
>
I have also put at end of this e-mail after reapplying on linus tree
so above commit doesn't have fuzz...

[DCCP] ccid3: Divide by zero fix

In rare circumstances 0 is returned by dccp_li_hist_calc_i_mean which leads to
a divide by zero in ccid3_hc_rx_packet_recv. Explicitly check for zero return
now. Update copyright notice at same time.

Found by Arnaldo.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index aa68e0a..35d1d34 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -2,7 +2,7 @@
  *  net/dccp/ccids/ccid3.c
  *
  *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
- *  Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]>
+ *  Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]>
  *
  *  An implementation of the DCCP protocol
  *
@@ -1033,9 +1033,13 @@ static void ccid3_hc_rx_packet_recv(stru
p_prev = hcrx->ccid3hcrx_p;

/* Calculate loss event rate */
-   if (!list_empty(&hcrx->ccid3hcrx_li_hist))
+   if (!list_empty(&hcrx->ccid3hcrx_li_hist)) {
+   u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist);
+
/* Scaling up by 100 as fixed decimal */
-   hcrx->ccid3hcrx_p = 100 /
dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist);
+   if (i_mean != 0)
+   hcrx->ccid3hcrx_p = 100 / i_mean;
+   }

if (hcrx->ccid3hcrx_p > p_prev) {
ccid3_hc_rx_send_feedback(sk);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net-2.6.17 rebased...

2006-03-01 Thread Ian McDonald
On 3/2/06, David S. Miller <[EMAIL PROTECTED]> wrote:
>
> This tree was getting crufty, so I rebased it today.
> It was actually a lot easier than I had anticipated.
>
> master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.17.git
>
Dave,

If you get a chance can you push the ccid3 divide by zero fix upstream
to Linus for 2.6.16 as it has no functionality changed and eliminates
a nasty little bug...

The commit for this is b6da19617f4ab610d3d90bcbdf65fa7e2b3d7b53 in your tree

I have also put at end of this e-mail after reapplying on linus tree
so above commit doesn't have fuzz...

[DCCP] ccid3: Divide by zero fix

In rare circumstances 0 is returned by dccp_li_hist_calc_i_mean which leads to
a divide by zero in ccid3_hc_rx_packet_recv. Explicitly check for zero return
now. Update copyright notice at same time.

Found by Arnaldo.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>

--- 2b82f96f1291c42ee9485465801f5f51897bec64
+++ ff426a9009993445a15cfcac6c88be1e39e07913
@@ -2,7 +2,7 @@
  *  net/dccp/ccids/ccid3.c
  *
  *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
- *  Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]>
+ *  Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]>
  *
  *  An implementation of the DCCP protocol
  *
@@ -1014,9 +1014,13 @@ static void ccid3_hc_rx_packet_recv(stru
p_prev = hcrx->ccid3hcrx_p;

/* Calculate loss event rate */
-   if (!list_empty(&hcrx->ccid3hcrx_li_hist))
+   if (!list_empty(&hcrx->ccid3hcrx_li_hist)) {
+   u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist);
+
/* Scaling up by 100 as fixed decimal */
-   hcrx->ccid3hcrx_p = 100 /
dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist);
+   if (i_mean != 0)
+   hcrx->ccid3hcrx_p = 100 / i_mean;
+   }

if (hcrx->ccid3hcrx_p > p_prev) {
ccid3_hc_rx_send_feedback(sk);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 losing promisc rx_mode bit

2006-02-23 Thread Ian McDonald
On 2/24/06, Michael Chan <[EMAIL PROTECTED]> wrote:
> This is a known problem caused by ASF or IPMI firmware overwriting the
> promiscuous mode bit. I will have someone contact you to get the
> firmware upgraded.
>
> Thanks.
>
Thinking out loud here without reading source... - can you check the
version of the firmware and make noise if they have a version like
this one?

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mtu probing: move tcp-specific data out of inet_connection_sock

2006-02-16 Thread Ian McDonald
> No Ian. John was the one that moved those fields out of tcp.h and into
> inet_connection_sock.h:
>
> http://master.kernel.org/git/?p=linux/kernel/git/acme/net-2.6.17.git;a=commit;h=55bb045aa49d5e5234c6213d1ed0bfef0c636971
>
> When we get to fix the DCCP PMTU code we can revisit if this move is
> interesting.
>
OK. Will teach me to hit send without researching my facts. Sorry to
all. Carry on as normal...

Ian
--
Ian McDonald
http://wand.net.nz/~iam4
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mtu probing: move tcp-specific data out of inet_connection_sock

2006-02-16 Thread Ian McDonald
On 2/17/06, John Heffner <[EMAIL PROTECTED]> wrote:
> This moves some TCP-specific MTU probing state out of
> inet_connection_sock back to tcp_sock.
>
> Signed-off-by: John Heffner <[EMAIL PROTECTED]>
>

Why do you want to do this? What benefit does it give?

I would like to see PMTU done in DCCP and this seems a better place
and probably why Arnaldo put it there

Ian
--
Ian McDonald
http://wand.net.nz/~iam4
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KERNEL: assertion (!sk->sk_forward_alloc) failed

2006-02-09 Thread Ian McDonald
On 2/10/06, Boris B. Zhmurov <[EMAIL PROTECTED]> wrote:
> Hello, Ian McDonald.
>
> On 09.02.2006 22:25 you said the following:
>
> > Is it possible for you to download 2.6.16-rc2 or similar and see if it
> > goes away?
>
> It'll be better, if I get only patch fixs that problem, not all 2.6.16-rc2.
>
Oops I didn't read Jesse's message earlier properly.

That patch which probably fixed it is (from his message):
I think the commit id that is missing from 2.6.14.X is
fb5f5e6e0cebd574be737334671d1aa8f170d5f3

but here is the web link if i gave the wrong info
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fb5f5e6e0cebd574be737334671d1aa8f170d5f3
--
Ian McDonald
http://wand.net.nz/~iam4
WAND Network Research Group
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KERNEL: assertion (!sk->sk_forward_alloc) failed

2006-02-09 Thread Ian McDonald
On 2/10/06, Boris B. Zhmurov <[EMAIL PROTECTED]> wrote:
> Hello, Jesse Brandeburg.
>
> On 08.02.2006 23:07 you said the following:
>
> > whats the relevance of e1000?
> >
> > I though Herbert had fixed these
>
> Nope :( I had this messages on 2.6.14.2 and now I have it on 2.6.15.3.
>
For what it's worth I had these messages for a while and they got
fixed 2 or 3 weeks ago from memory in Dave's 2.6.16 net tree or net2.6
tree.

Is it possible for you to download 2.6.16-rc2 or similar and see if it
goes away?

Ian
--
Ian McDonald
http://wand.net.nz/~iam4
WAND Network Research Group
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPVS] Shrink ip_vs_*.c includes

2006-02-07 Thread Ian McDonald
> Unfortunately this seems like it is going to be more tedious than
> we first thought. I would guess writing some sort of tool to analyse
> symbols and headers is the way to go. Else it seems more or less
> impossible to clean up headers, even on a small scale.
>
Search the netdev archives or look at Arnaldo's kernel.org space as he
has done some scripts to do this once.

--
Ian McDonald
http://wand.net.nz/~iam4
WAND Network Research Group
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [e2e] FW: Performance evaluation of high speed TCPs

2006-02-02 Thread Ian McDonald
>  Seriously, where's the value in comparing buggy implementations - isn't
> that just a waste of all our time ?  If we are genuine about wanting to
> understand tcp performance then I think we just have to take the hit from
> issues such as this that are outside all of our control.
>
A real part of the problem here is that the Linux doesn't have a full
TCP testing suite and doesn't have build checking to check for
regressions in TCP variants. As I understand the only thing tested in
nightly builds is throughput for the default TCP.

Stephen Hemminger has done some work on TCP Probes but this is where I
think real progress could be made in improving Linux TCP. I may get
around to doing this myself at some point in my research but would
welcome other people doing it also!

Ian
--
Ian McDonald
http://wand.net.nz/~iam4
WAND Network Research Group
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] 1/1 net/core: use USEC_PER_SEC and line spacing

2006-02-01 Thread Ian McDonald
> Yup, that is my current understanding. Heck:
>
> 1. using hrtimers in DCCP
> 2. Jumping into VJ's net channels to implement Eddie Kohler packet
> rings DCCP API
> 3. reviewing Andrea's CCID2 code and merging it
> 4. reviewing Ian's work on using sk_write_queue
> 5. reviewing/merging Andrea's feature negotiation patches
> 6. making DCCP rock solid (Hi Sorbo 8-) )
> 7. getting ostra to be easy to use and usable with userspace code
> 8. Real Customer Work(tm)
>
> too many things on my plate right now 8)
>
Hold off on #4 as I've found that I'm only creating a queue with depth
1. Reworking at present

Ian
--
Ian McDonald
http://wand.net.nz/~iam4
WAND Network Research Group
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] 1/1 net/core: use USEC_PER_SEC and line spacing

2006-02-01 Thread Ian McDonald
On 2/1/06, Herbert Xu <[EMAIL PROTECTED]> wrote:
> Ian McDonald <[EMAIL PROTECTED]> wrote:
> >
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -162,7 +162,8 @@ static int sock_set_timeout(long *timeo_
> >if (tv.tv_sec == 0 && tv.tv_usec == 0)
> >return 0;
> >if (tv.tv_sec < (MAX_SCHEDULE_TIMEOUT/HZ - 1))
> > -   *timeo_p = tv.tv_sec*HZ + 
> > (tv.tv_usec+(100/HZ-1))/(100/HZ);
> > +   *timeo_p = tv.tv_sec*HZ +
> > +   (tv.tv_usec+(USEC_PER_SEC/HZ-1))/(USEC_PER_SEC/HZ);
>
> Is there a macro for this calculation? If not could we add one?
>
I don't know if there is or not. There is similar code in DCCP. I
think the way forward is to use hrtimers
(http://lwn.net/Articles/167897/) as there are currently problems with
NTP changing time which affects jiffies. In the meantime this patch
makes the code a little bit tidier so I think it should go in...

Ian
--
Ian McDonald
http://wand.net.nz/~iam4
WAND Network Research Group
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] 1/1 net/core: use USEC_PER_SEC and line spacing

2006-01-31 Thread Ian McDonald
This puts in a constant for USEC_PER_SEC instead of 100. Also
fixing > 80 character lines in a couple of places

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>

diff --git a/net/core/sock.c b/net/core/sock.c
index 6e00811..1d06ec9 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -162,7 +162,8 @@ static int sock_set_timeout(long *timeo_
if (tv.tv_sec == 0 && tv.tv_usec == 0)
return 0;
if (tv.tv_sec < (MAX_SCHEDULE_TIMEOUT/HZ - 1))
-   *timeo_p = tv.tv_sec*HZ + 
(tv.tv_usec+(100/HZ-1))/(100/HZ);
+   *timeo_p = tv.tv_sec*HZ +
+   (tv.tv_usec+(USEC_PER_SEC/HZ-1))/(USEC_PER_SEC/HZ);
return 0;
 }

@@ -561,7 +562,8 @@ int sock_getsockopt(struct socket *sock,
v.tm.tv_usec = 0;
} else {
v.tm.tv_sec = sk->sk_rcvtimeo / HZ;
-   v.tm.tv_usec = ((sk->sk_rcvtimeo % HZ) * 
100) / HZ;
+   v.tm.tv_usec = ((sk->sk_rcvtimeo % HZ)
+   * USEC_PER_SEC) / HZ;
}
break;

@@ -572,7 +574,8 @@ int sock_getsockopt(struct socket *sock,
v.tm.tv_usec = 0;
} else {
v.tm.tv_sec = sk->sk_sndtimeo / HZ;
-   v.tm.tv_usec = ((sk->sk_sndtimeo % HZ) * 
100) / HZ;
+   v.tm.tv_usec = ((sk->sk_sndtimeo % HZ)
+   * USEC_PER_SEC) / HZ;
}
break;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >