Re: [IPV6]: Fix IPsec datagram fragmentation

2008-02-15 Thread David Stevens
I think the field name is kind of confusing here. It appears
that local_df == 1 means allow fragmentation and
local_df == 0 means don't fragment. But DF to me
means don't fragment, as in IP_DF, so the field value
is backwards from what I'd expect. And maybe that's
what happened to in your patch, too, Herbert.

For the future, maybe we should rename that, or reverse
the sense of it (in v4 as well). :-)

+-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: broken link-local multicast?

2008-02-14 Thread David Stevens
Pekka,
I first thought the interface might be down when I saw this
one, except the interface route is present in the route list he shows
later. That's normally deleted when the interface is down.
And a failure to be in the group just wouldn't answer--
shouldn't cause net unreachable (which is a failure to send it).

Marco,
You called this a firewall -- do you get the same thing
when you have no iptables rules?

+-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AW: Problem receiving multicast/promiscuous-mode with kernel.2.6.24

2008-02-11 Thread David Stevens
Another relevant piece of information is what the socket is
bound to; netstat -au will tell you if you're bound to an address
that will match the incoming multicast packets.

If you suspect that the VLAN device is the problem, then it'd be
a good idea to try it on ordinary ethernet.

Re: imr_ifindex; using a 0 ifindex  and INADDR_ANY interface is valid,
but it usually doesn't make sense. What it means is I don't care what
interface I join the group on. But you'll only receive multicasts
delivered to the group on the interface the kernel picks for you, which
often won't be the one you really want.

+-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ipv6_chk_acast_addr remove unused loop

2008-02-08 Thread David Stevens
NACK.

Daniel,
This code is part of the in-kernel API for anycast, which is 
intended
to be the same as for multicast. By removing support for NULL there, 
you're
making a special case for the anycast code that isn't there in the 
multicast
code (can't support a NULL dev), and you really aren't buying all that 
much for it.

+-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem receiving multicast/promiscuous-mode with kernel.2.6.24

2008-02-07 Thread David Stevens
You need to join the multicast group on the interface you want
to receive it. If you're setting both imr_address and imr_index
to 0, you haven't specified an interface.

Instead of setting imr_ifindex to 0, try setting it like this:

imr.imr_ifindex = if_nametoindex(eth0);

 (or whatever interface you need).

You can verify that you've joined the group on the correct
interface with netstat -g, also. Make sure that the interface
field is correct for the group you want.

And because I can't resist mentioning it, inet_aton is deprecated--
suggest inet_pton() with appropriate arguments and return value
checking instead.

There was a bug fix where the sense of promiscuous mode was
backwards for multicasting, which is probably why it was accidently
(and incorrectly) working in earlier kernels.

If that doesn't do it for you, let me know; better if you can post the
full code (or even better, the smallest subset that demonstrates the
problem).

+-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] IPV6: RFC 2011 compatibility broken

2008-01-21 Thread David Stevens
RFC 2011 doesn't apply to IPv6, and the internal names of /proc entries
are not used by the SNMP protocol, but it is an unintentional 
incompatibility with the
previous Linux entry names, so I agree. :-)

+-DLS

Acked-by: David L Stevens [EMAIL PROTECTED]

[EMAIL PROTECTED] wrote on 01/21/2008 01:46:44 AM:

 [IPV6]: RFC 2011 compatibility broken
 
 The snmp6 entry name was changed, and it broke compatibility
 to RFC 2011.
 
 Signed-off-by: Wang Chen [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ICMP: ICMP_MIB_OUTMSGS increment duplicated

2008-01-16 Thread David Stevens
Herbert Xu [EMAIL PROTECTED] wrote on 01/16/2008 03:49:01 AM:

 Actually having the icmp_out_count call in ip_push_pending_frames seems
 inconsistent.  Having it there means that we count raw socket ICMP 
packets
 too.  But we don't do that for any other protocol, e.g., raw UDP packets
 don't get counted.

Herbert,
The patch was to support the ICMPMsgStats table. Since none of 
certain
types of output ICMP messages are generated by the kernel, but are 
required
by the RFC, counting raw sockets is intentional (and the only way those 
ICMP
types can be counted at all).
Raw UDP packets would not be counted either before or after the 
patch,
but aren't part of the ICMPMsgStats table. Adding those might be 
worthwhile,
but it isn't quite the hole that the ICMP out stats were, since there is a
cooked interface for UDP output that counts the common use, at least.

Wang,
I think your patch is correct; did you test the same case for 
IPv6?

 +-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ICMP: ICMP_MIB_OUTMSGS increment duplicated

2008-01-16 Thread David Stevens
[EMAIL PROTECTED] wrote on 01/16/2008 03:17:29 PM:

 Fair enough.  How about moving this code back into icmp.c and just
 add a new count call in raw.c? The push pending function is used on
 the UDP fast path so the leaner it is the better.

I started out with it there, but it certainly wasn't
cleaner. You have to peek on the queue, which I didn't
want to do for all the SMP issues that brings in, and
then replicate all the code push_pending_frames does to
get you a protocol header pointer in one buffer, and there
are multiple ways in the raw path to get you there (so the
counter code itself would appear in multiple places).
I don't think the 2 instructions measurably impact
anything in the fast path and that is the earliest common
point for the multiple paths to generate ICMP output where
the header and type are available without replicating code.
So, simplicity is exactly why I put it there, rather than
the obvious places at the higher layer. :-) It's probably
the better, single place to put the other protocol out messages
counters, too, since those probably also have multiple paths
that end up here, but ICMPMsgStats was all that patch was
after.


+-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()

2008-01-07 Thread David Stevens
Brian,
Looks good to me.

+-DLS


Acked-by: David L Stevens [EMAIL PROTECTED]

 How about the simple patch below?  I just removed the ENINVAL check from 

 my original patch, but it accomplishes the same thing.
...
 
 Signed-off-by: Brian Haley [EMAIL PROTECTED]
 ---
 diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
 index 2ed689a..5d4245a 100644
 --- a/net/ipv6/datagram.c
 +++ b/net/ipv6/datagram.c
 @@ -123,11 +123,11 @@ ipv4_connected:
  goto out;
   }
   sk-sk_bound_dev_if = usin-sin6_scope_id;
 - if (!sk-sk_bound_dev_if 
 - (addr_type  IPV6_ADDR_MULTICAST))
 -fl.oif = np-mcast_oif;
}
 
 +  if (!sk-sk_bound_dev_if  (addr_type  IPV6_ADDR_MULTICAST))
 + sk-sk_bound_dev_if = np-mcast_oif;
 +
/* Connect to link-local address requires an interface */
if (!sk-sk_bound_dev_if) {
   err = -EINVAL;

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 7/9][NETNS][IPV6] make mld_max_msf per namespace

2008-01-03 Thread David Stevens
Daniel Lezcano [EMAIL PROTECTED] wrote on 01/03/2008 03:00:48 AM:
...
 With this solution, we can handle different values for the namespaces 
 but these values are driven by the initial network namespace because 
 their values are lesser or equal to the one from the initial network 
 namespace.
 
 Is it acceptable ?

Daniel,
If you have the premise that there's a reason for them to be
different, then your original implementation is fine already. It
requires root privilege to change the value, so I don't mind the
ability to raise it to a higher value later.
I don't object, but I don't understand. I can't think of
any circumstances where I would want to modify it per namespace.
Making it small is not an effective restriction, since someone
*wanting* to use lots of sources can simply do them on different
sockets of the same group. The point is to catch accidental silly
use and it's protecting a global resource so differing values
just change the threshold at which you catch accidental silly
use in different namespaces.
Setting it to 0 might be a method of preventing its
use entirely in some namespaces, but it's part of the socket
interface-- disabling it isn't something you generally want to
do, either.
Are you intending to convert all variables to be
per-namespace? If not -- that is, if you will have global
sysctl variables, then I think this one should be one of
those. Actually, all of the IGMP  MLD variables are tied
naturally to the shared interfaces, so should be global,
I think.

+-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 7/9][NETNS][IPV6] make mld_max_msf per namespace

2008-01-02 Thread David Stevens
Daniel,
I'm not sure what benefit you get from making this per-namespace.
The point of it is really to prevent one (non-root, even) application from
killing machine performance with source filters (because maintaining them
is an n^2 algorithm). It's a weak constraint, but the resources it's 
protecting are
the processor and MLDv2 packet counts. If any one namespace has a
large value, all will have a problem still, and  (even without your 
patch),
lots of separate source filters can still cause a problem. What it catches
is one application creating thousands (or millions) of source filters and
killing the machine and network with MLDv2 reports as a result. Why
shouldn't that remain global?

+-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()

2007-12-19 Thread David Stevens
Vlad Yasevich [EMAIL PROTECTED] wrote on 12/19/2007 07:20:53 AM:

 But this still requires either a SO_BINDTODEVICE or sin6_scope_id.  This
 means the an application can call BINDTODEVICE(eth0), MULTICAST_IF(eth1)
 issue a connect on a UDP socket an succeed?  Seems wrong to me.
 
 Can you check section 6.7 of RFC 3542.

No, it requires one of SO_BINDTODEVICE, sin6_scope_id, or 
IPV6_MULTICAST_IF.
If you do an SO_BINDTODEVICE(eth0) and then an IPV6_MULTICAST_IF(eth1), 
the
IPV6_MULTICAST_IF will fail in setsockopt (EINVAL), because it requires a 
match
for bound sockets. I'm not sure if SO_BINDTODEVICE resets mcast_oif if you 
do
them in the reverse order, but that would be a bug in SO_BINDTODEVICE.
The precedence order as implemented already is:

SO_BINDTODEVICE is highest and always wins
sin6_scope_id next
IPV6_MULTICAST_IF

and the existing code has the rule that all link-local addresses require a
sin6_scope_id. The change (intended) is to relax the sin6_scope_id rule 
only
for link-local multicasts that have done either an SO_BINDTODEVICE or
IPV6_MULTICAST_IF already.

 +-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()

2007-12-19 Thread David Stevens
Brian Haley [EMAIL PROTECTED] wrote on 12/19/2007 07:35:46 AM:
...
  if (usin-sin6_scope_id)
  sk-sk_bound_dev_if = usin-sin6_scope_id;
  if (!sk-sk_bound_dev_if 
   (addr_type  IPV6_ADDR_MULTICAST))
  fl.oif = np-mcast_oif;
 
 This assignment will not get us past the next check...

Yeah, that's what I get for typing in off-the-cuff code. What
I was thinking was the fl.oif assignment instead was:
if (!sk-sk_bound_dev_if 
(addr_type  IPV6_ADDR_MULTICAST))
sk-sk_bound_dev_if = np-mcast_oif;

Which it is not, but maybe it could be, since this is a connect().

That patch looks better, but I'm wondering if we could just remove the
requirement that sin6_scope_id be set here if it's multicast, since it
is doing the following later in the code:

if (!fl.oif  (addr_typeIPV6_ADDR_MULTICAST))
fl.oif = np-mcast_oif;

So, really, all we need to do is get through the LINKLOCAL section
without error in the multicast case and we can remove the redundant
multicast check there. I think that'd be simpler.

I also note that sin6_scope_id appears not to be honored at all in
the non-linklocal case, which may be correct, but surprises me.

I want to look a little more at this; I know you have a customer
issue, so I'll make it quick.

+-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()

2007-12-19 Thread David Stevens
 
 We would still have to check np-mcast_oif is set in the link-local case 

 since we shouldn't be getting here with a zero.

Actually, that's one of the things I wanted to look into. I'm not
sure if there's a path through here with (even non-linklocal) multicasts
that end up without an interface set. And I'd to do some testing of
different cases to see if they behave as expected; also compare to the
similar sendmsg() cases.

 
 Don't worry about that, they can wait, and I'm leaving for 10 days 
 anyways...

Good, thanks. It has been this way for years now, and I'm out
too. :-) Enjoy,

+-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()

2007-12-18 Thread David Stevens
Brian Haley [EMAIL PROTECTED] wrote on 12/18/2007 12:57:54 PM:

 Trying to connect() to an IPv6 link-local multicast address by
 specifying the outgoing multicast interface doesn't work, you have to
 bind to a device first with an SO_BINDTODEVICE setsockopt() call.

No, you simply have to specify sin6_scope_id for link-scope
addresses, like you do in unicast cases. Your patch requires them
to match (if specified), but I don't think IPV6_MULTICAST_IF should
override or require a match for a valid sin6_scope_id (or be an error).
If I read it correctly, the existing code uses IPV6_MULTICAST_IF
if the sin6_scope_id is not set, otherwise honors the interface specified
in the connect. That seems like correct behaviour to me, and RFC 3493
doesn't address the relative precedence of the two that I see. This is
in the linklocal branch, and all unicast linklocal's require specifying
sin6_scope_id. Multicast doesn't if require a scope_id in the case where
you've done an IPV6_MULTICAST_IF, but it should still allow a different
scope_id when you have used IPV6_MULTICAST_IF.

Do you have application code that you believe is correct that
doesn't work?

+-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()

2007-12-18 Thread David Stevens
Brian,

OK, I see what you're trying to fix now.

I think the scope_id checks are not quite right-- they
should be something like this:

if (addr_typeIPV6_ADDR_LINKLOCAL) {
if (addr_len = sizeof(struct sockaddr_in6)) {
if (sk-sk_bound_dev_if  usin-sin6_scope_id 
sk-sk_bound_dev_if != usin-sin6_scope_id) {
err = -EINVAL;
goto out;
}
if (usin-sin6_scope_id)
sk-sk_bound_dev_if = usin-sin6_scope_id;
if (!sk-sk_bound_dev_if 
 (addr_type  IPV6_ADDR_MULTICAST))
fl.oif = np-mcast_oif;
}

/* connect to the link-local addres requires an interface */
if (!sk-sk_bound_dev_if) {
err = -EINVAL;
goto out;
}
}

That is (in English):

If I did an SO_BINDTODEVICE and specified sin6_scope_id,
then they better agree.
If I specified sin6_scope_id without SO_BINDTODEVICE, set
the device to that.
If I get this far without a device and it's multicast, use 
mcast_oif
If I get all through that and don't have a device, EINVAL.

Does that work for you?

+-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


all is not right with the world...

2007-12-09 Thread David Stevens
I don't know if you've fixed it already, but I was writing some
UDPv6 tests in 2.6.23.9 and was getting EDESTADDRREQ unexpectedly. When
I looked at the kernel code, I found:

gerrit/wscott 1.115   | int udpv6_sendmsg(struct kiocb *iocb, 
struct sock *sk,
shemminger1.57|struct 
msghdr *msg, size_t len)
torvalds  1.1 | {
...

yoshfuji  1.53|  } else if (!up-pending) 
{
yoshfuji  1.53|  if 
(sk-sk_state != TCP_ESTABLISHED)
yoshfuji  1.53|  return -EDESTADDRREQ;
yoshfuji  1.53|  daddr = 
np-daddr;

I wouldn't expect a UDP socket to be setting
sk_state to TCP_ESTABLISHED...

+-DLS

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: all is not right with the world...

2007-12-09 Thread David Stevens
Never mind...

Though I hate the TCP in the name (maybe overload UDP_CONNECTED?), I see 
datagram
sockets are in fact setting it, and my EDESTADDRREQ was a bug in my test 
program. :-)

+-DLS

[EMAIL PROTECTED] wrote on 12/09/2007 04:19:12 PM:

 I don't know if you've fixed it already, but I was writing some
 UDPv6 tests in 2.6.23.9 and was getting EDESTADDRREQ unexpectedly. When
 I looked at the kernel code, I found:
 
 gerrit/wscott 1.115   | int udpv6_sendmsg(struct kiocb *iocb, 
 struct sock *sk,
 shemminger1.57| struct 
 msghdr *msg, size_t len)
 torvalds  1.1 | {
 ...
 
 yoshfuji  1.53|  } else if 
(!up-pending) 
 {
 yoshfuji  1.53|  if 
 (sk-sk_state != TCP_ESTABLISHED)
 yoshfuji  1.53|  return -EDESTADDRREQ;
 yoshfuji  1.53|  daddr = 

 np-daddr;
 
 I wouldn't expect a UDP socket to be setting
 sk_state to TCP_ESTABLISHED...
 
 +-DLS
 
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6.25 1/4] include - Convert IP4 address class macros to inline functions

2007-11-14 Thread David Stevens
Maybe I'm more used to hex, but I personally don't think those are
more readable. It replaces 1-line macroes with 4-line functions, and
I think more vgrepping to pick out the relevant constant data.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/05] ipv6: RFC4214 Support

2007-11-07 Thread David Stevens
Pekka,
Thanks! That answers the question I had (i.e., you believe the 
legal
issues are resolved).

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/05] ipv6: RFC4214 Support

2007-11-06 Thread David Stevens
Last I heard, there are Intellectual Property claims with ISATAP,
which is why the RFC is not standards track and which makes it
effectively a proprietary protocol.

Unless that's been resolved, I think the claim by the IP owner is
that it can't be distributed without a license from them. So, maybe
not worth the effort for an experimental RFC.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/05] ipv6: RFC4214 Support

2007-11-06 Thread David Stevens
 I guess license is no longer required for implementers of ISATAP.
 Is it right, Fred?
 
 https://datatracker.ietf.org/ipr/550/

Does this also allow license-free redistribution?

I'm certainly no lawyer, but I don't see the point of
having a patent that doesn't restrict *something*. :-)

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/05] ipv6: RFC4214 Support

2007-11-06 Thread David Stevens
 give it away on this specific instance.  I'm not sure if you should 
 attribute to hidden agendas what you can explain by doing the right 
 thing (granted, very few companies do this which may make it suspect, 
 but still..).

Pekka,
I'm not assuming hidden agendas here; I simply don't know what
they mean by no license for implementers.  It doesn't say they
relinquish *all* licensing, which would be clearer if that's what they
mean. If implementers, distributors, and users are included, then
who's left that does need licensing? If that answer really is nobody,
then why bother with for implementers.?
So, I don't think it's a hidden agenda, I think they said what
they mean. I just don't know what they mean. :-)

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Configuring the same IP on multiple addresses

2007-10-29 Thread David Stevens
[EMAIL PROTECTED] wrote on 10/29/2007 11:03:37 AM:

 So, I am looking for technical reasons why this is permitted.

Vlad,
Is there a technical reason to disallow it? Rather than
anticipate all the possible uses for a machine, it's, of course,
generally better to restrict only the things you know can't
work, and allow everything else.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Configuring the same IP on multiple addresses

2007-10-29 Thread David Stevens
 For v6, there are plenty of operational reasons to not allow this.  You 
really
 turn unicast into anycast when you do this and there are special rules 
to
 be followed.

I don't see it that way. The only problem I can think of offhand
is that you can't use a multi-interface address to identify an interface
(for example, for multicasting) and get predictable results (it'll pick
the first one it finds with that address, in no particular order). But
you can still use interface indexes, which are unique.
Anycast is used for multiple distinct hosts, which isn't an issue
on the same host. It's already true, as you pointed out, that you can
receive a packet for any local address on any interface, so allowing
multiple instances means you still match it as local. Which interface
you match it on usually isn't relevant, and when it is are exactly the
cases where using duplicates might be appropriate.
I can see where it might be useful if you have policy
restrictions on some interfaces and want the particular address
to be both in and out of a set. But, I agree, it's generally more
trouble (for an administrator), but then administrators don't have
to assign the same address to multiple interfaces. 

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: [PATCH] [IPv4] SNMP: Refer correct memory location to display ICMP out-going statistics

2007-10-29 Thread David Stevens
Dave,
I didn't see a response for this one... in case it fell through 
the
cracks. Just want to make sure my bone-headed error doesn't hang
around too long. :-)

+-DLS

- Forwarded by David Stevens/Beaverton/IBM on 10/29/2007 01:51 PM 
-

Mitsuru Chinen [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
10/25/2007 06:59 PM

To
netdev@vger.kernel.org
cc
David Stevens/Beaverton/[EMAIL PROTECTED]
Subject
[PATCH] [IPv4] SNMP: Refer correct memory location to display ICMP 
out-going statistics






While displaying ICMP out-going statistics as Outname counters in
/proc/net/snmp, the memory location for ICMP in-coming statistics
was referred by mistake.

Acked-by: David L Stevens [EMAIL PROTECTED] 
Signed-off-by: Mitsuru Chinen [EMAIL PROTECTED]
---
 net/ipv4/proc.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index fd16cb8..b7f7f8a 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -312,7 +312,7 @@ static void icmp_put(struct seq_file *seq)
 for (i=0; icmpmibmap[i].name != NULL; i++)
 seq_printf(seq,  %lu,
 snmp_fold_field((void **) 
icmpmsg_statistics,
- icmpmibmap[i].index));
+ icmpmibmap[i].index | 0x100));
 }
 
 /*
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] UDP: Make use of inet_iif() when doing socket lookups.

2007-10-25 Thread David Stevens
I don't see any problem with this. If you're using IP_MULTICAST_IF with
a bound socket, it forces you to use that interface, anyway.

I'm not sure multicasting works as expected in some other cases with
bound sockets (looks like there may be some holes to evade the
binding), but that's in old code. I don't see any problems from this
patch.

+-DLS

Acked-by: David L Stevens [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Mark snmp4_icmp_list[] as being unused

2007-10-25 Thread David Stevens
What about just removing it, or do you think it's
useful for documentation of the order? Either way,

Acked-by: David L Stevens [EMAIL PROTECTED]

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Mark snmp4_icmp_list[] as being unused

2007-10-25 Thread David Stevens
[EMAIL PROTECTED] wrote on 10/25/2007 04:05:13 PM:

 David Stevens [EMAIL PROTECTED] wrote:
 
  What about just removing it, or do you think it's
  useful for documentation of the order? Either way,
 
 I don't know whether it's intended to be used for something, perhaps a
 debugging macro.  Perhaps it should be #if'd out instead.

It was used for /proc/net/snmp header printing, but
most of the items in it were moved to a different MIB (by me).
The new values are printed in a backward-compatible way, but
maintaining the old order means not using this header map.
So, long-winded way of saying I believe it should be
removed, but I missed that when I made it obsolete. :-)

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] unexport icmpmsg_statistics

2007-10-24 Thread David Stevens
[EMAIL PROTECTED] wrote on 10/24/2007 09:24:10 AM:

 This patch removes the unused EXPORT_SYMBOL(icmpmsg_statistics).
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
 
 ---
 4ce74657ac0b1bdcb4c7bc359d05643f8cc4a08b 
 diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
 index 272c69e..233de06 100644
 --- a/net/ipv4/icmp.c
 +++ b/net/ipv4/icmp.c
 @@ -1104,5 +1104,4 @@ void __init icmp_init(struct net_proto_family 
*ops)
  EXPORT_SYMBOL(icmp_err_convert);
  EXPORT_SYMBOL(icmp_send);
  EXPORT_SYMBOL(icmp_statistics);
 -EXPORT_SYMBOL(icmpmsg_statistics);
  EXPORT_SYMBOL(xrlim_allow);

icmpmsg_statistics belongs with (and replaces some of the 
old...)
icmp_statistics. I'm not sure that any modules use it, but I think you
should remove both or neither.

+-DLS


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] unexport icmpmsg_statistics

2007-10-24 Thread David Stevens
[EMAIL PROTECTED] wrote on 10/24/2007 12:14:37 PM:

 On Wed, Oct 24, 2007 at 12:07:45PM -0700, David Stevens wrote:
  [EMAIL PROTECTED] wrote on 10/24/2007 09:24:10 AM:
  
   This patch removes the unused EXPORT_SYMBOL(icmpmsg_statistics).
   
   Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
   
   ---
   4ce74657ac0b1bdcb4c7bc359d05643f8cc4a08b 
   diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
   index 272c69e..233de06 100644
   --- a/net/ipv4/icmp.c
   +++ b/net/ipv4/icmp.c
   @@ -1104,5 +1104,4 @@ void __init icmp_init(struct net_proto_family 
  *ops)
EXPORT_SYMBOL(icmp_err_convert);
EXPORT_SYMBOL(icmp_send);
EXPORT_SYMBOL(icmp_statistics);
   -EXPORT_SYMBOL(icmpmsg_statistics);
EXPORT_SYMBOL(xrlim_allow);
  
  icmpmsg_statistics belongs with (and replaces some of the 
  old...)
  icmp_statistics. I'm not sure that any modules use it, but I think 
you
  should remove both or neither.
 
 icmp_statistics is used by the dccp_ipv4 and sctp modules.

The only items left in icmp_statistics are InMsgs, InErrs, 
OutMsgs, OutErrs,
so if dccp and sctp are sending or receiving any in or out ICMP messages, 
they
should be using the new macros (which reference icmpmsg_statistics, not
icmp_statistics) to count them.
I took a quick look at SCTP. I don't know if that's going through 
icmp_rcv()
or not; if so, I think it's double-counting; if not, then it isn't 
counting the
individual types (as it should), and it should have ICMPMSG macros doing 
that.

So, again, icmpmsg_statistics either should stay exported, or 
neither
icmpmsg_statistics nor icmp_statistics should be exported (depending on 
how
SCTP and DCCP code is resolved). It's incorrect in the current code to
incrememnt ICMP_MIB_INMSGS without incrementing one of the types too.

 +-DLS


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] unexport icmpmsg_statistics

2007-10-24 Thread David Stevens
I took a look at the DCCP references, and I think they're just
incrementing the wrong MIB variable -- e.g., it's incrementing
ICMP_MIB_INERRORS when the skb length is less than the
header indicates. That's not an ICMP_MIB_INERRORS error,
that's an IPSTATS_MIB_INHDRERRORS error. ICMP_MIB_INERRORS
is when you receive an ICMP error packet; an IP header error
is something else entirely.

That's followed by a failed lookup incrementing ICMP_MIB_INERRORS
which should be an unknown port error in the transport MIB (assuming
it has one-- it's not an ICMP error; could be an IP error, if the address
isn't local, rather than unknown port).

In SCTP, it appears to have similar problems. SCTP errors are not
ICMP errors, though it perhaps should be calling icmp_send() to
send one to the offending host for some of the cases.

I haven't seen any ICMP-relevant stats correctly referenced in
these yet.

I don't want to patch them directly, since I can't easily test them;
if someone who works with DCCP and SCTP would like to, I'd
be happy to review. Any volunteers?

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] unexport icmpmsg_statistics

2007-10-24 Thread David Stevens
My bad -- I see what it's doing, and it looks ok after all.

I thought I saw an INMSGS (but didn't).  These are ICMP errors that
went through icmp_rcv() and were counted correctly before getting
to the protocol error handlers. These are failures due mostly to not
having enough, or the right protocol info in the error packet being
handled. I'm not sure I'd count those as ICMP errors, since the
ICMP header itself is correct, but ok...

SCTP doesn't look so bad, though I think the references are
still questionable (but debatable) as ICMP errors.

sctp_v4_err is incrementing ICMP_MIB_INERRORS if there
isn't enough IP header to find the ports, I see. I'm not sure
that counts as an ICMP error, but it's not so terrible.

It's doing the same thing if a lookup fails to match vtag from
the encapsulated error packet. Again, I don't know that those
are ICMP errors (which normally are something wrong with
the ICMP header).

So, I stand corrected, and sorry about the histrionics. Since
these are arguably ICMP errors, and since errors is the only
thing being MIB-counted in DCCP and SCTP, then it now looks
ok to me as-is, and also ok to remove icmpmsg_statistics from
exporting.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: multicast: bug or feature

2007-10-19 Thread David Stevens
I don't know about a new knob, but it's the same notion
as rp_filter, so why not use rpf for RTN_LOCAL types?
Ie, allow RTN_LOCAL and RTN_UNICAST at the top, but
check rpf if the devs aren't equal or RTN_LOCAL

It seems like not a good thing to rely on in the first place, though;
usually receiving things from yourself is a bad thing. :-)

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: multicast: bug or feature

2007-10-19 Thread David Stevens
From looking at the code, it appears that validate
source is failing just because of the rp_filter. Do you have
rp_filter set to nonzero?
If so, it may do what you want just by setting that
to 0:

sysctl -w net.ipv4.conf.all.rp_filter=0

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: multicast: bug or feature

2007-10-19 Thread David Stevens
I don't know why you'd want it to be different for multicasting. If you
want to hear your own multicasts, you should use MULTICAST_LOOP;
hearing them off the wire indicates all the same bad things -- a forger,
a duplicate address or a routing loop. Those aren't any better for
multicasting than they are for unicasting, that I can see.

Why would you want that to be delivered, other than (apparently)
to test the multicast capability of a pair of interfaces while using
only 1 machine?

+-DLS


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: multicast: bug or feature

2007-10-19 Thread David Stevens
[EMAIL PROTECTED] wrote on 10/19/2007 04:43:27 AM:

 Vlad Yasevich [EMAIL PROTECTED] wrote:
 
  Now, to figure out what IPv6 does different and why it works.
  Seems to me that the two should have the same behavior.
 
 IPv6 on Linux uses a per-interface addressing model as opposed
 to the per-host model used by IPv4.

For link-local addresses, yes.

It's really a security feature; the ordinary
case where you'd receive something on an interface that's
using one of your source addresses is when someone is spoofing
you, has a duplicate address, or maybe an (unintentional)
routing loop. All of those are error cases, so dropping a
received packet that claims to be sent by you is a reasonable
thing to do.
If you're getting link-local source addresses for your
IPv6 multicast packets, that may explain it. The link-local
addresses are required to be unique and valid only for that
link, so IPv6 should not consider a different interface's
link-local address as local for a destination address, or
a packet using that source address as bogus.
For a global address, v4 and v6 use the same rules--
for a destination you can receive it on any interface for
any global address. So, if your source address was a global
IPv6 address and it worked, I'd guess IPv6 just isn't checking
the source address. I don't know that it's required by RFC for
either v4 or v6, though it's probably a good idea.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix memory leak in cleanup_ipv6_mibs()

2007-10-17 Thread David Stevens
Acked-by: David L Stevens [EMAIL PROTECTED]
 
 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
 
 ---
 
 diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
 index bc92938..1b1caf3 100644
 --- a/net/ipv6/af_inet6.c
 +++ b/net/ipv6/af_inet6.c
 @@ -747,6 +747,7 @@ static void cleanup_ipv6_mibs(void)
  {
 snmp_mib_free((void **)ipv6_statistics);
 snmp_mib_free((void **)icmpv6_statistics);
 +   snmp_mib_free((void **)icmpv6msg_statistics);
 snmp_mib_free((void **)udp_stats_in6);
 snmp_mib_free((void **)udplite_stats_in6);
  }
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: multicast: bug or feature

2007-10-17 Thread David Stevens
I'm not clear on your configuration.

Are the sender and receiver running on the same machine? Are
you saying eth0 and eth1 are connected on the same link?

+-DLS


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: multicast: bug or feature

2007-10-17 Thread David Stevens
Can you send the contents of /proc/net/igmp and the packet trace,
also? And the code?

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: multicast: bug or feature

2007-10-17 Thread David Stevens
You're joining the group on interface eth1, which is the
sender, right? You need to be a member on eth0 to receive it
there. I think your program needs another argument, to
specify the receiving interface, which you want to be
different from the sending interface.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multicast problem

2007-10-16 Thread David Stevens
If you have icmp_echo_ignore_broadcasts set to 1, it won't respond to
multicasts. That should be all you need to do for 224.0.0.1.

Some smart switches rely on IGMP snooping to determine group membership,
and some of those don't understand IGMPv3, so if you have no v1 or v2 
queriers
on the network, you can force the IGMP version to be 2 instead of the 
default 3
by using a sysctl. But that's not the issue for 224.0.0.1, since the 
all-hosts group
is not reported via IGMP.

You might check the contents of /proc/net/igmp and /proc/net/dev_mcast to 
make
sure they have entries for the group. If so, I'd guess it might be a 
driver or device issue.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Multicast problem

2007-10-16 Thread David Stevens
 
 hmm maybe I misunderstand here, but the we are trying to make OSPF work
 in 2.6 and the problem appears to be that our board does not pick up
 multicasts.
 Is icmp_echo_ignore_broadcasts=0 required to make OSPF work in 2.6?
 
  Jocke
 

No, it's required to make ping to a multicast address
work, which is what your mail said was the problem. :-) ping
uses ICMP, OSPF does not.
If you receive ping responses with ignore_broadcasts=0
and not in promiscuous mode, then the problem may be a smart
switch that doesn't understand IGMPv3. In that case, you
can try:

sysctl -w net.ipv4.conf.all.force_igmp_version=2

You can also, of course, check /proc/net/igmp and
/proc/net/dev_mcast to make sure you've joined the OSPF
multicast address on the interface(s) you care about.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Multicast problem

2007-10-16 Thread David Stevens
 dev_mcast and igmp looks:
 [EMAIL PROTECTED]:/proc/net# m dev_mcast
 1eth01 0 01005e01
 2eth11 0 01005e01

These are the hardware multicast addresses for
224.0.0.1 (so, correct).

 [EMAIL PROTECTED]:/proc/net# m igmp
 Idx Device: Count Querier   GroupUsers TimerReporter
 1   eth0  : 1  V3
 E001 1 0: 0
 2   eth1  : 1  V2
 E001 1 0: 0
 4   lo: 0  V3
 E001 1 0: 0

...and these are the group memberships for 224.0.0.1, so
also correct. This should receive and answer pings to 224.0.0.1
as long as you don't have ICMP set to ignore broadcasts. You should
not need to put the device in promiscuous mode for a ping of
224.0.0.1 to work, so if you do, then you have a driver, device
or switch problem.
None of these interfaces have memberships in the OSPF groups,
so if you had OSPF running when you did this, it didn't join.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPv6] Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2

2007-10-11 Thread David Stevens
Acked-by: David L Stevens [EMAIL PROTECTED]

 
 Signed-off-by: Brian Haley [EMAIL PROTECTED]
 diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
 index 532425d..1334fc1 100644
 --- a/net/ipv6/ipv6_sockglue.c
 +++ b/net/ipv6/ipv6_sockglue.c
 @@ -539,12 +539,15 @@ done:
 case IPV6_MULTICAST_IF:
if (sk-sk_type == SOCK_STREAM)
   goto e_inval;
 -  if (sk-sk_bound_dev_if  sk-sk_bound_dev_if != val)
 - goto e_inval;
 
 -  if (__dev_get_by_index(init_net, val) == NULL) {
 - retv = -ENODEV;
 - break;
 +  if (val) {
 + if (sk-sk_bound_dev_if  sk-sk_bound_dev_if != val)
 +goto e_inval;
 +
 + if (__dev_get_by_index(init_net, val) == NULL) {
 +retv = -ENODEV;
 +break;
 + }
}
np-mcast_oif = val;
retv = 0;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bytes received from recvmsg doesn't match FIONREAD

2007-10-11 Thread David Stevens
Questions of this sort should normally be directed to linux-net mailing 
list.

From the code you quoted, I see at least one case where it
will fail -- when the allocated buffer you pass to recvmsg is smaller than
value (ie. the datagram is too big for the read buffer).

If that's not the problem, I suggest you reproduce it with a small test
case and post real code and results when it happens.

+-DLS 


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPv6] Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493

2007-10-10 Thread David Stevens
What about just checking for 0 in the later test?

if (val  __dev_get_by_index(val) == NULL) {
...


+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPv6] Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493

2007-10-10 Thread David Stevens
Brian Haley [EMAIL PROTECTED] wrote on 10/10/2007 02:20:45 PM:

 David Stevens wrote:
  What about just checking for 0 in the later test?
  
  if (val  __dev_get_by_index(val) == NULL) {
 
 We could fail the next check right before that though:

Right, the semantics there would be if we have a bound
dev if, that's the only legal value here. Setting it to '0' in
that case doesn't really do anythng, anyway. But I don't care
about that semantic difference-- could even add val  to the
bound_dev_if check.
What I don't like is that your if creates an identical
duplicate code path for the functional part of it. In this case
it's trivial (the asignment), but makes the code look more
complex than it really is. If v4 does it that way, I don't
like that either. :-)
I agree with it in general, and may not be worth the
trouble, but I'd personally prefer something like:

if (sk-sk_type == SOCK_STREAM)
goto e_inval;
if (val  sk-sk_bound_dev_if  sk-sk_bound_dev_if != val)
goto e_inval;

if (val  __dev_get_by_index(val) != NULL) {
retv = -ENODEV;
break;
}
[at this point all validity checks are done and we're following
one code path to do the work; each check is easily
identifiable.]

np-mcast_oif = val;
retv = 0;
break;

Or maybe:

if (sk-sk_type == SOCK_STREAM)
goto e_inval;

if (val) {
if (sk-sk_bound_dev_if  sk-sk_bound_dev_if != val)
goto e_inval;
if (__dev_get_by_index(val != NULL) {
retv = -ENODEV;
break;
}
}
np-mcast_oif = val;
retv = 0;
break;

But anyway, I made the comment; I think some form of it
should go in. :-) If you like the original better, that's
ok with me, too.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][NETNS] Make ifindex generation per-namespace

2007-10-09 Thread David Stevens
Sorry if this is a dumb question, but what is the model you intend for
SNMP? Do you want each namespace to be its own virtual machine with
its own, separate MIB?

Ifindex's have to uniquely identify the interface (virtual or otherwise) 
to remote
queriers (not just local applications), so unless you pay the price of 
separating
all the SNMP MIBs per namespace too, it seems you'll need some way to
remap these for SNMP queries, right?

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fallback to ipv4 if we try to add join IPv4 multicast group via ipv4-mapped address.

2007-10-02 Thread David Stevens
Dmitry,
Good catch; a couple comments:

 struct ipv6_pinfo *np = inet6_sk(sk);
 int err;
 +   int addr_type = ipv6_addr_type(addr);
 +
 +   if (addr_type == IPV6_ADDR_MAPPED) {
 +  __be32 v4addr = addr-s6_addr32[3];
 +  struct ip_mreqn mreq;
 +  mreq.imr_multiaddr.s_addr = v4addr;
 +  mreq.imr_address.s_addr = INADDR_ANY;
 +  mreq.imr_ifindex = ifindex;
 +
 +  return ip_mc_join_group(sk, mreq);
 +   }

ipv6_addr_type() returns a bitmask, so you should use:

if (addr_type  IPV6_ADDR_MAPPED) {

Also, you should have a blank line after the mreq declaration.

Ditto for both in ipv6_mc_sock_drop().

I don't expect the multicast source filtering interface will
behave well for mapped addresses, either. The mapped multicast
address won't appear to be a multicast address (and return
error there), and all the source filters would have to be
v4mapped addresses and modify the v4 source filters for this
to do as you expect. So, there's more to it (and it may be a
bit messy) to support mapped multicast addresses fully. I'll
think about that part some more.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPv6] Fix ICMPv6 redirect handling with target multicast address

2007-10-02 Thread David Stevens
Brian,
ipv6_addr_type() returns a mask, so checking for equality will 
fail to
match if  any other (irrelevant) attributes are set. How about using 
bitwise
operators for that? Also, the error message is no longer descriptive of 
the
failure if it's a link-local multicast, but you could make it target 
address is not
link-local unicast.\n (in both places).

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPv6] Fix ICMPv6 redirect handling with target multicast address

2007-10-02 Thread David Stevens
Brian,
I don't think a few instructions is a performance issue in the 
redirect
paths (it'd be pretty broken if you're getting or generating lots of 
them), but I
know there are lots of other checks similar to that that will break with 
new
attributes, so doing that as a general clean-up separately is ok with me, 
too.

With the error message changes, you can add:

Acked-by: David L Stevens [EMAIL PROTECTED]

FWIW. :-)

+-DLS


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPV6] Fix ICMPv6 redirect handling with target multicast address

2007-09-28 Thread David Stevens
Brian,
A multicast address should never be the target of a neighbor
discovery request; the sender should use the mapping function for all
multicasts. So, I'm not sure that your example can ever happen, and it
certainly is ok to send ICMPv6 errors to multicast addresses in general.
But I don't see that it hurts anything. either (since it should never 
happen :-)),
so I don't particularly object, either.
I think it'd also be better if you add the check to be:

if (ipv6_addr_type(target)  
(IPV6_ADDR_LINKLOCAL|IPV6_ADDR_UNICAST))

or something along those lines, rather than reproducing ipv6_addr_type() 
code
separately in a new ipv6_addr_linklocal() function.

+-DLS


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] 2.6.22.6 networking [ipv4]: fix wrong destination when reply packetes

2007-09-20 Thread David Stevens
I'm not sure why it's using rt_src here, but there are relevant cases that
your description doesn't cover. For example, what happens if  the source
is not set in the original packet?  Does NAT affect this?

You quote RFC text for ICMP echo and the case where the receiving machine
is the final destination, but you're modifying code that is used for all 
ICMP
types and used for ICMP errors generated when acting as an intermediate
router.

In ordinary cases, and certainly with ICMP echo when the source is set in
the original packet and no rewriting is going on (and the address is not 
spoofed),
using the original source as the destination is fine. But have you tested 
or
considered the other cases?

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][1/2] Add ICMPMsgStats MIB (RFC 4293) [RESEND]

2007-09-16 Thread David Stevens
Dave,
Thanks. That rev2 was for v6-only; I didn't see anythng about the
v4 patch (below, in case it fell through the cracks).

+-DLS


- Forwarded by David Stevens/Beaverton/IBM on 09/16/2007 09:02 PM 
-

David Stevens/Beaverton/[EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
09/10/2007 07:25 PM

To
[EMAIL PROTECTED], [EMAIL PROTECTED]
cc
netdev@vger.kernel.org
Subject
[PATCH][1/2] Add ICMPMsgStats MIB (RFC 4293)






Background: RFC 4293 deprecates existing individual, named ICMP
type counters to be replaced with the ICMPMsgStatsTable. This table
includes entries for both IPv4 and IPv6, and requires counting of all
ICMP types, whether or not the machine implements the type.

These patches remove (but not really) the existing counters, and
replace them with the ICMPMsgStats tables for v4 and v6.
It includes the named counters in the /proc places they were, but gets the
values for them from the new tables. It also counts packets generated
from raw socket output (e.g., OutEchoes, MLD queries, RA's from
radvd, etc).

Changes:
1) create icmpmsg_statistics mib
2) create icmpv6msg_statistics mib
3) modify existing counters to use these
4) modify /proc/net/snmp to add IcmpMsg with all ICMP types
listed by number for easy SNMP parsing
5) modify /proc/net/snmp printing for Icmp to get the named data
from new counters.

IPv4 patch attached, IPv6 patch to follow.

+-DLS

Signed-off-by: David L Stevens [EMAIL PROTECTED]

diff -ruNp linux-2.6.22.5/include/linux/snmp.h 
linux-2.6.22.5_ICMPMSG/include/linux/snmp.h
--- linux-2.6.22.5/include/linux/snmp.h 2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPMSG/include/linux/snmp.h 2007-08-23 
15:32:29.0 -0700
@@ -82,6 +82,8 @@ enum
__ICMP_MIB_MAX
 };
 
+#define __ICMPMSG_MIB_MAX 512  /* Out+In for all 8-bit ICMP types */
+
 /* icmp6 mib definitions */
 /*
  * RFC 2466:  ICMPv6-MIB
diff -ruNp linux-2.6.22.5/include/net/icmp.h 
linux-2.6.22.5_ICMPMSG/include/net/icmp.h
--- linux-2.6.22.5/include/net/icmp.h   2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPMSG/include/net/icmp.h   2007-08-23 
15:56:45.0 -0700
@@ -30,9 +30,16 @@ struct icmp_err {
 
 extern struct icmp_err icmp_err_convert[];
 DECLARE_SNMP_STAT(struct icmp_mib, icmp_statistics);
+DECLARE_SNMP_STAT(struct icmpmsg_mib, icmpmsg_statistics);
 #define ICMP_INC_STATS(field)  SNMP_INC_STATS(icmp_statistics, 
field)
 #define ICMP_INC_STATS_BH(field)   SNMP_INC_STATS_BH(icmp_statistics, 

field)
 #define ICMP_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmp_statistics, 
field)
+#define ICMPMSGOUT_INC_STATS(field)SNMP_INC_STATS(icmpmsg_statistics, 

field+256)
+#define ICMPMSGOUT_INC_STATS_BH(field) 
SNMP_INC_STATS_BH(icmpmsg_statistics, field+256)
+#define ICMPMSGOUT_INC_STATS_USER(field) 
SNMP_INC_STATS_USER(icmpmsg_statistics, field+256)
+#define ICMPMSGIN_INC_STATS(field) SNMP_INC_STATS(icmpmsg_statistics, 

field)
+#define ICMPMSGIN_INC_STATS_BH(field) 
SNMP_INC_STATS_BH(icmpmsg_statistics, field)
+#define ICMPMSGIN_INC_STATS_USER(field) 
SNMP_INC_STATS_USER(icmpmsg_statistics, field)
 
 struct dst_entry;
 struct net_proto_family;
@@ -42,6 +49,7 @@ extern void   icmp_send(struct sk_buff *sk
 extern int icmp_rcv(struct sk_buff *skb);
 extern int icmp_ioctl(struct sock *sk, int cmd, unsigned long arg);
 extern voidicmp_init(struct net_proto_family *ops);
+extern voidicmp_out_count(unsigned char type);
 
 /* Move into dst.h ? */
 extern int xrlim_allow(struct dst_entry *dst, int timeout);
diff -ruNp linux-2.6.22.5/include/net/snmp.h 
linux-2.6.22.5_ICMPMSG/include/net/snmp.h
--- linux-2.6.22.5/include/net/snmp.h   2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPMSG/include/net/snmp.h   2007-08-23 
14:42:50.0 -0700
@@ -82,6 +82,11 @@ struct icmp_mib {
unsigned long   mibs[ICMP_MIB_MAX];
 } __SNMP_MIB_ALIGN__;
 
+#define ICMPMSG_MIB_MAX__ICMPMSG_MIB_MAX
+struct icmpmsg_mib {
+   unsigned long   mibs[ICMPMSG_MIB_MAX];
+} __SNMP_MIB_ALIGN__;
+
 /* ICMP6 (IPv6-ICMP) */
 #define ICMP6_MIB_MAX  __ICMP6_MIB_MAX
 struct icmpv6_mib {
diff -ruNp linux-2.6.22.5/net/ipv4/af_inet.c 
linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c
--- linux-2.6.22.5/net/ipv4/af_inet.c   2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c   2007-08-23 
14:47:26.0 -0700
@@ -1296,6 +1296,10 @@ static int __init init_ipv4_mibs(void)
  sizeof(struct icmp_mib),
  __alignof__(struct icmp_mib))  0)
goto err_icmp_mib;
+   if (snmp_mib_init((void **)icmpmsg_statistics,
+ sizeof(struct icmpmsg_mib),
+ __alignof__(struct icmpmsg_mib))  0)
+   goto err_icmpmsg_mib;
if (snmp_mib_init((void **)tcp_statistics,
  sizeof(struct

Fw: [PATCH] Add ICMPMsgStats MIB (RFC 4293) [rev 2]

2007-09-14 Thread David Stevens
Background: RFC 4293 deprecates existing individual, named ICMP
type counters to be replaced with the ICMPMsgStatsTable. This table
includes entries for both IPv4 and IPv6, and requires counting of all
ICMP types, whether or not the machine implements the type.

These patches remove (but not really) the existing counters, and
replace them with the ICMPMsgStats tables for v4 and v6.
It includes the named counters in the /proc places they were, but gets the
values for them from the new tables. It also counts packets generated
from raw socket output (e.g., OutEchoes, MLD queries, RA's from
radvd, etc).

Changes:
1) create icmpmsg_statistics mib
2) create icmpv6msg_statistics mib
3) modify existing counters to use these
4) modify /proc/net/snmp to add IcmpMsg with all ICMP types
listed by number for easy SNMP parsing
5) modify /proc/net/snmp printing for Icmp to get the named data
from new counters.
[new to 2nd revision]
6) support per-interface ICMP stats
7) use common macro for per-device stat macros

IPv6 patch attached.

+-DLS

Signed-off-by: David L Stevens [EMAIL PROTECTED]

diff -ruNp linux-2.6.22.5/include/linux/snmp.h 
linux-2.6.22.5_ICMPv6MSG2/include/linux/snmp.h
--- linux-2.6.22.5/include/linux/snmp.h 2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPv6MSG2/include/linux/snmp.h  2007-09-10 
15:02:43.0 -0700
@@ -91,35 +91,12 @@ enum
ICMP6_MIB_NUM = 0,
ICMP6_MIB_INMSGS,   /* InMsgs */
ICMP6_MIB_INERRORS, /* InErrors */
-   ICMP6_MIB_INDESTUNREACHS,   /* InDestUnreachs */
-   ICMP6_MIB_INPKTTOOBIGS, /* InPktTooBigs */
-   ICMP6_MIB_INTIMEEXCDS,  /* InTimeExcds */
-   ICMP6_MIB_INPARMPROBLEMS,   /* InParmProblems */
-   ICMP6_MIB_INECHOS,  /* InEchos */
-   ICMP6_MIB_INECHOREPLIES,/* InEchoReplies */
-   ICMP6_MIB_INGROUPMEMBQUERIES,   /* InGroupMembQueries */
-   ICMP6_MIB_INGROUPMEMBRESPONSES, /* InGroupMembResponses */
-   ICMP6_MIB_INGROUPMEMBREDUCTIONS,/* InGroupMembReductions 
*/
-   ICMP6_MIB_INROUTERSOLICITS, /* InRouterSolicits */
-   ICMP6_MIB_INROUTERADVERTISEMENTS,   /* InRouterAdvertisements 
*/
-   ICMP6_MIB_INNEIGHBORSOLICITS,   /* InNeighborSolicits */
-   ICMP6_MIB_INNEIGHBORADVERTISEMENTS, /* 
InNeighborAdvertisements */
-   ICMP6_MIB_INREDIRECTS,  /* InRedirects */
ICMP6_MIB_OUTMSGS,  /* OutMsgs */
-   ICMP6_MIB_OUTDESTUNREACHS,  /* OutDestUnreachs */
-   ICMP6_MIB_OUTPKTTOOBIGS,/* OutPktTooBigs */
-   ICMP6_MIB_OUTTIMEEXCDS, /* OutTimeExcds */
-   ICMP6_MIB_OUTPARMPROBLEMS,  /* OutParmProblems */
-   ICMP6_MIB_OUTECHOREPLIES,   /* OutEchoReplies */
-   ICMP6_MIB_OUTROUTERSOLICITS,/* OutRouterSolicits */
-   ICMP6_MIB_OUTNEIGHBORSOLICITS,  /* OutNeighborSolicits */
-   ICMP6_MIB_OUTNEIGHBORADVERTISEMENTS,/* 
OutNeighborAdvertisements */
-   ICMP6_MIB_OUTREDIRECTS, /* OutRedirects */
-   ICMP6_MIB_OUTGROUPMEMBRESPONSES,/* OutGroupMembResponses 
*/
-   ICMP6_MIB_OUTGROUPMEMBREDUCTIONS,   /* OutGroupMembReductions 
*/
__ICMP6_MIB_MAX
 };
 
+#define __ICMP6MSG_MIB_MAX 512 /* Out+In for all 8-bit ICMPv6 types */
+
 /* tcp mib definitions */
 /*
  * RFC 1213:  MIB-II TCP group
diff -ruNp linux-2.6.22.5/include/net/if_inet6.h 
linux-2.6.22.5_ICMPv6MSG2/include/net/if_inet6.h
--- linux-2.6.22.5/include/net/if_inet6.h   2007-08-22 
16:23:54.0 -0700
+++ linux-2.6.22.5_ICMPv6MSG2/include/net/if_inet6.h2007-09-12 
14:11:49.0 -0700
@@ -154,6 +154,7 @@ struct ipv6_devstat {
struct proc_dir_entry   *proc_dir_entry;
DEFINE_SNMP_STAT(struct ipstats_mib, ipv6);
DEFINE_SNMP_STAT(struct icmpv6_mib, icmpv6);
+   DEFINE_SNMP_STAT(struct icmpv6msg_mib, icmpv6msg);
 };
 
 struct inet6_dev 
diff -ruNp linux-2.6.22.5/include/net/ipv6.h 
linux-2.6.22.5_ICMPv6MSG2/include/net/ipv6.h
--- linux-2.6.22.5/include/net/ipv6.h   2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPv6MSG2/include/net/ipv6.h2007-09-13 
15:09:28.0 -0700
@@ -111,45 +111,28 @@ struct frag_hdr {
 extern int sysctl_ipv6_bindv6only;
 extern int sysctl_mld_max_msf;
 
-/* MIBs */
-DECLARE_SNMP_STAT(struct ipstats_mib, ipv6_statistics);
-#define IP6_INC_STATS(idev,field)  ({  \
+#define _DEVINC(statname, modifier, idev, field)   \
+({ \
struct inet6_dev *_idev = (idev);   \
if (likely(_idev != NULL))  \
-   

Re: [PATCH][2/2] Add ICMPMsgStats MIB (RFC 4293)

2007-09-11 Thread David Stevens
Yoshifuji Hideaki [EMAIL PROTECTED] wrote on 09/11/2007 01:50:53 
AM:

 Dave, we've been supporting per-interface stats for IPv6, and
 you seem to remove them.  Please keep them.  Thank you.

The reason I didn't for ICMPMsgStats is the size. The RFC requires
in  out counters for all types, whether or not they are known by the OS,
so that's 512 stats.

Because the MIB variables in the existing infrastructure are not 
dynamically
allocated, that's:

512 * numInterfaces * numCPUs * 2protos

That seems like a lot of mostly-zero memory. Changing them to be run-time
allocated (and re-allocated when new types are seen) is another 
alternative,
but more significant changes.

Memory is Cheap.
-- Doug Comer

So maybe it's not so bad -- I'll roll another per-interface version
to see.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][1/2] Add ICMPMsgStats MIB (RFC 4293)

2007-09-10 Thread David Stevens
Background: RFC 4293 deprecates existing individual, named ICMP
type counters to be replaced with the ICMPMsgStatsTable. This table
includes entries for both IPv4 and IPv6, and requires counting of all
ICMP types, whether or not the machine implements the type.

These patches remove (but not really) the existing counters, and
replace them with the ICMPMsgStats tables for v4 and v6.
It includes the named counters in the /proc places they were, but gets the
values for them from the new tables. It also counts packets generated
from raw socket output (e.g., OutEchoes, MLD queries, RA's from
radvd, etc).

Changes:
1) create icmpmsg_statistics mib
2) create icmpv6msg_statistics mib
3) modify existing counters to use these
4) modify /proc/net/snmp to add IcmpMsg with all ICMP types
listed by number for easy SNMP parsing
5) modify /proc/net/snmp printing for Icmp to get the named data
from new counters.

IPv4 patch attached, IPv6 patch to follow.

+-DLS

Signed-off-by: David L Stevens [EMAIL PROTECTED]

diff -ruNp linux-2.6.22.5/include/linux/snmp.h 
linux-2.6.22.5_ICMPMSG/include/linux/snmp.h
--- linux-2.6.22.5/include/linux/snmp.h 2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPMSG/include/linux/snmp.h 2007-08-23 
15:32:29.0 -0700
@@ -82,6 +82,8 @@ enum
__ICMP_MIB_MAX
 };
 
+#define __ICMPMSG_MIB_MAX 512  /* Out+In for all 8-bit ICMP types */
+
 /* icmp6 mib definitions */
 /*
  * RFC 2466:  ICMPv6-MIB
diff -ruNp linux-2.6.22.5/include/net/icmp.h 
linux-2.6.22.5_ICMPMSG/include/net/icmp.h
--- linux-2.6.22.5/include/net/icmp.h   2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPMSG/include/net/icmp.h   2007-08-23 
15:56:45.0 -0700
@@ -30,9 +30,16 @@ struct icmp_err {
 
 extern struct icmp_err icmp_err_convert[];
 DECLARE_SNMP_STAT(struct icmp_mib, icmp_statistics);
+DECLARE_SNMP_STAT(struct icmpmsg_mib, icmpmsg_statistics);
 #define ICMP_INC_STATS(field)  SNMP_INC_STATS(icmp_statistics, 
field)
 #define ICMP_INC_STATS_BH(field)   SNMP_INC_STATS_BH(icmp_statistics, 
field)
 #define ICMP_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmp_statistics, 
field)
+#define ICMPMSGOUT_INC_STATS(field)SNMP_INC_STATS(icmpmsg_statistics, 
field+256)
+#define ICMPMSGOUT_INC_STATS_BH(field) 
SNMP_INC_STATS_BH(icmpmsg_statistics, field+256)
+#define ICMPMSGOUT_INC_STATS_USER(field) 
SNMP_INC_STATS_USER(icmpmsg_statistics, field+256)
+#define ICMPMSGIN_INC_STATS(field) SNMP_INC_STATS(icmpmsg_statistics, 
field)
+#define ICMPMSGIN_INC_STATS_BH(field) 
SNMP_INC_STATS_BH(icmpmsg_statistics, field)
+#define ICMPMSGIN_INC_STATS_USER(field) 
SNMP_INC_STATS_USER(icmpmsg_statistics, field)
 
 struct dst_entry;
 struct net_proto_family;
@@ -42,6 +49,7 @@ extern void   icmp_send(struct sk_buff *sk
 extern int icmp_rcv(struct sk_buff *skb);
 extern int icmp_ioctl(struct sock *sk, int cmd, unsigned long arg);
 extern voidicmp_init(struct net_proto_family *ops);
+extern voidicmp_out_count(unsigned char type);
 
 /* Move into dst.h ? */
 extern int xrlim_allow(struct dst_entry *dst, int timeout);
diff -ruNp linux-2.6.22.5/include/net/snmp.h 
linux-2.6.22.5_ICMPMSG/include/net/snmp.h
--- linux-2.6.22.5/include/net/snmp.h   2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPMSG/include/net/snmp.h   2007-08-23 
14:42:50.0 -0700
@@ -82,6 +82,11 @@ struct icmp_mib {
unsigned long   mibs[ICMP_MIB_MAX];
 } __SNMP_MIB_ALIGN__;
 
+#define ICMPMSG_MIB_MAX__ICMPMSG_MIB_MAX
+struct icmpmsg_mib {
+   unsigned long   mibs[ICMPMSG_MIB_MAX];
+} __SNMP_MIB_ALIGN__;
+
 /* ICMP6 (IPv6-ICMP) */
 #define ICMP6_MIB_MAX  __ICMP6_MIB_MAX
 struct icmpv6_mib {
diff -ruNp linux-2.6.22.5/net/ipv4/af_inet.c 
linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c
--- linux-2.6.22.5/net/ipv4/af_inet.c   2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c   2007-08-23 
14:47:26.0 -0700
@@ -1296,6 +1296,10 @@ static int __init init_ipv4_mibs(void)
  sizeof(struct icmp_mib),
  __alignof__(struct icmp_mib))  0)
goto err_icmp_mib;
+   if (snmp_mib_init((void **)icmpmsg_statistics,
+ sizeof(struct icmpmsg_mib),
+ __alignof__(struct icmpmsg_mib))  0)
+   goto err_icmpmsg_mib;
if (snmp_mib_init((void **)tcp_statistics,
  sizeof(struct tcp_mib),
  __alignof__(struct tcp_mib))  0)
@@ -1318,6 +1322,8 @@ err_udplite_mib:
 err_udp_mib:
snmp_mib_free((void **)tcp_statistics);
 err_tcp_mib:
+   snmp_mib_free((void **)icmpmsg_statistics);
+err_icmpmsg_mib:
snmp_mib_free((void **)icmp_statistics);
 err_icmp_mib:
snmp_mib_free((void **)ip_statistics);
diff -ruNp linux-2.6.22.5/net/ipv4/icmp.c 
linux-2.6.22.5_ICMPMSG/net/ipv4/icmp.c
--- linux-2.6.22.5/net/ipv4/icmp.c  2007-08-22 

[PATCH][2/2] Add ICMPMsgStats MIB (RFC 4293)

2007-09-10 Thread David Stevens
Background: RFC 4293 deprecates existing individual, named ICMP
type counters to be replaced with the ICMPMsgStatsTable. This table
includes entries for both IPv4 and IPv6, and requires counting of all
ICMP types, whether or not the machine implements the type.

These patches remove (but not really) the existing counters, and
replace them with the ICMPMsgStats tables for v4 and v6.
It includes the named counters in the /proc places they were, but gets the
values for them from the new tables. It also counts packets generated
from raw socket output (e.g., OutEchoes, MLD queries, RA's from
radvd, etc).

Changes:
1) create icmpmsg_statistics mib
2) create icmpv6msg_statistics mib
3) modify existing counters to use these
4) modify /proc/net/snmp to add IcmpMsg with all ICMP types
listed by number for easy SNMP parsing
5) modify /proc/net/snmp printing for Icmp to get the named data
from new counters.

IPv6 patch attached.

+-DLS

Signed-off-by: David L Stevens [EMAIL PROTECTED]

diff -ruNp linux-2.6.22.5/include/linux/snmp.h 
linux-2.6.22.5_ICMPv6MSG/include/linux/snmp.h
--- linux-2.6.22.5/include/linux/snmp.h 2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPv6MSG/include/linux/snmp.h   2007-09-10 
15:02:43.0 -0700
@@ -91,35 +91,12 @@ enum
ICMP6_MIB_NUM = 0,
ICMP6_MIB_INMSGS,   /* InMsgs */
ICMP6_MIB_INERRORS, /* InErrors */
-   ICMP6_MIB_INDESTUNREACHS,   /* InDestUnreachs */
-   ICMP6_MIB_INPKTTOOBIGS, /* InPktTooBigs */
-   ICMP6_MIB_INTIMEEXCDS,  /* InTimeExcds */
-   ICMP6_MIB_INPARMPROBLEMS,   /* InParmProblems */
-   ICMP6_MIB_INECHOS,  /* InEchos */
-   ICMP6_MIB_INECHOREPLIES,/* InEchoReplies */
-   ICMP6_MIB_INGROUPMEMBQUERIES,   /* InGroupMembQueries */
-   ICMP6_MIB_INGROUPMEMBRESPONSES, /* InGroupMembResponses */
-   ICMP6_MIB_INGROUPMEMBREDUCTIONS,/* InGroupMembReductions 
*/
-   ICMP6_MIB_INROUTERSOLICITS, /* InRouterSolicits */
-   ICMP6_MIB_INROUTERADVERTISEMENTS,   /* InRouterAdvertisements 
*/
-   ICMP6_MIB_INNEIGHBORSOLICITS,   /* InNeighborSolicits */
-   ICMP6_MIB_INNEIGHBORADVERTISEMENTS, /* 
InNeighborAdvertisements */
-   ICMP6_MIB_INREDIRECTS,  /* InRedirects */
ICMP6_MIB_OUTMSGS,  /* OutMsgs */
-   ICMP6_MIB_OUTDESTUNREACHS,  /* OutDestUnreachs */
-   ICMP6_MIB_OUTPKTTOOBIGS,/* OutPktTooBigs */
-   ICMP6_MIB_OUTTIMEEXCDS, /* OutTimeExcds */
-   ICMP6_MIB_OUTPARMPROBLEMS,  /* OutParmProblems */
-   ICMP6_MIB_OUTECHOREPLIES,   /* OutEchoReplies */
-   ICMP6_MIB_OUTROUTERSOLICITS,/* OutRouterSolicits */
-   ICMP6_MIB_OUTNEIGHBORSOLICITS,  /* OutNeighborSolicits */
-   ICMP6_MIB_OUTNEIGHBORADVERTISEMENTS,/* 
OutNeighborAdvertisements */
-   ICMP6_MIB_OUTREDIRECTS, /* OutRedirects */
-   ICMP6_MIB_OUTGROUPMEMBRESPONSES,/* OutGroupMembResponses 
*/
-   ICMP6_MIB_OUTGROUPMEMBREDUCTIONS,   /* OutGroupMembReductions 
*/
__ICMP6_MIB_MAX
 };
 
+#define __ICMP6MSG_MIB_MAX 512 /* Out+In for all 8-bit ICMPv6 types */
+
 /* tcp mib definitions */
 /*
  * RFC 1213:  MIB-II TCP group
diff -ruNp linux-2.6.22.5/include/net/ipv6.h 
linux-2.6.22.5_ICMPv6MSG/include/net/ipv6.h
--- linux-2.6.22.5/include/net/ipv6.h   2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPv6MSG/include/net/ipv6.h 2007-09-10 
15:41:19.0 -0700
@@ -132,6 +132,7 @@ DECLARE_SNMP_STAT(struct ipstats_mib, ip
SNMP_INC_STATS_USER(ipv6_statistics, field);\
 })
 DECLARE_SNMP_STAT(struct icmpv6_mib, icmpv6_statistics);
+DECLARE_SNMP_STAT(struct icmpv6msg_mib, icmpv6msg_statistics);
 #define ICMP6_INC_STATS(idev, field)   ({  \
struct inet6_dev *_idev = (idev);   \
if (likely(_idev != NULL))  \
@@ -157,6 +158,14 @@ DECLARE_SNMP_STAT(struct icmpv6_mib, icm
SNMP_INC_STATS_OFFSET_BH(_idev-stats.icmpv6, field, 
_offset);   \
SNMP_INC_STATS_OFFSET_BH(icmpv6_statistics, field, _offset); \
 })
+
+#define ICMP6MSGOUT_INC_STATS(field) SNMP_INC_STATS(icmpv6msg_statistics, 
field+256)
+#define ICMP6MSGOUT_INC_STATS_BH(field) 
SNMP_INC_STATS_BH(icmpv6msg_statistics, field+256)
+#define ICMP6MSGOUT_INC_STATS_USER(field) 
SNMP_INC_STATS_USER(icmpv6msg_statistics, field+256)
+#define ICMP6MSGIN_INC_STATS(field) SNMP_INC_STATS(icmpv6msg_statistics, 
field)
+#define ICMP6MSGIN_INC_STATS_BH(field) 
SNMP_INC_STATS_BH(icmpv6msg_statistics, field)
+#define ICMP6MSGIN_INC_STATS_USER(field) 
SNMP_INC_STATS_USER(icmpv6msg_statistics, 

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread David Stevens
Stephen Hemminger [EMAIL PROTECTED] wrote on 08/24/2007 
08:52:03 AM:

 
 You need hardware support for deferred interrupts. Most devices have it 
 (e1000, sky2, tg3)
 and it interacts well with NAPI. It is not a generic thing you want done 
by the stack,
 you want the hardware to hold off interrupts until X packets or Y usecs 
have expired.

For generic hardware that doesn't support it, couldn't you use an 
estimater
and adjust the timer dynamicly in software based on sampled values? Switch 
to per-packet
interrupts when the receive rate is low...
Actually, that's how I thought NAPI worked before I found out 
otherwise (ie,
before I looked :-)).

The hardware-accelerated one is essentially siloing as done by 
ancient serial
devices on UNIX systems. If you had a tunable for a target count, and an 
estimator
for the time interval, then switch to per-packet when the estimator 
exceeds a tunable
max threshold (and also, I suppose, if you near overflowing the ring on 
the min
timer granularity), you get almost all of it, right?
Problem is if it increases rapidly, you may drop packets before 
you notice
that the ring is full in the current estimated interval.

 +-DLS


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] Add new timeval_to_sec function

2007-07-24 Thread David Stevens
Oliver Hartkopp [EMAIL PROTECTED] wrote on 07/23/2007 11:22:39 PM:

 When you like to create any timeout based on your calculated value, you
 might run into the problem that your calculated value is set to _zero_
 even if there was some time before the conversion. This might probably
 not what you indented to get.

BTW, these are stats (for human consumption), not timers.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel module to list sockets with their multicast group joins and respective processes

2007-07-16 Thread David Stevens
 sockets that join different groups receive messages from the respective 
 other group (if they are only bound to the wildcard address). Obviously 
 this is handled differently in Linux for IPv4, where the socket matching 

 for incoming message is done solely on the 4-tuple of addresses and 
ports, 
 and IPv6, where the explicit socket joins are taken into account [1,2]. 

These are not different in v4 vs v6. Group membership is 
per-interface,
delivery depends on the binding (for both v4 and v6), and this is the same
as BSD behavior since IP multicasting was invented.
There are two levels of checks, and I think you're confusing them.
inet6_mc_check() is checking that somebody (*anybody*) has joined the
group on the receiving interface, before delivering the packet to IP.
Delivery to a particular socket depends on source filter settings, if
they are in use, and bindings (only). The socket has a list of joined
addresses to track source filters and to check for correct join/leave
(a socket can't leave a group it hasn't joined), but that list does not
determine delivery of a multicast packet to the socket when source 
filtering
is not in use.
 +-DLS



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPv6: optionaly validate RAs on raw sockets

2007-07-11 Thread David Stevens
I think #2 in your list is the right choice, and that has nothing to do 
with adding a
non-standard option (which I completely agree is a bad idea).

It looked like you're just checking if the machine is acting as a router 
or not and
if it comes from a link-local address; is that right? Of course, lots of 
apps already
check for am I a router and they don't require a new socket option. (!) 
See everything
in the quagga package, for example. And checking the address type in a app 
is
trivial.

The previous discussion about validation was talking about RA's that are 
forged,
so don't pass IPsec authentication checks. I don't see any reason at all 
to deliver those
to an application (ever), so no non-standard socket option required there. 
I don't know
if those are currently delivered on raw sockets or not, but if they are, I 
think it's
reasonable to have a patch that clones them only after authentication 
rather than before.

Prior discussion used FUD about some monitoring apps needing to see forged 
RA's.
I don't think there really are apps that need to see forged RA's, but if 
they really
want everything, they should use bpf or the like, just as they would need 
to do to
receive, for example, packets with invalid checksums.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPv6: optionaly validate RAs on raw sockets

2007-07-11 Thread David Stevens
 Since you asked for another idea, how about using netlink to send 
_validated_ RA
 information to interested parties?
 
 -vlad

That sounds like a good idea to me (FWIW),
though I also still think a simple raw-socket
application would do it just fine, possibly with
no kernel modification at all.
But since the kernel wouldn't be maintaining
the DNS info, which was my real objection to the
original version,  netlink would work well too.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: a maze of twisty stats, most different

2007-06-28 Thread David Stevens
I think there's a more general problem that's a huge hassle. There are 
lots of
new SNMP MIB's, but no conventions (that I'm aware of, at least) to allow 
for
changes to the /proc entries that get them to applications. For example, 
the
route age data recently submitted. I agree that existing apps rely on 
these
and are generally very fragile, but in practice it means it isn't easy to 
do any
changes for RFC compliance with new MIBS, short of replicating existing 
data
in a new /proc entry along with the old one.

How do people feel about adding a MIB subtree in /proc that tracks MIB
updates with some basic rules:

1) all fields must be tagged with a label
2) all apps using it must match the label, and ignore anything they don't
match

Then future additions to a row just mean adding a label (like route age),
and items like the one-per-line v6 statistics could add new ones easily, 
too.
And the existing MIB and non-MIB stats can be the same format, though
possibly derived from different data.

I've got a patch for doing ICMPMsgStats, which replaces the individual
ICMP counters now defined (and which are deprecated), but changing
/proc/net/snmp6 doesn't seem so wise, given how fragile some of these
apps are.

Defined device stats could live in that proposed tree, too, and with basic
rules for users would allow Linux-only useful extensions without breaking
the rules that apps using it should follow (and therefore without breaking 
the
apps). That's independent of ethtool extensions, but at least marginally 
related...

Comments?

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scaling Max IP address limitation

2007-06-24 Thread David Stevens
 Unrelated wishful thinking
 I keep having hopeful dreams that one day netfilter will grow support 
 for cross-protocol NAT (IE: NAT a TCPv4 connection over TCPv6 to the 
 IPv6-only local web server, or vice versa).  It would seem that would 
 require a merged xtables program.

You don't actually need it (at least for easy cases like that),
because IPv6 defines IPv4 mapped IPv6 addresses of the form
:::a.b.c.d. These will generate IPv4 packets for a.b.c.d, from
a v6 socket.
Unless you're using v6only binding (a sysctl option), you can
connect to v6-only servers using a v4 network and a v4 address of the
server. The peer address on those connections will show up as a v4
mapped address, and all the traffic will be v4, but the socket layer
is all v6.

+-DLS


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-23 Thread David Stevens
[EMAIL PROTECTED] wrote on 06/23/2007 07:47:06 AM:

 Rémi and Simon give my responses very eloquently.  Although you could
 have yet-another-network-daemon redundantly process RA messages, the
 kernel is doing it already and it makes sense to just push this

It would be two pieces looking at the same packet, but it isn't
redundant processing. The kernel would ignore the RDNS header, and the
app would ignore everything else; everything would be processed once.

 Although parsing
 RA messages and processing expiry in userland looks barely-possible
 now,
barely possible?? See below.

 SeND support is really necessary for long-term IPv6 security, and
 duplicating SeND functionality in userland would be a nightmare.
 Futher, the neighbor discover protocol involves Router Solicitation
 messages which elicit the Router Advertisement reply, and we really
 don't want userland sending redundant Router Solicitation messages
 around, just because the kernel doesn't want to tell it what Router
 Advertisements it received.  I considered storing the *complete*
 Router Advertisement messages received and pushing them unparsed to
 userland, just to get around the bogus DNS in the kernel politics
 (hint: it's not a resolver in the kernel, it's just nameserver
 addresses being stored).  Does anyone really suggest that this would
 be a better solution?

No, in fact! I didn't hear anyone suggesting that all of
neighbor discovery be pushed out of the kernel. All I suggested is
that you read a raw ICMPv6 socket for RA's that have the RDNS header
and the app _process_the_RDNS_header. The kernel should still continue
to do everything it needs to with the kernel data in the RA. Then you
just need a hash table (or maybe just a list -- there shouldn't be a
lot of them) and a timer to delete them when the RDNS expiration hits.
Easy, right?
You might have to change the icmp6_filter, if RA's are not
already copied to raw sockets (I don't know either way offhand),
but that's a trivial kernel patch; otherwise, I don't believe you
have to do anything but read the socket and process the RDNS header
on RAs you receive.
 
 +-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-23 Thread David Stevens
Rémi Denis-Courmont [EMAIL PROTECTED] wrote on 06/23/2007 09:51:55 
AM:

 How do I authenticate SeND RA? How do I deal with the link going down 
 before the expiration? How do I know this interface is doing autoconf 
 at all?

The kernel should do the authentication, as it will for other 
RA's,
and should not deliver (IMAO) unauthenticated packets. If it is, I would
consider that a bug (for all cases, not just this), and that would be a
good thing to fix. :-)
An interface going down doesn't directly invalidate a DNS server
address, though it may not be the best through another interface. Since
it is a list, I think doing nothing for this case wouldn't be terrible.
This is no worse than the existing resolver code. But if you really need 
it,
you can monitor netlink, or poll the interface flags on whatever interval 
you
require for detection.
As for autoconf, that's available from sysctl, I assume from /proc
somewhere, too. That usually doesn't change, but if you want to account
for runtime configuration changes, you can always monitor netlink and
reread when new addresses appear, too.

There certainly may be complications I haven't thought of, since
I haven't implemented it. But I still don't see a good case for using the
kernel as a DNS database.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-23 Thread David Stevens
Rémi Denis-Courmont [EMAIL PROTECTED] wrote on 06/23/2007 11:13:01 
AM:

An implementation might perform additional validity checks on the
ICMPv6 message content and discard malformed packets.  However, a
portable application must not assume that such validity checks have
been performed.

This doesn't say that unauthenticated packets must be delivered,
and I don't think the portability of an RDNS daemon is an issue. But
even if you really wanted to run the same code on a non-Linux machine,
it just means that your daemon code would have to do its own 
authentication.
Reading /proc or netlink with packet formats you've defined to get this
information is not more portable to non-Linux machines, right?
I don't see any issue here. If an application is relying on the
ability to see forged packets for portability reasons, it's probably not
an application you want running on your machine. :-)

 That would encourage people into running open recursive DNS servers 
 which is widely known and documented as a bad practice. Definitely a 
 very bad idea.
I don't understand your point here. I'm talking about client
behaviour, and if the client fails for a server from a downed interface,
I don't see how that's different from removing the client from the
list, which is what you want to do. Nobody should feel encouraged to
do anything different on the server side-- at least not by me!
 
  But if you really need it, you can monitor netlink, or poll the
  interface flags on whatever interval you require for detection.
 
  As for autoconf, that's available from sysctl, I assume from
  /proc somewhere, too. That usually doesn't change, but if you want to
  account for runtime configuration changes, you can always monitor
  netlink and reread when new addresses appear, too.
 
 There are a bunch of parameters that determine whether an interface 
 accepts RAs or not. I doubt it's wise to try to reimplement that into 
 userspace, particularly if it is subject to change.

I'm not suggesting re-implementing anything; I'm saying you can
read the current state at application level, if you need it. If you
think it's difficult to get the correct information from existing
API's, then improving those API's is always worthwhile. I don't
believe it's excessively difficult to determine if autoconf is in
use, though.

 My point 
 is raw IPv6 sockets are not usable for the time being, and I do not see 
 anyway to fix without modifying the kernel.

I disagree about raw sockets being usable, but without modifying the
kernel isn't a constraint.
modifying the kernel != put DNS server info in the kernel;
if there's a bug, or some minor tweaking that'd help the feature along,
I'd support that. The important point for me is that the basic mechanisms
are already in place, and I think it'd be best to use those rather than
creating a new interface for all of this.

 The userspace DNS configuration daemon might need to be started later 
 than the kernel autoconf - another issue that needs help from the 
 kernel.
Easily done; the init scripts are what bring the interfaces up
in the first place, so start the daemon before those run. Adding an
entry in inittab so it'll be automatically restarted if it dies is also
a reasonable thing. RA's are resent periodically, and they can be lost
anyway, so not the end of the world if you miss one, either.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-22 Thread David Stevens
Scott,
Why not make the application that writes resolv.conf
also listen on a raw ICMPv6 socket? I don't believe you'd need
any kernel changes, then, and it seems pretty simple and
straightforward.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-22 Thread David Stevens
[EMAIL PROTECTED] wrote on 06/22/2007 06:17:46 PM:

 On 23/06/07 02:04, David Stevens wrote:
  Why not make the application that writes resolv.conf
  also listen on a raw ICMPv6 socket? I don't believe you'd need
  any kernel changes, then, and it seems pretty simple and
  straightforward.
 
 Because then it requires yet another network daemon, RA in 
 the kernel means there's no need for one to manage adding 
 auto-configured IP addresses... what's wrong with doing the 
 same for DNS?

It's not yet another one, since you have to run something
to get it in resolv.conf, anyway. That seems much better to me
than having the kernel track data that can only be used at the
application layer. The app itself looks like it'd be really simple.
Auto-configured addresses are used by the kernel. It has to
have those addresses. But the kernel doesn't do DNS look-ups, or
write resolv.conf; that's the difference, for me.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] don't put multicasts with mc_ttl=0 on the wire

2007-05-15 Thread David Stevens
Arthur,
I assume you're making use of the hack mentioned in route.c:

 ... This hack is not just for fun, it allows vic, vat and friends to 
work.
They bind socket to loopback, set ttl to zero and expect that it will 
work.
I don't know the details of the intent for this hack, but did you
test that it won't break them?
A multicast application that relies on the (unicast) routing table
at all is broken IMHO, as is an app that sets the TTL to 0, but shouldn't
all ttl 0 packets about to be sent to the wire be dropped? Wouldn't that
be a better way to handle this case?

+-DLS

[EMAIL PROTECTED] wrote on 05/15/2007 12:56:02 PM:

 A colleague of mine found that multicasts with a ttl of 0
 can be sent on the wire. This happens if the sender doesn't
 belong to the destination multicast group.
 
 With the following the multicast ttl is respected whether
 or not the sender belongs to the destination multicast group.
 
 Signed-off-by: Arthur Kepner [EMAIL PROTECTED]
 
 ---
 
  net/ipv4/route.c |3 +--
  1 files changed, 1 insertions(+), 2 deletions(-)
 
 diff --git a/net/ipv4/route.c b/net/ipv4/route.c
 index cb76e3c..bf25cf5 100644
 --- a/net/ipv4/route.c
 +++ b/net/ipv4/route.c
 @@ -2249,8 +2249,7 @@ static inline int __mkroute_output(struct rtable 
**result,
 }
 if (flags  (RTCF_BROADCAST | RTCF_MULTICAST)) {
rth-rt_spec_dst = fl-fl4_src;
 -  if (flags  RTCF_LOCAL 
 -  !(dev_out-flags  IFF_LOOPBACK)) {
 +  if (!(dev_out-flags  IFF_LOOPBACK)) {
   rth-u.dst.output = ip_mc_output;
   RT_CACHE_STAT_INC(out_slow_mc);
}
 
 -- 
 Arthur
 
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] Reverse sense of promisc tests in ip6_mc_input

2007-05-12 Thread David Stevens
Your patch looks correct to me.

+-DLS

Signed-off-by: David L Stevens [EMAIL PROTECTED]

[EMAIL PROTECTED] wrote on 05/12/2007 03:51:35 PM:

 From: Corey Mutter [EMAIL PROTECTED]
 
 Reverse the sense of the promiscuous-mode tests in ip6_mc_input(). 
 
 Signed-off-by: Corey Mutter [EMAIL PROTECTED]
 
 ---
 
 I had been suspicious of this code for a while, but just assumed I must 
 have been missing something. However, I did some tests and, sure enough, 

 I can open a socket and it will see IPv6 multicast packets come in, then 

 the packets will stop coming in when I turn on IFF_ALLMULTI on the 
 interface. 
 
 In my tests, this change works more like I'd expect; I can open a 
 socket, and it doesn't see any multicast packets (unless I join a group 
 or something) until after I turn on IFF_ALLMULTI. 
 
 --- linux-2.6.21.1/net/ipv6/ip6_input.c   2007-04-27 17:49:26.0 
-0400
 +++ linux/net/ipv6/ip6_input.c   2007-05-12 18:19:09.59775 -0400
 @@ -235,7 +235,7 @@ int ip6_mc_input(struct sk_buff *skb)
 IP6_INC_STATS_BH(ip6_dst_idev(skb-dst), IPSTATS_MIB_INMCASTPKTS);
 
 hdr = skb-nh.ipv6h;
 -   deliver = likely(!(skb-dev-flags  (IFF_PROMISC|IFF_ALLMULTI))) ||
 +   deliver = unlikely(skb-dev-flags  (IFF_PROMISC|IFF_ALLMULTI)) ||
 ipv6_chk_mcast_addr(skb-dev, hdr-daddr, NULL);
 
 /*
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-10 Thread David Stevens
The word small is coming up a lot in this discussion, and
I think packet size really has nothing to do with it. Multiple
streams generating packets of any size would benefit; the
key ingredient is a queue length greater than 1.

I think the intent is to remove queue lock cycles by taking
the whole list (at least up to the count of free ring buffers)
when the queue is greater than one packet, thus effectively
removing the lock expense for n-1 packets.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-10 Thread David Stevens
 Which worked _very_ well (the whole list) going in the other direction 
for the 
 netisr queue(s) in HP-UX 10.20.  OK, I promise no more old HP-UX stories 
for the 
 balance of the week :)

Yes, OSes I worked on in other lives usually took the whole queue
and then took responsibility for handling them outside the lock.
A dequeue_n() under 1 lock should still be better than n locks,
I'd think, since you may bounce to different CPUs during the n individual
cycles.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[IPV6] send ICMPv6 error on scope violations

2007-05-09 Thread David Stevens
When an IPv6 router is forwarding a packet with a link-local scope source
address off-link, RFC 4007 requires it to send an ICMPv6 destination
unreachable with code 2 (not neighbor), but Linux doesn't. Fix below.

+-DLS
[in-line for viewing, attached for applying]

Signed-off-by: David L Stevens [EMAIL PROTECTED]

--- linux-2.6.20.7/net/ipv6/ip6_output.c2007-04-13 
13:48:14.0 -0700
+++ linux-2.6.20.7T1/net/ipv6/ip6_output.c  2007-05-09 
12:32:45.0 -0700
@@ -449,10 +449,17 @@ int ip6_forward(struct sk_buff *skb)
 */
if (xrlim_allow(dst, 1*HZ))
ndisc_send_redirect(skb, n, target);
-   } else if 
(ipv6_addr_type(hdr-saddr)(IPV6_ADDR_MULTICAST|IPV6_ADDR_LOOPBACK
-   |IPV6_ADDR_LINKLOCAL)) {
+   } else {
+   int addrtype = ipv6_addr_type(hdr-saddr);
+
/* This check is security critical. */
-   goto error;
+   if (addrtype  (IPV6_ADDR_MULTICAST|IPV6_ADDR_LOOPBACK))
+   goto error;
+   if (addrtype  IPV6_ADDR_LINKLOCAL) {
+   icmpv6_send(skb, ICMPV6_DEST_UNREACH,
+   ICMPV6_NOT_NEIGHBOUR, 0, skb-dev);
+   goto error;
+   }
}
 
if (skb-len  dst_mtu(dst)) {




icmp6fix.patch
Description: Binary data


Re: [PATCH 3/3] bonding: Improve IGMP join processing

2007-03-06 Thread David Stevens
It looks to me like rejoin is essentially ip_mc_up(), and it'd be better
to call that than add a nearly identical function.

Also, real interfaces already do gratuitous IGMP advertisements when
they are bounced (the reason there is an ip_mc_up()). Could bonding,
when failing over, simply mark the master interface as down, switch, and
then mark the master as up again? In addition to doing the right
thing for both IPv4 and IPv6 multicasting w/o any code changes in those
layers, it may have similar benefits for ARP and neighbor discovery, 
right?
Maybe not-- haven't looked at it...

One down side for IPv6 (which apparently bonding doesn't support) is that
static addresses are lost when the device goes down, but that's a 
difference
form IPv4 that should be fixed.
+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bonding-devel] [PATCH 3/3] bonding: Improve IGMP join processing

2007-03-06 Thread David Stevens
[EMAIL PROTECTED] wrote on 03/06/2007 03:15:41 PM:

 
 David Stevens [EMAIL PROTECTED] wrote:
 
 It looks to me like rejoin is essentially ip_mc_up(), and it'd be 
better
 to call that than add a nearly identical function.
 
Won't ip_mc_up() acquire an additional reference (via
 ip_mc_inc_group) to the IGMP_ALL_HOSTS im-users that would never be
 released (in the case of bonding calling the function out of the blue)?

Yes. I'm not sure that matters-- destroy_dev doesn't care how many
references to a group, and IGMP_ALL_HOSTS isn't advertised (so wouldn't
get a leave when you only down the interface, like other groups do).
But since ip_mc_up() is *entirely* that join plus group_added() on all
the existing groups, there really shouldn't be another. But the new device
will need the all-hosts group in its hardware multicast filter, too, if
it hasn't already been using multicasting. Your reload caller could just
dec_group that group after calling ip_mc_up().

In looking at it, the ip_mc_rejoin_group function (the new one
 added with the patch) is a lot more like igmp_group_added() than
 ip_mc_up().
No, group_added() is one group. mc_up() just calls group_added on all
of them, which I think is what the rejoin was trying to do.

I'm not sure if the extra bits in igmp_group_added() are
 worthy of concern; I'm thinking not, since im-loaded shouldn't be zero
 coming in for the bonding case.
im-loaded means the device has it in its multicast address 
filter.
If you're switching devices, and didn't do all the multicast stuff on all
the devices originally, then you want it to be 0 (and should make it so,
like ip_mc_down() does). :-)

I think the meat that the rejoin wants is what's in
 igmpv3_send_cr(), which appears to do the actual sending stuff.  I'm not
 sure if that's better to call directly (and risk locking adventures) or
 to just trip the timer via igmp_ifc_event().

No, no, no -- please don't mess with those directly. It'd be a 
maintenance
nightmare, and multicasting is device independent right now. :-) I'd hope 
there
wouldn't be any bonding-specific code needed at this layer, which is why I 
hope
something like using up/down would work out.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bonding-devel] [PATCH 3/3] bonding: Improve IGMP join processing

2007-03-06 Thread David Stevens
  Marking the master down would, I believe, issue notifiers that
  the device has gone down.  Various things, network manager sort of
  applications in particular, listen to those, so I'm not sure it's a 
good
  idea.  I think there are other side effects as well, I'm thinking it
  would flush routes associated with the interface as well.

[BTW, you can call ip_mc_down()/ip_mc_up() directly w/o getting there
from the notifiers -- then no side-effects.]

Andy Gospodarek wrote:
 
 I agree with Jay here.  I hate that bonding has to have so much
 knowledge about upper layer protocols, but for the ones that are
 stateful like IGMP we will need fixes like the one proposed.

I have no problem with bonding having knowledge of ULP's (I
don't like it, but I don't have to look at it :-) ), but the
patch is doing it the other way around. What I don't like about the
proposed patch is that it's adding knowledge of bonding to IGMP.
And IGMP does work fine in this case, w/o flooding or the
proposed patch. It just has the risk of losing multicast packets
during one query interval, and that only happens if you're
using a switch that does IGMP snooping.
I'd like the patch a lot better if it were basicly this:

mc_bond_fudge(void)
{
ip_mc_down(masterdev);
/*do whatever you need to do to switch the slave */
ip_mc_up(masterdev);
}

That doesn't go through the notifier chain, uses existing
functions, doesn't have any refcnt issues, and most importantly
could/should reside in a bonding source file and not in igmp.c. :-)
But RTNL is required whether you use up/down or roll your
own variant, so it sounds like you have other issues to resolve too.

+-DLS


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] /proc/net/anycast6 unbalanced inet6_dev refcnt

2007-02-26 Thread David Stevens
Reading /proc/net/anycast6 when there is no anycast address
on an interface results in an ever-increasing inet6_dev reference
count, as well as a reference to the netdevice you can't get rid of.

Patch below  attached.

+-DLS

Signed-off-by: David L Stevens [EMAIL PROTECTED]
--- linux-2.6.20.1/net/ipv6/anycast.c   2007-02-19 22:34:32.0 
-0800
+++ linux-2.6.20.1T1/net/ipv6/anycast.c 2007-02-26 15:37:04.0 
-0800
@@ -462,6 +462,7 @@ static inline struct ifacaddr6 *ac6_get_
break;
}
read_unlock_bh(idev-lock);
+   in6_dev_put(idev);
}
return im;
 }



ac6.patch
Description: Binary data


Re: igmp: possible NULL dereference after GFP_ATOMIC allocation?

2007-01-30 Thread David Stevens
Alexey,
I think you're correct-- looks like it needs:

if (!skb)
return NULL;

just before the skb_put(), since an allocation (and failure)
could occur in either the igmpv3_newpack() call or in add_grhead().
Also, similar code in net/ipv6/mcast..c.

Will you be submitting the patch?

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] devinet: inetdev_init out label moved after RCU assignment

2007-01-05 Thread David Stevens
Yeah, sure.

+-DLS

Acked-by: David L Stevens [EMAIL PROTECTED]

 Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]
 
 ---
 
 diff -Nurp linux-2.6.20-rc3-/net/ipv4/devinet.c 
linux-2.6.20-rc3/net/ipv4/devinet.c
 --- linux-2.6.20-rc3-/net/ipv4/devinet.c   2007-01-05 11:53:16.0 
+0100
 +++ linux-2.6.20-rc3/net/ipv4/devinet.c   2007-01-05 11:55:32.0 
+0100
 @@ -174,9 +174,10 @@ struct in_device *inetdev_init(struct ne
 ip_mc_init_dev(in_dev);
 if (dev-flags  IFF_UP)
ip_mc_up(in_dev);
 -out:
 +
 /* we can receive as soon as ip_ptr is set -- do this last */
 rcu_assign_pointer(dev-ip_ptr, in_dev);
 +out:
 return in_dev;
  out_kfree:
 kfree(in_dev);
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread David Stevens
Ben  Jarek,
Your analysis looks correct to me. It seems to me the problem is 
that
we don't want the in_device to be searchable until after the 
initialization is done.
What about moving the initialization of dev-ip_ptr in inetdev_init() to 
after the
out label?

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread David Stevens
Ben,
Here's a patch that I think will fix it, assuming the receive is 
on the
same device as the initialization. Can you try this out?

+-DLS
[inline for viewing, attached for applying]

Signed-off-by: David L Stevens [EMAIL PROTECTED]

diff -ruNp linux-2.6.19.1/net/ipv4/devinet.c 
linux-2.6.19.1T1/net/ipv4/devinet.c
--- linux-2.6.19.1/net/ipv4/devinet.c   2006-12-11 11:32:53.0 
-0800
+++ linux-2.6.19.1T1/net/ipv4/devinet.c 2007-01-03 14:37:56.0 
-0800
@@ -165,9 +165,8 @@ struct in_device *inetdev_init(struct ne
  NET_IPV4_NEIGH, ipv4, NULL, NULL);
 #endif
 
-   /* Account for reference dev-ip_ptr */
+   /* Account for reference dev-ip_ptr (below) */
in_dev_hold(in_dev);
-   rcu_assign_pointer(dev-ip_ptr, in_dev);
 
 #ifdef CONFIG_SYSCTL
devinet_sysctl_register(in_dev, in_dev-cnf);
@@ -176,6 +175,8 @@ struct in_device *inetdev_init(struct ne
if (dev-flags  IFF_UP)
ip_mc_up(in_dev);
 out:
+   /* we can receive as soon as ip_ptr is set -- do this last */
+   rcu_assign_pointer(dev-ip_ptr, in_dev);
return in_dev;
 out_kfree:
kfree(in_dev);
diff -ruNp linux-2.6.19.1/net/ipv6/addrconf.c 
linux-2.6.19.1T1/net/ipv6/addrconf.c
--- linux-2.6.19.1/net/ipv6/addrconf.c  2006-12-11 11:32:53.0 
-0800
+++ linux-2.6.19.1T1/net/ipv6/addrconf.c2007-01-03 
14:47:07.0 -0800
@@ -413,8 +413,6 @@ static struct inet6_dev * ipv6_add_dev(s
if (netif_carrier_ok(dev))
ndev-if_flags |= IF_READY;
 
-   /* protected by rtnl_lock */
-   rcu_assign_pointer(dev-ip6_ptr, ndev);
 
ipv6_mc_init_dev(ndev);
ndev-tstamp = jiffies;
@@ -425,6 +423,8 @@ static struct inet6_dev * ipv6_add_dev(s
  NULL);
addrconf_sysctl_register(ndev, ndev-cnf);
 #endif
+   /* protected by rtnl_lock */
+   rcu_assign_pointer(dev-ip6_ptr, ndev);
return ndev;
 }
 


initfix.patch
Description: Binary data


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread David Stevens
OK, sounds good.

By the way, I think you can probably hit it more often if you have
something on the virtual network sending lots of multicast traffic while
you're creating the interface. That'll increase the odds that you'll
get into ip_check_mc() with a partially initialized in_dev.

You can use ping -I intfX 224.0.0.1 (e.g.) to generate multicast
traffic, though you'd want more than one. :-)

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread David Stevens
Herbert,
You're right, I don't know whether it'll fix the problem Ben saw
or not, but it looks like the original code can do a receive before the
in_device is fully initialized, and that, of course, is bad.
If the device for ip_rcv() is not the same one we were
initializing when the receive interrupted, then the patch should have
no effect either way -- I don't think it'll hide other problems.
If it's hard to reproduce (which I guess is true), then you're
right, no soft lockup doesn't really tell us if it's fixed or not.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread David Stevens
Ben,
 If the ip_rcv() and the inetdev_init() are on the same
interface in your stack backtrace, it's a certainty at that point
that the lock value is still 0ed, because none of the initialization
occurs until after it has returned from the function it interrupted
to do the receive.
It'd have to be out of the register code and doing
ip_mc_init_dev() (after that call) to be a tight race with
lock creation.

+-DLS
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-02 Thread David Stevens
I've looked at this a little too -- it'd be nice to know who holds
the write lock.

I see ip_mc_destroy_dev() is bouncing through the lock for
each multicast address, though it starts at the beginning of
the list each time. I don't see a problem with it, but it'd be
simpler if it acquired the write lock once, grabbed and nulled
the list, released the lock and then called igmp_group_dropped()
 ip_ma_put() on each address from the local list copy.

Are you destroying/creating interfaces or doing a lot of multicasting at
the time? How many group memberships do you have?

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix IGMPV3_EXP() normalization bit shift value

2006-11-16 Thread David Stevens
The IGMPV3_EXP() macro doesn't correctly shift the normalization bit, so
time-out values are longer than they should be. Patch below for viewing
and attached for applying.

Thanks to Dirk Ooms for finding the problem in IGMPv3 - MLDv2 had a
similar problem that was already fixed a year ago. :-(

+-DLS

Signed-off-by: David L Stevens [EMAIL PROTECTED]

--- linux-2.6.19-rc5/include/linux/igmp.h   2006-11-16 
16:11:53.0 -0800
+++ linux-2.6.19-rc5T1/include/linux/igmp.h 2006-11-16 
16:15:46.0 -0800
@@ -191,7 +191,7 @@ struct ip_mc_list
 #define IGMPV3_MASK(value, nb) ((nb)=32 ? (value) : ((1(nb))-1)  
(value))
 #define IGMPV3_EXP(thresh, nbmant, nbexp, value) \
((value)  (thresh) ? (value) : \
-((IGMPV3_MASK(value, nbmant) | (1(nbmant+nbexp)))  \
+((IGMPV3_MASK(value, nbmant) | (1(nbmant)))  \
  (IGMPV3_MASK((value)  (nbmant), nbexp) + (nbexp
 
 #define IGMPV3_QQIC(value) IGMPV3_EXP(0x80, 4, 3, value)



mc.patch
Description: Binary data


Re: [PATCH] IPv6: only modify checksum for UDP

2006-11-13 Thread David Stevens
David Miller [EMAIL PROTECTED] wrote on 11/13/2006 04:50:58 PM:
 
 Puzzling :-)  Then why is the transformation only performed for
 UDP in the ipv4 stack?  It seems by your logic TCP would need
 to either do the if (sum==0) sum=~0;  thing or it would need
 to accept both 0 and ~0 in the checksum checking path.

That's actually what I was suggesting. In 1's-complement,
~0 == -0 which is still 0, so barring any special case (like UDP's
0 means no checksum rule), it should be equally valid for a
packet to have 0 or ~0 as the checksum (with otherwise identical
data)-- they are both correct, and equal to each other. That
extra 1's-complement 0 is, of course, why UDP can have the
special case of remapping 0-~0.
Since the patch was for output-side, it doesn't matter
whether you remap 0 to ~0 or not (except for the special case),
but a receiver technically should allow either.
In practice, it won't matter if the machines you're talking
to are remapping (or not) in the same way. But 0 and -0 (aka ~0)
are still equal. :-)
It obviously doesn't come up much.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPv6: only modify checksum for UDP

2006-11-10 Thread David Stevens
Sorry, I saw this discussion a little late...

The Internet checksum is defined as a 1's-complement sum, so if  the
alternate 0 does not have a special meaning in a protocol, then by
1's-complement arithmetic, 0 == ~0.
So, it looks to me without the remapping that a valid checksum
may also fail, if it is simply computed in a different way (or on a 
different
architecture) such that one gets 0 and one gets ~0 as un-modified answers.
Since we're checking for equality on 2's-complement machines,
I think the easiest thing is to still re-map it. Otherwise, instead of 
testing
for 0, we have to test for both 0 and ~0 in the validity checks, right?

Disclaimer: I've just glanced through the discussion, so maybe
I've misunderstood the point or effect of the patch...

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7254] New: leaving IP multicast group with incorrect imr_address unexpectedly succeeded

2006-10-03 Thread David Stevens
If the index is set, it doesn't look up the address (which may be 
expensive).
If you want to look up by address, the index must be 0.

I wouldn't  call that a bug.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make ipv4 multicast packets only get delivered to sockets that are joined to group

2006-09-14 Thread David Stevens
Alexey Kuznetsov [EMAIL PROTECTED] wrote on 09/14/2006 03:30:37 AM:

 Hello!
 
  No, it returns 1 (allow) if there are no filters to explicitly
  filter it. I wrote that code. :-)
 
 I see. It did not behave this way old times.
 
 From your mails I understood that current behaviour matches another
 implementations (BSD whatever), is it true?

Hi, Alexey,

If you mean IPv6 code in current BSD derivatives, I don't know.

The IPv6 behaviour was different from IPv4 on Linux and was changed for
compatibility with IPv4 (discussion on netdev agreed that binding
should determine socket delivery, not group membership, or simply
that there was no reason to be different from long-standing IPv4 
practice).

The IPv4 code is that way for compatibility with everything else since
about ~4.3BSD (with the possible exception of Solaris 8, apparently).

FWIW, I think Deering's original interpretation is correct. Adding
a multicast address to an interface by joining a group is little
different from adding a unicast address via SIOCSIFADDR, which
certainly does affect packets delivered to the machine and to any
INADDR_ANY-bound socket. Binding to the multicast address and not
INADDR_ANY will give you only packets for that group, if that's
what you want, just as in the unicast address-specific bind.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make ipv4 multicast packets only get delivered to sockets that are joined to group

2006-09-13 Thread David Stevens
[EMAIL PROTECTED] wrote on 09/13/2006 07:13:55 AM:

 Only
 the socket that is bound to the group address to which the packet was
 sent should get it.

This is not true on any OS I'm aware of, including the
original sockets multicast implementation on early BSD.

Multicast group membership is per-interface, not per-socket.
Joining a group on any socket on the machine allows packets for that
group to be delivered on the interface where it was joined.

Delivery of packets to a socket is determined by the binding, and
INADDR_ANY means any.

IPv6 behaves the same way.
+-DLS


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make ipv4 multicast packets only get delivered to sockets that are joined to group

2006-09-13 Thread David Stevens
Jeff Layton [EMAIL PROTECTED] wrote on 09/13/2006 09:12:23 AM:

 Most of the RFC's I've looked at don't seem to have much to say about
 how multicasting works at the socket level. Is there an RFC or
 specification that spells this out, or is this one of those situations
 where things are somewhat open to interpretation?

RFC's are standards for protocols and are concerned with
interoperability among multiple implementations. They, in general,
do not define API's (and those that do are informational).
There are after-the-fact standards for sockets, including
POSIX, but older features are effectively defined by industry
practice; in this case, Deering's BSD implementation from nearly
20 years ago.
Sections 7.1  7.2 of RFC 988 [Deering, 1986] hint at this
interpretation (separation of membership from socket delivery),
esp. when it says Incoming multicast IP datagrams are
delivered to upper-layer protocol modules using the
same 'Receive IP' operation as normal, unicast
datagrams.
But Deering's own implementation, which
does not distinguish which socket joined the group,
is the source of the standard practice.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make ipv4 multicast packets only get delivered to sockets that are joined to group

2006-09-13 Thread David Stevens
Alexey Kuznetsov [EMAIL PROTECTED] wrote on 09/13/2006 01:20:29 PM:

 Hello!
 
  IPv6 behaves the same way.
 
 Actually, Linux IPv6 filters received multicasts, inet6_mc_check() does
 this.

No, it returns 1 (allow) if there are no filters to explicitly
filter it. I wrote that code. :-)
+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: multicast group memberships purge on interface delete

2006-08-23 Thread David Stevens
Michal,

 My question/suggestion:
 Would it feasible to drop the relevant entries from sockets' multicast 
 membership lists on the interface
 delete?

Yes, I think this is needed. The original BSD code didn't have 
this
problem because it didn't support removal of a device. I wondered for a
while whether an app should care whether the device was removed or not on
a leave, but I think it probably should get ENODEV in that case, just
as it would for any invalid interface index, whether or not it used to
be valid. So, simply removing the memberships and filters on device
destroy looks right to me.

I'm not sure that maintaining a per-device socket list is 
necessary,
since removal of a device is relatively unusual, and that may make locking
and reference counting more complicated (maybe not). Probably a simple
search of all UDP sockets with non-null multicast lists and matching
interface index is sufficient.
IGMPv3 is there fully, and uses a more efficient way of tracking
interface and socket filter intersections and unions than the obvious
way as suggested by the RFC. I don't think it is affected either way by
this.

I'll either do or review a patch for this, depending on who
gets to it first. :-)

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible leak of multicast source filter sctructure

2006-08-17 Thread David Stevens
Michal Ruzicka [EMAIL PROTECTED] wrote on 08/17/2006 05:26:35 
AM:

 I've reviewed your patch (the IPv4 part of it) and I think I can say 
that
 our soulutions are actually quite similar.
 But I can come up with one difference that I'd like know your opinion 
on.

If you have a duplicated address and the application is not using
interface index to identify the interface, then which one you join or 
leave
on is not unique, and which one you'll get is not defined. It is, in fact,
an error to leave the group on an interface with a different index, and
the duplicate scenario has effectively changed the meaning of that 
address's
interface.
In the same scenario, if you have not deleted any interfaces and
attempt to leave the group, your code may get the pppY interface from
ip_mc_find_dev(), not have a matching index for the group you actually
joined, and will return EADDRNOTAVAIL rather than leaving the group you
joined on.
In both cases, it'd simply be an error for an application to be
using address to identify the interface when that is not unique. If the
addresses are used (or reused) on multiple interfaces, the application
has to use interface index to work properly.

+-DLS


 Suppose the following situatuion:
 
 1) create pppX interface:
   IP: A.B.C.D
 
 2) join a multicast group by address: A.B.C.D
   ASSUMPTION: no other interface with the same IP address exists;
   the result should be that the group is joined on the pppX interface
 
 3) create pppY interface
   IP: A.B.C.D (Yes the same IP, not unusual on ppp links.)
 
 4) delete the pppX interace
 
 5) attempt to leave the multicast group by address: A.B.C.D
 
 6) * if your algorthim is used -EADDRNOTAVAIL is returned
* if mine is used the multicast group is left on the pppX interface
 
 Surely this won't be a common problem but I think it's worth pointing 
out
 here.
 
 Michal
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible leak of multicast source filter sctructure #3a

2006-08-16 Thread David Stevens
Michal,
I believe the patch I submitted yesterday fixes this
problem, and in a simpler way.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Re: Possible leak of multicast source filter sctructure

2006-08-15 Thread David Stevens
Here's a patch to fix source filter leakage when a device is removed
and a process leaves the group thereafter. (2nd part of prior discussion)

This patch also includes corresponding fixes for IPv6 multicast source
filters on device removal (1st  2nd parts of prior discussion, for IPv6).

+-DLS

[in-line for viewing, attached for applying]

Signed-off-by: David L Stevens [EMAIL PROTECTED]

diff -ruNp linux-2.6.17.8/net/ipv4/igmp.c 
linux-2.6.17.8-dls/net/ipv4/igmp.c
--- linux-2.6.17.8/net/ipv4/igmp.c  2006-08-06 21:18:54.0 
-0700
+++ linux-2.6.17.8-dls/net/ipv4/igmp.c  2006-08-15 16:30:18.0 
-0700
@@ -1796,29 +1796,35 @@ int ip_mc_leave_group(struct sock *sk, s
struct in_device *in_dev;
u32 group = imr-imr_multiaddr.s_addr;
u32 ifindex;
+   int ret = -EADDRNOTAVAIL;
 
rtnl_lock();
in_dev = ip_mc_find_dev(imr);
-   if (!in_dev) {
-   rtnl_unlock();
-   return -ENODEV;
-   }
ifindex = imr-imr_ifindex;
for (imlp = inet-mc_list; (iml = *imlp) != NULL; imlp = 
iml-next) {
-   if (iml-multi.imr_multiaddr.s_addr == group 
-   iml-multi.imr_ifindex == ifindex) {
-   (void) ip_mc_leave_src(sk, iml, in_dev);
+   if (iml-multi.imr_multiaddr.s_addr != group)
+   continue;
+   if (ifindex) {
+   if (iml-multi.imr_ifindex != ifindex)
+   continue;
+   } else if (imr-imr_address.s_addr  
imr-imr_address.s_addr !=
+   iml-multi.imr_address.s_addr)
+   continue;
+
+   (void) ip_mc_leave_src(sk, iml, in_dev);
 
-   *imlp = iml-next;
+   *imlp = iml-next;
 
+   if (in_dev)
ip_mc_dec_group(in_dev, group);
-   rtnl_unlock();
-   sock_kfree_s(sk, iml, sizeof(*iml));
-   return 0;
-   }
+   rtnl_unlock();
+   sock_kfree_s(sk, iml, sizeof(*iml));
+   return 0;
}
+   if (!in_dev)
+   ret = -ENODEV;
rtnl_unlock();
-   return -EADDRNOTAVAIL;
+   return ret;
 }
 
 int ip_mc_source(int add, int omode, struct sock *sk, struct
diff -ruNp linux-2.6.17.8/net/ipv6/mcast.c 
linux-2.6.17.8-dls/net/ipv6/mcast.c
--- linux-2.6.17.8/net/ipv6/mcast.c 2006-08-06 21:18:54.0 
-0700
+++ linux-2.6.17.8-dls/net/ipv6/mcast.c 2006-08-15 15:51:53.0 
-0700
@@ -269,13 +269,14 @@ int ipv6_sock_mc_drop(struct sock *sk, i
if ((dev = dev_get_by_index(mc_lst-ifindex)) != 
NULL) {
struct inet6_dev *idev = in6_dev_get(dev);
 
+   (void) ip6_mc_leave_src(sk, mc_lst, idev);
if (idev) {
-   (void) 
ip6_mc_leave_src(sk,mc_lst,idev);
__ipv6_dev_mc_dec(idev, 
mc_lst-addr);
in6_dev_put(idev);
}
dev_put(dev);
-   }
+   } else
+   (void) ip6_mc_leave_src(sk, mc_lst, NULL);
sock_kfree_s(sk, mc_lst, sizeof(*mc_lst));
return 0;
}
@@ -335,13 +336,14 @@ void ipv6_sock_mc_close(struct sock *sk)
if (dev) {
struct inet6_dev *idev = in6_dev_get(dev);
 
+   (void) ip6_mc_leave_src(sk, mc_lst, idev);
if (idev) {
-   (void) ip6_mc_leave_src(sk, mc_lst, idev);
__ipv6_dev_mc_dec(idev, mc_lst-addr);
in6_dev_put(idev);
}
dev_put(dev);
-   }
+   } else
+   (void) ip6_mc_leave_src(sk, mc_lst, NULL);
 
sock_kfree_s(sk, mc_lst, sizeof(*mc_lst));
 



mcd.patch
Description: Binary data


  1   2   >