Re: [IPV6]: Fix IPsec datagram fragmentation
I think the field name is kind of confusing here. It appears that local_df == 1 means allow fragmentation and local_df == 0 means don't fragment. But DF to me means don't fragment, as in IP_DF, so the field value is backwards from what I'd expect. And maybe that's what happened to in your patch, too, Herbert. For the future, maybe we should rename that, or reverse the sense of it (in v4 as well). :-) +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: broken link-local multicast?
Pekka, I first thought the interface might be down when I saw this one, except the interface route is present in the route list he shows later. That's normally deleted when the interface is down. And a failure to be in the group just wouldn't answer-- shouldn't cause net unreachable (which is a failure to send it). Marco, You called this a firewall -- do you get the same thing when you have no iptables rules? +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AW: Problem receiving multicast/promiscuous-mode with kernel.2.6.24
Another relevant piece of information is what the socket is bound to; netstat -au will tell you if you're bound to an address that will match the incoming multicast packets. If you suspect that the VLAN device is the problem, then it'd be a good idea to try it on ordinary ethernet. Re: imr_ifindex; using a 0 ifindex and INADDR_ANY interface is valid, but it usually doesn't make sense. What it means is I don't care what interface I join the group on. But you'll only receive multicasts delivered to the group on the interface the kernel picks for you, which often won't be the one you really want. +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipv6_chk_acast_addr remove unused loop
NACK. Daniel, This code is part of the in-kernel API for anycast, which is intended to be the same as for multicast. By removing support for NULL there, you're making a special case for the anycast code that isn't there in the multicast code (can't support a NULL dev), and you really aren't buying all that much for it. +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem receiving multicast/promiscuous-mode with kernel.2.6.24
You need to join the multicast group on the interface you want to receive it. If you're setting both imr_address and imr_index to 0, you haven't specified an interface. Instead of setting imr_ifindex to 0, try setting it like this: imr.imr_ifindex = if_nametoindex(eth0); (or whatever interface you need). You can verify that you've joined the group on the correct interface with netstat -g, also. Make sure that the interface field is correct for the group you want. And because I can't resist mentioning it, inet_aton is deprecated-- suggest inet_pton() with appropriate arguments and return value checking instead. There was a bug fix where the sense of promiscuous mode was backwards for multicasting, which is probably why it was accidently (and incorrectly) working in earlier kernels. If that doesn't do it for you, let me know; better if you can post the full code (or even better, the smallest subset that demonstrates the problem). +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] IPV6: RFC 2011 compatibility broken
RFC 2011 doesn't apply to IPv6, and the internal names of /proc entries are not used by the SNMP protocol, but it is an unintentional incompatibility with the previous Linux entry names, so I agree. :-) +-DLS Acked-by: David L Stevens [EMAIL PROTECTED] [EMAIL PROTECTED] wrote on 01/21/2008 01:46:44 AM: [IPV6]: RFC 2011 compatibility broken The snmp6 entry name was changed, and it broke compatibility to RFC 2011. Signed-off-by: Wang Chen [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ICMP: ICMP_MIB_OUTMSGS increment duplicated
Herbert Xu [EMAIL PROTECTED] wrote on 01/16/2008 03:49:01 AM: Actually having the icmp_out_count call in ip_push_pending_frames seems inconsistent. Having it there means that we count raw socket ICMP packets too. But we don't do that for any other protocol, e.g., raw UDP packets don't get counted. Herbert, The patch was to support the ICMPMsgStats table. Since none of certain types of output ICMP messages are generated by the kernel, but are required by the RFC, counting raw sockets is intentional (and the only way those ICMP types can be counted at all). Raw UDP packets would not be counted either before or after the patch, but aren't part of the ICMPMsgStats table. Adding those might be worthwhile, but it isn't quite the hole that the ICMP out stats were, since there is a cooked interface for UDP output that counts the common use, at least. Wang, I think your patch is correct; did you test the same case for IPv6? +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ICMP: ICMP_MIB_OUTMSGS increment duplicated
[EMAIL PROTECTED] wrote on 01/16/2008 03:17:29 PM: Fair enough. How about moving this code back into icmp.c and just add a new count call in raw.c? The push pending function is used on the UDP fast path so the leaner it is the better. I started out with it there, but it certainly wasn't cleaner. You have to peek on the queue, which I didn't want to do for all the SMP issues that brings in, and then replicate all the code push_pending_frames does to get you a protocol header pointer in one buffer, and there are multiple ways in the raw path to get you there (so the counter code itself would appear in multiple places). I don't think the 2 instructions measurably impact anything in the fast path and that is the earliest common point for the multiple paths to generate ICMP output where the header and type are available without replicating code. So, simplicity is exactly why I put it there, rather than the obvious places at the higher layer. :-) It's probably the better, single place to put the other protocol out messages counters, too, since those probably also have multiple paths that end up here, but ICMPMsgStats was all that patch was after. +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()
Brian, Looks good to me. +-DLS Acked-by: David L Stevens [EMAIL PROTECTED] How about the simple patch below? I just removed the ENINVAL check from my original patch, but it accomplishes the same thing. ... Signed-off-by: Brian Haley [EMAIL PROTECTED] --- diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c index 2ed689a..5d4245a 100644 --- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -123,11 +123,11 @@ ipv4_connected: goto out; } sk-sk_bound_dev_if = usin-sin6_scope_id; - if (!sk-sk_bound_dev_if - (addr_type IPV6_ADDR_MULTICAST)) -fl.oif = np-mcast_oif; } + if (!sk-sk_bound_dev_if (addr_type IPV6_ADDR_MULTICAST)) + sk-sk_bound_dev_if = np-mcast_oif; + /* Connect to link-local address requires an interface */ if (!sk-sk_bound_dev_if) { err = -EINVAL; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 7/9][NETNS][IPV6] make mld_max_msf per namespace
Daniel Lezcano [EMAIL PROTECTED] wrote on 01/03/2008 03:00:48 AM: ... With this solution, we can handle different values for the namespaces but these values are driven by the initial network namespace because their values are lesser or equal to the one from the initial network namespace. Is it acceptable ? Daniel, If you have the premise that there's a reason for them to be different, then your original implementation is fine already. It requires root privilege to change the value, so I don't mind the ability to raise it to a higher value later. I don't object, but I don't understand. I can't think of any circumstances where I would want to modify it per namespace. Making it small is not an effective restriction, since someone *wanting* to use lots of sources can simply do them on different sockets of the same group. The point is to catch accidental silly use and it's protecting a global resource so differing values just change the threshold at which you catch accidental silly use in different namespaces. Setting it to 0 might be a method of preventing its use entirely in some namespaces, but it's part of the socket interface-- disabling it isn't something you generally want to do, either. Are you intending to convert all variables to be per-namespace? If not -- that is, if you will have global sysctl variables, then I think this one should be one of those. Actually, all of the IGMP MLD variables are tied naturally to the shared interfaces, so should be global, I think. +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 7/9][NETNS][IPV6] make mld_max_msf per namespace
Daniel, I'm not sure what benefit you get from making this per-namespace. The point of it is really to prevent one (non-root, even) application from killing machine performance with source filters (because maintaining them is an n^2 algorithm). It's a weak constraint, but the resources it's protecting are the processor and MLDv2 packet counts. If any one namespace has a large value, all will have a problem still, and (even without your patch), lots of separate source filters can still cause a problem. What it catches is one application creating thousands (or millions) of source filters and killing the machine and network with MLDv2 reports as a result. Why shouldn't that remain global? +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()
Vlad Yasevich [EMAIL PROTECTED] wrote on 12/19/2007 07:20:53 AM: But this still requires either a SO_BINDTODEVICE or sin6_scope_id. This means the an application can call BINDTODEVICE(eth0), MULTICAST_IF(eth1) issue a connect on a UDP socket an succeed? Seems wrong to me. Can you check section 6.7 of RFC 3542. No, it requires one of SO_BINDTODEVICE, sin6_scope_id, or IPV6_MULTICAST_IF. If you do an SO_BINDTODEVICE(eth0) and then an IPV6_MULTICAST_IF(eth1), the IPV6_MULTICAST_IF will fail in setsockopt (EINVAL), because it requires a match for bound sockets. I'm not sure if SO_BINDTODEVICE resets mcast_oif if you do them in the reverse order, but that would be a bug in SO_BINDTODEVICE. The precedence order as implemented already is: SO_BINDTODEVICE is highest and always wins sin6_scope_id next IPV6_MULTICAST_IF and the existing code has the rule that all link-local addresses require a sin6_scope_id. The change (intended) is to relax the sin6_scope_id rule only for link-local multicasts that have done either an SO_BINDTODEVICE or IPV6_MULTICAST_IF already. +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()
Brian Haley [EMAIL PROTECTED] wrote on 12/19/2007 07:35:46 AM: ... if (usin-sin6_scope_id) sk-sk_bound_dev_if = usin-sin6_scope_id; if (!sk-sk_bound_dev_if (addr_type IPV6_ADDR_MULTICAST)) fl.oif = np-mcast_oif; This assignment will not get us past the next check... Yeah, that's what I get for typing in off-the-cuff code. What I was thinking was the fl.oif assignment instead was: if (!sk-sk_bound_dev_if (addr_type IPV6_ADDR_MULTICAST)) sk-sk_bound_dev_if = np-mcast_oif; Which it is not, but maybe it could be, since this is a connect(). That patch looks better, but I'm wondering if we could just remove the requirement that sin6_scope_id be set here if it's multicast, since it is doing the following later in the code: if (!fl.oif (addr_typeIPV6_ADDR_MULTICAST)) fl.oif = np-mcast_oif; So, really, all we need to do is get through the LINKLOCAL section without error in the multicast case and we can remove the redundant multicast check there. I think that'd be simpler. I also note that sin6_scope_id appears not to be honored at all in the non-linklocal case, which may be correct, but surprises me. I want to look a little more at this; I know you have a customer issue, so I'll make it quick. +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()
We would still have to check np-mcast_oif is set in the link-local case since we shouldn't be getting here with a zero. Actually, that's one of the things I wanted to look into. I'm not sure if there's a path through here with (even non-linklocal) multicasts that end up without an interface set. And I'd to do some testing of different cases to see if they behave as expected; also compare to the similar sendmsg() cases. Don't worry about that, they can wait, and I'm leaving for 10 days anyways... Good, thanks. It has been this way for years now, and I'm out too. :-) Enjoy, +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()
Brian Haley [EMAIL PROTECTED] wrote on 12/18/2007 12:57:54 PM: Trying to connect() to an IPv6 link-local multicast address by specifying the outgoing multicast interface doesn't work, you have to bind to a device first with an SO_BINDTODEVICE setsockopt() call. No, you simply have to specify sin6_scope_id for link-scope addresses, like you do in unicast cases. Your patch requires them to match (if specified), but I don't think IPV6_MULTICAST_IF should override or require a match for a valid sin6_scope_id (or be an error). If I read it correctly, the existing code uses IPV6_MULTICAST_IF if the sin6_scope_id is not set, otherwise honors the interface specified in the connect. That seems like correct behaviour to me, and RFC 3493 doesn't address the relative precedence of the two that I see. This is in the linklocal branch, and all unicast linklocal's require specifying sin6_scope_id. Multicast doesn't if require a scope_id in the case where you've done an IPV6_MULTICAST_IF, but it should still allow a different scope_id when you have used IPV6_MULTICAST_IF. Do you have application code that you believe is correct that doesn't work? +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()
Brian, OK, I see what you're trying to fix now. I think the scope_id checks are not quite right-- they should be something like this: if (addr_typeIPV6_ADDR_LINKLOCAL) { if (addr_len = sizeof(struct sockaddr_in6)) { if (sk-sk_bound_dev_if usin-sin6_scope_id sk-sk_bound_dev_if != usin-sin6_scope_id) { err = -EINVAL; goto out; } if (usin-sin6_scope_id) sk-sk_bound_dev_if = usin-sin6_scope_id; if (!sk-sk_bound_dev_if (addr_type IPV6_ADDR_MULTICAST)) fl.oif = np-mcast_oif; } /* connect to the link-local addres requires an interface */ if (!sk-sk_bound_dev_if) { err = -EINVAL; goto out; } } That is (in English): If I did an SO_BINDTODEVICE and specified sin6_scope_id, then they better agree. If I specified sin6_scope_id without SO_BINDTODEVICE, set the device to that. If I get this far without a device and it's multicast, use mcast_oif If I get all through that and don't have a device, EINVAL. Does that work for you? +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
all is not right with the world...
I don't know if you've fixed it already, but I was writing some UDPv6 tests in 2.6.23.9 and was getting EDESTADDRREQ unexpectedly. When I looked at the kernel code, I found: gerrit/wscott 1.115 | int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk, shemminger1.57|struct msghdr *msg, size_t len) torvalds 1.1 | { ... yoshfuji 1.53| } else if (!up-pending) { yoshfuji 1.53| if (sk-sk_state != TCP_ESTABLISHED) yoshfuji 1.53| return -EDESTADDRREQ; yoshfuji 1.53| daddr = np-daddr; I wouldn't expect a UDP socket to be setting sk_state to TCP_ESTABLISHED... +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: all is not right with the world...
Never mind... Though I hate the TCP in the name (maybe overload UDP_CONNECTED?), I see datagram sockets are in fact setting it, and my EDESTADDRREQ was a bug in my test program. :-) +-DLS [EMAIL PROTECTED] wrote on 12/09/2007 04:19:12 PM: I don't know if you've fixed it already, but I was writing some UDPv6 tests in 2.6.23.9 and was getting EDESTADDRREQ unexpectedly. When I looked at the kernel code, I found: gerrit/wscott 1.115 | int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk, shemminger1.57| struct msghdr *msg, size_t len) torvalds 1.1 | { ... yoshfuji 1.53| } else if (!up-pending) { yoshfuji 1.53| if (sk-sk_state != TCP_ESTABLISHED) yoshfuji 1.53| return -EDESTADDRREQ; yoshfuji 1.53| daddr = np-daddr; I wouldn't expect a UDP socket to be setting sk_state to TCP_ESTABLISHED... +-DLS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.25 1/4] include - Convert IP4 address class macros to inline functions
Maybe I'm more used to hex, but I personally don't think those are more readable. It replaces 1-line macroes with 4-line functions, and I think more vgrepping to pick out the relevant constant data. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/05] ipv6: RFC4214 Support
Pekka, Thanks! That answers the question I had (i.e., you believe the legal issues are resolved). +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/05] ipv6: RFC4214 Support
Last I heard, there are Intellectual Property claims with ISATAP, which is why the RFC is not standards track and which makes it effectively a proprietary protocol. Unless that's been resolved, I think the claim by the IP owner is that it can't be distributed without a license from them. So, maybe not worth the effort for an experimental RFC. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/05] ipv6: RFC4214 Support
I guess license is no longer required for implementers of ISATAP. Is it right, Fred? https://datatracker.ietf.org/ipr/550/ Does this also allow license-free redistribution? I'm certainly no lawyer, but I don't see the point of having a patent that doesn't restrict *something*. :-) +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/05] ipv6: RFC4214 Support
give it away on this specific instance. I'm not sure if you should attribute to hidden agendas what you can explain by doing the right thing (granted, very few companies do this which may make it suspect, but still..). Pekka, I'm not assuming hidden agendas here; I simply don't know what they mean by no license for implementers. It doesn't say they relinquish *all* licensing, which would be clearer if that's what they mean. If implementers, distributors, and users are included, then who's left that does need licensing? If that answer really is nobody, then why bother with for implementers.? So, I don't think it's a hidden agenda, I think they said what they mean. I just don't know what they mean. :-) +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuring the same IP on multiple addresses
[EMAIL PROTECTED] wrote on 10/29/2007 11:03:37 AM: So, I am looking for technical reasons why this is permitted. Vlad, Is there a technical reason to disallow it? Rather than anticipate all the possible uses for a machine, it's, of course, generally better to restrict only the things you know can't work, and allow everything else. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuring the same IP on multiple addresses
For v6, there are plenty of operational reasons to not allow this. You really turn unicast into anycast when you do this and there are special rules to be followed. I don't see it that way. The only problem I can think of offhand is that you can't use a multi-interface address to identify an interface (for example, for multicasting) and get predictable results (it'll pick the first one it finds with that address, in no particular order). But you can still use interface indexes, which are unique. Anycast is used for multiple distinct hosts, which isn't an issue on the same host. It's already true, as you pointed out, that you can receive a packet for any local address on any interface, so allowing multiple instances means you still match it as local. Which interface you match it on usually isn't relevant, and when it is are exactly the cases where using duplicates might be appropriate. I can see where it might be useful if you have policy restrictions on some interfaces and want the particular address to be both in and out of a set. But, I agree, it's generally more trouble (for an administrator), but then administrators don't have to assign the same address to multiple interfaces. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Fw: [PATCH] [IPv4] SNMP: Refer correct memory location to display ICMP out-going statistics
Dave, I didn't see a response for this one... in case it fell through the cracks. Just want to make sure my bone-headed error doesn't hang around too long. :-) +-DLS - Forwarded by David Stevens/Beaverton/IBM on 10/29/2007 01:51 PM - Mitsuru Chinen [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 10/25/2007 06:59 PM To netdev@vger.kernel.org cc David Stevens/Beaverton/[EMAIL PROTECTED] Subject [PATCH] [IPv4] SNMP: Refer correct memory location to display ICMP out-going statistics While displaying ICMP out-going statistics as Outname counters in /proc/net/snmp, the memory location for ICMP in-coming statistics was referred by mistake. Acked-by: David L Stevens [EMAIL PROTECTED] Signed-off-by: Mitsuru Chinen [EMAIL PROTECTED] --- net/ipv4/proc.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index fd16cb8..b7f7f8a 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -312,7 +312,7 @@ static void icmp_put(struct seq_file *seq) for (i=0; icmpmibmap[i].name != NULL; i++) seq_printf(seq, %lu, snmp_fold_field((void **) icmpmsg_statistics, - icmpmibmap[i].index)); + icmpmibmap[i].index | 0x100)); } /* -- 1.5.3.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] UDP: Make use of inet_iif() when doing socket lookups.
I don't see any problem with this. If you're using IP_MULTICAST_IF with a bound socket, it forces you to use that interface, anyway. I'm not sure multicasting works as expected in some other cases with bound sockets (looks like there may be some holes to evade the binding), but that's in old code. I don't see any problems from this patch. +-DLS Acked-by: David L Stevens [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Mark snmp4_icmp_list[] as being unused
What about just removing it, or do you think it's useful for documentation of the order? Either way, Acked-by: David L Stevens [EMAIL PROTECTED] +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Mark snmp4_icmp_list[] as being unused
[EMAIL PROTECTED] wrote on 10/25/2007 04:05:13 PM: David Stevens [EMAIL PROTECTED] wrote: What about just removing it, or do you think it's useful for documentation of the order? Either way, I don't know whether it's intended to be used for something, perhaps a debugging macro. Perhaps it should be #if'd out instead. It was used for /proc/net/snmp header printing, but most of the items in it were moved to a different MIB (by me). The new values are printed in a backward-compatible way, but maintaining the old order means not using this header map. So, long-winded way of saying I believe it should be removed, but I missed that when I made it obsolete. :-) +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] unexport icmpmsg_statistics
[EMAIL PROTECTED] wrote on 10/24/2007 09:24:10 AM: This patch removes the unused EXPORT_SYMBOL(icmpmsg_statistics). Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- 4ce74657ac0b1bdcb4c7bc359d05643f8cc4a08b diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 272c69e..233de06 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -1104,5 +1104,4 @@ void __init icmp_init(struct net_proto_family *ops) EXPORT_SYMBOL(icmp_err_convert); EXPORT_SYMBOL(icmp_send); EXPORT_SYMBOL(icmp_statistics); -EXPORT_SYMBOL(icmpmsg_statistics); EXPORT_SYMBOL(xrlim_allow); icmpmsg_statistics belongs with (and replaces some of the old...) icmp_statistics. I'm not sure that any modules use it, but I think you should remove both or neither. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] unexport icmpmsg_statistics
[EMAIL PROTECTED] wrote on 10/24/2007 12:14:37 PM: On Wed, Oct 24, 2007 at 12:07:45PM -0700, David Stevens wrote: [EMAIL PROTECTED] wrote on 10/24/2007 09:24:10 AM: This patch removes the unused EXPORT_SYMBOL(icmpmsg_statistics). Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- 4ce74657ac0b1bdcb4c7bc359d05643f8cc4a08b diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 272c69e..233de06 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -1104,5 +1104,4 @@ void __init icmp_init(struct net_proto_family *ops) EXPORT_SYMBOL(icmp_err_convert); EXPORT_SYMBOL(icmp_send); EXPORT_SYMBOL(icmp_statistics); -EXPORT_SYMBOL(icmpmsg_statistics); EXPORT_SYMBOL(xrlim_allow); icmpmsg_statistics belongs with (and replaces some of the old...) icmp_statistics. I'm not sure that any modules use it, but I think you should remove both or neither. icmp_statistics is used by the dccp_ipv4 and sctp modules. The only items left in icmp_statistics are InMsgs, InErrs, OutMsgs, OutErrs, so if dccp and sctp are sending or receiving any in or out ICMP messages, they should be using the new macros (which reference icmpmsg_statistics, not icmp_statistics) to count them. I took a quick look at SCTP. I don't know if that's going through icmp_rcv() or not; if so, I think it's double-counting; if not, then it isn't counting the individual types (as it should), and it should have ICMPMSG macros doing that. So, again, icmpmsg_statistics either should stay exported, or neither icmpmsg_statistics nor icmp_statistics should be exported (depending on how SCTP and DCCP code is resolved). It's incorrect in the current code to incrememnt ICMP_MIB_INMSGS without incrementing one of the types too. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] unexport icmpmsg_statistics
I took a look at the DCCP references, and I think they're just incrementing the wrong MIB variable -- e.g., it's incrementing ICMP_MIB_INERRORS when the skb length is less than the header indicates. That's not an ICMP_MIB_INERRORS error, that's an IPSTATS_MIB_INHDRERRORS error. ICMP_MIB_INERRORS is when you receive an ICMP error packet; an IP header error is something else entirely. That's followed by a failed lookup incrementing ICMP_MIB_INERRORS which should be an unknown port error in the transport MIB (assuming it has one-- it's not an ICMP error; could be an IP error, if the address isn't local, rather than unknown port). In SCTP, it appears to have similar problems. SCTP errors are not ICMP errors, though it perhaps should be calling icmp_send() to send one to the offending host for some of the cases. I haven't seen any ICMP-relevant stats correctly referenced in these yet. I don't want to patch them directly, since I can't easily test them; if someone who works with DCCP and SCTP would like to, I'd be happy to review. Any volunteers? +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] unexport icmpmsg_statistics
My bad -- I see what it's doing, and it looks ok after all. I thought I saw an INMSGS (but didn't). These are ICMP errors that went through icmp_rcv() and were counted correctly before getting to the protocol error handlers. These are failures due mostly to not having enough, or the right protocol info in the error packet being handled. I'm not sure I'd count those as ICMP errors, since the ICMP header itself is correct, but ok... SCTP doesn't look so bad, though I think the references are still questionable (but debatable) as ICMP errors. sctp_v4_err is incrementing ICMP_MIB_INERRORS if there isn't enough IP header to find the ports, I see. I'm not sure that counts as an ICMP error, but it's not so terrible. It's doing the same thing if a lookup fails to match vtag from the encapsulated error packet. Again, I don't know that those are ICMP errors (which normally are something wrong with the ICMP header). So, I stand corrected, and sorry about the histrionics. Since these are arguably ICMP errors, and since errors is the only thing being MIB-counted in DCCP and SCTP, then it now looks ok to me as-is, and also ok to remove icmpmsg_statistics from exporting. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: multicast: bug or feature
I don't know about a new knob, but it's the same notion as rp_filter, so why not use rpf for RTN_LOCAL types? Ie, allow RTN_LOCAL and RTN_UNICAST at the top, but check rpf if the devs aren't equal or RTN_LOCAL It seems like not a good thing to rely on in the first place, though; usually receiving things from yourself is a bad thing. :-) +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: multicast: bug or feature
From looking at the code, it appears that validate source is failing just because of the rp_filter. Do you have rp_filter set to nonzero? If so, it may do what you want just by setting that to 0: sysctl -w net.ipv4.conf.all.rp_filter=0 +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: multicast: bug or feature
I don't know why you'd want it to be different for multicasting. If you want to hear your own multicasts, you should use MULTICAST_LOOP; hearing them off the wire indicates all the same bad things -- a forger, a duplicate address or a routing loop. Those aren't any better for multicasting than they are for unicasting, that I can see. Why would you want that to be delivered, other than (apparently) to test the multicast capability of a pair of interfaces while using only 1 machine? +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: multicast: bug or feature
[EMAIL PROTECTED] wrote on 10/19/2007 04:43:27 AM: Vlad Yasevich [EMAIL PROTECTED] wrote: Now, to figure out what IPv6 does different and why it works. Seems to me that the two should have the same behavior. IPv6 on Linux uses a per-interface addressing model as opposed to the per-host model used by IPv4. For link-local addresses, yes. It's really a security feature; the ordinary case where you'd receive something on an interface that's using one of your source addresses is when someone is spoofing you, has a duplicate address, or maybe an (unintentional) routing loop. All of those are error cases, so dropping a received packet that claims to be sent by you is a reasonable thing to do. If you're getting link-local source addresses for your IPv6 multicast packets, that may explain it. The link-local addresses are required to be unique and valid only for that link, so IPv6 should not consider a different interface's link-local address as local for a destination address, or a packet using that source address as bogus. For a global address, v4 and v6 use the same rules-- for a destination you can receive it on any interface for any global address. So, if your source address was a global IPv6 address and it worked, I'd guess IPv6 just isn't checking the source address. I don't know that it's required by RFC for either v4 or v6, though it's probably a good idea. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix memory leak in cleanup_ipv6_mibs()
Acked-by: David L Stevens [EMAIL PROTECTED] Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index bc92938..1b1caf3 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -747,6 +747,7 @@ static void cleanup_ipv6_mibs(void) { snmp_mib_free((void **)ipv6_statistics); snmp_mib_free((void **)icmpv6_statistics); + snmp_mib_free((void **)icmpv6msg_statistics); snmp_mib_free((void **)udp_stats_in6); snmp_mib_free((void **)udplite_stats_in6); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: multicast: bug or feature
I'm not clear on your configuration. Are the sender and receiver running on the same machine? Are you saying eth0 and eth1 are connected on the same link? +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: multicast: bug or feature
Can you send the contents of /proc/net/igmp and the packet trace, also? And the code? +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: multicast: bug or feature
You're joining the group on interface eth1, which is the sender, right? You need to be a member on eth0 to receive it there. I think your program needs another argument, to specify the receiving interface, which you want to be different from the sending interface. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multicast problem
If you have icmp_echo_ignore_broadcasts set to 1, it won't respond to multicasts. That should be all you need to do for 224.0.0.1. Some smart switches rely on IGMP snooping to determine group membership, and some of those don't understand IGMPv3, so if you have no v1 or v2 queriers on the network, you can force the IGMP version to be 2 instead of the default 3 by using a sysctl. But that's not the issue for 224.0.0.1, since the all-hosts group is not reported via IGMP. You might check the contents of /proc/net/igmp and /proc/net/dev_mcast to make sure they have entries for the group. If so, I'd guess it might be a driver or device issue. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Multicast problem
hmm maybe I misunderstand here, but the we are trying to make OSPF work in 2.6 and the problem appears to be that our board does not pick up multicasts. Is icmp_echo_ignore_broadcasts=0 required to make OSPF work in 2.6? Jocke No, it's required to make ping to a multicast address work, which is what your mail said was the problem. :-) ping uses ICMP, OSPF does not. If you receive ping responses with ignore_broadcasts=0 and not in promiscuous mode, then the problem may be a smart switch that doesn't understand IGMPv3. In that case, you can try: sysctl -w net.ipv4.conf.all.force_igmp_version=2 You can also, of course, check /proc/net/igmp and /proc/net/dev_mcast to make sure you've joined the OSPF multicast address on the interface(s) you care about. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Multicast problem
dev_mcast and igmp looks: [EMAIL PROTECTED]:/proc/net# m dev_mcast 1eth01 0 01005e01 2eth11 0 01005e01 These are the hardware multicast addresses for 224.0.0.1 (so, correct). [EMAIL PROTECTED]:/proc/net# m igmp Idx Device: Count Querier GroupUsers TimerReporter 1 eth0 : 1 V3 E001 1 0: 0 2 eth1 : 1 V2 E001 1 0: 0 4 lo: 0 V3 E001 1 0: 0 ...and these are the group memberships for 224.0.0.1, so also correct. This should receive and answer pings to 224.0.0.1 as long as you don't have ICMP set to ignore broadcasts. You should not need to put the device in promiscuous mode for a ping of 224.0.0.1 to work, so if you do, then you have a driver, device or switch problem. None of these interfaces have memberships in the OSPF groups, so if you had OSPF running when you did this, it didn't join. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2
Acked-by: David L Stevens [EMAIL PROTECTED] Signed-off-by: Brian Haley [EMAIL PROTECTED] diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 532425d..1334fc1 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -539,12 +539,15 @@ done: case IPV6_MULTICAST_IF: if (sk-sk_type == SOCK_STREAM) goto e_inval; - if (sk-sk_bound_dev_if sk-sk_bound_dev_if != val) - goto e_inval; - if (__dev_get_by_index(init_net, val) == NULL) { - retv = -ENODEV; - break; + if (val) { + if (sk-sk_bound_dev_if sk-sk_bound_dev_if != val) +goto e_inval; + + if (__dev_get_by_index(init_net, val) == NULL) { +retv = -ENODEV; +break; + } } np-mcast_oif = val; retv = 0; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bytes received from recvmsg doesn't match FIONREAD
Questions of this sort should normally be directed to linux-net mailing list. From the code you quoted, I see at least one case where it will fail -- when the allocated buffer you pass to recvmsg is smaller than value (ie. the datagram is too big for the read buffer). If that's not the problem, I suggest you reproduce it with a small test case and post real code and results when it happens. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493
What about just checking for 0 in the later test? if (val __dev_get_by_index(val) == NULL) { ... +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493
Brian Haley [EMAIL PROTECTED] wrote on 10/10/2007 02:20:45 PM: David Stevens wrote: What about just checking for 0 in the later test? if (val __dev_get_by_index(val) == NULL) { We could fail the next check right before that though: Right, the semantics there would be if we have a bound dev if, that's the only legal value here. Setting it to '0' in that case doesn't really do anythng, anyway. But I don't care about that semantic difference-- could even add val to the bound_dev_if check. What I don't like is that your if creates an identical duplicate code path for the functional part of it. In this case it's trivial (the asignment), but makes the code look more complex than it really is. If v4 does it that way, I don't like that either. :-) I agree with it in general, and may not be worth the trouble, but I'd personally prefer something like: if (sk-sk_type == SOCK_STREAM) goto e_inval; if (val sk-sk_bound_dev_if sk-sk_bound_dev_if != val) goto e_inval; if (val __dev_get_by_index(val) != NULL) { retv = -ENODEV; break; } [at this point all validity checks are done and we're following one code path to do the work; each check is easily identifiable.] np-mcast_oif = val; retv = 0; break; Or maybe: if (sk-sk_type == SOCK_STREAM) goto e_inval; if (val) { if (sk-sk_bound_dev_if sk-sk_bound_dev_if != val) goto e_inval; if (__dev_get_by_index(val != NULL) { retv = -ENODEV; break; } } np-mcast_oif = val; retv = 0; break; But anyway, I made the comment; I think some form of it should go in. :-) If you like the original better, that's ok with me, too. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NETNS] Make ifindex generation per-namespace
Sorry if this is a dumb question, but what is the model you intend for SNMP? Do you want each namespace to be its own virtual machine with its own, separate MIB? Ifindex's have to uniquely identify the interface (virtual or otherwise) to remote queriers (not just local applications), so unless you pay the price of separating all the SNMP MIBs per namespace too, it seems you'll need some way to remap these for SNMP queries, right? +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fallback to ipv4 if we try to add join IPv4 multicast group via ipv4-mapped address.
Dmitry, Good catch; a couple comments: struct ipv6_pinfo *np = inet6_sk(sk); int err; + int addr_type = ipv6_addr_type(addr); + + if (addr_type == IPV6_ADDR_MAPPED) { + __be32 v4addr = addr-s6_addr32[3]; + struct ip_mreqn mreq; + mreq.imr_multiaddr.s_addr = v4addr; + mreq.imr_address.s_addr = INADDR_ANY; + mreq.imr_ifindex = ifindex; + + return ip_mc_join_group(sk, mreq); + } ipv6_addr_type() returns a bitmask, so you should use: if (addr_type IPV6_ADDR_MAPPED) { Also, you should have a blank line after the mreq declaration. Ditto for both in ipv6_mc_sock_drop(). I don't expect the multicast source filtering interface will behave well for mapped addresses, either. The mapped multicast address won't appear to be a multicast address (and return error there), and all the source filters would have to be v4mapped addresses and modify the v4 source filters for this to do as you expect. So, there's more to it (and it may be a bit messy) to support mapped multicast addresses fully. I'll think about that part some more. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] Fix ICMPv6 redirect handling with target multicast address
Brian, ipv6_addr_type() returns a mask, so checking for equality will fail to match if any other (irrelevant) attributes are set. How about using bitwise operators for that? Also, the error message is no longer descriptive of the failure if it's a link-local multicast, but you could make it target address is not link-local unicast.\n (in both places). +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] Fix ICMPv6 redirect handling with target multicast address
Brian, I don't think a few instructions is a performance issue in the redirect paths (it'd be pretty broken if you're getting or generating lots of them), but I know there are lots of other checks similar to that that will break with new attributes, so doing that as a general clean-up separately is ok with me, too. With the error message changes, you can add: Acked-by: David L Stevens [EMAIL PROTECTED] FWIW. :-) +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPV6] Fix ICMPv6 redirect handling with target multicast address
Brian, A multicast address should never be the target of a neighbor discovery request; the sender should use the mapping function for all multicasts. So, I'm not sure that your example can ever happen, and it certainly is ok to send ICMPv6 errors to multicast addresses in general. But I don't see that it hurts anything. either (since it should never happen :-)), so I don't particularly object, either. I think it'd also be better if you add the check to be: if (ipv6_addr_type(target) (IPV6_ADDR_LINKLOCAL|IPV6_ADDR_UNICAST)) or something along those lines, rather than reproducing ipv6_addr_type() code separately in a new ipv6_addr_linklocal() function. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RESEND] 2.6.22.6 networking [ipv4]: fix wrong destination when reply packetes
I'm not sure why it's using rt_src here, but there are relevant cases that your description doesn't cover. For example, what happens if the source is not set in the original packet? Does NAT affect this? You quote RFC text for ICMP echo and the case where the receiving machine is the final destination, but you're modifying code that is used for all ICMP types and used for ICMP errors generated when acting as an intermediate router. In ordinary cases, and certainly with ICMP echo when the source is set in the original packet and no rewriting is going on (and the address is not spoofed), using the original source as the destination is fine. But have you tested or considered the other cases? +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][1/2] Add ICMPMsgStats MIB (RFC 4293) [RESEND]
Dave, Thanks. That rev2 was for v6-only; I didn't see anythng about the v4 patch (below, in case it fell through the cracks). +-DLS - Forwarded by David Stevens/Beaverton/IBM on 09/16/2007 09:02 PM - David Stevens/Beaverton/[EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 09/10/2007 07:25 PM To [EMAIL PROTECTED], [EMAIL PROTECTED] cc netdev@vger.kernel.org Subject [PATCH][1/2] Add ICMPMsgStats MIB (RFC 4293) Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches remove (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add IcmpMsg with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for Icmp to get the named data from new counters. IPv4 patch attached, IPv6 patch to follow. +-DLS Signed-off-by: David L Stevens [EMAIL PROTECTED] diff -ruNp linux-2.6.22.5/include/linux/snmp.h linux-2.6.22.5_ICMPMSG/include/linux/snmp.h --- linux-2.6.22.5/include/linux/snmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/include/linux/snmp.h 2007-08-23 15:32:29.0 -0700 @@ -82,6 +82,8 @@ enum __ICMP_MIB_MAX }; +#define __ICMPMSG_MIB_MAX 512 /* Out+In for all 8-bit ICMP types */ + /* icmp6 mib definitions */ /* * RFC 2466: ICMPv6-MIB diff -ruNp linux-2.6.22.5/include/net/icmp.h linux-2.6.22.5_ICMPMSG/include/net/icmp.h --- linux-2.6.22.5/include/net/icmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/include/net/icmp.h 2007-08-23 15:56:45.0 -0700 @@ -30,9 +30,16 @@ struct icmp_err { extern struct icmp_err icmp_err_convert[]; DECLARE_SNMP_STAT(struct icmp_mib, icmp_statistics); +DECLARE_SNMP_STAT(struct icmpmsg_mib, icmpmsg_statistics); #define ICMP_INC_STATS(field) SNMP_INC_STATS(icmp_statistics, field) #define ICMP_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmp_statistics, field) #define ICMP_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmp_statistics, field) +#define ICMPMSGOUT_INC_STATS(field)SNMP_INC_STATS(icmpmsg_statistics, field+256) +#define ICMPMSGOUT_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmpmsg_statistics, field+256) +#define ICMPMSGOUT_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmpmsg_statistics, field+256) +#define ICMPMSGIN_INC_STATS(field) SNMP_INC_STATS(icmpmsg_statistics, field) +#define ICMPMSGIN_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmpmsg_statistics, field) +#define ICMPMSGIN_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmpmsg_statistics, field) struct dst_entry; struct net_proto_family; @@ -42,6 +49,7 @@ extern void icmp_send(struct sk_buff *sk extern int icmp_rcv(struct sk_buff *skb); extern int icmp_ioctl(struct sock *sk, int cmd, unsigned long arg); extern voidicmp_init(struct net_proto_family *ops); +extern voidicmp_out_count(unsigned char type); /* Move into dst.h ? */ extern int xrlim_allow(struct dst_entry *dst, int timeout); diff -ruNp linux-2.6.22.5/include/net/snmp.h linux-2.6.22.5_ICMPMSG/include/net/snmp.h --- linux-2.6.22.5/include/net/snmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/include/net/snmp.h 2007-08-23 14:42:50.0 -0700 @@ -82,6 +82,11 @@ struct icmp_mib { unsigned long mibs[ICMP_MIB_MAX]; } __SNMP_MIB_ALIGN__; +#define ICMPMSG_MIB_MAX__ICMPMSG_MIB_MAX +struct icmpmsg_mib { + unsigned long mibs[ICMPMSG_MIB_MAX]; +} __SNMP_MIB_ALIGN__; + /* ICMP6 (IPv6-ICMP) */ #define ICMP6_MIB_MAX __ICMP6_MIB_MAX struct icmpv6_mib { diff -ruNp linux-2.6.22.5/net/ipv4/af_inet.c linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c --- linux-2.6.22.5/net/ipv4/af_inet.c 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c 2007-08-23 14:47:26.0 -0700 @@ -1296,6 +1296,10 @@ static int __init init_ipv4_mibs(void) sizeof(struct icmp_mib), __alignof__(struct icmp_mib)) 0) goto err_icmp_mib; + if (snmp_mib_init((void **)icmpmsg_statistics, + sizeof(struct icmpmsg_mib), + __alignof__(struct icmpmsg_mib)) 0) + goto err_icmpmsg_mib; if (snmp_mib_init((void **)tcp_statistics, sizeof(struct
Fw: [PATCH] Add ICMPMsgStats MIB (RFC 4293) [rev 2]
Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches remove (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add IcmpMsg with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for Icmp to get the named data from new counters. [new to 2nd revision] 6) support per-interface ICMP stats 7) use common macro for per-device stat macros IPv6 patch attached. +-DLS Signed-off-by: David L Stevens [EMAIL PROTECTED] diff -ruNp linux-2.6.22.5/include/linux/snmp.h linux-2.6.22.5_ICMPv6MSG2/include/linux/snmp.h --- linux-2.6.22.5/include/linux/snmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPv6MSG2/include/linux/snmp.h 2007-09-10 15:02:43.0 -0700 @@ -91,35 +91,12 @@ enum ICMP6_MIB_NUM = 0, ICMP6_MIB_INMSGS, /* InMsgs */ ICMP6_MIB_INERRORS, /* InErrors */ - ICMP6_MIB_INDESTUNREACHS, /* InDestUnreachs */ - ICMP6_MIB_INPKTTOOBIGS, /* InPktTooBigs */ - ICMP6_MIB_INTIMEEXCDS, /* InTimeExcds */ - ICMP6_MIB_INPARMPROBLEMS, /* InParmProblems */ - ICMP6_MIB_INECHOS, /* InEchos */ - ICMP6_MIB_INECHOREPLIES,/* InEchoReplies */ - ICMP6_MIB_INGROUPMEMBQUERIES, /* InGroupMembQueries */ - ICMP6_MIB_INGROUPMEMBRESPONSES, /* InGroupMembResponses */ - ICMP6_MIB_INGROUPMEMBREDUCTIONS,/* InGroupMembReductions */ - ICMP6_MIB_INROUTERSOLICITS, /* InRouterSolicits */ - ICMP6_MIB_INROUTERADVERTISEMENTS, /* InRouterAdvertisements */ - ICMP6_MIB_INNEIGHBORSOLICITS, /* InNeighborSolicits */ - ICMP6_MIB_INNEIGHBORADVERTISEMENTS, /* InNeighborAdvertisements */ - ICMP6_MIB_INREDIRECTS, /* InRedirects */ ICMP6_MIB_OUTMSGS, /* OutMsgs */ - ICMP6_MIB_OUTDESTUNREACHS, /* OutDestUnreachs */ - ICMP6_MIB_OUTPKTTOOBIGS,/* OutPktTooBigs */ - ICMP6_MIB_OUTTIMEEXCDS, /* OutTimeExcds */ - ICMP6_MIB_OUTPARMPROBLEMS, /* OutParmProblems */ - ICMP6_MIB_OUTECHOREPLIES, /* OutEchoReplies */ - ICMP6_MIB_OUTROUTERSOLICITS,/* OutRouterSolicits */ - ICMP6_MIB_OUTNEIGHBORSOLICITS, /* OutNeighborSolicits */ - ICMP6_MIB_OUTNEIGHBORADVERTISEMENTS,/* OutNeighborAdvertisements */ - ICMP6_MIB_OUTREDIRECTS, /* OutRedirects */ - ICMP6_MIB_OUTGROUPMEMBRESPONSES,/* OutGroupMembResponses */ - ICMP6_MIB_OUTGROUPMEMBREDUCTIONS, /* OutGroupMembReductions */ __ICMP6_MIB_MAX }; +#define __ICMP6MSG_MIB_MAX 512 /* Out+In for all 8-bit ICMPv6 types */ + /* tcp mib definitions */ /* * RFC 1213: MIB-II TCP group diff -ruNp linux-2.6.22.5/include/net/if_inet6.h linux-2.6.22.5_ICMPv6MSG2/include/net/if_inet6.h --- linux-2.6.22.5/include/net/if_inet6.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPv6MSG2/include/net/if_inet6.h2007-09-12 14:11:49.0 -0700 @@ -154,6 +154,7 @@ struct ipv6_devstat { struct proc_dir_entry *proc_dir_entry; DEFINE_SNMP_STAT(struct ipstats_mib, ipv6); DEFINE_SNMP_STAT(struct icmpv6_mib, icmpv6); + DEFINE_SNMP_STAT(struct icmpv6msg_mib, icmpv6msg); }; struct inet6_dev diff -ruNp linux-2.6.22.5/include/net/ipv6.h linux-2.6.22.5_ICMPv6MSG2/include/net/ipv6.h --- linux-2.6.22.5/include/net/ipv6.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPv6MSG2/include/net/ipv6.h2007-09-13 15:09:28.0 -0700 @@ -111,45 +111,28 @@ struct frag_hdr { extern int sysctl_ipv6_bindv6only; extern int sysctl_mld_max_msf; -/* MIBs */ -DECLARE_SNMP_STAT(struct ipstats_mib, ipv6_statistics); -#define IP6_INC_STATS(idev,field) ({ \ +#define _DEVINC(statname, modifier, idev, field) \ +({ \ struct inet6_dev *_idev = (idev); \ if (likely(_idev != NULL)) \ -
Re: [PATCH][2/2] Add ICMPMsgStats MIB (RFC 4293)
Yoshifuji Hideaki [EMAIL PROTECTED] wrote on 09/11/2007 01:50:53 AM: Dave, we've been supporting per-interface stats for IPv6, and you seem to remove them. Please keep them. Thank you. The reason I didn't for ICMPMsgStats is the size. The RFC requires in out counters for all types, whether or not they are known by the OS, so that's 512 stats. Because the MIB variables in the existing infrastructure are not dynamically allocated, that's: 512 * numInterfaces * numCPUs * 2protos That seems like a lot of mostly-zero memory. Changing them to be run-time allocated (and re-allocated when new types are seen) is another alternative, but more significant changes. Memory is Cheap. -- Doug Comer So maybe it's not so bad -- I'll roll another per-interface version to see. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][1/2] Add ICMPMsgStats MIB (RFC 4293)
Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches remove (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add IcmpMsg with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for Icmp to get the named data from new counters. IPv4 patch attached, IPv6 patch to follow. +-DLS Signed-off-by: David L Stevens [EMAIL PROTECTED] diff -ruNp linux-2.6.22.5/include/linux/snmp.h linux-2.6.22.5_ICMPMSG/include/linux/snmp.h --- linux-2.6.22.5/include/linux/snmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/include/linux/snmp.h 2007-08-23 15:32:29.0 -0700 @@ -82,6 +82,8 @@ enum __ICMP_MIB_MAX }; +#define __ICMPMSG_MIB_MAX 512 /* Out+In for all 8-bit ICMP types */ + /* icmp6 mib definitions */ /* * RFC 2466: ICMPv6-MIB diff -ruNp linux-2.6.22.5/include/net/icmp.h linux-2.6.22.5_ICMPMSG/include/net/icmp.h --- linux-2.6.22.5/include/net/icmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/include/net/icmp.h 2007-08-23 15:56:45.0 -0700 @@ -30,9 +30,16 @@ struct icmp_err { extern struct icmp_err icmp_err_convert[]; DECLARE_SNMP_STAT(struct icmp_mib, icmp_statistics); +DECLARE_SNMP_STAT(struct icmpmsg_mib, icmpmsg_statistics); #define ICMP_INC_STATS(field) SNMP_INC_STATS(icmp_statistics, field) #define ICMP_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmp_statistics, field) #define ICMP_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmp_statistics, field) +#define ICMPMSGOUT_INC_STATS(field)SNMP_INC_STATS(icmpmsg_statistics, field+256) +#define ICMPMSGOUT_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmpmsg_statistics, field+256) +#define ICMPMSGOUT_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmpmsg_statistics, field+256) +#define ICMPMSGIN_INC_STATS(field) SNMP_INC_STATS(icmpmsg_statistics, field) +#define ICMPMSGIN_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmpmsg_statistics, field) +#define ICMPMSGIN_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmpmsg_statistics, field) struct dst_entry; struct net_proto_family; @@ -42,6 +49,7 @@ extern void icmp_send(struct sk_buff *sk extern int icmp_rcv(struct sk_buff *skb); extern int icmp_ioctl(struct sock *sk, int cmd, unsigned long arg); extern voidicmp_init(struct net_proto_family *ops); +extern voidicmp_out_count(unsigned char type); /* Move into dst.h ? */ extern int xrlim_allow(struct dst_entry *dst, int timeout); diff -ruNp linux-2.6.22.5/include/net/snmp.h linux-2.6.22.5_ICMPMSG/include/net/snmp.h --- linux-2.6.22.5/include/net/snmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/include/net/snmp.h 2007-08-23 14:42:50.0 -0700 @@ -82,6 +82,11 @@ struct icmp_mib { unsigned long mibs[ICMP_MIB_MAX]; } __SNMP_MIB_ALIGN__; +#define ICMPMSG_MIB_MAX__ICMPMSG_MIB_MAX +struct icmpmsg_mib { + unsigned long mibs[ICMPMSG_MIB_MAX]; +} __SNMP_MIB_ALIGN__; + /* ICMP6 (IPv6-ICMP) */ #define ICMP6_MIB_MAX __ICMP6_MIB_MAX struct icmpv6_mib { diff -ruNp linux-2.6.22.5/net/ipv4/af_inet.c linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c --- linux-2.6.22.5/net/ipv4/af_inet.c 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c 2007-08-23 14:47:26.0 -0700 @@ -1296,6 +1296,10 @@ static int __init init_ipv4_mibs(void) sizeof(struct icmp_mib), __alignof__(struct icmp_mib)) 0) goto err_icmp_mib; + if (snmp_mib_init((void **)icmpmsg_statistics, + sizeof(struct icmpmsg_mib), + __alignof__(struct icmpmsg_mib)) 0) + goto err_icmpmsg_mib; if (snmp_mib_init((void **)tcp_statistics, sizeof(struct tcp_mib), __alignof__(struct tcp_mib)) 0) @@ -1318,6 +1322,8 @@ err_udplite_mib: err_udp_mib: snmp_mib_free((void **)tcp_statistics); err_tcp_mib: + snmp_mib_free((void **)icmpmsg_statistics); +err_icmpmsg_mib: snmp_mib_free((void **)icmp_statistics); err_icmp_mib: snmp_mib_free((void **)ip_statistics); diff -ruNp linux-2.6.22.5/net/ipv4/icmp.c linux-2.6.22.5_ICMPMSG/net/ipv4/icmp.c --- linux-2.6.22.5/net/ipv4/icmp.c 2007-08-22
[PATCH][2/2] Add ICMPMsgStats MIB (RFC 4293)
Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches remove (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add IcmpMsg with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for Icmp to get the named data from new counters. IPv6 patch attached. +-DLS Signed-off-by: David L Stevens [EMAIL PROTECTED] diff -ruNp linux-2.6.22.5/include/linux/snmp.h linux-2.6.22.5_ICMPv6MSG/include/linux/snmp.h --- linux-2.6.22.5/include/linux/snmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPv6MSG/include/linux/snmp.h 2007-09-10 15:02:43.0 -0700 @@ -91,35 +91,12 @@ enum ICMP6_MIB_NUM = 0, ICMP6_MIB_INMSGS, /* InMsgs */ ICMP6_MIB_INERRORS, /* InErrors */ - ICMP6_MIB_INDESTUNREACHS, /* InDestUnreachs */ - ICMP6_MIB_INPKTTOOBIGS, /* InPktTooBigs */ - ICMP6_MIB_INTIMEEXCDS, /* InTimeExcds */ - ICMP6_MIB_INPARMPROBLEMS, /* InParmProblems */ - ICMP6_MIB_INECHOS, /* InEchos */ - ICMP6_MIB_INECHOREPLIES,/* InEchoReplies */ - ICMP6_MIB_INGROUPMEMBQUERIES, /* InGroupMembQueries */ - ICMP6_MIB_INGROUPMEMBRESPONSES, /* InGroupMembResponses */ - ICMP6_MIB_INGROUPMEMBREDUCTIONS,/* InGroupMembReductions */ - ICMP6_MIB_INROUTERSOLICITS, /* InRouterSolicits */ - ICMP6_MIB_INROUTERADVERTISEMENTS, /* InRouterAdvertisements */ - ICMP6_MIB_INNEIGHBORSOLICITS, /* InNeighborSolicits */ - ICMP6_MIB_INNEIGHBORADVERTISEMENTS, /* InNeighborAdvertisements */ - ICMP6_MIB_INREDIRECTS, /* InRedirects */ ICMP6_MIB_OUTMSGS, /* OutMsgs */ - ICMP6_MIB_OUTDESTUNREACHS, /* OutDestUnreachs */ - ICMP6_MIB_OUTPKTTOOBIGS,/* OutPktTooBigs */ - ICMP6_MIB_OUTTIMEEXCDS, /* OutTimeExcds */ - ICMP6_MIB_OUTPARMPROBLEMS, /* OutParmProblems */ - ICMP6_MIB_OUTECHOREPLIES, /* OutEchoReplies */ - ICMP6_MIB_OUTROUTERSOLICITS,/* OutRouterSolicits */ - ICMP6_MIB_OUTNEIGHBORSOLICITS, /* OutNeighborSolicits */ - ICMP6_MIB_OUTNEIGHBORADVERTISEMENTS,/* OutNeighborAdvertisements */ - ICMP6_MIB_OUTREDIRECTS, /* OutRedirects */ - ICMP6_MIB_OUTGROUPMEMBRESPONSES,/* OutGroupMembResponses */ - ICMP6_MIB_OUTGROUPMEMBREDUCTIONS, /* OutGroupMembReductions */ __ICMP6_MIB_MAX }; +#define __ICMP6MSG_MIB_MAX 512 /* Out+In for all 8-bit ICMPv6 types */ + /* tcp mib definitions */ /* * RFC 1213: MIB-II TCP group diff -ruNp linux-2.6.22.5/include/net/ipv6.h linux-2.6.22.5_ICMPv6MSG/include/net/ipv6.h --- linux-2.6.22.5/include/net/ipv6.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPv6MSG/include/net/ipv6.h 2007-09-10 15:41:19.0 -0700 @@ -132,6 +132,7 @@ DECLARE_SNMP_STAT(struct ipstats_mib, ip SNMP_INC_STATS_USER(ipv6_statistics, field);\ }) DECLARE_SNMP_STAT(struct icmpv6_mib, icmpv6_statistics); +DECLARE_SNMP_STAT(struct icmpv6msg_mib, icmpv6msg_statistics); #define ICMP6_INC_STATS(idev, field) ({ \ struct inet6_dev *_idev = (idev); \ if (likely(_idev != NULL)) \ @@ -157,6 +158,14 @@ DECLARE_SNMP_STAT(struct icmpv6_mib, icm SNMP_INC_STATS_OFFSET_BH(_idev-stats.icmpv6, field, _offset); \ SNMP_INC_STATS_OFFSET_BH(icmpv6_statistics, field, _offset); \ }) + +#define ICMP6MSGOUT_INC_STATS(field) SNMP_INC_STATS(icmpv6msg_statistics, field+256) +#define ICMP6MSGOUT_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmpv6msg_statistics, field+256) +#define ICMP6MSGOUT_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmpv6msg_statistics, field+256) +#define ICMP6MSGIN_INC_STATS(field) SNMP_INC_STATS(icmpv6msg_statistics, field) +#define ICMP6MSGIN_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmpv6msg_statistics, field) +#define ICMP6MSGIN_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmpv6msg_statistics,
Re: RFC: issues concerning the next NAPI interface
Stephen Hemminger [EMAIL PROTECTED] wrote on 08/24/2007 08:52:03 AM: You need hardware support for deferred interrupts. Most devices have it (e1000, sky2, tg3) and it interacts well with NAPI. It is not a generic thing you want done by the stack, you want the hardware to hold off interrupts until X packets or Y usecs have expired. For generic hardware that doesn't support it, couldn't you use an estimater and adjust the timer dynamicly in software based on sampled values? Switch to per-packet interrupts when the receive rate is low... Actually, that's how I thought NAPI worked before I found out otherwise (ie, before I looked :-)). The hardware-accelerated one is essentially siloing as done by ancient serial devices on UNIX systems. If you had a tunable for a target count, and an estimator for the time interval, then switch to per-packet when the estimator exceeds a tunable max threshold (and also, I suppose, if you near overflowing the ring on the min timer granularity), you get almost all of it, right? Problem is if it increases rapidly, you may drop packets before you notice that the ring is full in the current estimated interval. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] Add new timeval_to_sec function
Oliver Hartkopp [EMAIL PROTECTED] wrote on 07/23/2007 11:22:39 PM: When you like to create any timeout based on your calculated value, you might run into the problem that your calculated value is set to _zero_ even if there was some time before the conversion. This might probably not what you indented to get. BTW, these are stats (for human consumption), not timers. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel module to list sockets with their multicast group joins and respective processes
sockets that join different groups receive messages from the respective other group (if they are only bound to the wildcard address). Obviously this is handled differently in Linux for IPv4, where the socket matching for incoming message is done solely on the 4-tuple of addresses and ports, and IPv6, where the explicit socket joins are taken into account [1,2]. These are not different in v4 vs v6. Group membership is per-interface, delivery depends on the binding (for both v4 and v6), and this is the same as BSD behavior since IP multicasting was invented. There are two levels of checks, and I think you're confusing them. inet6_mc_check() is checking that somebody (*anybody*) has joined the group on the receiving interface, before delivering the packet to IP. Delivery to a particular socket depends on source filter settings, if they are in use, and bindings (only). The socket has a list of joined addresses to track source filters and to check for correct join/leave (a socket can't leave a group it hasn't joined), but that list does not determine delivery of a multicast packet to the socket when source filtering is not in use. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPv6: optionaly validate RAs on raw sockets
I think #2 in your list is the right choice, and that has nothing to do with adding a non-standard option (which I completely agree is a bad idea). It looked like you're just checking if the machine is acting as a router or not and if it comes from a link-local address; is that right? Of course, lots of apps already check for am I a router and they don't require a new socket option. (!) See everything in the quagga package, for example. And checking the address type in a app is trivial. The previous discussion about validation was talking about RA's that are forged, so don't pass IPsec authentication checks. I don't see any reason at all to deliver those to an application (ever), so no non-standard socket option required there. I don't know if those are currently delivered on raw sockets or not, but if they are, I think it's reasonable to have a patch that clones them only after authentication rather than before. Prior discussion used FUD about some monitoring apps needing to see forged RA's. I don't think there really are apps that need to see forged RA's, but if they really want everything, they should use bpf or the like, just as they would need to do to receive, for example, packets with invalid checksums. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPv6: optionaly validate RAs on raw sockets
Since you asked for another idea, how about using netlink to send _validated_ RA information to interested parties? -vlad That sounds like a good idea to me (FWIW), though I also still think a simple raw-socket application would do it just fine, possibly with no kernel modification at all. But since the kernel wouldn't be maintaining the DNS info, which was my real objection to the original version, netlink would work well too. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: a maze of twisty stats, most different
I think there's a more general problem that's a huge hassle. There are lots of new SNMP MIB's, but no conventions (that I'm aware of, at least) to allow for changes to the /proc entries that get them to applications. For example, the route age data recently submitted. I agree that existing apps rely on these and are generally very fragile, but in practice it means it isn't easy to do any changes for RFC compliance with new MIBS, short of replicating existing data in a new /proc entry along with the old one. How do people feel about adding a MIB subtree in /proc that tracks MIB updates with some basic rules: 1) all fields must be tagged with a label 2) all apps using it must match the label, and ignore anything they don't match Then future additions to a row just mean adding a label (like route age), and items like the one-per-line v6 statistics could add new ones easily, too. And the existing MIB and non-MIB stats can be the same format, though possibly derived from different data. I've got a patch for doing ICMPMsgStats, which replaces the individual ICMP counters now defined (and which are deprecated), but changing /proc/net/snmp6 doesn't seem so wise, given how fragile some of these apps are. Defined device stats could live in that proposed tree, too, and with basic rules for users would allow Linux-only useful extensions without breaking the rules that apps using it should follow (and therefore without breaking the apps). That's independent of ethtool extensions, but at least marginally related... Comments? +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scaling Max IP address limitation
Unrelated wishful thinking I keep having hopeful dreams that one day netfilter will grow support for cross-protocol NAT (IE: NAT a TCPv4 connection over TCPv6 to the IPv6-only local web server, or vice versa). It would seem that would require a merged xtables program. You don't actually need it (at least for easy cases like that), because IPv6 defines IPv4 mapped IPv6 addresses of the form :::a.b.c.d. These will generate IPv4 packets for a.b.c.d, from a v6 socket. Unless you're using v6only binding (a sysctl option), you can connect to v6-only servers using a v4 network and a v4 address of the server. The peer address on those connections will show up as a v4 mapped address, and all the traffic will be v4, but the socket layer is all v6. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
[EMAIL PROTECTED] wrote on 06/23/2007 07:47:06 AM: Rémi and Simon give my responses very eloquently. Although you could have yet-another-network-daemon redundantly process RA messages, the kernel is doing it already and it makes sense to just push this It would be two pieces looking at the same packet, but it isn't redundant processing. The kernel would ignore the RDNS header, and the app would ignore everything else; everything would be processed once. Although parsing RA messages and processing expiry in userland looks barely-possible now, barely possible?? See below. SeND support is really necessary for long-term IPv6 security, and duplicating SeND functionality in userland would be a nightmare. Futher, the neighbor discover protocol involves Router Solicitation messages which elicit the Router Advertisement reply, and we really don't want userland sending redundant Router Solicitation messages around, just because the kernel doesn't want to tell it what Router Advertisements it received. I considered storing the *complete* Router Advertisement messages received and pushing them unparsed to userland, just to get around the bogus DNS in the kernel politics (hint: it's not a resolver in the kernel, it's just nameserver addresses being stored). Does anyone really suggest that this would be a better solution? No, in fact! I didn't hear anyone suggesting that all of neighbor discovery be pushed out of the kernel. All I suggested is that you read a raw ICMPv6 socket for RA's that have the RDNS header and the app _process_the_RDNS_header. The kernel should still continue to do everything it needs to with the kernel data in the RA. Then you just need a hash table (or maybe just a list -- there shouldn't be a lot of them) and a timer to delete them when the RDNS expiration hits. Easy, right? You might have to change the icmp6_filter, if RA's are not already copied to raw sockets (I don't know either way offhand), but that's a trivial kernel patch; otherwise, I don't believe you have to do anything but read the socket and process the RDNS header on RAs you receive. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
Rémi Denis-Courmont [EMAIL PROTECTED] wrote on 06/23/2007 09:51:55 AM: How do I authenticate SeND RA? How do I deal with the link going down before the expiration? How do I know this interface is doing autoconf at all? The kernel should do the authentication, as it will for other RA's, and should not deliver (IMAO) unauthenticated packets. If it is, I would consider that a bug (for all cases, not just this), and that would be a good thing to fix. :-) An interface going down doesn't directly invalidate a DNS server address, though it may not be the best through another interface. Since it is a list, I think doing nothing for this case wouldn't be terrible. This is no worse than the existing resolver code. But if you really need it, you can monitor netlink, or poll the interface flags on whatever interval you require for detection. As for autoconf, that's available from sysctl, I assume from /proc somewhere, too. That usually doesn't change, but if you want to account for runtime configuration changes, you can always monitor netlink and reread when new addresses appear, too. There certainly may be complications I haven't thought of, since I haven't implemented it. But I still don't see a good case for using the kernel as a DNS database. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
Rémi Denis-Courmont [EMAIL PROTECTED] wrote on 06/23/2007 11:13:01 AM: An implementation might perform additional validity checks on the ICMPv6 message content and discard malformed packets. However, a portable application must not assume that such validity checks have been performed. This doesn't say that unauthenticated packets must be delivered, and I don't think the portability of an RDNS daemon is an issue. But even if you really wanted to run the same code on a non-Linux machine, it just means that your daemon code would have to do its own authentication. Reading /proc or netlink with packet formats you've defined to get this information is not more portable to non-Linux machines, right? I don't see any issue here. If an application is relying on the ability to see forged packets for portability reasons, it's probably not an application you want running on your machine. :-) That would encourage people into running open recursive DNS servers which is widely known and documented as a bad practice. Definitely a very bad idea. I don't understand your point here. I'm talking about client behaviour, and if the client fails for a server from a downed interface, I don't see how that's different from removing the client from the list, which is what you want to do. Nobody should feel encouraged to do anything different on the server side-- at least not by me! But if you really need it, you can monitor netlink, or poll the interface flags on whatever interval you require for detection. As for autoconf, that's available from sysctl, I assume from /proc somewhere, too. That usually doesn't change, but if you want to account for runtime configuration changes, you can always monitor netlink and reread when new addresses appear, too. There are a bunch of parameters that determine whether an interface accepts RAs or not. I doubt it's wise to try to reimplement that into userspace, particularly if it is subject to change. I'm not suggesting re-implementing anything; I'm saying you can read the current state at application level, if you need it. If you think it's difficult to get the correct information from existing API's, then improving those API's is always worthwhile. I don't believe it's excessively difficult to determine if autoconf is in use, though. My point is raw IPv6 sockets are not usable for the time being, and I do not see anyway to fix without modifying the kernel. I disagree about raw sockets being usable, but without modifying the kernel isn't a constraint. modifying the kernel != put DNS server info in the kernel; if there's a bug, or some minor tweaking that'd help the feature along, I'd support that. The important point for me is that the basic mechanisms are already in place, and I think it'd be best to use those rather than creating a new interface for all of this. The userspace DNS configuration daemon might need to be started later than the kernel autoconf - another issue that needs help from the kernel. Easily done; the init scripts are what bring the interfaces up in the first place, so start the daemon before those run. Adding an entry in inittab so it'll be automatically restarted if it dies is also a reasonable thing. RA's are resent periodically, and they can be lost anyway, so not the end of the world if you miss one, either. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
Scott, Why not make the application that writes resolv.conf also listen on a raw ICMPv6 socket? I don't believe you'd need any kernel changes, then, and it seems pretty simple and straightforward. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
[EMAIL PROTECTED] wrote on 06/22/2007 06:17:46 PM: On 23/06/07 02:04, David Stevens wrote: Why not make the application that writes resolv.conf also listen on a raw ICMPv6 socket? I don't believe you'd need any kernel changes, then, and it seems pretty simple and straightforward. Because then it requires yet another network daemon, RA in the kernel means there's no need for one to manage adding auto-configured IP addresses... what's wrong with doing the same for DNS? It's not yet another one, since you have to run something to get it in resolv.conf, anyway. That seems much better to me than having the kernel track data that can only be used at the application layer. The app itself looks like it'd be really simple. Auto-configured addresses are used by the kernel. It has to have those addresses. But the kernel doesn't do DNS look-ups, or write resolv.conf; that's the difference, for me. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] don't put multicasts with mc_ttl=0 on the wire
Arthur, I assume you're making use of the hack mentioned in route.c: ... This hack is not just for fun, it allows vic, vat and friends to work. They bind socket to loopback, set ttl to zero and expect that it will work. I don't know the details of the intent for this hack, but did you test that it won't break them? A multicast application that relies on the (unicast) routing table at all is broken IMHO, as is an app that sets the TTL to 0, but shouldn't all ttl 0 packets about to be sent to the wire be dropped? Wouldn't that be a better way to handle this case? +-DLS [EMAIL PROTECTED] wrote on 05/15/2007 12:56:02 PM: A colleague of mine found that multicasts with a ttl of 0 can be sent on the wire. This happens if the sender doesn't belong to the destination multicast group. With the following the multicast ttl is respected whether or not the sender belongs to the destination multicast group. Signed-off-by: Arthur Kepner [EMAIL PROTECTED] --- net/ipv4/route.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index cb76e3c..bf25cf5 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2249,8 +2249,7 @@ static inline int __mkroute_output(struct rtable **result, } if (flags (RTCF_BROADCAST | RTCF_MULTICAST)) { rth-rt_spec_dst = fl-fl4_src; - if (flags RTCF_LOCAL - !(dev_out-flags IFF_LOOPBACK)) { + if (!(dev_out-flags IFF_LOOPBACK)) { rth-u.dst.output = ip_mc_output; RT_CACHE_STAT_INC(out_slow_mc); } -- Arthur - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] Reverse sense of promisc tests in ip6_mc_input
Your patch looks correct to me. +-DLS Signed-off-by: David L Stevens [EMAIL PROTECTED] [EMAIL PROTECTED] wrote on 05/12/2007 03:51:35 PM: From: Corey Mutter [EMAIL PROTECTED] Reverse the sense of the promiscuous-mode tests in ip6_mc_input(). Signed-off-by: Corey Mutter [EMAIL PROTECTED] --- I had been suspicious of this code for a while, but just assumed I must have been missing something. However, I did some tests and, sure enough, I can open a socket and it will see IPv6 multicast packets come in, then the packets will stop coming in when I turn on IFF_ALLMULTI on the interface. In my tests, this change works more like I'd expect; I can open a socket, and it doesn't see any multicast packets (unless I join a group or something) until after I turn on IFF_ALLMULTI. --- linux-2.6.21.1/net/ipv6/ip6_input.c 2007-04-27 17:49:26.0 -0400 +++ linux/net/ipv6/ip6_input.c 2007-05-12 18:19:09.59775 -0400 @@ -235,7 +235,7 @@ int ip6_mc_input(struct sk_buff *skb) IP6_INC_STATS_BH(ip6_dst_idev(skb-dst), IPSTATS_MIB_INMCASTPKTS); hdr = skb-nh.ipv6h; - deliver = likely(!(skb-dev-flags (IFF_PROMISC|IFF_ALLMULTI))) || + deliver = unlikely(skb-dev-flags (IFF_PROMISC|IFF_ALLMULTI)) || ipv6_chk_mcast_addr(skb-dev, hdr-daddr, NULL); /* - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] New driver API to speed up small packets xmits
The word small is coming up a lot in this discussion, and I think packet size really has nothing to do with it. Multiple streams generating packets of any size would benefit; the key ingredient is a queue length greater than 1. I think the intent is to remove queue lock cycles by taking the whole list (at least up to the count of free ring buffers) when the queue is greater than one packet, thus effectively removing the lock expense for n-1 packets. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] New driver API to speed up small packets xmits
Which worked _very_ well (the whole list) going in the other direction for the netisr queue(s) in HP-UX 10.20. OK, I promise no more old HP-UX stories for the balance of the week :) Yes, OSes I worked on in other lives usually took the whole queue and then took responsibility for handling them outside the lock. A dequeue_n() under 1 lock should still be better than n locks, I'd think, since you may bounce to different CPUs during the n individual cycles. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[IPV6] send ICMPv6 error on scope violations
When an IPv6 router is forwarding a packet with a link-local scope source address off-link, RFC 4007 requires it to send an ICMPv6 destination unreachable with code 2 (not neighbor), but Linux doesn't. Fix below. +-DLS [in-line for viewing, attached for applying] Signed-off-by: David L Stevens [EMAIL PROTECTED] --- linux-2.6.20.7/net/ipv6/ip6_output.c2007-04-13 13:48:14.0 -0700 +++ linux-2.6.20.7T1/net/ipv6/ip6_output.c 2007-05-09 12:32:45.0 -0700 @@ -449,10 +449,17 @@ int ip6_forward(struct sk_buff *skb) */ if (xrlim_allow(dst, 1*HZ)) ndisc_send_redirect(skb, n, target); - } else if (ipv6_addr_type(hdr-saddr)(IPV6_ADDR_MULTICAST|IPV6_ADDR_LOOPBACK - |IPV6_ADDR_LINKLOCAL)) { + } else { + int addrtype = ipv6_addr_type(hdr-saddr); + /* This check is security critical. */ - goto error; + if (addrtype (IPV6_ADDR_MULTICAST|IPV6_ADDR_LOOPBACK)) + goto error; + if (addrtype IPV6_ADDR_LINKLOCAL) { + icmpv6_send(skb, ICMPV6_DEST_UNREACH, + ICMPV6_NOT_NEIGHBOUR, 0, skb-dev); + goto error; + } } if (skb-len dst_mtu(dst)) { icmp6fix.patch Description: Binary data
Re: [PATCH 3/3] bonding: Improve IGMP join processing
It looks to me like rejoin is essentially ip_mc_up(), and it'd be better to call that than add a nearly identical function. Also, real interfaces already do gratuitous IGMP advertisements when they are bounced (the reason there is an ip_mc_up()). Could bonding, when failing over, simply mark the master interface as down, switch, and then mark the master as up again? In addition to doing the right thing for both IPv4 and IPv6 multicasting w/o any code changes in those layers, it may have similar benefits for ARP and neighbor discovery, right? Maybe not-- haven't looked at it... One down side for IPv6 (which apparently bonding doesn't support) is that static addresses are lost when the device goes down, but that's a difference form IPv4 that should be fixed. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bonding-devel] [PATCH 3/3] bonding: Improve IGMP join processing
[EMAIL PROTECTED] wrote on 03/06/2007 03:15:41 PM: David Stevens [EMAIL PROTECTED] wrote: It looks to me like rejoin is essentially ip_mc_up(), and it'd be better to call that than add a nearly identical function. Won't ip_mc_up() acquire an additional reference (via ip_mc_inc_group) to the IGMP_ALL_HOSTS im-users that would never be released (in the case of bonding calling the function out of the blue)? Yes. I'm not sure that matters-- destroy_dev doesn't care how many references to a group, and IGMP_ALL_HOSTS isn't advertised (so wouldn't get a leave when you only down the interface, like other groups do). But since ip_mc_up() is *entirely* that join plus group_added() on all the existing groups, there really shouldn't be another. But the new device will need the all-hosts group in its hardware multicast filter, too, if it hasn't already been using multicasting. Your reload caller could just dec_group that group after calling ip_mc_up(). In looking at it, the ip_mc_rejoin_group function (the new one added with the patch) is a lot more like igmp_group_added() than ip_mc_up(). No, group_added() is one group. mc_up() just calls group_added on all of them, which I think is what the rejoin was trying to do. I'm not sure if the extra bits in igmp_group_added() are worthy of concern; I'm thinking not, since im-loaded shouldn't be zero coming in for the bonding case. im-loaded means the device has it in its multicast address filter. If you're switching devices, and didn't do all the multicast stuff on all the devices originally, then you want it to be 0 (and should make it so, like ip_mc_down() does). :-) I think the meat that the rejoin wants is what's in igmpv3_send_cr(), which appears to do the actual sending stuff. I'm not sure if that's better to call directly (and risk locking adventures) or to just trip the timer via igmp_ifc_event(). No, no, no -- please don't mess with those directly. It'd be a maintenance nightmare, and multicasting is device independent right now. :-) I'd hope there wouldn't be any bonding-specific code needed at this layer, which is why I hope something like using up/down would work out. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bonding-devel] [PATCH 3/3] bonding: Improve IGMP join processing
Marking the master down would, I believe, issue notifiers that the device has gone down. Various things, network manager sort of applications in particular, listen to those, so I'm not sure it's a good idea. I think there are other side effects as well, I'm thinking it would flush routes associated with the interface as well. [BTW, you can call ip_mc_down()/ip_mc_up() directly w/o getting there from the notifiers -- then no side-effects.] Andy Gospodarek wrote: I agree with Jay here. I hate that bonding has to have so much knowledge about upper layer protocols, but for the ones that are stateful like IGMP we will need fixes like the one proposed. I have no problem with bonding having knowledge of ULP's (I don't like it, but I don't have to look at it :-) ), but the patch is doing it the other way around. What I don't like about the proposed patch is that it's adding knowledge of bonding to IGMP. And IGMP does work fine in this case, w/o flooding or the proposed patch. It just has the risk of losing multicast packets during one query interval, and that only happens if you're using a switch that does IGMP snooping. I'd like the patch a lot better if it were basicly this: mc_bond_fudge(void) { ip_mc_down(masterdev); /*do whatever you need to do to switch the slave */ ip_mc_up(masterdev); } That doesn't go through the notifier chain, uses existing functions, doesn't have any refcnt issues, and most importantly could/should reside in a bonding source file and not in igmp.c. :-) But RTNL is required whether you use up/down or roll your own variant, so it sounds like you have other issues to resolve too. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] /proc/net/anycast6 unbalanced inet6_dev refcnt
Reading /proc/net/anycast6 when there is no anycast address on an interface results in an ever-increasing inet6_dev reference count, as well as a reference to the netdevice you can't get rid of. Patch below attached. +-DLS Signed-off-by: David L Stevens [EMAIL PROTECTED] --- linux-2.6.20.1/net/ipv6/anycast.c 2007-02-19 22:34:32.0 -0800 +++ linux-2.6.20.1T1/net/ipv6/anycast.c 2007-02-26 15:37:04.0 -0800 @@ -462,6 +462,7 @@ static inline struct ifacaddr6 *ac6_get_ break; } read_unlock_bh(idev-lock); + in6_dev_put(idev); } return im; } ac6.patch Description: Binary data
Re: igmp: possible NULL dereference after GFP_ATOMIC allocation?
Alexey, I think you're correct-- looks like it needs: if (!skb) return NULL; just before the skb_put(), since an allocation (and failure) could occur in either the igmpv3_newpack() call or in add_grhead(). Also, similar code in net/ipv6/mcast..c. Will you be submitting the patch? +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] devinet: inetdev_init out label moved after RCU assignment
Yeah, sure. +-DLS Acked-by: David L Stevens [EMAIL PROTECTED] Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- diff -Nurp linux-2.6.20-rc3-/net/ipv4/devinet.c linux-2.6.20-rc3/net/ipv4/devinet.c --- linux-2.6.20-rc3-/net/ipv4/devinet.c 2007-01-05 11:53:16.0 +0100 +++ linux-2.6.20-rc3/net/ipv4/devinet.c 2007-01-05 11:55:32.0 +0100 @@ -174,9 +174,10 @@ struct in_device *inetdev_init(struct ne ip_mc_init_dev(in_dev); if (dev-flags IFF_UP) ip_mc_up(in_dev); -out: + /* we can receive as soon as ip_ptr is set -- do this last */ rcu_assign_pointer(dev-ip_ptr, in_dev); +out: return in_dev; out_kfree: kfree(in_dev); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)
Ben Jarek, Your analysis looks correct to me. It seems to me the problem is that we don't want the in_device to be searchable until after the initialization is done. What about moving the initialization of dev-ip_ptr in inetdev_init() to after the out label? +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)
Ben, Here's a patch that I think will fix it, assuming the receive is on the same device as the initialization. Can you try this out? +-DLS [inline for viewing, attached for applying] Signed-off-by: David L Stevens [EMAIL PROTECTED] diff -ruNp linux-2.6.19.1/net/ipv4/devinet.c linux-2.6.19.1T1/net/ipv4/devinet.c --- linux-2.6.19.1/net/ipv4/devinet.c 2006-12-11 11:32:53.0 -0800 +++ linux-2.6.19.1T1/net/ipv4/devinet.c 2007-01-03 14:37:56.0 -0800 @@ -165,9 +165,8 @@ struct in_device *inetdev_init(struct ne NET_IPV4_NEIGH, ipv4, NULL, NULL); #endif - /* Account for reference dev-ip_ptr */ + /* Account for reference dev-ip_ptr (below) */ in_dev_hold(in_dev); - rcu_assign_pointer(dev-ip_ptr, in_dev); #ifdef CONFIG_SYSCTL devinet_sysctl_register(in_dev, in_dev-cnf); @@ -176,6 +175,8 @@ struct in_device *inetdev_init(struct ne if (dev-flags IFF_UP) ip_mc_up(in_dev); out: + /* we can receive as soon as ip_ptr is set -- do this last */ + rcu_assign_pointer(dev-ip_ptr, in_dev); return in_dev; out_kfree: kfree(in_dev); diff -ruNp linux-2.6.19.1/net/ipv6/addrconf.c linux-2.6.19.1T1/net/ipv6/addrconf.c --- linux-2.6.19.1/net/ipv6/addrconf.c 2006-12-11 11:32:53.0 -0800 +++ linux-2.6.19.1T1/net/ipv6/addrconf.c2007-01-03 14:47:07.0 -0800 @@ -413,8 +413,6 @@ static struct inet6_dev * ipv6_add_dev(s if (netif_carrier_ok(dev)) ndev-if_flags |= IF_READY; - /* protected by rtnl_lock */ - rcu_assign_pointer(dev-ip6_ptr, ndev); ipv6_mc_init_dev(ndev); ndev-tstamp = jiffies; @@ -425,6 +423,8 @@ static struct inet6_dev * ipv6_add_dev(s NULL); addrconf_sysctl_register(ndev, ndev-cnf); #endif + /* protected by rtnl_lock */ + rcu_assign_pointer(dev-ip6_ptr, ndev); return ndev; } initfix.patch Description: Binary data
Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)
OK, sounds good. By the way, I think you can probably hit it more often if you have something on the virtual network sending lots of multicast traffic while you're creating the interface. That'll increase the odds that you'll get into ip_check_mc() with a partially initialized in_dev. You can use ping -I intfX 224.0.0.1 (e.g.) to generate multicast traffic, though you'd want more than one. :-) +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)
Herbert, You're right, I don't know whether it'll fix the problem Ben saw or not, but it looks like the original code can do a receive before the in_device is fully initialized, and that, of course, is bad. If the device for ip_rcv() is not the same one we were initializing when the receive interrupted, then the patch should have no effect either way -- I don't think it'll hide other problems. If it's hard to reproduce (which I guess is true), then you're right, no soft lockup doesn't really tell us if it's fixed or not. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)
Ben, If the ip_rcv() and the inetdev_init() are on the same interface in your stack backtrace, it's a certainty at that point that the lock value is still 0ed, because none of the initialization occurs until after it has returned from the function it interrupted to do the receive. It'd have to be out of the register code and doing ip_mc_init_dev() (after that call) to be a tight race with lock creation. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)
I've looked at this a little too -- it'd be nice to know who holds the write lock. I see ip_mc_destroy_dev() is bouncing through the lock for each multicast address, though it starts at the beginning of the list each time. I don't see a problem with it, but it'd be simpler if it acquired the write lock once, grabbed and nulled the list, released the lock and then called igmp_group_dropped() ip_ma_put() on each address from the local list copy. Are you destroying/creating interfaces or doing a lot of multicasting at the time? How many group memberships do you have? +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fix IGMPV3_EXP() normalization bit shift value
The IGMPV3_EXP() macro doesn't correctly shift the normalization bit, so time-out values are longer than they should be. Patch below for viewing and attached for applying. Thanks to Dirk Ooms for finding the problem in IGMPv3 - MLDv2 had a similar problem that was already fixed a year ago. :-( +-DLS Signed-off-by: David L Stevens [EMAIL PROTECTED] --- linux-2.6.19-rc5/include/linux/igmp.h 2006-11-16 16:11:53.0 -0800 +++ linux-2.6.19-rc5T1/include/linux/igmp.h 2006-11-16 16:15:46.0 -0800 @@ -191,7 +191,7 @@ struct ip_mc_list #define IGMPV3_MASK(value, nb) ((nb)=32 ? (value) : ((1(nb))-1) (value)) #define IGMPV3_EXP(thresh, nbmant, nbexp, value) \ ((value) (thresh) ? (value) : \ -((IGMPV3_MASK(value, nbmant) | (1(nbmant+nbexp))) \ +((IGMPV3_MASK(value, nbmant) | (1(nbmant))) \ (IGMPV3_MASK((value) (nbmant), nbexp) + (nbexp #define IGMPV3_QQIC(value) IGMPV3_EXP(0x80, 4, 3, value) mc.patch Description: Binary data
Re: [PATCH] IPv6: only modify checksum for UDP
David Miller [EMAIL PROTECTED] wrote on 11/13/2006 04:50:58 PM: Puzzling :-) Then why is the transformation only performed for UDP in the ipv4 stack? It seems by your logic TCP would need to either do the if (sum==0) sum=~0; thing or it would need to accept both 0 and ~0 in the checksum checking path. That's actually what I was suggesting. In 1's-complement, ~0 == -0 which is still 0, so barring any special case (like UDP's 0 means no checksum rule), it should be equally valid for a packet to have 0 or ~0 as the checksum (with otherwise identical data)-- they are both correct, and equal to each other. That extra 1's-complement 0 is, of course, why UDP can have the special case of remapping 0-~0. Since the patch was for output-side, it doesn't matter whether you remap 0 to ~0 or not (except for the special case), but a receiver technically should allow either. In practice, it won't matter if the machines you're talking to are remapping (or not) in the same way. But 0 and -0 (aka ~0) are still equal. :-) It obviously doesn't come up much. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPv6: only modify checksum for UDP
Sorry, I saw this discussion a little late... The Internet checksum is defined as a 1's-complement sum, so if the alternate 0 does not have a special meaning in a protocol, then by 1's-complement arithmetic, 0 == ~0. So, it looks to me without the remapping that a valid checksum may also fail, if it is simply computed in a different way (or on a different architecture) such that one gets 0 and one gets ~0 as un-modified answers. Since we're checking for equality on 2's-complement machines, I think the easiest thing is to still re-map it. Otherwise, instead of testing for 0, we have to test for both 0 and ~0 in the validity checks, right? Disclaimer: I've just glanced through the discussion, so maybe I've misunderstood the point or effect of the patch... +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 7254] New: leaving IP multicast group with incorrect imr_address unexpectedly succeeded
If the index is set, it doesn't look up the address (which may be expensive). If you want to look up by address, the index must be 0. I wouldn't call that a bug. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make ipv4 multicast packets only get delivered to sockets that are joined to group
Alexey Kuznetsov [EMAIL PROTECTED] wrote on 09/14/2006 03:30:37 AM: Hello! No, it returns 1 (allow) if there are no filters to explicitly filter it. I wrote that code. :-) I see. It did not behave this way old times. From your mails I understood that current behaviour matches another implementations (BSD whatever), is it true? Hi, Alexey, If you mean IPv6 code in current BSD derivatives, I don't know. The IPv6 behaviour was different from IPv4 on Linux and was changed for compatibility with IPv4 (discussion on netdev agreed that binding should determine socket delivery, not group membership, or simply that there was no reason to be different from long-standing IPv4 practice). The IPv4 code is that way for compatibility with everything else since about ~4.3BSD (with the possible exception of Solaris 8, apparently). FWIW, I think Deering's original interpretation is correct. Adding a multicast address to an interface by joining a group is little different from adding a unicast address via SIOCSIFADDR, which certainly does affect packets delivered to the machine and to any INADDR_ANY-bound socket. Binding to the multicast address and not INADDR_ANY will give you only packets for that group, if that's what you want, just as in the unicast address-specific bind. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make ipv4 multicast packets only get delivered to sockets that are joined to group
[EMAIL PROTECTED] wrote on 09/13/2006 07:13:55 AM: Only the socket that is bound to the group address to which the packet was sent should get it. This is not true on any OS I'm aware of, including the original sockets multicast implementation on early BSD. Multicast group membership is per-interface, not per-socket. Joining a group on any socket on the machine allows packets for that group to be delivered on the interface where it was joined. Delivery of packets to a socket is determined by the binding, and INADDR_ANY means any. IPv6 behaves the same way. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make ipv4 multicast packets only get delivered to sockets that are joined to group
Jeff Layton [EMAIL PROTECTED] wrote on 09/13/2006 09:12:23 AM: Most of the RFC's I've looked at don't seem to have much to say about how multicasting works at the socket level. Is there an RFC or specification that spells this out, or is this one of those situations where things are somewhat open to interpretation? RFC's are standards for protocols and are concerned with interoperability among multiple implementations. They, in general, do not define API's (and those that do are informational). There are after-the-fact standards for sockets, including POSIX, but older features are effectively defined by industry practice; in this case, Deering's BSD implementation from nearly 20 years ago. Sections 7.1 7.2 of RFC 988 [Deering, 1986] hint at this interpretation (separation of membership from socket delivery), esp. when it says Incoming multicast IP datagrams are delivered to upper-layer protocol modules using the same 'Receive IP' operation as normal, unicast datagrams. But Deering's own implementation, which does not distinguish which socket joined the group, is the source of the standard practice. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make ipv4 multicast packets only get delivered to sockets that are joined to group
Alexey Kuznetsov [EMAIL PROTECTED] wrote on 09/13/2006 01:20:29 PM: Hello! IPv6 behaves the same way. Actually, Linux IPv6 filters received multicasts, inet6_mc_check() does this. No, it returns 1 (allow) if there are no filters to explicitly filter it. I wrote that code. :-) +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: multicast group memberships purge on interface delete
Michal, My question/suggestion: Would it feasible to drop the relevant entries from sockets' multicast membership lists on the interface delete? Yes, I think this is needed. The original BSD code didn't have this problem because it didn't support removal of a device. I wondered for a while whether an app should care whether the device was removed or not on a leave, but I think it probably should get ENODEV in that case, just as it would for any invalid interface index, whether or not it used to be valid. So, simply removing the memberships and filters on device destroy looks right to me. I'm not sure that maintaining a per-device socket list is necessary, since removal of a device is relatively unusual, and that may make locking and reference counting more complicated (maybe not). Probably a simple search of all UDP sockets with non-null multicast lists and matching interface index is sufficient. IGMPv3 is there fully, and uses a more efficient way of tracking interface and socket filter intersections and unions than the obvious way as suggested by the RFC. I don't think it is affected either way by this. I'll either do or review a patch for this, depending on who gets to it first. :-) +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible leak of multicast source filter sctructure
Michal Ruzicka [EMAIL PROTECTED] wrote on 08/17/2006 05:26:35 AM: I've reviewed your patch (the IPv4 part of it) and I think I can say that our soulutions are actually quite similar. But I can come up with one difference that I'd like know your opinion on. If you have a duplicated address and the application is not using interface index to identify the interface, then which one you join or leave on is not unique, and which one you'll get is not defined. It is, in fact, an error to leave the group on an interface with a different index, and the duplicate scenario has effectively changed the meaning of that address's interface. In the same scenario, if you have not deleted any interfaces and attempt to leave the group, your code may get the pppY interface from ip_mc_find_dev(), not have a matching index for the group you actually joined, and will return EADDRNOTAVAIL rather than leaving the group you joined on. In both cases, it'd simply be an error for an application to be using address to identify the interface when that is not unique. If the addresses are used (or reused) on multiple interfaces, the application has to use interface index to work properly. +-DLS Suppose the following situatuion: 1) create pppX interface: IP: A.B.C.D 2) join a multicast group by address: A.B.C.D ASSUMPTION: no other interface with the same IP address exists; the result should be that the group is joined on the pppX interface 3) create pppY interface IP: A.B.C.D (Yes the same IP, not unusual on ppp links.) 4) delete the pppX interace 5) attempt to leave the multicast group by address: A.B.C.D 6) * if your algorthim is used -EADDRNOTAVAIL is returned * if mine is used the multicast group is left on the pppX interface Surely this won't be a common problem but I think it's worth pointing out here. Michal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible leak of multicast source filter sctructure #3a
Michal, I believe the patch I submitted yesterday fixes this problem, and in a simpler way. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Re: Possible leak of multicast source filter sctructure
Here's a patch to fix source filter leakage when a device is removed and a process leaves the group thereafter. (2nd part of prior discussion) This patch also includes corresponding fixes for IPv6 multicast source filters on device removal (1st 2nd parts of prior discussion, for IPv6). +-DLS [in-line for viewing, attached for applying] Signed-off-by: David L Stevens [EMAIL PROTECTED] diff -ruNp linux-2.6.17.8/net/ipv4/igmp.c linux-2.6.17.8-dls/net/ipv4/igmp.c --- linux-2.6.17.8/net/ipv4/igmp.c 2006-08-06 21:18:54.0 -0700 +++ linux-2.6.17.8-dls/net/ipv4/igmp.c 2006-08-15 16:30:18.0 -0700 @@ -1796,29 +1796,35 @@ int ip_mc_leave_group(struct sock *sk, s struct in_device *in_dev; u32 group = imr-imr_multiaddr.s_addr; u32 ifindex; + int ret = -EADDRNOTAVAIL; rtnl_lock(); in_dev = ip_mc_find_dev(imr); - if (!in_dev) { - rtnl_unlock(); - return -ENODEV; - } ifindex = imr-imr_ifindex; for (imlp = inet-mc_list; (iml = *imlp) != NULL; imlp = iml-next) { - if (iml-multi.imr_multiaddr.s_addr == group - iml-multi.imr_ifindex == ifindex) { - (void) ip_mc_leave_src(sk, iml, in_dev); + if (iml-multi.imr_multiaddr.s_addr != group) + continue; + if (ifindex) { + if (iml-multi.imr_ifindex != ifindex) + continue; + } else if (imr-imr_address.s_addr imr-imr_address.s_addr != + iml-multi.imr_address.s_addr) + continue; + + (void) ip_mc_leave_src(sk, iml, in_dev); - *imlp = iml-next; + *imlp = iml-next; + if (in_dev) ip_mc_dec_group(in_dev, group); - rtnl_unlock(); - sock_kfree_s(sk, iml, sizeof(*iml)); - return 0; - } + rtnl_unlock(); + sock_kfree_s(sk, iml, sizeof(*iml)); + return 0; } + if (!in_dev) + ret = -ENODEV; rtnl_unlock(); - return -EADDRNOTAVAIL; + return ret; } int ip_mc_source(int add, int omode, struct sock *sk, struct diff -ruNp linux-2.6.17.8/net/ipv6/mcast.c linux-2.6.17.8-dls/net/ipv6/mcast.c --- linux-2.6.17.8/net/ipv6/mcast.c 2006-08-06 21:18:54.0 -0700 +++ linux-2.6.17.8-dls/net/ipv6/mcast.c 2006-08-15 15:51:53.0 -0700 @@ -269,13 +269,14 @@ int ipv6_sock_mc_drop(struct sock *sk, i if ((dev = dev_get_by_index(mc_lst-ifindex)) != NULL) { struct inet6_dev *idev = in6_dev_get(dev); + (void) ip6_mc_leave_src(sk, mc_lst, idev); if (idev) { - (void) ip6_mc_leave_src(sk,mc_lst,idev); __ipv6_dev_mc_dec(idev, mc_lst-addr); in6_dev_put(idev); } dev_put(dev); - } + } else + (void) ip6_mc_leave_src(sk, mc_lst, NULL); sock_kfree_s(sk, mc_lst, sizeof(*mc_lst)); return 0; } @@ -335,13 +336,14 @@ void ipv6_sock_mc_close(struct sock *sk) if (dev) { struct inet6_dev *idev = in6_dev_get(dev); + (void) ip6_mc_leave_src(sk, mc_lst, idev); if (idev) { - (void) ip6_mc_leave_src(sk, mc_lst, idev); __ipv6_dev_mc_dec(idev, mc_lst-addr); in6_dev_put(idev); } dev_put(dev); - } + } else + (void) ip6_mc_leave_src(sk, mc_lst, NULL); sock_kfree_s(sk, mc_lst, sizeof(*mc_lst)); mcd.patch Description: Binary data