Re: [PATCH] [IPV6] XFRM: Fix auditing rt6i_flags; use RTF_xxx flags instead of RTCF_xxx.
From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]> Date: Fri, 07 Dec 2007 10:41:48 -0800 (PST) > RTCF_xxx flags, defined in include/linux/in_route.h) are available for > IPv4 route (rtable) entries only. Use RTF_xxx flags instead, > defined in include/linux/ipv6_route.h, for IPv6 route entries (rt6_info). > > Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Applied, thank you. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.24 net driver mis-patch
Jeff, this belonged in netdev-2.6, Linus's tree doesn't have the BNX2X driver yet, your 2.6.25 bound tree does. As a result you added the ZLIB_INFLATE dependency to the TEHUTI driver in 2.6.24 instead of BNX2X where it belongs, please fix this, thanks :-) commit 70eba18b5664f90d7620905e005b89388e5fd94b Author: Eliezer Tamir <[EMAIL PROTECTED]> Date: Wed Dec 5 16:12:39 2007 +0200 make bnx2x select ZLIB_INFLATE The bnx2x module depends on the zlib_inflate functions. The build will fail if ZLIB_INFLATE has not been selected manually or by building another module that automatically selects it. Modify BNX2X config option to 'select ZLIB_INFLATE' like BNX2 and others. This seems to fix it. Signed-off-by: Lee Schermerhorn <[EMAIL PROTECTED]> Acked-by: Eliezer Tamir <[EMAIL PROTECTED]> Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index d9107e5..6cde4ed 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2588,6 +2588,7 @@ config MLX4_DEBUG config TEHUTI tristate "Tehuti Networks 10G Ethernet" depends on PCI + select ZLIB_INFLATE help Tehuti Networks 10G Ethernet NIC -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.25] qdisc: new rate limiter
Patrick McHardy wrote: Stephen Hemminger wrote: +struct tc_rlim_qopt +{ +__u32 limit;/* fifo limit (packets) */ +__u32rate;/* bits per sec */ This seems a bit small, 512mbit is the maximum rate. Its 4gbit of course, so I guess its enough :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.25] qdisc: new rate limiter
Stephen Hemminger wrote: This is a time based rate limiter for use in network testing. When doing network tests it is often useful to test at reduced bandwidths. The existing Token Bucket Filter provides rate control, but causes bursty traffic that can cause different performance than real world. Another alternative is the PSPacer, but it depends on pause frames which may also cause issues. The qdisc depends on high resolution timers and clocks, so it will probably use more CPU than others making it a poor choice for use when doing traffic shaping for QOS. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- a/include/linux/pkt_sched.h 2007-10-30 09:18:29.0 -0700 +++ b/include/linux/pkt_sched.h 2007-12-07 13:37:50.0 -0800 @@ -475,4 +475,10 @@ struct tc_netem_corrupt #define NETEM_DIST_SCALE 8192 +struct tc_rlim_qopt +{ + __u32 limit; /* fifo limit (packets) */ + __u32 rate; /* bits per sec */ This seems a bit small, 512mbit is the maximum rate. --- /dev/null 1970-01-01 00:00:00.0 + +++ b/net/sched/sch_rlim.c 2007-12-07 16:22:10.0 -0800 @@ -0,0 +1,350 @@ +static struct sk_buff *rlim_dequeue(struct Qdisc *sch) +{ + struct rlim_sched_data *q = qdisc_priv(sch); + struct sk_buff *skb; + ktime_t now = ktime_get(); + + /* if haven't reached the correct time slot, start timer */ + if (now.tv64 < q->next_send.tv64) { + sch->flags |= TCQ_F_THROTTLED; + hrtimer_start(&q->watchdog.timer, q->next_send, + HRTIMER_MODE_ABS); + return NULL; + } + + skb = q->qdisc->dequeue(q->qdisc); + if (skb) { + q->next_send = ktime_add_ns(now, pkt_time(q, skb)); + sch->flags &= ~TCQ_F_THROTTLED; qlen is not decremented here. + } + return skb; +} + +static int rlim_requeue(struct sk_buff *skb, struct Qdisc *sch) +{ + struct rlim_sched_data *q = qdisc_priv(sch); + int ret; + + ret = q->qdisc->ops->requeue(skb, q->qdisc); + if (!ret) { + q->next_send = ktime_sub_ns(q->next_send, pkt_time(q, skb)); + sch->q.qlen++; + sch->qstats.requeues++; + } + + return ret; +} + +static void rlim_reset(struct Qdisc *sch) +{ + struct rlim_sched_data *q = qdisc_priv(sch); + + qdisc_reset_queue(sch); This should reset the child. + + q->next_send = ktime_get(); + qdisc_watchdog_cancel(&q->watchdog); +} + +static int rlim_change(struct Qdisc *sch, struct rtattr *opt) +{ + struct rlim_sched_data *q = qdisc_priv(sch); + const struct tc_rlim_qopt *qopt; + int err; + + if (opt == NULL || RTA_PAYLOAD(opt) < sizeof(struct tc_rlim_qopt)) + return -EINVAL; + + qopt = RTA_DATA(opt); Using nested attributes would make sure we don't run into problems with extensibility. + err = set_fifo_limit(q->qdisc, qopt->limit); + if (err) + return err; + + q->limit = qopt->limit; + if (qopt->rate == 0) + q->cost = 0; /* unlimited */ + else { + q->cost = (u64)NSEC_PER_SEC << NSEC_SCALE; + do_div(q->cost, qopt->rate); + } + + pr_debug("rlim_change: rate=%u cost=%llu\n", +qopt->rate, q->cost); + + return 0; +} + +static struct Qdisc_class_ops rlim_class_ops = { This can be const. + .graft = rlim_graft, + .leaf = rlim_leaf, + .get = rlim_get, + .put = rlim_put, + .change= rlim_change_class, + .delete= rlim_delete, + .walk = rlim_walk, + .tcf_chain = rlim_find_tcf, + .dump = rlim_dump_class, +}; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] iproute2: support dotted-quad netmask notation.
On tor, 2007-12-06 at 11:53 -0800, Stephen Hemminger wrote: > On Tue, 4 Dec 2007 14:58:18 +0100 > Andreas Henriksson <[EMAIL PROTECTED]> wrote: > > > Suggested patch for allowing netmask to be specified in dotted quad format. > > See http://bugs.debian.org/357172 > > > > (Known problem: this will not prevent some invalid syntaxes, > > ie. "255.0.255.0" will be treated as "255.255.255.0") > > > > Comments? Suggestions? Improvements? > > Fix the bug you mentioned? > > [... snip example code ...] Updated patch, added your netmask validation code but without the check that made 0.0.0.0 (default) and 255.255.255.255 (one address) invalid netmasks as they are permitted in CIDR format. Signed-off-by: Andreas Henriksson <[EMAIL PROTECTED]> diff --git a/lib/utils.c b/lib/utils.c index 4c42dfd..b4a6125 100644 --- a/lib/utils.c +++ b/lib/utils.c @@ -47,6 +47,41 @@ int get_integer(int *val, const char *arg, int base) return 0; } +/* a valid netmask must be 2^n - 1 (n = 1..31) */ +static int is_valid_netmask(const inet_prefix *addr) +{ +uint32_t host; + +if (addr->family != AF_INET) +return 0; + +host = ~ntohl(addr->data[0]); + +return (host & (host + 1)) == 0; +} + +static int get_netmask(unsigned *val, const char *arg, int base) +{ + inet_prefix addr; + + if (!get_unsigned(val, arg, base)) + return 0; + + /* try coverting dotted quad to CIDR */ + if (!get_addr_1(&addr, arg, AF_INET)) { + u_int32_t mask; + + *val=0; + for (mask = addr.data[0]; mask; mask >>= 1) + (*val)++; + + if (is_valid_netmask(&addr)) + return 0; + } + + return -1; +} + int get_unsigned(unsigned *val, const char *arg, int base) { unsigned long res; @@ -304,7 +339,8 @@ int get_prefix_1(inet_prefix *dst, char *arg, int family) dst->bitlen = 32; } if (slash) { - if (get_unsigned(&plen, slash+1, 0) || plen > dst->bitlen) { + if (get_netmask(&plen, slash+1, 0) + || plen > dst->bitlen) { err = -1; goto done; } -- Regards, Andreas Henriksson -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.25] Cleanup IN_DEV_MFORWARD macro
On Fri, Dec 07, 2007 at 07:19:38PM +0300, Pavel Emelyanov wrote: > This is essentially IN_DEV_ANDCONF with proper arguments. > > Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> Acked-by: Herbert Xu <[EMAIL PROTECTED]> Thanks Pavel! I must have written that one before writing the AND macro :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] TCP illinois max rtt aging
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 15:05:59 +0200 (EET) > On Fri, 7 Dec 2007, David Miller wrote: > > > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > > Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET) > > > > > I guess if you get a large cumulative ACK, the amount of processing is > > > still overwhelming (added DaveM if he has some idea how to combat it). > > > > > > Even a simple scenario (this isn't anything fancy at all, will occur all > > > the time): Just one loss => rest skbs grow one by one into a single > > > very large SACK block (and we do that efficiently for sure) => then the > > > fast retransmit gets delivered and a cumulative ACK for whole orig_window > > > arrives => clean_rtx_queue has to do a lot of processing. In this case we > > > could optimize RB-tree cleanup away (by just blanking it all) but still > > > getting rid of all those skbs is going to take a larger moment than I'd > > > like to see. > > > > > > That tree blanking could be extended to cover anything which ACK more > > > than > > > half of the tree by just replacing the root (and dealing with potential > > > recolorization of the root). > > > > Yes, it's the classic problem. But it ought to be at least > > partially masked when TSO is in use, because we'll only process > > a handful of SKBs. The more effectively TSO batches, the > > less work clean_rtx_queue() will do. > > No, that's not what is going to happen, TSO won't help at all > because one-by-one SACKs will fragment every single one of them > (see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO > case, or am I missing something? You're of course right, and it's ironic that I wrote the SACK splitting code so I should have known this :-) A possible approach just occurred to me wherein we maintain the SACK state external to the SKBs so that we don't need to mess with them at all. That would allow us to eliminate the TSO splitting but it would not remove the general problem of clean_rtx_queue()'s overhead. I'll try to give some thought to this over the weekend. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH iproute2] rlim qdisc support.
Setup code for new rlim qdisc. For use by anyone who wants to test rlim before kernel inclusion. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- include/linux/pkt_sched.h |6 ++ tc/Makefile |1 + tc/q_rlim.c | 115 + 3 files changed, 122 insertions(+), 0 deletions(-) create mode 100644 tc/q_rlim.c diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 919af93..7973dc4 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -475,4 +475,10 @@ struct tc_netem_corrupt #define NETEM_DIST_SCALE 8192 +struct tc_rlim_qopt +{ + __u32 limit; /* fifo limit (packets) */ + __u32 rate; /* bits per sec */ +}; + #endif diff --git a/tc/Makefile b/tc/Makefile index a715566..e46954d 100644 --- a/tc/Makefile +++ b/tc/Makefile @@ -13,6 +13,7 @@ TCMODULES += q_tbf.o TCMODULES += q_cbq.o TCMODULES += q_rr.o TCMODULES += q_netem.o +TCMODULES += q_rlim.o TCMODULES += f_rsvp.o TCMODULES += f_u32.o TCMODULES += f_route.o diff --git a/tc/q_rlim.c b/tc/q_rlim.c new file mode 100644 index 000..5f634a8 --- /dev/null +++ b/tc/q_rlim.c @@ -0,0 +1,115 @@ +/* + * q_rlim.cRLIM. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Authors:Stephen Hemminger <[EMAIL PROTECTED]> + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "utils.h" +#include "tc_util.h" + +static void explain(void) +{ + fprintf(stderr, "Usage: ... rlim limit PACKETS rate KBPS\n"); +} + +static void explain1(char *arg) +{ + fprintf(stderr, "Illegal \"%s\"\n", arg); +} + + +#define usage() return(-1) + +static int rlim_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nlmsghdr *n) +{ + struct tc_rlim_qopt opt; + unsigned x; + + memset(&opt, 0, sizeof(opt)); + + while (argc > 0) { + if (matches(*argv, "limit") == 0) { + NEXT_ARG(); + if (opt.limit) { + fprintf(stderr, "Double \"limit\" spec\n"); + return -1; + } + if (get_size(&opt.limit, *argv)) { + explain1("limit"); + return -1; + } + } else if (strcmp(*argv, "rate") == 0) { + NEXT_ARG(); + if (opt.rate) { + fprintf(stderr, "Double \"rate\" spec\n"); + return -1; + } + + if (get_rate(&x, *argv)) { + explain1("rate"); + return -1; + } + opt.rate = x; + } else if (strcmp(*argv, "help") == 0) { + explain(); + return -1; + } else { + fprintf(stderr, "What is \"%s\"?\n", *argv); + explain(); + return -1; + } + argc--; argv++; + } + + if (opt.rate == 0) { + fprintf(stderr, "\"rate\" is required.\n"); + return -1; + } + + if (opt.limit == 0) { + fprintf(stderr, "\"limit\" is required.\n"); + return -1; + } + + addattr_l(n, 1024, TCA_OPTIONS, &opt, sizeof(opt)); + return 0; +} + +static int rlim_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) +{ + struct tc_rlim_qopt *qopt; + SPRINT_BUF(b1); + + if (opt == NULL) + return 0; + + if (RTA_PAYLOAD(opt) < sizeof(*qopt)) + return -1; + qopt = RTA_DATA(opt); + fprintf(f, "limit %up rate %s", qopt->limit, sprint_rate(qopt->rate, b1)); + + return 0; +} + +struct qdisc_util rlim_qdisc_util = { + .id = "rlim", + .parse_qopt = rlim_parse_opt, + .print_qopt = rlim_print_opt, +}; + -- 1.5.3.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25] qdisc: new rate limiter
This is a time based rate limiter for use in network testing. When doing network tests it is often useful to test at reduced bandwidths. The existing Token Bucket Filter provides rate control, but causes bursty traffic that can cause different performance than real world. Another alternative is the PSPacer, but it depends on pause frames which may also cause issues. The qdisc depends on high resolution timers and clocks, so it will probably use more CPU than others making it a poor choice for use when doing traffic shaping for QOS. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- a/include/linux/pkt_sched.h 2007-10-30 09:18:29.0 -0700 +++ b/include/linux/pkt_sched.h 2007-12-07 13:37:50.0 -0800 @@ -475,4 +475,10 @@ struct tc_netem_corrupt #define NETEM_DIST_SCALE 8192 +struct tc_rlim_qopt +{ + __u32 limit; /* fifo limit (packets) */ + __u32 rate; /* bits per sec */ +}; + #endif --- a/net/sched/Kconfig 2007-12-07 13:37:25.0 -0800 +++ b/net/sched/Kconfig 2007-12-07 13:37:50.0 -0800 @@ -196,6 +196,19 @@ config NET_SCH_NETEM If unsure, say N. +config NET_SCH_RLIM + tristate "Network Rate Limiter" + ---help--- + Say Y here if you want to use timer based network rate limiter + algorithm. + + See the top of for more details. + + To compile this code as a module, choose M here: the + module will be called sch_rlim. + + If unsure, say N. + config NET_SCH_INGRESS tristate "Ingress Qdisc" ---help--- --- a/net/sched/Makefile2007-10-30 09:18:30.0 -0700 +++ b/net/sched/Makefile2007-12-07 13:37:50.0 -0800 @@ -28,6 +28,7 @@ obj-$(CONFIG_NET_SCH_TEQL)+= sch_teql.o obj-$(CONFIG_NET_SCH_PRIO) += sch_prio.o obj-$(CONFIG_NET_SCH_ATM) += sch_atm.o obj-$(CONFIG_NET_SCH_NETEM)+= sch_netem.o +obj-$(CONFIG_NET_SCH_RLIM) += sch_rlim.o obj-$(CONFIG_NET_CLS_U32) += cls_u32.o obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o obj-$(CONFIG_NET_CLS_FW) += cls_fw.o --- /dev/null 1970-01-01 00:00:00.0 + +++ b/net/sched/sch_rlim.c 2007-12-07 16:22:10.0 -0800 @@ -0,0 +1,350 @@ +/* + * net/sched/sch_rate.cTimer based rate control + * + * Copyright (c) 2007 Stephen Hemminger <[EMAIL PROTECTED]> + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* Simple Rate control + + Algorthim used in NISTnet and others. + Logically similar to Token Bucket, but more real time and less lumpy. + + A packet is not allowed to be dequeued until a after the deadline. + Each packet dequeued increases the deadline by rate * size. + + If qdisc throttles, it starts a timer, which will wake it up + when it is ready to transmit. This scheduler works much better + if high resolution timers are available. + + Like classful TBF, limit is just kept for backwards compatibility. + It is passed to the default pfifo qdisc - if the inner qdisc is + changed the limit is not effective anymore. + +*/ + +/* Use scaled math to get 1/64 ns resolution */ +#define NSEC_SCALE 6 + +struct rlim_sched_data { + ktime_t next_send; /* next scheduled departure */ + u64 cost; /* nsec/byte * 64 */ + u32 limit; /* upper bound on fifo (packets) */ + + struct Qdisc *qdisc;/* Inner qdisc, default - bfifo queue */ + struct qdisc_watchdog watchdog; +}; + +static int rlim_enqueue(struct sk_buff *skb, struct Qdisc *sch) +{ + struct rlim_sched_data *q = qdisc_priv(sch); + int ret; + + ret = q->qdisc->enqueue(skb, q->qdisc); + if (ret) + sch->qstats.drops++; + else { + sch->q.qlen++; + sch->bstats.bytes += skb->len; + sch->bstats.packets++; + } + + return ret; +} + + +static u64 pkt_time(const struct rlim_sched_data *q, + const struct sk_buff *skb) +{ + return (q->cost * skb->len) >> NSEC_SCALE; +} + +static unsigned int rlim_drop(struct Qdisc *sch) +{ + struct rlim_sched_data *q = qdisc_priv(sch); + unsigned int len = 0; + + if (q->qdisc->ops->drop && (len = q->qdisc->ops->drop(q->qdisc)) != 0) { + sch->q.qlen--; + sch->qstats.drops++; + } + + return len; +} + +static struct sk_buff *rlim_dequeue(struct Qdisc *sch) +{ + struct rlim_sched_data *q = qdisc_priv(sch); + struct sk_buff *skb; + ktime_t now = ktime_get(); + + /* if haven't reached the correct time slot, start timer */ + if (now.tv64 < q->next_send.tv64) { + sch->flags |= TCQ_F_THROTTLED; + hrtimer_start(&q->watchdog.timer, q->next_send, + HRTIMER_MODE_ABS); + return NULL; + } + + skb = q-
Re: [PATCH 2/2] cxgb3 - Parity initialization for T3C adapters
Jeff Garzik wrote: Divy Le Ray wrote: > Jeff Garzik wrote: >> Divy Le Ray wrote: >>> From: Divy Le Ray <[EMAIL PROTECTED]> >>> >>> Add parity initialization for T3C adapters. >>> >>> Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]> >>> --- >>> >>> drivers/net/cxgb3/adapter.h |1 >>> drivers/net/cxgb3/cxgb3_main.c| 82 >>> drivers/net/cxgb3/cxgb3_offload.c | 15 ++ >>> drivers/net/cxgb3/regs.h | 248 >>> + >>> drivers/net/cxgb3/sge.c | 24 +++- >>> drivers/net/cxgb3/t3_hw.c | 131 +--- >>> 6 files changed, 472 insertions(+), 29 deletions(-) >> >> dropped patches 2-3, did not apply >> >> > > Hi Jeff, > > I noticed that you applied the first one of this 3 patches series to the > #upstream-fixes branch. > These patches are intended to the #upstream (2.6.25) branch, as they are > built on top of the > last 10 patches committed - 9 from me, and the white space clean up > (thanks!). > May be this is the reason why they did not apply. Ah... you need to tell me these things. I looked for a kernel version in your messages but did not see one. I had put it in the introduction mail, I should have added the kernel version in the patch titles. I'll do from now on. Does the patch #1 need to be reverted for 2.6.24? No, it can be applied to 2.6.24. The 2 next patches seem to apply cleanly on #upstream when patch #1 is popped out the patch stack. > On this topic, I have a question: how do I get to see all the netdev-2.6 > branches ? git fetch -f $NETDEV_URL upstream:upstream copies the latest upstream branch from netdev-2.6.git, and stores it as your local upstream branch. You may do the same for #upstream-fixes too. That made it. Thanks a lot! Cheers, Divy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] cxgb3 - Parity initialization for T3C adapters
Divy Le Ray wrote: Jeff Garzik wrote: Divy Le Ray wrote: From: Divy Le Ray <[EMAIL PROTECTED]> Add parity initialization for T3C adapters. Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]> --- drivers/net/cxgb3/adapter.h |1 drivers/net/cxgb3/cxgb3_main.c| 82 drivers/net/cxgb3/cxgb3_offload.c | 15 ++ drivers/net/cxgb3/regs.h | 248 + drivers/net/cxgb3/sge.c | 24 +++- drivers/net/cxgb3/t3_hw.c | 131 +--- 6 files changed, 472 insertions(+), 29 deletions(-) dropped patches 2-3, did not apply Hi Jeff, I noticed that you applied the first one of this 3 patches series to the #upstream-fixes branch. These patches are intended to the #upstream (2.6.25) branch, as they are built on top of the last 10 patches committed - 9 from me, and the white space clean up (thanks!). May be this is the reason why they did not apply. Ah... you need to tell me these things. I looked for a kernel version in your messages but did not see one. Does the patch #1 need to be reverted for 2.6.24? On this topic, I have a question: how do I get to see all the netdev-2.6 branches ? After cloning a free netdev-2.6 tree, 'git branch' shows only the master branch: bash-3.1$ git --version git version 1.5.3.rc4.29.g74276-dirty -bash-3.1$ stg clone git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git netdev-2.6-fresh Cloning "git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git" into "netdev-2.6-fresh"... Initialized empty Git repository in /opt/sources/netdev-2.6-fresh/.git/ remote: Generating pack... remote: Counting objects: 620879 Done counting 633562 objects. remote: Deltifying 633562 objects... remote: 100% (633562/633562) done Indexing 633562 objects... remote: Total 633562 (delta 517968), reused 594305 (delta 478716) 100% (633562/633562) done Resolving 517968 deltas... 100% (517968/517968) done Checking 23058 files out... 100% (23058/23058) done done -bash-3.1$ cd netdev-2.6-fresh/ -bash-3.1$ git branch * master git fetch -f $NETDEV_URL upstream:upstream copies the latest upstream branch from netdev-2.6.git, and stores it as your local upstream branch. You may do the same for #upstream-fixes too. Jeff -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sky2: RX lockup fix
I'm using a Marvell 88E8062 on a custom PPC64 blade and ran into RX lockups while validating the sky2 driver. The receive MAC FIFO would become stuck during testing with high traffic. One port of the 88E8062 would lockup, while the other port remained functional. Re-inserting the sky2 module would not fix the problem - only a power cycle would. I looked over Marvell's most recent sk98lin driver and it looks like they had a "workaround" for the Yukon XL that the sky2 doesn't have yet. The sk98lin driver disables the RX MAC FIFO flush feature for all revisions of the Yukon XL. According to skgeinit.c of the sk98lin driver, "Flushing must be enabled (needed for ASF see dev. #4.29), but the flushing mask should be disabled (see dev. #4.115)". Nice. I implemented this same change in the sky2 driver and verified that the RX lockup I was seeing was resolved. Signed-off-by: Peter Tyser <[EMAIL PROTECTED]> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- Original patch reformatted to remove line wrap. --- a/drivers/net/sky2.c2007-12-06 09:39:12.0 -0800 +++ b/drivers/net/sky2.c2007-12-06 09:54:14.0 -0800 @@ -821,8 +821,13 @@ static void sky2_mac_init(struct sky2_hw sky2_write32(hw, SK_REG(port, RX_GMF_CTRL_T), rx_reg); - /* Flush Rx MAC FIFO on any flow control or error */ - sky2_write16(hw, SK_REG(port, RX_GMF_FL_MSK), GMR_FS_ANY_ERR); + if (hw->chip_id == CHIP_ID_YUKON_XL) { + /* Hardware errata - clear flush mask */ + sky2_write16(hw, SK_REG(port, RX_GMF_FL_MSK), 0); + } else { + /* Flush Rx MAC FIFO on any flow control or error */ + sky2_write16(hw, SK_REG(port, RX_GMF_FL_MSK), GMR_FS_ANY_ERR); + } /* Set threshold to 0xa (64 bytes) + 1 to workaround pause bug */ reg = RX_GMF_FL_THR_DEF + 1; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] cxgb3 - Parity initialization for T3C adapters
Jeff Garzik wrote: Divy Le Ray wrote: From: Divy Le Ray <[EMAIL PROTECTED]> Add parity initialization for T3C adapters. Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]> --- drivers/net/cxgb3/adapter.h |1 drivers/net/cxgb3/cxgb3_main.c| 82 drivers/net/cxgb3/cxgb3_offload.c | 15 ++ drivers/net/cxgb3/regs.h | 248 + drivers/net/cxgb3/sge.c | 24 +++- drivers/net/cxgb3/t3_hw.c | 131 +--- 6 files changed, 472 insertions(+), 29 deletions(-) dropped patches 2-3, did not apply Hi Jeff, I noticed that you applied the first one of this 3 patches series to the #upstream-fixes branch. These patches are intended to the #upstream (2.6.25) branch, as they are built on top of the last 10 patches committed - 9 from me, and the white space clean up (thanks!). May be this is the reason why they did not apply. On this topic, I have a question: how do I get to see all the netdev-2.6 branches ? After cloning a free netdev-2.6 tree, 'git branch' shows only the master branch: bash-3.1$ git --version git version 1.5.3.rc4.29.g74276-dirty -bash-3.1$ stg clone git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git netdev-2.6-fresh Cloning "git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git" into "netdev-2.6-fresh"... Initialized empty Git repository in /opt/sources/netdev-2.6-fresh/.git/ remote: Generating pack... remote: Counting objects: 620879 Done counting 633562 objects. remote: Deltifying 633562 objects... remote: 100% (633562/633562) done Indexing 633562 objects... remote: Total 633562 (delta 517968), reused 594305 (delta 478716) 100% (633562/633562) done Resolving 517968 deltas... 100% (517968/517968) done Checking 23058 files out... 100% (23058/23058) done done -bash-3.1$ cd netdev-2.6-fresh/ -bash-3.1$ git branch * master Cheers, Divy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] XFRM: RFC4303 compliant auditing
On Fri, 2007-12-07 at 16:06 -0500, Paul Moore wrote: > On Friday 07 December 2007 3:52:31 pm Eric Paris wrote: > > On Fri, 2007-12-07 at 14:57 -0500, Paul Moore wrote: > > > NOTE: This really is an RFC patch, it compiles and boots but that is > > > pretty much all I can promise at this point. I'm posting this patch to > > > gather feedback from the audit crowd about the continued overloading of > > > the AUDIT_MAC_IPSEC_EVENT message type - continue to use it or create a > > > new audit message type? Of course any other comments people may have are > > > always welcome. > > > > I'm all for continuing to use it, but I feel like the op= strings should > > probably all get collected up in one place to ease maintenance in the > > future, might not matter but it's nice to be able to look only on place > > in the code to find all of the possible op= > > Agreed. I punted on doing anything here for two main reasons: > > 1. It makes sense to do this in the xfrm_audit_start() function which I > couldn't use here without some overhaul ... > 2. ... I didn't want to overhaul anything if I was going to end up using > separate message types. > > If we decide to go with a single audit message type (kinda sounds like it) > I'll fix this up in the next version. > > > The one advantage to multiple messages is the ability to exclude and not > > audit certain things. How often will these extra messages actually pop > > out of a system? Enough that people would likely still care about some > > of them but decide they don't want others? I don't know this stuff, so > > tell me how often would any of these show up? > > Bingo, this is the whole reason why I was wondering about a different message > type. Currently only SAD and SPD changes are audited and only because they > effect the security labels that are assigned to packets as they are > imported/exported out of the system (look at the LSPP requirements for > auditing the import and export of data). These new audit messages apply to > individual packets and/or a particular SA and have nothing to do with > security labels, rather they indicate error conditions found during normal > IPsec processing. It would be difficult to think of all of the particular > cases where these error conditions but in general I would say that these > audit messages should not be common. > Yes, I agree. They should not happen often. Especially compared to LSPP requirements of auditing whenever SA or SPD entries were added or deleted, which are common events. > The only reason for creating a separate audit message type for these > packet/SA > messages would be to meet a RFC requirement that states that the > implementation MUST allow the administrator to enable and disable ESP > auditing. Now, we can probably say we fulfill that requirement regardless, > but more message types allow us greater granularity and flexibility ... > Also, there is great possibility of additional messages. This is for RFC 4303, which is ESP. There are also audit messages listed for rfc 4301-IPsec architecture and rfc 4302-AH that may happen later. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH] easier PBR for dynamic source tables (via multipath)
This is a first swat and not in final form. I hope folks here will help vet my thinking on it. This fills in a missed niche in policy routing support. It allows multipath routes to select nexthop based on the source realm, inside the routing decision step, immediately after RPF is performed. It moves RPF before multipath selection. This would be for people wanting to do policy routing based on a table injected by a dynamic routing protocol, e.g. quagga, rather than static rules. The existing methods for achieving this effect are all a bit tacky for various reasons: 1) "iptables -m realm --realm X -j ROUTE" in FORWARD,mangle because ipt_ROUTE is not a well supported iptables target and has started to get dropped from mainstream distros. Maybe for lack of maintenence, but perhaps it is intentionally deprecated. (?) 2) "tc route from" ... "action mirred egress redirect" happens too late in the packet processing to do much else to the packet, like say edit the MAC addresses which remain what they were on the original output dev. Doing this is really an abuse of the queueing system and involves setting up qdiscs in weird ways when one may only want to route. 3) Userspace scripts to glue loading from a kernel routing table to a pre-routing ipset, iptables -j MARK, then "ip rule add fwmark" because the kernel then has to check the source address against two tables rather than one, and they could get quite large. Plus it's hackery. This patch is a raw proof-of-concept I put together to get things working just enough to ensure that nothing blows up when packets are routed this way. As such it does a couple of distasteful things and has a couple rough edges: Reuses the nh_weight field as the realm Does not allow normal load balancing to fully mix in ipv4 only forward only, no code for local/output route. probably will break ifndef CONFIG_NET_CLS_ROUTE Were this general idea to be deemed worthy, and as long as limiting sizeof(struct fib_nh) is not a major concern to any linux routing application. I could work up a more thorough/cleaner patch allowing statistical multipath and SAD policy-routing multipath to play nicely together. Especially needing comments on proper multipath RPF: The mainline code only checks the selected path and if RPF fails it does not choose a different one. From this I assumed it is OK to do RPF on any old nexthop, and we just assume the user won't or can't put any PR rule in that would gum that up. Otherwise both the mainline code and this code would have to RPF multiple times, defeating the goal of good performance. (Not to mention that could get extra confusing when you are using the source realm to choose.) Special attention to the spec_dest handling, what should be (?) OK since this is forward-only. Also to consider is what this means to multipath caching should that make a comeback. I've only tested this code lightly so far, just bouncing things around to static arp maps on the same if. After patching iproute2, just substitute "weight X" with "byrealm X" to activate it. Probably you want to avoid realm 0. You should be able to put catch-all nexthops in with "weight X" alongside the "byrealm" ones but they do not interact statistically. Comments on that syntax also welcome. Sorry about the attachments, no real MUAs available here that won't corrupt tabs. diff -r -U2 linux-source-2.6.23-dsc/include/linux/rtnetlink.h linux-source-2.6.23-dsc-dsad/include/linux/rtnetlink.h --- linux-source-2.6.23-dsc/include/linux/rtnetlink.h 2007-10-09 16:31:38.0 -0400 +++ linux-source-2.6.23-dsc-dsad/include/linux/rtnetlink.h 2007-12-06 20:23:25.0 -0500 @@ -294,4 +294,5 @@ #define RTNH_F_PERVASIVE 2 /* Do recursive gateway lookup */ #define RTNH_F_ONLINK 4 /* Gateway is forced on link*/ +#define RTNH_F_DSAD8 /* Dynamic PBR (weight = source realm) */ /* Macros to handle hexthops */ diff -r -U2 linux-source-2.6.23-dsc/include/net/ip_fib.h linux-source-2.6.23-dsc-dsad/include/net/ip_fib.h --- linux-source-2.6.23-dsc/include/net/ip_fib.h2007-10-09 16:31:38.0 -0400 +++ linux-source-2.6.23-dsc-dsad/include/net/ip_fib.h 2007-12-06 20:23:25.0 -0500 @@ -202,5 +202,6 @@ extern int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif, struct net_device *dev, __be32 *spec_dst, u32 *itag); -extern void fib_select_multipath(const struct flowi *flp, struct fib_result *res); +extern void fib_select_multipath(const struct flowi *flp, + struct fib_result *res, u32 itag); struct rtentry; diff -r -U2 linux-source-2.6.23-dsc/net/ipv4/fib_semantics.c linux-source-2.6.23-dsc-dsad/net/ipv4/fib_semantics.c --- linux-source-2.6.23-dsc/net/ipv4/fib_semantics.c2007-10-09 16:31:38.0 -0400 +++ linux-source-2.6.23-dsc-dsad/net/ipv4/fib_semantics.c 2007-12-07 14:36:10.0 -0500 @@ -1164,5 +1
Re: [PATCH] s2io: fix inconsistent hardware VLAN tagging during driver init
Ramkrishna Vepa wrote: Jeff, This patch looks good. Please accept. Ram -Original Message- From: Andy Gospodarek [mailto:[EMAIL PROTECTED] Sent: Thursday, December 06, 2007 11:57 AM To: netdev@vger.kernel.org Cc: [EMAIL PROTECTED]; Rastapur Santosh; Sivakumar Subramani; Sreenivasa Honnur Subject: [PATCH] s2io: fix inconsistent hardware VLAN tagging during driver init queued for my next patch run... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] XFRM: RFC4303 compliant auditing
On Friday 07 December 2007 3:52:31 pm Eric Paris wrote: > On Fri, 2007-12-07 at 14:57 -0500, Paul Moore wrote: > > NOTE: This really is an RFC patch, it compiles and boots but that is > > pretty much all I can promise at this point. I'm posting this patch to > > gather feedback from the audit crowd about the continued overloading of > > the AUDIT_MAC_IPSEC_EVENT message type - continue to use it or create a > > new audit message type? Of course any other comments people may have are > > always welcome. > > I'm all for continuing to use it, but I feel like the op= strings should > probably all get collected up in one place to ease maintenance in the > future, might not matter but it's nice to be able to look only on place > in the code to find all of the possible op= Agreed. I punted on doing anything here for two main reasons: 1. It makes sense to do this in the xfrm_audit_start() function which I couldn't use here without some overhaul ... 2. ... I didn't want to overhaul anything if I was going to end up using separate message types. If we decide to go with a single audit message type (kinda sounds like it) I'll fix this up in the next version. > The one advantage to multiple messages is the ability to exclude and not > audit certain things. How often will these extra messages actually pop > out of a system? Enough that people would likely still care about some > of them but decide they don't want others? I don't know this stuff, so > tell me how often would any of these show up? Bingo, this is the whole reason why I was wondering about a different message type. Currently only SAD and SPD changes are audited and only because they effect the security labels that are assigned to packets as they are imported/exported out of the system (look at the LSPP requirements for auditing the import and export of data). These new audit messages apply to individual packets and/or a particular SA and have nothing to do with security labels, rather they indicate error conditions found during normal IPsec processing. It would be difficult to think of all of the particular cases where these error conditions but in general I would say that these audit messages should not be common. The only reason for creating a separate audit message type for these packet/SA messages would be to meet a RFC requirement that states that the implementation MUST allow the administrator to enable and disable ESP auditing. Now, we can probably say we fulfill that requirement regardless, but more message types allow us greater granularity and flexibility ... -- paul moore linux security @ hp -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] s2io: fix inconsistent hardware VLAN tagging during driver init
Jeff, This patch looks good. Please accept. Ram > -Original Message- > From: Andy Gospodarek [mailto:[EMAIL PROTECTED] > Sent: Thursday, December 06, 2007 11:57 AM > To: netdev@vger.kernel.org > Cc: [EMAIL PROTECTED]; Rastapur Santosh; Sivakumar Subramani; > Sreenivasa Honnur > Subject: [PATCH] s2io: fix inconsistent hardware VLAN tagging during > driver init > > > The s2io driver keeps a local variable around (vlan_strip_flag) to keep > track of the current state of the hardware and whether or not it will > strip VLAN tags on incoming packets. It seems as though the hardware > default is to strip them, but that variable is not set correctly during > initialization if the default setup is used. This check ensures > vlan_strip_flag and the hardware setting are in sync. > > These variables were introduced by this patch: > > commit 926930b202d56c3dfb6aea0a0c6bfba2b87a8c03 > Author: Sivakumar Subramani <[EMAIL PROTECTED]> > Date: Sat Feb 24 01:59:39 2007 -0500 > > so this problem hasn't been around forever. > > Recent patches from Ramkrishna Vepa <[EMAIL PROTECTED]> removed this > variable and would have worked around the problem, but they were not > accepted. > > Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]> > > --- > > s2io.c |5 + > 1 files changed, 5 insertions(+) > > diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c > index 8b9f0ea..08c08de 100644 > --- a/drivers/net/s2io.c > +++ b/drivers/net/s2io.c > @@ -2151,6 +2151,11 @@ static int start_nic(struct s2io_nic *nic) > val64 &= ~RX_PA_CFG_STRIP_VLAN_TAG; > writeq(val64, &bar0->rx_pa_cfg); > vlan_strip_flag = 0; > + } else { > + val64 = readq(&bar0->rx_pa_cfg); > + val64 |= RX_PA_CFG_STRIP_VLAN_TAG; > + writeq(val64, &bar0->rx_pa_cfg); > + vlan_strip_flag = 1; > } > > /* -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] XFRM: RFC4303 compliant auditing
On Fri, 2007-12-07 at 14:57 -0500, Paul Moore wrote: > NOTE: This really is an RFC patch, it compiles and boots but that is pretty > much all I can promise at this point. I'm posting this patch to gather > feedback from the audit crowd about the continued overloading of > the AUDIT_MAC_IPSEC_EVENT message type - continue to use it or create > a new audit message type? Of course any other comments people may have > are always welcome. I'm all for continuing to use it, but I feel like the op= strings should probably all get collected up in one place to ease maintenance in the future, might not matter but it's nice to be able to look only on place in the code to find all of the possible op= The one advantage to multiple messages is the ability to exclude and not audit certain things. How often will these extra messages actually pop out of a system? Enough that people would likely still care about some of them but decide they don't want others? I don't know this stuff, so tell me how often would any of these show up? -Eric > > This patch adds a number of new IPsec audit events to meet the auditing > requirements of RFC4303. This includes audit hooks for the following events: > > * Could not find a valid SA [sections 2.1, 3.4.2] >. xfrm_audit_state_notfound() >. xfrm_audit_state_notfound_simple() > > * Sequence number overflow [section 3.3.3] >. xfrm_audit_state_replay_overflow() > > * Replayed packet [section 3.4.3] >. xfrm_audit_state_replay() > > * Integrity check failure [sections 3.4.4.1, 3.4.4.2] >. xfrm_audit_state_icvfail() > > While RFC4304 deals only with ESP most of the changes in this patch apply to > IPsec in general, i.e. both AH and ESP. The one case, integrity check > failure, where ESP specific code had to be modified the same was done to the > AH code for the sake of consistency. > --- > > include/net/xfrm.h | 14 > net/ipv4/ah4.c |1 > net/ipv4/esp4.c|1 > net/ipv4/xfrm4_input.c |6 +- > net/ipv6/ah6.c |1 > net/ipv6/esp6.c|1 > net/ipv6/xfrm6_input.c | 10 ++- > net/xfrm/xfrm_output.c |2 + > net/xfrm/xfrm_state.c | 155 > ++-- > 9 files changed, 177 insertions(+), 14 deletions(-) > > diff --git a/include/net/xfrm.h b/include/net/xfrm.h > index c02e230..85ce8c1 100644 > --- a/include/net/xfrm.h > +++ b/include/net/xfrm.h > @@ -492,11 +492,22 @@ extern void xfrm_audit_state_add(struct xfrm_state *x, > int result, >u32 auid, u32 secid); > extern void xfrm_audit_state_delete(struct xfrm_state *x, int result, > u32 auid, u32 secid); > +extern void xfrm_audit_state_replay_overflow(struct xfrm_state *x, > + struct sk_buff *skb); > +extern void xfrm_audit_state_notfound_simple(struct sk_buff *skb, u16 > family); > +extern void xfrm_audit_state_notfound(struct sk_buff *skb, u16 family, > + __be32 net_spi, __be32 net_seq); > +extern void xfrm_audit_state_icvfail(struct xfrm_state *x, > + struct sk_buff *skb, u8 proto); > #else > #define xfrm_audit_policy_add(x, r, a, s)do { ; } while (0) > #define xfrm_audit_policy_delete(x, r, a, s) do { ; } while (0) > #define xfrm_audit_state_add(x, r, a, s) do { ; } while (0) > #define xfrm_audit_state_delete(x, r, a, s) do { ; } while (0) > +#define xfrm_audit_state_replay_overflow(x, s) do { ; } while (0) > +#define xfrm_audit_state_notfound_simple(s, f) do { ; } while (0) > +#define xfrm_audit_state_notfound(s, f, sp, sq) do { ; } while (0) > +#define xfrm_audit_state_icvfail(x, s, p)do { ; } while (0) > #endif /* CONFIG_AUDITSYSCALL */ > > static inline void xfrm_pol_hold(struct xfrm_policy *policy) > @@ -1045,7 +1056,8 @@ extern int xfrm_state_delete(struct xfrm_state *x); > extern int xfrm_state_flush(u8 proto, struct xfrm_audit *audit_info); > extern void xfrm_sad_getinfo(struct xfrmk_sadinfo *si); > extern void xfrm_spd_getinfo(struct xfrmk_spdinfo *si); > -extern int xfrm_replay_check(struct xfrm_state *x, __be32 seq); > +extern int xfrm_replay_check(struct xfrm_state *x, > + struct sk_buff *skb, __be32 seq); > extern void xfrm_replay_advance(struct xfrm_state *x, __be32 seq); > extern void xfrm_replay_notify(struct xfrm_state *x, int event); > extern int xfrm_state_mtu(struct xfrm_state *x, int mtu); > diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c > index 5fc346d..8eb19c9 100644 > --- a/net/ipv4/ah4.c > +++ b/net/ipv4/ah4.c > @@ -180,6 +180,7 @@ static int ah_input(struct xfrm_state *x, struct sk_buff > *skb) > err = -EINVAL; > if (memcmp(ahp->work_icv, auth_data, ahp->icv_trunc_len)) { > x->stats.integrity_failed++; > + xfrm_audit_state_icvfail(x, sk
Re: [PATCH] XFRM: assorted IPsec fixups
On Friday 07 December 2007 3:36:08 pm Eric Paris wrote: > On Fri, 2007-12-07 at 12:11 -0500, Paul Moore wrote: > > This patch fixes a number of small but potentially troublesome things in > > the XFRM/IPsec code: > > > > * Use the 'audit_enabled' variable already in include/linux/audit.h > >Removed the need for extern declarations local to each XFRM audit > > fuction {snip} > although it does make me wonder why audit_log_start doesn't just check > audit_enabled itself /me shrugs ... I have no idea, I've just always followed the lead of what was already written, but now that you mention it - it doesn't make much sense. I suppose at some point we can go through and change all the 'audit_enabled' users, but I wonder if there is some point (?performance?) to having the callers check? -- paul moore linux security @ hp -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] XFRM: assorted IPsec fixups
On Fri, 2007-12-07 at 12:11 -0500, Paul Moore wrote: > This patch fixes a number of small but potentially troublesome things in the > XFRM/IPsec code: > > * Use the 'audit_enabled' variable already in include/linux/audit.h >Removed the need for extern declarations local to each XFRM audit fuction > > * Convert 'sid' to 'secid' >The 'sid' name is specific to SELinux, 'secid' is the common naming >convention used by the kernel when refering to tokenized LSM labels > > * Convert address display to use standard NIP* macros >Similar to what was recently done with the SPD audit code, this also >includes the removal of some unnecessary memcpy() calls > > * Move common code to xfrm_audit_common_stateinfo() >Code consolidation from the "less is more" book on software development > > * Convert the SPI in audit records to host byte order >The current SPI values in the audit record are being displayed in >network byte order, probably not what was intended > > * Proper spacing around commas in function arguments >Minor style tweak since I was already touching the code > > Signed-off-by: Paul Moore <[EMAIL PROTECTED]> Acked-by: Eric Paris <[EMAIL PROTECTED]> although it does make me wonder why audit_log_start doesn't just check audit_enabled itself Anyway, this patch looks good. > --- > > include/linux/xfrm.h|2 + > include/net/xfrm.h | 18 ++-- > net/xfrm/xfrm_policy.c | 15 +- > net/xfrm/xfrm_state.c | 69 > +-- > security/selinux/xfrm.c | 20 +++--- > 5 files changed, 58 insertions(+), 66 deletions(-) > > diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h > index b58adc5..f75a337 100644 > --- a/include/linux/xfrm.h > +++ b/include/linux/xfrm.h > @@ -31,7 +31,7 @@ struct xfrm_sec_ctx { > __u8ctx_doi; > __u8ctx_alg; > __u16 ctx_len; > - __u32 ctx_sid; > + __u32 ctx_secid; > charctx_str[0]; > }; > > diff --git a/include/net/xfrm.h b/include/net/xfrm.h > index 58dfa82..c02e230 100644 > --- a/include/net/xfrm.h > +++ b/include/net/xfrm.h > @@ -462,7 +462,7 @@ struct xfrm_audit > }; > > #ifdef CONFIG_AUDITSYSCALL > -static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 sid) > +static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 secid) > { > struct audit_buffer *audit_buf = NULL; > char *secctx; > @@ -475,8 +475,8 @@ static inline struct audit_buffer *xfrm_audit_start(u32 > auid, u32 sid) > > audit_log_format(audit_buf, "auid=%u", auid); > > - if (sid != 0 && > - security_secid_to_secctx(sid, &secctx, &secctx_len) == 0) { > + if (secid != 0 && > + security_secid_to_secctx(secid, &secctx, &secctx_len) == 0) { > audit_log_format(audit_buf, " subj=%s", secctx); > security_release_secctx(secctx, secctx_len); > } else > @@ -485,13 +485,13 @@ static inline struct audit_buffer *xfrm_audit_start(u32 > auid, u32 sid) > } > > extern void xfrm_audit_policy_add(struct xfrm_policy *xp, int result, > - u32 auid, u32 sid); > + u32 auid, u32 secid); > extern void xfrm_audit_policy_delete(struct xfrm_policy *xp, int result, > - u32 auid, u32 sid); > + u32 auid, u32 secid); > extern void xfrm_audit_state_add(struct xfrm_state *x, int result, > - u32 auid, u32 sid); > + u32 auid, u32 secid); > extern void xfrm_audit_state_delete(struct xfrm_state *x, int result, > - u32 auid, u32 sid); > + u32 auid, u32 secid); > #else > #define xfrm_audit_policy_add(x, r, a, s)do { ; } while (0) > #define xfrm_audit_policy_delete(x, r, a, s) do { ; } while (0) > @@ -621,13 +621,13 @@ extern int xfrm_selector_match(struct xfrm_selector > *sel, struct flowi *fl, > > #ifdef CONFIG_SECURITY_NETWORK_XFRM > /* If neither has a context --> match > - * Otherwise, both must have a context and the sids, doi, alg must match > + * Otherwise, both must have a context and the secids, doi, alg must match > */ > static inline int xfrm_sec_ctx_match(struct xfrm_sec_ctx *s1, struct > xfrm_sec_ctx *s2) > { > return ((!s1 && !s2) || > (s1 && s2 && > - (s1->ctx_sid == s2->ctx_sid) && > + (s1->ctx_secid == s2->ctx_secid) && >(s1->ctx_doi == s2->ctx_doi) && >(s1->ctx_alg == s2->ctx_alg))); > } > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c > index b702bd8..75f25c4 100644 > --- a/net/xfrm/xfrm_policy.c > +++ b/net/xfrm/xfrm_policy.c > @@ -23,6 +23,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -2150,15 +2151,14 @@ static inline void > xfrm_audit_common_pol
[git patches] net driver fixes
Nothing remarkable. Mainly bonding fixes and bringing ibm_newemac up to snuff. Please pull from 'upstream-linus' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream-linus to receive the following updates: Documentation/networking/bonding.txt | 29 - arch/powerpc/boot/dts/sequoia.dts|5 ++ drivers/net/Kconfig |1 + drivers/net/bonding/bond_main.c | 116 +- drivers/net/bonding/bond_sysfs.c | 94 +--- drivers/net/bonding/bonding.h|4 +- drivers/net/cxgb3/regs.h | 27 - drivers/net/cxgb3/t3_hw.c|6 +- drivers/net/cxgb3/xgmac.c| 44 +- drivers/net/e100.c |6 +- drivers/net/e1000/e1000_ethtool.c|2 +- drivers/net/e1000e/ethtool.c |2 +- drivers/net/ibm_newemac/core.c | 56 +++- drivers/net/ibm_newemac/core.h | 11 +++- drivers/net/ibm_newemac/debug.c |5 ++ drivers/net/ibm_newemac/debug.h |5 ++ drivers/net/ibm_newemac/emac.h |5 ++ drivers/net/ibm_newemac/mal.c|5 ++ drivers/net/ibm_newemac/mal.h|5 ++ drivers/net/ibm_newemac/phy.c| 81 +++ drivers/net/ibm_newemac/phy.h|5 ++ drivers/net/ibm_newemac/rgmii.c | 25 +--- drivers/net/ibm_newemac/rgmii.h | 10 +++- drivers/net/ibm_newemac/tah.c|8 ++- drivers/net/ibm_newemac/tah.h|5 ++ drivers/net/ibm_newemac/zmii.c |9 +++- drivers/net/ibm_newemac/zmii.h |5 ++ drivers/net/s2io-regs.h |1 + drivers/net/s2io.c | 16 +- include/linux/if_bonding.h |3 +- 30 files changed, 423 insertions(+), 173 deletions(-) Auke Kok (1): e100: cleanup unneeded math Benjamin Herrenschmidt (5): ibm_newemac: Fix ZMII refcounting bug ibm_newemac: Workaround reset timeout when no link ibm_newemac: Cleanup/Fix RGMII MDIO support detection ibm_newemac: Cleanup/fix support for STACR register variants ibm_newemac: Update file headers copyright notices David Sterba (1): bonding: Fix time comparison Divy Le Ray (1): cxgb3 - T3C support update Eliezer Tamir (1): make bnx2x select ZLIB_INFLATE Hugh Blemings (1): ibm_newemac: Skip EMACs that are marked unused by the firmware Jay Vosburgh (2): bonding: Add new layer2+3 hash for xor/802.3ad modes bonding: Fix race at module unload Roel Kluin (1): e1000: fix memcpy in e1000_get_strings Sreenivasa Honnur (1): S2io: Check for register initialization completion before accesing device registers Stefan Roese (2): ibm_newemac: Add BCM5248 and Marvell 88E PHY support ibm_newemac: Add ET1011c PHY support Valentine Barshak (3): ibm_newemac: Correct opb_bus_freq value ibm_newemac: Fix typo reading TAH channel info ibm_newemac: Call dev_set_drvdata() before tah_reset() Wagner Ferenc (5): bonding: Remove trailing NULs from sysfs interface. bonding: Return nothing for not applicable values bonding: Purely cosmetic: rename a local variable bonding: Coding style: break line after the if condition bonding: Allow setting and querying xmit policy regardless of mode diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 1134062..6cc30e0 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -554,6 +554,30 @@ xmit_hash_policy This algorithm is 802.3ad compliant. + layer2+3 + + This policy uses a combination of layer2 and layer3 + protocol information to generate the hash. + + Uses XOR of hardware MAC addresses and IP addresses to + generate the hash. The formula is + + (((source IP XOR dest IP) AND 0x) XOR + ( source MAC XOR destination MAC )) + modulo slave count + + This algorithm will place all traffic to a particular + network peer on the same slave. For non-IP traffic, + the formula is the same as for the layer2 transmit + hash policy. + + This policy is intended to provide a more balanced + distribution of traffic than layer2 alone, especially + in environments where a layer3 gateway device is + required to reach most destinations. + + This algorithm is 802.3ad complient. + layer3+4 This policy uses upper layer protocol information, @@ -589,8 +613,9 @@ xmit_hash_policy or may not tolerate this noncompliance. The default value is layer2. This option was added in bonding -version 2.6.3. In earlier versio
Re: [PATCH 2.6.24 1/1]S2io: Check for register initialization completion before accesing device registers
Sreenivasa Honnur wrote: - Making sure register initialisation is complete before proceeding further. The driver must wait until initialization is complete before attempting to access any other device registers. Signed-off-by: Surjit Reang <[EMAIL PROTECTED]> Signed-off-by: Sreenivasa Honnur <[EMAIL PROTECTED]> applied #upstream-fixes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/11] ibm_newemac: Add BCM5248 and Marvell 88E1111 PHY support
Benjamin Herrenschmidt wrote: From: Stefan Roese <[EMAIL PROTECTED]> This patch adds BCM5248 and Marvell 88E PHY support to NEW EMAC driver. These PHY chips are used on PowerPC 440EPx boards. The PHY code is based on the previous work by Stefan Roese <[EMAIL PROTECTED]> Signed-off-by: Stefan Roese <[EMAIL PROTECTED]> Signed-off-by: Valentine Barshak <[EMAIL PROTECTED]> Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]> --- drivers/net/ibm_newemac/phy.c | 39 +++ 1 file changed, 39 insertions(+) applied 1-11 #upstream-fixes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] e1000: fix memcpy in e1000_get_strings
Auke Kok wrote: From: Roel Kluin <[EMAIL PROTECTED]> drivers/net/e1000/e1000_ethtool.c:113: #define E1000_TEST_LEN sizeof(e1000_gstrings_test) / ETH_GSTRING_LEN drivers/net/e1000e/ethtool.c:106: #define E1000_TEST_LEN sizeof(e1000_gstrings_test) / ETH_GSTRING_LEN E1000_TEST_LEN*ETH_GSTRING_LEN will expand to sizeof(e1000_gstrings_test) / (ETH_GSTRING_LEN * ETH_GSTRING_LEN) A lack of parentheses around defines causes unexpected results due to operator precedences. Signed-off-by: Roel Kluin <[EMAIL PROTECTED]> Signed-off-by: Auke Kok <[EMAIL PROTECTED]> --- drivers/net/e1000/e1000_ethtool.c |2 +- drivers/net/e1000e/ethtool.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) applied 1-2 to #upstream-fixes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch resend build-breakage] make bnx2x select ZLIB_INFLATE
Eliezer Tamir wrote: The bnx2x module depends on the zlib_inflate functions. The build will fail if ZLIB_INFLATE has not been selected manually or by building another module that automatically selects it. Modify BNX2X config option to 'select ZLIB_INFLATE' like BNX2 and others. This seems to fix it. Signed-off-by: Lee Schermerhorn <[EMAIL PROTECTED]> Acked-by: Eliezer Tamir <[EMAIL PROTECTED]> --- drivers/net/Kconfig |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 5bafb30..b9d7f5b 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2594,6 +2594,7 @@ config TEHUTI config BNX2X tristate "Broadcom NetXtremeII 10Gb support" depends on PCI + select ZLIB_INFLATE help applied #upstream-fixes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] cxgb3 - Parity initialization for T3C adapters
Divy Le Ray wrote: From: Divy Le Ray <[EMAIL PROTECTED]> Add parity initialization for T3C adapters. Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]> --- drivers/net/cxgb3/adapter.h |1 drivers/net/cxgb3/cxgb3_main.c| 82 drivers/net/cxgb3/cxgb3_offload.c | 15 ++ drivers/net/cxgb3/regs.h | 248 + drivers/net/cxgb3/sge.c | 24 +++- drivers/net/cxgb3/t3_hw.c | 131 +--- 6 files changed, 472 insertions(+), 29 deletions(-) dropped patches 2-3, did not apply -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/8] bonding: Remove trailing NULs from sysfs interface.
Jay Vosburgh wrote: From: Wagner Ferenc <[EMAIL PROTECTED]> From: Wagner Ferenc <[EMAIL PROTECTED]> Also remove trailing spaces from multivalued files. This fixes output like for example: $ od -c /sys/class/net/bond0/bonding/slaves 000 e t h - l e f t e t h - r i g 020 h t \n \0 025 It mostly entails deleting '+1'-s after sprintf() calls: the return value of sprintf is the number of characters printed, without the closing NUL, ie. exactly what the sysfs interface requires. The three multivalue cases are different, because they also have to swallow back a trailing space. Signed-off-by: Ferenc Wagner <[EMAIL PROTECTED]> Acked-by: Jay Vosburgh <[EMAIL PROTECTED]> --- drivers/net/bonding/bond_sysfs.c | 66 + 1 files changed, 30 insertions(+), 36 deletions(-) applied 1-8 to #upstream-fixes Your script is duplicating the "From: " line twice -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] cxgb3 - T3C support update
Divy Le Ray wrote: From: Divy Le Ray <[EMAIL PROTECTED]> Update GPIO mapping for T3C. Update xgmac for T3C support. Fix typo in mtu table. Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]> applied #upstream-fixes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] XFRM: RFC4303 compliant auditing
NOTE: This really is an RFC patch, it compiles and boots but that is pretty much all I can promise at this point. I'm posting this patch to gather feedback from the audit crowd about the continued overloading of the AUDIT_MAC_IPSEC_EVENT message type - continue to use it or create a new audit message type? Of course any other comments people may have are always welcome. This patch adds a number of new IPsec audit events to meet the auditing requirements of RFC4303. This includes audit hooks for the following events: * Could not find a valid SA [sections 2.1, 3.4.2] . xfrm_audit_state_notfound() . xfrm_audit_state_notfound_simple() * Sequence number overflow [section 3.3.3] . xfrm_audit_state_replay_overflow() * Replayed packet [section 3.4.3] . xfrm_audit_state_replay() * Integrity check failure [sections 3.4.4.1, 3.4.4.2] . xfrm_audit_state_icvfail() While RFC4304 deals only with ESP most of the changes in this patch apply to IPsec in general, i.e. both AH and ESP. The one case, integrity check failure, where ESP specific code had to be modified the same was done to the AH code for the sake of consistency. --- include/net/xfrm.h | 14 net/ipv4/ah4.c |1 net/ipv4/esp4.c|1 net/ipv4/xfrm4_input.c |6 +- net/ipv6/ah6.c |1 net/ipv6/esp6.c|1 net/ipv6/xfrm6_input.c | 10 ++- net/xfrm/xfrm_output.c |2 + net/xfrm/xfrm_state.c | 155 ++-- 9 files changed, 177 insertions(+), 14 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index c02e230..85ce8c1 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -492,11 +492,22 @@ extern void xfrm_audit_state_add(struct xfrm_state *x, int result, u32 auid, u32 secid); extern void xfrm_audit_state_delete(struct xfrm_state *x, int result, u32 auid, u32 secid); +extern void xfrm_audit_state_replay_overflow(struct xfrm_state *x, +struct sk_buff *skb); +extern void xfrm_audit_state_notfound_simple(struct sk_buff *skb, u16 family); +extern void xfrm_audit_state_notfound(struct sk_buff *skb, u16 family, + __be32 net_spi, __be32 net_seq); +extern void xfrm_audit_state_icvfail(struct xfrm_state *x, +struct sk_buff *skb, u8 proto); #else #define xfrm_audit_policy_add(x, r, a, s) do { ; } while (0) #define xfrm_audit_policy_delete(x, r, a, s) do { ; } while (0) #define xfrm_audit_state_add(x, r, a, s) do { ; } while (0) #define xfrm_audit_state_delete(x, r, a, s)do { ; } while (0) +#define xfrm_audit_state_replay_overflow(x, s) do { ; } while (0) +#define xfrm_audit_state_notfound_simple(s, f) do { ; } while (0) +#define xfrm_audit_state_notfound(s, f, sp, sq)do { ; } while (0) +#define xfrm_audit_state_icvfail(x, s, p) do { ; } while (0) #endif /* CONFIG_AUDITSYSCALL */ static inline void xfrm_pol_hold(struct xfrm_policy *policy) @@ -1045,7 +1056,8 @@ extern int xfrm_state_delete(struct xfrm_state *x); extern int xfrm_state_flush(u8 proto, struct xfrm_audit *audit_info); extern void xfrm_sad_getinfo(struct xfrmk_sadinfo *si); extern void xfrm_spd_getinfo(struct xfrmk_spdinfo *si); -extern int xfrm_replay_check(struct xfrm_state *x, __be32 seq); +extern int xfrm_replay_check(struct xfrm_state *x, +struct sk_buff *skb, __be32 seq); extern void xfrm_replay_advance(struct xfrm_state *x, __be32 seq); extern void xfrm_replay_notify(struct xfrm_state *x, int event); extern int xfrm_state_mtu(struct xfrm_state *x, int mtu); diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c index 5fc346d..8eb19c9 100644 --- a/net/ipv4/ah4.c +++ b/net/ipv4/ah4.c @@ -180,6 +180,7 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb) err = -EINVAL; if (memcmp(ahp->work_icv, auth_data, ahp->icv_trunc_len)) { x->stats.integrity_failed++; + xfrm_audit_state_icvfail(x, skb, IPPROTO_AH); goto out; } } diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index c31bccb..00ec285 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -183,6 +183,7 @@ static int esp_input(struct xfrm_state *x, struct sk_buff *skb) if (unlikely(memcmp(esp->auth.work_icv, sum, alen))) { x->stats.integrity_failed++; + xfrm_audit_state_icvfail(x, skb, IPPROTO_ESP); goto out; } } diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c index 5e95c8a..6d7be5e 100644 --- a/net/ipv4/xfrm4_input.c +++ b/net/ipv4/xfrm4_input.c @@ -56,8 +56,10 @@ int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi, x = xfrm_state_lookup((x
[PATCH] [IPV6] XFRM: Fix auditing rt6i_flags; use RTF_xxx flags instead of RTCF_xxx.
RTCF_xxx flags, defined in include/linux/in_route.h) are available for IPv4 route (rtable) entries only. Use RTF_xxx flags instead, defined in include/linux/ipv6_route.h, for IPv6 route entries (rt6_info). Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> -- diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index 82e27b8..b8e9eb4 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -233,7 +233,7 @@ __xfrm6_bundle_create(struct xfrm_policy *policy, struct xfrm_state **xfrm, int dst_prev->output = dst_prev->xfrm->outer_mode->afinfo->output; /* Sheit... I remember I did this right. Apparently, * it was magically lost, so this code needs audit */ - x->u.rt6.rt6i_flags= rt0->rt6i_flags&(RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL); + x->u.rt6.rt6i_flags= rt0->rt6i_flags&(RTF_ANYCAST|RTF_LOCAL); x->u.rt6.rt6i_metric = rt0->rt6i_metric; x->u.rt6.rt6i_node = rt0->rt6i_node; x->u.rt6.rt6i_gateway = rt0->rt6i_gateway; -- YOSHIFUJI Hideaki @ USAGI Project <[EMAIL PROTECTED]> GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 06/22] NET: DM9000: Use kthread to probe MII status when device open
On Fri, Nov 23, 2007 at 08:38:51PM -0500, Jeff Garzik wrote: > seems like a delayed workqueue would be most appropriate for this. I like the fact that the use of kthread shows the user how much cpu time is being used by the execution of monitoring the phy. How badly do people object to using a kthread? -- Ben ([EMAIL PROTECTED], http://www.fluff.org/) 'a smiley only costs 4 bytes' -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 22/22] NET: DM9000: Show the MAC address source after printing MAC
On Fri, Nov 23, 2007 at 08:43:04PM -0500, Jeff Garzik wrote: > ACK patches 16-22 Is reposting here ok to get these queued for the next kernel release, or are there people to CC: for this? -- Ben ([EMAIL PROTECTED], http://www.fluff.org/) 'a smiley only costs 4 bytes' -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] TCP illinois max rtt aging
On Fri, 7 Dec 2007, Ilpo Järvinen wrote: > On Fri, 7 Dec 2007, David Miller wrote: > > > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > > Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET) > > > > > I guess if you get a large cumulative ACK, the amount of processing is > > > still overwhelming (added DaveM if he has some idea how to combat it). > > > > > > Even a simple scenario (this isn't anything fancy at all, will occur all > > > the time): Just one loss => rest skbs grow one by one into a single > > > very large SACK block (and we do that efficiently for sure) => then the > > > fast retransmit gets delivered and a cumulative ACK for whole orig_window > > > arrives => clean_rtx_queue has to do a lot of processing. In this case we > > > could optimize RB-tree cleanup away (by just blanking it all) but still > > > getting rid of all those skbs is going to take a larger moment than I'd > > > like to see. > > > > > > That tree blanking could be extended to cover anything which ACK more > > > than > > > half of the tree by just replacing the root (and dealing with potential > > > recolorization of the root). > > > > Yes, it's the classic problem. But it ought to be at least > > partially masked when TSO is in use, because we'll only process > > a handful of SKBs. The more effectively TSO batches, the > > less work clean_rtx_queue() will do. > > No, that's not what is going to happen, TSO won't help at all > because one-by-one SACKs will fragment every single one of them > (see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO > case, or am I missing something? Hmm... this could be solved though by postponing the fragmentation of a partially sacked skb when the first sack block can (is likely) to still grow and remove the need for fragmentation. Has some implications to packet processing, increases burstiness a bit & tcp_max_burst kicks in too easily. -- i.
[PATCH] Use BUILD_BUG_ON in inet_timewait_sock.c checks
Make the INET_TWDR_TWKILL_SLOTS vs sizeof(twdr->thread_slots) check nicer. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index a60b99e..d43e787 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -194,16 +194,14 @@ out: EXPORT_SYMBOL_GPL(inet_twdr_hangman); -extern void twkill_slots_invalid(void); - void inet_twdr_twkill_work(struct work_struct *work) { struct inet_timewait_death_row *twdr = container_of(work, struct inet_timewait_death_row, twkill_work); int i; - if ((INET_TWDR_TWKILL_SLOTS - 1) > (sizeof(twdr->thread_slots) * 8)) - twkill_slots_invalid(); + BUILD_BUG_ON((INET_TWDR_TWKILL_SLOTS - 1) > + (sizeof(twdr->thread_slots) * 8)); while (twdr->thread_slots) { spin_lock_bh(&twdr->death_lock); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Use BUILD_BUG_ON for tcp_skb_cb size checking
The sizeof(struct tcp_skb_cb) should not be less than the sizeof(skb->cb). This is checked in net/ipv4/tcp.c, but this check can be made more gracefully. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 8e65182..c8bebd3 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2411,7 +2411,6 @@ void tcp_done(struct sock *sk) } EXPORT_SYMBOL_GPL(tcp_done); -extern void __skb_cb_too_small_for_tcp(int, int); extern struct tcp_congestion_ops tcp_reno; static __initdata unsigned long thash_entries; @@ -2430,9 +2429,7 @@ void __init tcp_init(void) unsigned long limit; int order, i, max_share; - if (sizeof(struct tcp_skb_cb) > sizeof(skb->cb)) - __skb_cb_too_small_for_tcp(sizeof(struct tcp_skb_cb), - sizeof(skb->cb)); + BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > sizeof(skb->cb)); tcp_hashinfo.bind_bucket_cachep = kmem_cache_create("tcp_bind_bucket", -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] XFRM: assorted IPsec fixups
This patch fixes a number of small but potentially troublesome things in the XFRM/IPsec code: * Use the 'audit_enabled' variable already in include/linux/audit.h Removed the need for extern declarations local to each XFRM audit fuction * Convert 'sid' to 'secid' The 'sid' name is specific to SELinux, 'secid' is the common naming convention used by the kernel when refering to tokenized LSM labels * Convert address display to use standard NIP* macros Similar to what was recently done with the SPD audit code, this also includes the removal of some unnecessary memcpy() calls * Move common code to xfrm_audit_common_stateinfo() Code consolidation from the "less is more" book on software development * Convert the SPI in audit records to host byte order The current SPI values in the audit record are being displayed in network byte order, probably not what was intended * Proper spacing around commas in function arguments Minor style tweak since I was already touching the code Signed-off-by: Paul Moore <[EMAIL PROTECTED]> --- include/linux/xfrm.h|2 + include/net/xfrm.h | 18 ++-- net/xfrm/xfrm_policy.c | 15 +- net/xfrm/xfrm_state.c | 69 +-- security/selinux/xfrm.c | 20 +++--- 5 files changed, 58 insertions(+), 66 deletions(-) diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h index b58adc5..f75a337 100644 --- a/include/linux/xfrm.h +++ b/include/linux/xfrm.h @@ -31,7 +31,7 @@ struct xfrm_sec_ctx { __u8ctx_doi; __u8ctx_alg; __u16 ctx_len; - __u32 ctx_sid; + __u32 ctx_secid; charctx_str[0]; }; diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 58dfa82..c02e230 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -462,7 +462,7 @@ struct xfrm_audit }; #ifdef CONFIG_AUDITSYSCALL -static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 sid) +static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 secid) { struct audit_buffer *audit_buf = NULL; char *secctx; @@ -475,8 +475,8 @@ static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 sid) audit_log_format(audit_buf, "auid=%u", auid); - if (sid != 0 && - security_secid_to_secctx(sid, &secctx, &secctx_len) == 0) { + if (secid != 0 && + security_secid_to_secctx(secid, &secctx, &secctx_len) == 0) { audit_log_format(audit_buf, " subj=%s", secctx); security_release_secctx(secctx, secctx_len); } else @@ -485,13 +485,13 @@ static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 sid) } extern void xfrm_audit_policy_add(struct xfrm_policy *xp, int result, - u32 auid, u32 sid); + u32 auid, u32 secid); extern void xfrm_audit_policy_delete(struct xfrm_policy *xp, int result, - u32 auid, u32 sid); + u32 auid, u32 secid); extern void xfrm_audit_state_add(struct xfrm_state *x, int result, -u32 auid, u32 sid); +u32 auid, u32 secid); extern void xfrm_audit_state_delete(struct xfrm_state *x, int result, - u32 auid, u32 sid); + u32 auid, u32 secid); #else #define xfrm_audit_policy_add(x, r, a, s) do { ; } while (0) #define xfrm_audit_policy_delete(x, r, a, s) do { ; } while (0) @@ -621,13 +621,13 @@ extern int xfrm_selector_match(struct xfrm_selector *sel, struct flowi *fl, #ifdef CONFIG_SECURITY_NETWORK_XFRM /* If neither has a context --> match - * Otherwise, both must have a context and the sids, doi, alg must match + * Otherwise, both must have a context and the secids, doi, alg must match */ static inline int xfrm_sec_ctx_match(struct xfrm_sec_ctx *s1, struct xfrm_sec_ctx *s2) { return ((!s1 && !s2) || (s1 && s2 && -(s1->ctx_sid == s2->ctx_sid) && +(s1->ctx_secid == s2->ctx_secid) && (s1->ctx_doi == s2->ctx_doi) && (s1->ctx_alg == s2->ctx_alg))); } diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index b702bd8..75f25c4 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -2150,15 +2151,14 @@ static inline void xfrm_audit_common_policyinfo(struct xfrm_policy *xp, } } -void -xfrm_audit_policy_add(struct xfrm_policy *xp, int result, u32 auid, u32 sid) +void xfrm_audit_policy_add(struct xfrm_policy *xp, int result, + u32 auid, u32 secid) { struct audit_buffer *audit_buf; - extern int audit_enabled; if (audit_enabled == 0) return; - audit_buf = xfrm_aud
Re: TCP event tracking via netlink...
On Thu, 6 Dec 2007, David Miller wrote: > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > Date: Thu, 6 Dec 2007 01:18:28 +0200 (EET) > > > On Wed, 5 Dec 2007, David Miller wrote: > > > > > I assume you're using something like carefully crafted printk's, > > > kprobes, or even ad-hoc statistic counters. That's what I used to do > > > :-) > > > > No, that's not at all what I do :-). I usually look time-seq graphs > > expect for the cases when I just find things out by reading code (or > > by just thinking of it). > > Can you briefly detail what graph tools and command lines > you are using? I have a tool called Sealion but it's behind NDA (making it open source has been talked for long but I don't have idea why it hasn't realized yet). It's mostly tcl/tk code is, by no means nice or clean desing nor quality (I'll leave details why I think it's that way out of this discussion :-)). Produces svgs. Usually I'm have the things I need in the standard sent+ACK+SACKs(+win) graph it produces. The result is quite similar to what tcptrace+xplot produces but xplot UI is really horrible, IMHO. If I have to deal with tcpdump output only, it takes considerable amount of time to do computations with bc to come up with the same understanding by just reading tcpdumps. > The last time I did graphing to analyze things, the tools > were hit-or-miss. Yeah, this is definately true. Open source graphing tools I know are really not that astonishing :-(. I've tried to look for better tools as well but with little success. > > Much of the info is available in tcpdump already, it's just hard to read > > without graphing it first because there are some many overlapping things > > to track in two-dimensional space. > > > > ...But yes, I have to admit that couple of problems come to my mind > > where having some variable from tcp_sock would have made the problem > > more obvious. > > The most important are the cwnd and ssthresh, which you could guess > using graphs but it is important to know on a packet to packet > basis why we might have sent a packet or not because this has > rippling effects down the rest of the RTT. Couple of points: In order to evaluate validity of some action, one might need more than one packet from the history. Answer to the why we have sent a packet is rather simple (excluding RTOs): cwnd > packets_in_flight and data was available. No, it's not at all complicated. Though I might be too biased toward non-application limited cases which make the formula even simpler because everything is basically ACK clocked. To really tell what caused changes between cwnd and/or packets_in_flight one usually needs some history or more fine-grained approach, once per packet is way too wide gap. It tells just what happened, not why, unless you're really familiar with the state machine and can make the right guess. > > Not sure what is the benefit of having distributions with it because > > those people hardly report problems anyway to here, they're just too > > happy with TCP performance unless we print something to their logs, > > which implies that we must setup a *_ON() condition :-(. > > That may be true, but if we could integrate the information with > tcpdumps, we could gather internal state using tools the user > already has available. It would definately help if we could, but that of course depends on getting the reports in the first place. > Imagine if tcpdump printed out: > > 02:26:14.865805 IP $SRC > $DEST: . 11226:12686(1460) ack 0 win 108 > ss_thresh: 129 cwnd: 133 packets_out: 132 > > or something like that. How about this: 02:26:14.865805 IP $SRC > $DEST: . ack 11226 win 108 <...sack 1 {15606:18526} 17066:18526 0->S sacktag_one l0 s1 r0 f4 pc1 ... 11226:12686 clean_rtx_queue ... 11226:12686 0->L mark_head_lost l1 s1 r0 f4 pc1 ... 12686:14146 0->L mark_head_lost l2 s1 r0 f4 pc1 ... 11226:12686 L->LRe retransmit_skb l2 s1 r1 f4 pc1 ... ...would make the bug in sack processing relatively obvious (yes, it has an intentional flaw in it, points from find it :-))... That would be something I'd like to have right now. > But sometimes the algorithms are working as designed, it's just that > they provide poor pipe utilization and CWND analysis embedded inside > of a tcpdump would be one way to see that as well as determine the > flaw in the algorithm. Fair enough. > It is untested since I didn't write the userland app yet to see that > proper things get logged. Basically you could run a daemon that > writes per-connection traces into files based upon the incoming > netlink events. Later, using the binary pcap file and these traces, > you can piece together traces like the above using the timestamps > etc. to match up pcap packets to ones from the TCP logger. > > The userland tools could do analysis and print pre-cooked state diff > logs, like "this ACK raised CWND by one" or whatever else you wanted > to know. Obviously a collection of useful userland tools see
[PATCH net-2.6.25] Cleanup sysctl manipulations in devinet.c
This includes: * moving neigh_sysctl_(un)register calls inside devinet_sysctl_(un)register ones, as they are always called in pairs; * making __devinet_sysctl_unregister() to unregister the ipv4_devconf struct, while original devinet_sysctl_unregister() works with the in_device to handle both - devconf and neigh sysctls; * make stubs for CONFIG_SYSCTL=n case to get rid of in-code ifdefs. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 0b5f042..872883e 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -99,7 +99,14 @@ static void inet_del_ifa(struct in_device *in_dev, struct in_ifaddr **ifap, int destroy); #ifdef CONFIG_SYSCTL static void devinet_sysctl_register(struct in_device *idev); -static void devinet_sysctl_unregister(struct ipv4_devconf *p); +static void devinet_sysctl_unregister(struct in_device *idev); +#else +static inline void devinet_sysctl_register(struct in_device *idev) +{ +} +static inline void devinet_sysctl_unregister(struct in_device *idev) +{ +} #endif /* Locks all the inet devices. */ @@ -163,17 +170,10 @@ static struct in_device *inetdev_init(struct net_device *dev) goto out_kfree; /* Reference in_dev->dev */ dev_hold(dev); -#ifdef CONFIG_SYSCTL - neigh_sysctl_register(dev, in_dev->arp_parms, NET_IPV4, - NET_IPV4_NEIGH, "ipv4", NULL, NULL); -#endif - /* Account for reference dev->ip_ptr (below) */ in_dev_hold(in_dev); -#ifdef CONFIG_SYSCTL devinet_sysctl_register(in_dev); -#endif ip_mc_init_dev(in_dev); if (dev->flags & IFF_UP) ip_mc_up(in_dev); @@ -212,15 +212,9 @@ static void inetdev_destroy(struct in_device *in_dev) inet_free_ifa(ifa); } -#ifdef CONFIG_SYSCTL - devinet_sysctl_unregister(&in_dev->cnf); -#endif - dev->ip_ptr = NULL; -#ifdef CONFIG_SYSCTL - neigh_sysctl_unregister(in_dev->arp_parms); -#endif + devinet_sysctl_unregister(in_dev); neigh_parms_release(&arp_tbl, in_dev->arp_parms); arp_ifdown(dev); @@ -1114,13 +1108,8 @@ static int inetdev_event(struct notifier_block *this, unsigned long event, */ inetdev_changename(dev, in_dev); -#ifdef CONFIG_SYSCTL - devinet_sysctl_unregister(&in_dev->cnf); - neigh_sysctl_unregister(in_dev->arp_parms); - neigh_sysctl_register(dev, in_dev->arp_parms, NET_IPV4, - NET_IPV4_NEIGH, "ipv4", NULL, NULL); + devinet_sysctl_unregister(in_dev); devinet_sysctl_register(in_dev); -#endif break; } out: @@ -1519,21 +1508,31 @@ out: return; } +static void __devinet_sysctl_unregister(struct ipv4_devconf *cnf) +{ + struct devinet_sysctl_table *t = cnf->sysctl; + + if (t == NULL) + return; + + cnf->sysctl = NULL; + unregister_sysctl_table(t->sysctl_header); + kfree(t->dev_name); + kfree(t); +} + static void devinet_sysctl_register(struct in_device *idev) { - return __devinet_sysctl_register(idev->dev->name, idev->dev->ifindex, + neigh_sysctl_register(idev->dev, idev->arp_parms, NET_IPV4, + NET_IPV4_NEIGH, "ipv4", NULL, NULL); + __devinet_sysctl_register(idev->dev->name, idev->dev->ifindex, &idev->cnf); } -static void devinet_sysctl_unregister(struct ipv4_devconf *p) +static void devinet_sysctl_unregister(struct in_device *idev) { - if (p->sysctl) { - struct devinet_sysctl_table *t = p->sysctl; - p->sysctl = NULL; - unregister_sysctl_table(t->sysctl_header); - kfree(t->dev_name); - kfree(t); - } + __devinet_sysctl_unregister(&idev->cnf); + neigh_sysctl_unregister(idev->arp_parms); } #endif -- 1.5.3.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25] Cleanup IN_DEV_MFORWARD macro
This is essentially IN_DEV_ANDCONF with proper arguments. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h index dd093ea..962a062 100644 --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -78,9 +78,7 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev) (max(IPV4_DEVCONF_ALL(attr), IN_DEV_CONF_GET((in_dev), attr))) #define IN_DEV_FORWARD(in_dev) IN_DEV_CONF_GET((in_dev), FORWARDING) -#define IN_DEV_MFORWARD(in_dev) (IPV4_DEVCONF_ALL(MC_FORWARDING) && \ -IPV4_DEVCONF((in_dev)->cnf, \ - MC_FORWARDING)) +#define IN_DEV_MFORWARD(in_dev)IN_DEV_ANDCONF((in_dev), MC_FORWARDING) #define IN_DEV_RPFILTER(in_dev)IN_DEV_ANDCONF((in_dev), RP_FILTER) #define IN_DEV_SOURCE_ROUTE(in_dev)IN_DEV_ANDCONF((in_dev), \ ACCEPT_SOURCE_ROUTE) -- 1.5.3.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.25 3/3] ipv4: last default route is a fib table property
ipv4: last default route is a fib table property Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> Acked-by: Alexey Kuznetsov <[EMAIL PROTECTED]> --- include/net/ip_fib.h |1 + net/ipv4/fib_hash.c | 16 net/ipv4/fib_trie.c | 18 +- 3 files changed, 18 insertions(+), 17 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 690fb4d..d70b9b4 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -141,6 +141,7 @@ struct fib_table { struct hlist_node tb_hlist; u32 tb_id; unsignedtb_stamp; + int tb_default; int (*tb_lookup)(struct fib_table *tb, const struct flowi *flp, struct fib_result *res); int (*tb_insert)(struct fib_table *, struct fib_config *); int (*tb_delete)(struct fib_table *, struct fib_config *); diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c index a52b570..481de47 100644 --- a/net/ipv4/fib_hash.c +++ b/net/ipv4/fib_hash.c @@ -272,8 +272,6 @@ out: return err; } -static int fn_hash_last_dflt=-1; - static void fn_hash_select_default(struct fib_table *tb, const struct flowi *flp, struct fib_result *res) { @@ -314,9 +312,9 @@ fn_hash_select_default(struct fib_table *tb, const struct flowi *flp, struct fib if (next_fi != res->fi) break; } else if (!fib_detect_death(fi, order, &last_resort, -&last_idx, fn_hash_last_dflt)) { + &last_idx, tb->tb_default)) { fib_result_assign(res, fi); - fn_hash_last_dflt = order; + tb->tb_default = order; goto out; } fi = next_fi; @@ -325,19 +323,20 @@ fn_hash_select_default(struct fib_table *tb, const struct flowi *flp, struct fib } if (order <= 0 || fi == NULL) { - fn_hash_last_dflt = -1; + tb->tb_default = -1; goto out; } - if (!fib_detect_death(fi, order, &last_resort, &last_idx, fn_hash_last_dflt)) { + if (!fib_detect_death(fi, order, &last_resort, &last_idx, + tb->tb_default)) { fib_result_assign(res, fi); - fn_hash_last_dflt = order; + tb->tb_default = order; goto out; } if (last_idx >= 0) fib_result_assign(res, last_resort); - fn_hash_last_dflt = last_idx; + tb->tb_default = last_idx; out: read_unlock(&fib_hash_lock); } @@ -772,6 +771,7 @@ struct fib_table * __init fib_hash_init(u32 id) return NULL; tb->tb_id = id; + tb->tb_default = -1; tb->tb_lookup = fn_hash_lookup; tb->tb_insert = fn_hash_insert; tb->tb_delete = fn_hash_delete; diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 29a06af..850165a 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1779,8 +1779,6 @@ static int fn_trie_flush(struct fib_table *tb) return found; } -static int trie_last_dflt = -1; - static void fn_trie_select_default(struct fib_table *tb, const struct flowi *flp, struct fib_result *res) { @@ -1827,28 +1825,29 @@ fn_trie_select_default(struct fib_table *tb, const struct flowi *flp, struct fib if (next_fi != res->fi) break; } else if (!fib_detect_death(fi, order, &last_resort, -&last_idx, trie_last_dflt)) { +&last_idx, tb->tb_default)) { fib_result_assign(res, fi); - trie_last_dflt = order; + tb->tb_default = order; goto out; } fi = next_fi; order++; } if (order <= 0 || fi == NULL) { - trie_last_dflt = -1; + tb->tb_default = -1; goto out; } - if (!fib_detect_death(fi, order, &last_resort, &last_idx, trie_last_dflt)) { + if (!fib_detect_death(fi, order, &last_resort, &last_idx, + tb->tb_default)) { fib_result_assign(res, fi); - trie_last_dflt = order; + tb->tb_default = order; goto out; } if (last_idx >= 0) fib_result_assign(res, last_resort); - trie_last_dflt = last_idx; - out:; + tb->tb_default = last_idx; +out: rcu_read_unlock(); } @@ -1975,6 +1974,7 @@ struct fib_table * __init fib_hash_init(u32 id) return NULL; tb->tb_id = id; + tb->tb
[PATCH 2.6.25 2/3] ipv4: unify assignment of fi to fib_result
ipv4: unify assignment of fi to fib_result Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> Acked-by: Alexey Kuznetsov <[EMAIL PROTECTED]> --- net/ipv4/fib_hash.c | 19 --- net/ipv4/fib_lookup.h | 10 ++ net/ipv4/fib_trie.c | 19 --- 3 files changed, 18 insertions(+), 30 deletions(-) diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c index 76bb7fd..a52b570 100644 --- a/net/ipv4/fib_hash.c +++ b/net/ipv4/fib_hash.c @@ -315,10 +315,7 @@ fn_hash_select_default(struct fib_table *tb, const struct flowi *flp, struct fib break; } else if (!fib_detect_death(fi, order, &last_resort, &last_idx, fn_hash_last_dflt)) { - if (res->fi) - fib_info_put(res->fi); - res->fi = fi; - atomic_inc(&fi->fib_clntref); + fib_result_assign(res, fi); fn_hash_last_dflt = order; goto out; } @@ -333,21 +330,13 @@ fn_hash_select_default(struct fib_table *tb, const struct flowi *flp, struct fib } if (!fib_detect_death(fi, order, &last_resort, &last_idx, fn_hash_last_dflt)) { - if (res->fi) - fib_info_put(res->fi); - res->fi = fi; - atomic_inc(&fi->fib_clntref); + fib_result_assign(res, fi); fn_hash_last_dflt = order; goto out; } - if (last_idx >= 0) { - if (res->fi) - fib_info_put(res->fi); - res->fi = last_resort; - if (last_resort) - atomic_inc(&last_resort->fib_clntref); - } + if (last_idx >= 0) + fib_result_assign(res, last_resort); fn_hash_last_dflt = last_idx; out: read_unlock(&fib_hash_lock); diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h index 6c9dd42..26ee66d 100644 --- a/net/ipv4/fib_lookup.h +++ b/net/ipv4/fib_lookup.h @@ -38,4 +38,14 @@ extern int fib_detect_death(struct fib_info *fi, int order, struct fib_info **last_resort, int *last_idx, int dflt); +static inline void fib_result_assign(struct fib_result *res, +struct fib_info *fi) +{ + if (res->fi != NULL) + fib_info_put(res->fi); + res->fi = fi; + if (fi != NULL) + atomic_inc(&fi->fib_clntref); +} + #endif /* _FIB_LOOKUP_H */ diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 914a0d2..29a06af 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1828,10 +1828,7 @@ fn_trie_select_default(struct fib_table *tb, const struct flowi *flp, struct fib break; } else if (!fib_detect_death(fi, order, &last_resort, &last_idx, trie_last_dflt)) { - if (res->fi) - fib_info_put(res->fi); - res->fi = fi; - atomic_inc(&fi->fib_clntref); + fib_result_assign(res, fi); trie_last_dflt = order; goto out; } @@ -1844,20 +1841,12 @@ fn_trie_select_default(struct fib_table *tb, const struct flowi *flp, struct fib } if (!fib_detect_death(fi, order, &last_resort, &last_idx, trie_last_dflt)) { - if (res->fi) - fib_info_put(res->fi); - res->fi = fi; - atomic_inc(&fi->fib_clntref); + fib_result_assign(res, fi); trie_last_dflt = order; goto out; } - if (last_idx >= 0) { - if (res->fi) - fib_info_put(res->fi); - res->fi = last_resort; - if (last_resort) - atomic_inc(&last_resort->fib_clntref); - } + if (last_idx >= 0) + fib_result_assign(res, last_resort); trie_last_dflt = last_idx; out:; rcu_read_unlock(); -- 1.5.3.rc5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.25 1/3] ipv4: no need pass pointer to a default into fib_detect_death
ipv4: no need pass pointer to a default into fib_detect_death Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> Acked-by: Alexey Kuznetsov <[EMAIL PROTECTED]> --- net/ipv4/fib_hash.c |4 ++-- net/ipv4/fib_lookup.h|2 +- net/ipv4/fib_semantics.c |6 +++--- net/ipv4/fib_trie.c |4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c index 30ff657..76bb7fd 100644 --- a/net/ipv4/fib_hash.c +++ b/net/ipv4/fib_hash.c @@ -314,7 +314,7 @@ fn_hash_select_default(struct fib_table *tb, const struct flowi *flp, struct fib if (next_fi != res->fi) break; } else if (!fib_detect_death(fi, order, &last_resort, -&last_idx, &fn_hash_last_dflt)) { +&last_idx, fn_hash_last_dflt)) { if (res->fi) fib_info_put(res->fi); res->fi = fi; @@ -332,7 +332,7 @@ fn_hash_select_default(struct fib_table *tb, const struct flowi *flp, struct fib goto out; } - if (!fib_detect_death(fi, order, &last_resort, &last_idx, &fn_hash_last_dflt)) { + if (!fib_detect_death(fi, order, &last_resort, &last_idx, fn_hash_last_dflt)) { if (res->fi) fib_info_put(res->fi); res->fi = fi; diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h index eef9eec..6c9dd42 100644 --- a/net/ipv4/fib_lookup.h +++ b/net/ipv4/fib_lookup.h @@ -36,6 +36,6 @@ extern struct fib_alias *fib_find_alias(struct list_head *fah, u8 tos, u32 prio); extern int fib_detect_death(struct fib_info *fi, int order, struct fib_info **last_resort, - int *last_idx, int *dflt); + int *last_idx, int dflt); #endif /* _FIB_LOOKUP_H */ diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index ec9b0dd..bbd4a24 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -346,7 +346,7 @@ struct fib_alias *fib_find_alias(struct list_head *fah, u8 tos, u32 prio) } int fib_detect_death(struct fib_info *fi, int order, -struct fib_info **last_resort, int *last_idx, int *dflt) +struct fib_info **last_resort, int *last_idx, int dflt) { struct neighbour *n; int state = NUD_NONE; @@ -358,10 +358,10 @@ int fib_detect_death(struct fib_info *fi, int order, } if (state==NUD_REACHABLE) return 0; - if ((state&NUD_VALID) && order != *dflt) + if ((state&NUD_VALID) && order != dflt) return 0; if ((state&NUD_VALID) || - (*last_idx<0 && order > *dflt)) { + (*last_idx<0 && order > dflt)) { *last_resort = fi; *last_idx = order; } diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 6385cca..914a0d2 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1827,7 +1827,7 @@ fn_trie_select_default(struct fib_table *tb, const struct flowi *flp, struct fib if (next_fi != res->fi) break; } else if (!fib_detect_death(fi, order, &last_resort, -&last_idx, &trie_last_dflt)) { +&last_idx, trie_last_dflt)) { if (res->fi) fib_info_put(res->fi); res->fi = fi; @@ -1843,7 +1843,7 @@ fn_trie_select_default(struct fib_table *tb, const struct flowi *flp, struct fib goto out; } - if (!fib_detect_death(fi, order, &last_resort, &last_idx, &trie_last_dflt)) { + if (!fib_detect_death(fi, order, &last_resort, &last_idx, trie_last_dflt)) { if (res->fi) fib_info_put(res->fi); res->fi = fi; -- 1.5.3.rc5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
IPsec replay sequence number overflow behavior? (RFC4303 section 3.3.3)
Hello all, As part of the IPv6 "gap analysis" that the Linux Foundation is currently doing I've been looking at the IPsec auditing requirements as defined in RFC4303 and I came across some odd behavior regarding SA sequence number overflows ... RFC4303 states the following: 3.3.3. Sequence Number Generation The sender's counter is initialized to 0 when an SA is established. The sender increments the sequence number (or ESN) counter for this SA and inserts the low-order 32 bits of the value into the Sequence Number field. Thus, the first packet sent using a given SA will contain a sequence number of 1. If anti-replay is enabled (the default), the sender checks to ensure that the counter has not cycled before inserting the new value in the Sequence Number field. In other words, the sender MUST NOT send a packet on an SA if doing so would cause the sequence number to cycle. An attempt to transmit a packet that would result in sequence number overflow is an auditable event. The audit log entry for this event SHOULD include the SPI value, current date/time, Source Address, Destination Address, and (in IPv6) the cleartext Flow ID. The related code in net/xfrm/xfrm_output.c:xfrm_output() looks like this: if (x->type->flags & XFRM_TYPE_REPLAY_PROT) { XFRM_SKB_CB(skb)->seq = ++x->replay.oseq; if (xfrm_aevent_is_on()) xfrm_replay_notify(x, XFRM_REPLAY_UPDATE); } Which doesn't appear to take into account sequence number overflow at all. Granted, it does send notifications to userspace but it doesn't do anything to prevent the packet from being sent if the sequence number wraps. I'm still a few years behind in my IPsec specifications so I could be missing something here (extended sequence numbers spring to mind and the kernel's curious mixing of 32bit and 64bit types for SA sequence number counters) but at first glance this appears to be a bug ... yes/no? If it is a bug, I think the basic fix should be pretty simple, changing the above xfrm_output() code to the following: if (x->type->flags & XFRM_TYPE_REPLAY_PROT) { XFRM_SKB_CB(skb)->seq = ++x->replay.oseq; + if (x->replay.oseq == 0) + goto error; if (xfrm_aevent_is_on()) xfrm_replay_notify(x, XFRM_REPLAY_UPDATE); } -- paul moore linux security @ hp -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 07/22] NET: DM9000: Use msleep() instead of udelay()
On Fri, Nov 23, 2007 at 08:39:45PM -0500, Jeff Garzik wrote: > are you sure you cannot sleep during suspend? Yes. This is not the first driver that has had this problem, see the sm501 as another example. -- Ben ([EMAIL PROTECTED], http://www.fluff.org/) 'a smiley only costs 4 bytes' -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] net/xfrm/xfrm_policy.c: Some small improvements
David Miller wrote: From: Richard Knutsson <[EMAIL PROTECTED]> Date: Thu, 06 Dec 2007 15:37:46 +0100 David Miller wrote: But this time I'll just let you know up front that I don't see much value in this patch. It is not a clear improvement to replace int's with bool's in my mind and the other changes are just whitespace changes. Is it not an improvement to distinct booleans from actual values? Do you use integers for ASCII characters too? It can also avoid some potential bugs like the 'if (i == TRUE)'... What is wrong with 'size_t' (since it is unsigned, compared to (some) 'int')? When you say "int found;" is there any doubt in your mind that this integer is going to hold a 1 or a 0 depending upon whether we "found" something? That's the problem I have with these kinds of patches, they do not increase clarity, it's just pure mindless edits. But is there not a good thing if also the compiler knows + names are sometime not as clear as that one? In new code, fine, use booleans if you want. I would even accept that it helps to change to boolean for arguments to functions that are global in scope. But not for function local variables in cases like this. Oh, I see your point now. Believed it to be yet another 'booleans is not C idiom'. Sorry about the noise Richard Knutsson -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/3][IPV6]: remove ifdef in route6 init/fini functions
In article <[EMAIL PROTECTED]> (at Fri, 07 Dec 2007 14:13:25 +0100), Daniel Lezcano <[EMAIL PROTECTED]> says: > The route6 init function is a little difficult to read because it contains > a lot of ifdef. The patchset redefines the usual static inline functions when > the code is to be disabled by configuration, so we can call the code without > taking care of the config option in the init function. Acked-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> --yoshfuji -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/3][IPV6]: create route6 proc init-fini functions
Make the proc creation/destruction to be a separate function. That allows to remove the #ifdef CONFIG_PROC_FS in the init/fini function and make them more readable. Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]> --- net/ipv6/route.c | 58 +-- 1 file changed, 40 insertions(+), 18 deletions(-) Index: net-2.6.25/net/ipv6/route.c === --- net-2.6.25.orig/net/ipv6/route.c +++ net-2.6.25/net/ipv6/route.c @@ -2353,6 +2353,40 @@ static const struct file_operations rt6_ .llseek = seq_lseek, .release = single_release, }; + +static int ipv6_route_proc_init(struct net *net) +{ + int ret = -ENOMEM; + if (!proc_net_fops_create(net, "ipv6_route", + 0, &ipv6_route_proc_fops)) + goto out; + + if (!proc_net_fops_create(net, "rt6_stats", + S_IRUGO, &rt6_stats_seq_fops)) + goto out_ipv6_route; + + ret = 0; +out: + return ret; +out_ipv6_route: + proc_net_remove(net, "ipv6_route"); + goto out; +} + +static void ipv6_route_proc_fini(struct net *net) +{ + proc_net_remove(net, "ipv6_route"); + proc_net_remove(net, "rt6_stats"); +} +#else +static inline int ipv6_route_proc_init(struct net *net) +{ + return 0; +} +static inline void ipv6_route_proc_fini(struct net *net) +{ + return ; +} #endif /* CONFIG_PROC_FS */ #ifdef CONFIG_SYSCTL @@ -2479,21 +2513,14 @@ int __init ip6_route_init(void) if (ret) goto out_kmem_cache; -#ifdef CONFIG_PROC_FS - ret = -ENOMEM; - if (!proc_net_fops_create(&init_net, "ipv6_route", - 0, &ipv6_route_proc_fops)) + ret = ipv6_route_proc_init(&init_net); + if (ret) goto out_fib6_init; - if (!proc_net_fops_create(&init_net, "rt6_stats", - S_IRUGO, &rt6_stats_seq_fops)) - goto out_proc_ipv6_route; -#endif - #ifdef CONFIG_XFRM ret = xfrm6_init(); if (ret) - goto out_proc_rt6_stats; + goto out_proc_init; #endif #ifdef CONFIG_IPV6_MULTIPLE_TABLES ret = fib6_rules_init(); @@ -2517,14 +2544,10 @@ xfrm6_init: #endif #ifdef CONFIG_XFRM xfrm6_fini(); -out_proc_rt6_stats: #endif -#ifdef CONFIG_PROC_FS - proc_net_remove(&init_net, "rt6_stats"); -out_proc_ipv6_route: - proc_net_remove(&init_net, "ipv6_route"); +out_proc_init: + ipv6_route_proc_fini(&init_net); out_fib6_init: -#endif rt6_ifdown(NULL); fib6_gc_cleanup(); out_kmem_cache: @@ -2537,8 +2560,7 @@ void ip6_route_cleanup(void) #ifdef CONFIG_IPV6_MULTIPLE_TABLES fib6_rules_cleanup(); #endif - proc_net_remove(&init_net, "ipv6_route"); - proc_net_remove(&init_net, "rt6_stats"); + ipv6_route_proc_fini(&init_net); #ifdef CONFIG_XFRM xfrm6_fini(); #endif -- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/3][IPV6]: remove ifdef in route6 init/fini functions
The route6 init function is a little difficult to read because it contains a lot of ifdef. The patchset redefines the usual static inline functions when the code is to be disabled by configuration, so we can call the code without taking care of the config option in the init function. -- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/3][IPV6]: remove ifdef in route6 for xfrm6
The following patch create the usual static inline functions to disable the xfrm6_init and xfrm6_fini function when XFRM is off. That's allow to remove some ifdef and make the code a little more clear. Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]> --- include/net/xfrm.h | 16 +--- net/ipv6/route.c |7 +-- 2 files changed, 14 insertions(+), 9 deletions(-) Index: net-2.6.25/include/net/xfrm.h === --- net-2.6.25.orig/include/net/xfrm.h +++ net-2.6.25/include/net/xfrm.h @@ -842,7 +842,6 @@ xfrm_state_addr_cmp(struct xfrm_tmpl *tm } #ifdef CONFIG_XFRM - extern int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb, unsigned short family); static inline int xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, unsigned short family) @@ -1066,12 +1065,23 @@ struct xfrm6_tunnel { extern void xfrm_init(void); extern void xfrm4_init(void); -extern int xfrm6_init(void); -extern void xfrm6_fini(void); extern void xfrm_state_init(void); extern void xfrm4_state_init(void); +#ifdef CONFIG_XFRM +extern int xfrm6_init(void); +extern void xfrm6_fini(void); extern int xfrm6_state_init(void); extern void xfrm6_state_fini(void); +#else +static inline int xfrm6_init(void) +{ + return 0; +} +static inline void xfrm6_fini(void) +{ + ; +} +#endif extern int xfrm_state_walk(u8 proto, int (*func)(struct xfrm_state *, int, void*), void *); extern struct xfrm_state *xfrm_state_alloc(void); Index: net-2.6.25/net/ipv6/route.c === --- net-2.6.25.orig/net/ipv6/route.c +++ net-2.6.25/net/ipv6/route.c @@ -2517,11 +2517,10 @@ int __init ip6_route_init(void) if (ret) goto out_fib6_init; -#ifdef CONFIG_XFRM ret = xfrm6_init(); if (ret) goto out_proc_init; -#endif + #ifdef CONFIG_IPV6_MULTIPLE_TABLES ret = fib6_rules_init(); if (ret) @@ -2542,9 +2541,7 @@ fib6_rules_init: fib6_rules_cleanup(); xfrm6_init: #endif -#ifdef CONFIG_XFRM xfrm6_fini(); -#endif out_proc_init: ipv6_route_proc_fini(&init_net); out_fib6_init: @@ -2561,9 +2558,7 @@ void ip6_route_cleanup(void) fib6_rules_cleanup(); #endif ipv6_route_proc_fini(&init_net); -#ifdef CONFIG_XFRM xfrm6_fini(); -#endif rt6_ifdown(NULL); fib6_gc_cleanup(); kmem_cache_destroy(ip6_dst_ops.kmem_cachep); -- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/3][IPV6]: route6 remove ifdef for fib_rules
The patch defines the usual static inline functions when the code is disabled for fib6_rules. That's allow to remove some ifdef in route.c file and make the code a little more clear. Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]> --- include/net/ip6_fib.h | 12 +++- net/ipv6/route.c |7 +-- 2 files changed, 12 insertions(+), 7 deletions(-) Index: net-2.6.25/include/net/ip6_fib.h === --- net-2.6.25.orig/include/net/ip6_fib.h +++ net-2.6.25/include/net/ip6_fib.h @@ -226,8 +226,18 @@ extern voidfib6_gc_cleanup(void); extern int fib6_init(void); +#ifdef CONFIG_IPV6_MULTIPLE_TABLES extern int fib6_rules_init(void); extern voidfib6_rules_cleanup(void); - +#else +static inline int fib6_rules_init(void) +{ + return 0; +} +static inline void fib6_rules_cleanup(void) +{ + return ; +} +#endif #endif #endif Index: net-2.6.25/net/ipv6/route.c === --- net-2.6.25.orig/net/ipv6/route.c +++ net-2.6.25/net/ipv6/route.c @@ -2521,11 +2521,10 @@ int __init ip6_route_init(void) if (ret) goto out_proc_init; -#ifdef CONFIG_IPV6_MULTIPLE_TABLES ret = fib6_rules_init(); if (ret) goto xfrm6_init; -#endif + ret = -ENOBUFS; if (__rtnl_register(PF_INET6, RTM_NEWROUTE, inet6_rtm_newroute, NULL) || __rtnl_register(PF_INET6, RTM_DELROUTE, inet6_rtm_delroute, NULL) || @@ -2537,10 +2536,8 @@ out: return ret; fib6_rules_init: -#ifdef CONFIG_IPV6_MULTIPLE_TABLES fib6_rules_cleanup(); xfrm6_init: -#endif xfrm6_fini(); out_proc_init: ipv6_route_proc_fini(&init_net); @@ -2554,9 +2551,7 @@ out_kmem_cache: void ip6_route_cleanup(void) { -#ifdef CONFIG_IPV6_MULTIPLE_TABLES fib6_rules_cleanup(); -#endif ipv6_route_proc_fini(&init_net); xfrm6_fini(); rt6_ifdown(NULL); -- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc4-mm1
On Wed, 5 Dec 2007, David Miller wrote: > From: Reuben Farrelly <[EMAIL PROTECTED]> > Date: Thu, 06 Dec 2007 17:59:37 +1100 > > > On 5/12/2007 4:17 PM, Andrew Morton wrote: > > > - Lots of device IDs have been removed from the e1000 driver and moved > > > over > > > to e1000e. So if your e1000 stops working, you forgot to set > > > CONFIG_E1000E. > > > > This non fatal oops which I have just noticed may be related to this change > > then > > - certainly looks networking related. > > > > WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() > > Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1 > > > > Call Trace: > > [] tcp_fastretrans_alert+0x229/0xe63 > > [] tcp_ack+0xa3f/0x127d > > [] tcp_rcv_established+0x55f/0x7f8 > > [] tcp_v4_do_rcv+0xdb/0x3a7 > > [] :nf_conntrack:nf_ct_deliver_cached_events+0x75/0x99 > > No, it's from TCP assertions and changes added by Ilpo to the > net-2.6.25 tree recently. Yeah, this (very likely) due to the new SACK processing (in net-2.6.25). I'll look what could go wrong with fack_count calculations, most likely it's the reason (I've found earlier one out-of-place retransmission segment in one of my test case which already indicated that there's something incorrect with them but didn't have time to debug it yet). Thanks for report. Some info about how easily you can reproduce & couple of sentences about the test case might be useful later on when evaluating the fix. -- i. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25 3/3]sysctl: make sysctl_somaxconn per-namespace
Just move the variable on the struct net and adjust its usage. Others sysctls from sys.net.core table are more difficult to virtualize (i.e. make them per-namespace), but I'll look at them as well a bit later. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/include/linux/socket.h b/include/linux/socket.h index eb5bdd5..bd2b30a 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -24,7 +24,6 @@ struct __kernel_sockaddr_storage { #include/* pid_t*/ #include /* __user */ -extern int sysctl_somaxconn; #ifdef CONFIG_PROC_FS struct seq_file; extern void socket_seq_show(struct seq_file *seq); diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index d593611..b62e31f 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -39,6 +39,7 @@ struct net { /* core sysctls */ struct ctl_table_header *sysctl_core_hdr; + int sysctl_somaxconn; /* List of all packet sockets. */ rwlock_tpacket_sklist_lock; diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index dc4cf7d..130338f 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -127,7 +127,7 @@ static struct ctl_table net_core_table[] = { { .ctl_name = NET_CORE_SOMAXCONN, .procname = "somaxconn", - .data = &sysctl_somaxconn, + .data = &init_net.sysctl_somaxconn, .maxlen = sizeof(int), .mode = 0644, .proc_handler = &proc_dointvec @@ -161,6 +161,8 @@ static __net_init int sysctl_core_net_init(struct net *net) { struct ctl_table *tbl, *tmp; + net->sysctl_somaxconn = SOMAXCONN; + tbl = net_core_table; if (net != &init_net) { tbl = kmemdup(tbl, sizeof(net_core_table), GFP_KERNEL); diff --git a/net/socket.c b/net/socket.c index 9ebca5c..7651de0 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1365,17 +1365,17 @@ asmlinkage long sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen) * ready for listening. */ -int sysctl_somaxconn __read_mostly = SOMAXCONN; - asmlinkage long sys_listen(int fd, int backlog) { struct socket *sock; int err, fput_needed; + int somaxconn; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (sock) { - if ((unsigned)backlog > sysctl_somaxconn) - backlog = sysctl_somaxconn; + somaxconn = sock->sk->sk_net->sysctl_somaxconn; + if ((unsigned)backlog > somaxconn) + backlog = somaxconn; err = security_socket_listen(sock, backlog); if (!err) -- 1.5.3.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25 2/3]sysctl: prepare core tables to point to netns variables
Some of ctl variables are going to be on the struct net. Here's the way to adjust the ->data pointer on the ctl_table-s to point on the right variable. Since some pointers still point on the global variables, I keep turning the write bits off on such tables. This looks to become a common procedure for net sysctls, so later parts of this code may migrate to some more generic place. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 57a7ead..dc4cf7d 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -167,8 +167,13 @@ static __net_init int sysctl_core_net_init(struct net *net) if (tbl == NULL) goto err_dup; - for (tmp = tbl; tmp->procname; tmp++) - tmp->mode &= ~0222; + for (tmp = tbl; tmp->procname; tmp++) { + if (tmp->data >= (void *)&init_net && + tmp->data < (void *)(&init_net + 1)) + tmp->data += (char *)net - (char *)&init_net; + else + tmp->mode &= ~0222; + } } net->sysctl_core_hdr = register_net_sysctl_table(net, -- 1.5.3.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25 1/3]sysctl: make the sys.net.core sysctls per-namespace
Making them per-namespace is required for the following two reasons: First, some ctl values have a per-namespace meaning. Second, making them writable from the sub-namespace is an isolation hole. So I introduce the pernet operations to create these tables. For init_net I use the existing statically declared tables, for sub-namespace they are duplicated and the write bits are removed from the mode. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index f97b2a4..d593611 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -37,6 +37,9 @@ struct net { struct sock *rtnl; /* rtnetlink socket */ + /* core sysctls */ + struct ctl_table_header *sysctl_core_hdr; + /* List of all packet sockets. */ rwlock_tpacket_sklist_lock; struct hlist_head packet_sklist; diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index e322713..57a7ead 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -151,18 +151,58 @@ static struct ctl_table net_core_table[] = { { .ctl_name = 0 } }; -static __initdata struct ctl_path net_core_path[] = { +static __net_initdata struct ctl_path net_core_path[] = { { .procname = "net", .ctl_name = CTL_NET, }, { .procname = "core", .ctl_name = NET_CORE, }, { }, }; -static __init int sysctl_core_init(void) +static __net_init int sysctl_core_net_init(struct net *net) { - struct ctl_table_header *hdr; + struct ctl_table *tbl, *tmp; + + tbl = net_core_table; + if (net != &init_net) { + tbl = kmemdup(tbl, sizeof(net_core_table), GFP_KERNEL); + if (tbl == NULL) + goto err_dup; + + for (tmp = tbl; tmp->procname; tmp++) + tmp->mode &= ~0222; + } + + net->sysctl_core_hdr = register_net_sysctl_table(net, + net_core_path, tbl); + if (net->sysctl_core_hdr == NULL) + goto err_reg; - hdr = register_sysctl_paths(net_core_path, net_core_table); - return hdr == NULL ? -ENOMEM : 0; + return 0; + +err_reg: + if (tbl != net_core_table) + kfree(tbl); +err_dup: + return -ENOMEM; +} + +static __net_exit void sysctl_core_net_exit(struct net *net) +{ + struct ctl_table *tbl; + + tbl = net->sysctl_core_hdr->ctl_table_arg; + unregister_net_sysctl_table(net->sysctl_core_hdr); + BUG_ON(tbl == net_core_table); + kfree(tbl); +} + +static __net_initdata struct pernet_operations sysctl_core_ops = { + .init = sysctl_core_net_init, + .exit = sysctl_core_net_exit, +}; + +static __init int sysctl_core_init(void) +{ + return register_pernet_subsys(&sysctl_core_ops); } __initcall(sysctl_core_init); -- 1.5.3.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] TCP illinois max rtt aging
On Fri, 7 Dec 2007, David Miller wrote: > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET) > > > I guess if you get a large cumulative ACK, the amount of processing is > > still overwhelming (added DaveM if he has some idea how to combat it). > > > > Even a simple scenario (this isn't anything fancy at all, will occur all > > the time): Just one loss => rest skbs grow one by one into a single > > very large SACK block (and we do that efficiently for sure) => then the > > fast retransmit gets delivered and a cumulative ACK for whole orig_window > > arrives => clean_rtx_queue has to do a lot of processing. In this case we > > could optimize RB-tree cleanup away (by just blanking it all) but still > > getting rid of all those skbs is going to take a larger moment than I'd > > like to see. > > > > That tree blanking could be extended to cover anything which ACK more than > > half of the tree by just replacing the root (and dealing with potential > > recolorization of the root). > > Yes, it's the classic problem. But it ought to be at least > partially masked when TSO is in use, because we'll only process > a handful of SKBs. The more effectively TSO batches, the > less work clean_rtx_queue() will do. No, that's not what is going to happen, TSO won't help at all because one-by-one SACKs will fragment every single one of them (see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO case, or am I missing something? > Web100 just provides statistics and other kinds of connection data > to userspace, all the actual algorithm etc. modifications have been > merged upstream and yanked out of the web100 patch. I was looking > at it the other night and it's frankly totally uninteresting these > days :-) ...Thanks, I'll keep that in my mind when looking... :-) -- i.
Re: [RFC] TCP illinois max rtt aging
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET) > I guess if you get a large cumulative ACK, the amount of processing is > still overwhelming (added DaveM if he has some idea how to combat it). > > Even a simple scenario (this isn't anything fancy at all, will occur all > the time): Just one loss => rest skbs grow one by one into a single > very large SACK block (and we do that efficiently for sure) => then the > fast retransmit gets delivered and a cumulative ACK for whole orig_window > arrives => clean_rtx_queue has to do a lot of processing. In this case we > could optimize RB-tree cleanup away (by just blanking it all) but still > getting rid of all those skbs is going to take a larger moment than I'd > like to see. > > That tree blanking could be extended to cover anything which ACK more than > half of the tree by just replacing the root (and dealing with potential > recolorization of the root). Yes, it's the classic problem. But it ought to be at least partially masked when TSO is in use, because we'll only process a handful of SKBs. The more effectively TSO batches, the less work clean_rtx_queue() will do. When not doing TSO the behavior is super-stupid, we bump reference counts on the same page multiple times while running over the SKBs since consequetive SKBs cover data in different spans of the same page. The core issue is that we have a poorly behaving data container, and therefore that's obviously what we need to change. Conceptually what we probably need to do is seperate the data maintainence from the SKB objects themselves. There is a blob that maintains the paged data state for everything in the retransmit queue. SKBs are built and get the page pointers but don't actually grab references to the pages, the blob does that and it keeps track of how many SKB references to each page there are, non-atomically. The hardest part is dealing with the page lifetime issues. Unfortunately, when we trim the rtx queue, references to the clones can still exist in the driver output path. It's a difficult problem to overcome in fact, so in the end my suggestion above might not even be workable. > No idea about what it could do, haven't yet looked web100, I was planning > at some point of time... Web100 just provides statistics and other kinds of connection data to userspace, all the actual algorithm etc. modifications have been merged upstream and yanked out of the web100 patch. I was looking at it the other night and it's frankly totally uninteresting these days :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] AF_RXRPC: Add a missing goto
From: David Howells <[EMAIL PROTECTED]> Date: Fri, 07 Dec 2007 11:23:55 + > Add a missing goto to error handling in the RXKAD security module for > AF_RXRPC. > > Signed-off-by: David Howells <[EMAIL PROTECTED]> Applied, thanks David. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] AF_RXRPC: Add a missing goto
Add a missing goto to error handling in the RXKAD security module for AF_RXRPC. Signed-off-by: David Howells <[EMAIL PROTECTED]> --- net/rxrpc/rxkad.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c index e09a95a..8e69d69 100644 --- a/net/rxrpc/rxkad.c +++ b/net/rxrpc/rxkad.c @@ -1021,6 +1021,7 @@ static int rxkad_verify_response(struct rxrpc_connection *conn, abort_code = RXKADINCONSISTENCY; if (version != RXKAD_VERSION) + goto protocol_error; abort_code = RXKADTICKETLEN; if (ticket_len < 4 || ticket_len > MAXKRB5TICKETLEN) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage
Andrew Morton wrote: > On Fri, 07 Dec 2007 04:51:37 + David Woodhouse <[EMAIL PROTECTED]> wrote: > >> On Mon, 2007-11-26 at 15:17 -0700, Eric W. Biederman wrote: >>> Well I clearly goofed when I added the initial network namespace support >>> for /proc/net. Currently things work but there are odd details visible >>> to user space, even when we have a single network namespace. >>> >>> Since we do not cache proc_dir_entry dentries at the moment we can >>> just modify ->lookup to return a different directory inode depending >>> on the network namespace of the process looking at /proc/net, replacing >>> the current technique of using a magic and fragile follow_link method. >>> >>> To accomplish that this patch: >>> - introduces a shadow_proc method to allow different dentries to >>> be returned from proc_lookup. >>> - Removes the old /proc/net follow_link magic >>> - Fixes a weakness in our not caching of proc generic dentries. >>> >>> As shadow_proc uses a task struct to decided which dentry to return we >>> can go back later and fix the proc generic caching without modifying any >>> code that >>> uses the shadow_proc method. >>> >>> Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> >>> --- >>> fs/proc/generic.c | 12 ++- >>> fs/proc/proc_net.c | 86 >>> +++ >>> include/linux/proc_fs.h |3 ++ >>> 3 files changed, 19 insertions(+), 82 deletions(-) >> (commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416) >> >> This seems to have broken the use of /proc/bus/usb as a mountpoint. It >> always appears empty now, whatever's supposed to be mounted there. >> > > Yes. Denis and Eric are tossing around competing patches but afaik nobody > is happy with any of them. Guys, could we get this sorted soonish please? > Andrew, I become too relaxed after receiving "Tested-by: Giacomo Catenazzi <[EMAIL PROTECTED]>" Eric, I believe that reverting an original behavior is better than your new one as - you introduce search into the depth by calling have_submounts(dentry) during revalidation for all(!) /proc dentries - your shadowing behavior will be broken if you'll mount something in the depth of shadowed tree (this can be done as a DoS attempt) As a last minute call, may be it will be better to pin network namespace like a pid namespace during mount to avoid this crap at all? Regards, Den -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] TCP illinois max rtt aging
On Thu, 6 Dec 2007, Lachlan Andrew wrote: > On 04/12/2007, Ilpo Järvinen <[EMAIL PROTECTED]> wrote: > > On Mon, 3 Dec 2007, Lachlan Andrew wrote: > > > > > > When SACK is active, the per-packet processing becomes more involved, > > > tracking the list of lost/SACKed packets. This causes a CPU spike > > > just after a loss, which increases the RTTs, at least in my > > > experience. > > > > I suspect that as long as old code was able to use hint, it wasn't doing > > that bad. But it was seriously lacking ability to take advantage of sack > > processing hint when e.g., a new hole appeared, or cumulative ACK arrived. > > > > ...Code available in net-2.6.25 might cure those. > > We had been using one of your earlier patches, and still had the > problem. I think you've cured the problem with SACK itself, but there > still seems to be something taking a lot of CPU while recovering from > the loss. I guess if you get a large cumulative ACK, the amount of processing is still overwhelming (added DaveM if he has some idea how to combat it). Even a simple scenario (this isn't anything fancy at all, will occur all the time): Just one loss => rest skbs grow one by one into a single very large SACK block (and we do that efficiently for sure) => then the fast retransmit gets delivered and a cumulative ACK for whole orig_window arrives => clean_rtx_queue has to do a lot of processing. In this case we could optimize RB-tree cleanup away (by just blanking it all) but still getting rid of all those skbs is going to take a larger moment than I'd like to see. That tree blanking could be extended to cover anything which ACK more than half of the tree by just replacing the root (and dealing with potential recolorization of the root). > It is possible that it was to do with web100 which we > have also been running, but I cut out most of the statistics from that > and still had problems. No idea about what it could do, haven't yet looked web100, I was planning at some point of time... -- i.
Re: [PATCH][VLAN] Merge tree equal tails in vlan_skb_recv
Pavel Emelyanov wrote: There are tree paths in it, that set the skb->proto and then perform common receive manipulations (basically call netif_rx()). I think, that we can make this code flow easier to understand by introducing the vlan_set_encap_proto() function (I hope the name is good) to setup the skb proto and merge the paths calling netif_rx() together. Surprisingly, but gcc detects this thing and merges these paths by itself, so this patch doesn't make the vlan module smaller. I already have something similar queued, but your patch is a nice cleanup on top. I'll merge it into my tree and send it out after some testing, hopefully today. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage
On Fri, 07 Dec 2007 04:51:37 + David Woodhouse <[EMAIL PROTECTED]> wrote: > On Mon, 2007-11-26 at 15:17 -0700, Eric W. Biederman wrote: > > Well I clearly goofed when I added the initial network namespace support > > for /proc/net. Currently things work but there are odd details visible > > to user space, even when we have a single network namespace. > > > > Since we do not cache proc_dir_entry dentries at the moment we can > > just modify ->lookup to return a different directory inode depending > > on the network namespace of the process looking at /proc/net, replacing > > the current technique of using a magic and fragile follow_link method. > > > > To accomplish that this patch: > > - introduces a shadow_proc method to allow different dentries to > > be returned from proc_lookup. > > - Removes the old /proc/net follow_link magic > > - Fixes a weakness in our not caching of proc generic dentries. > > > > As shadow_proc uses a task struct to decided which dentry to return we > > can go back later and fix the proc generic caching without modifying any > > code that > > uses the shadow_proc method. > > > > Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> > > --- > > fs/proc/generic.c | 12 ++- > > fs/proc/proc_net.c | 86 > > +++ > > include/linux/proc_fs.h |3 ++ > > 3 files changed, 19 insertions(+), 82 deletions(-) > > (commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416) > > This seems to have broken the use of /proc/bus/usb as a mountpoint. It > always appears empty now, whatever's supposed to be mounted there. > Yes. Denis and Eric are tossing around competing patches but afaik nobody is happy with any of them. Guys, could we get this sorted soonish please? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][VLAN] Merge tree equal tails in vlan_skb_recv
There are tree paths in it, that set the skb->proto and then perform common receive manipulations (basically call netif_rx()). I think, that we can make this code flow easier to understand by introducing the vlan_set_encap_proto() function (I hope the name is good) to setup the skb proto and merge the paths calling netif_rx() together. Surprisingly, but gcc detects this thing and merges these paths by itself, so this patch doesn't make the vlan module smaller. Fits both net-2.6 and net-2.6.25. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 4f99bb8..11198c1 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -90,6 +90,40 @@ static inline struct sk_buff *vlan_check_reorder_header(struct sk_buff *skb) return skb; } +static inline void vlan_set_encap_proto(struct sk_buff *skb, + struct vlan_hdr *vhdr) +{ + __be16 proto; + unsigned char *rawp; + + /* +* Was a VLAN packet, grab the encapsulated protocol, which the layer +* three protocols care about. +*/ + + proto = vhdr->h_vlan_encapsulated_proto; + if (ntohs(proto) >= 1536) { + skb->protocol = proto; + return; + } + + rawp = skb->data; + if (*(unsigned short *)rawp == 0x) + /* +* This is a magic hack to spot IPX packets. Older Novell +* breaks the protocol design and runs IPX over 802.3 without +* an 802.2 LLC layer. We look for which isn't a used +* 802.2 SSAP/DSAP. This won't work for fault tolerant netware +* but does for the rest. +*/ + skb->protocol = htons(ETH_P_802_3); + else + /* +* Real 802.2 LLC +*/ + skb->protocol = htons(ETH_P_802_2); +} + /* * Determine the packet's protocol ID. The rule here is that we * assume 802.3 if the type field is short enough to be a length. @@ -115,12 +149,10 @@ static inline struct sk_buff *vlan_check_reorder_header(struct sk_buff *skb) int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev, struct packet_type* ptype, struct net_device *orig_dev) { - unsigned char *rawp = NULL; struct vlan_hdr *vhdr; unsigned short vid; struct net_device_stats *stats; unsigned short vlan_TCI; - __be16 proto; if (dev->nd_net != &init_net) { kfree_skb(skb); @@ -236,70 +268,11 @@ int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev, break; } - /* Was a VLAN packet, grab the encapsulated protocol, which the layer -* three protocols care about. -*/ - /* proto = get_unaligned(&vhdr->h_vlan_encapsulated_proto); */ - proto = vhdr->h_vlan_encapsulated_proto; - - skb->protocol = proto; - if (ntohs(proto) >= 1536) { - /* place it back on the queue to be handled by -* true layer 3 protocols. -*/ - - /* See if we are configured to re-write the VLAN header -* to make it look like ethernet... -*/ - skb = vlan_check_reorder_header(skb); - - /* Can be null if skb-clone fails when re-ordering */ - if (skb) { - netif_rx(skb); - } else { - /* TODO: Add a more specific counter here. */ - stats->rx_errors++; - } - rcu_read_unlock(); - return 0; - } - - rawp = skb->data; - - /* -* This is a magic hack to spot IPX packets. Older Novell breaks -* the protocol design and runs IPX over 802.3 without an 802.2 LLC -* layer. We look for which isn't a used 802.2 SSAP/DSAP. This -* won't work for fault tolerant netware but does for the rest. -*/ - if (*(unsigned short *)rawp == 0x) { - skb->protocol = htons(ETH_P_802_3); - /* place it back on the queue to be handled by true layer 3 protocols. -*/ - - /* See if we are configured to re-write the VLAN header -* to make it look like ethernet... -*/ - skb = vlan_check_reorder_header(skb); - - /* Can be null if skb-clone fails when re-ordering */ - if (skb) { - netif_rx(skb); - } else { - /* TODO: Add a more specific counter here. */ - stats->rx_errors++; - } - rcu_read_unlock(); - return 0; - } - - /* -* Real 802.2 LLC -*/ - skb->protocol = htons(ETH_P_802_2); /* place it back on the queue to be handled
[PATCH net-2.6.25] Remove unused devconf macros
The SNMP_INC_STATS_OFFSET_BH is used only by ICMP6_INC_STATS_OFFSET_BH. The ICMP6_INC_STATS_OFFSET_BH is unused. Can we drop them? Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/include/net/ipv6.h b/include/net/ipv6.h index a84f3f6..38df94b 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -143,14 +143,6 @@ DECLARE_SNMP_STAT(struct icmpv6msg_mib, icmpv6msg_statistics); #define ICMP6_INC_STATS_BH(idev, field)_DEVINC(icmpv6, _BH, idev, field) #define ICMP6_INC_STATS_USER(idev, field) _DEVINC(icmpv6, _USER, idev, field) -#define ICMP6_INC_STATS_OFFSET_BH(idev, field, offset) ({ \ - struct inet6_dev *_idev = idev; \ - __typeof__(offset) _offset = (offset); \ - if (likely(_idev != NULL)) \ - SNMP_INC_STATS_OFFSET_BH(_idev->stats.icmpv6, field, _offset); \ - SNMP_INC_STATS_OFFSET_BH(icmpv6_statistics, field, _offset); \ -}) - #define ICMP6MSGOUT_INC_STATS(idev, field) \ _DEVINC(icmpv6msg, , idev, field +256) #define ICMP6MSGOUT_INC_STATS_BH(idev, field) \ diff --git a/include/net/snmp.h b/include/net/snmp.h index ea206bf..9c5793d 100644 --- a/include/net/snmp.h +++ b/include/net/snmp.h @@ -134,8 +134,6 @@ struct linux_mib { #define SNMP_INC_STATS_BH(mib, field) \ (per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field]++) -#define SNMP_INC_STATS_OFFSET_BH(mib, field, offset) \ - (per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field + (offset)]++) #define SNMP_INC_STATS_USER(mib, field) \ (per_cpu_ptr(mib[1], raw_smp_processor_id())->mibs[field]++) #define SNMP_INC_STATS(mib, field) \ -- 1.5.3.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sockets affected by IPsec always block (2.6.23)
Am Freitag, 7. Dezember 2007 04:20 schrieb David Miller: > If IPSEC takes a long time to resolve, and we don't block, the > connect() can hard fail (we will just keep dropping the outgoing SYN > packet send attempts, eventually hitting the retry limit) in cases > where if we did block it would not fail (because we wouldn't send > the first SYN until IPSEC resolved). David - I'm aware of this, the discussion is which behaviour is ok. Let's go back to a real life example. I've already researched that the squid web proxy has a poll() based main loop doing nonblocking connects, may be with multiple threads. Situation: One user wants to access a web page that needs IPSEC. The SA takes 30 seconds to come up. a) Non-blocking connect is respected: SYN packets during the first 30 seconds will be dropped as you said. Connection can be completed on the next SYN retry (timeout in linux: 3 minutes). During this time, the 500 other users can continue to browse using the proxy. b) Non-blocking connect is ignored during IPSEC resolving as you advocate it: Connection for the one user can be completed immediatly after IPSEC comes up. That's the pro. However, until then, the other 500 proxy user CANNOT ACCESS THE WEB because squid's threads are stuck in connect()s on sockets they configured not to block. If the IPSEC SA never resolves due to some network outage, squid will sleep forever or until an admin configures it that it doesn't try to connect the adress in question and restarts it. Don't you realize how broken this behaviour is? Can you give me ONE example of an application that works better with b) and why this outweights the problems it creates for everybody else? Even the DNS example you posted in <[EMAIL PROTECTED]> is wrong because the second server will never queried if the kernel puts the process into coma while the IPSEC SA to the first server cannot be resolved. Stefan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][IPV4] Swap the ifa allocation with the"ipv4_devconf_setall" call
According to Herbert, the ipv4_devconf_setall should be called only when the ifa is added to the device. However, failed ifa allocation may bring things into inconsistent state. Move the call to ipv4_devconf_setall after the ifa allocation. Fits both net-2.6 (with offsets) and net-2.6.25 (cleanly). Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 0b5f042..1c3e20c 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -519,8 +519,6 @@ static struct in_ifaddr *rtm_to_ifaddr(struct nlmsghdr *nlh) goto errout; } - ipv4_devconf_setall(in_dev); - ifa = inet_alloc_ifa(); if (ifa == NULL) { /* @@ -531,6 +529,7 @@ static struct in_ifaddr *rtm_to_ifaddr(struct nlmsghdr *nlh) goto errout; } + ipv4_devconf_setall(in_dev); in_dev_hold(in_dev); if (tb[IFA_ADDRESS] == NULL) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/20] net/ipv4/cipso_ipv4.c: use LIST_HEAD instead of LIST_HEAD_INIT
From: Denis Cheng <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 00:04:36 +0800 > single list_head variable initialized with LIST_HEAD_INIT could almost > always can be replaced with LIST_HEAD declaration, this shrinks the code > and looks better. > > Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/20] net/core/dev.c: use LIST_HEAD instead of LIST_HEAD_INIT
From: Denis Cheng <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 00:01:26 +0800 > single list_head variable initialized with LIST_HEAD_INIT could almost > always can be replaced with LIST_HEAD declaration, this shrinks the code > and looks better. > > Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] net: move trie_local and trie_main into the proc iterator
From: "Denis V. Lunev" <[EMAIL PROTECTED]> Date: Thu, 6 Dec 2007 18:00:12 +0300 > From: Eric W. Biederman <[EMAIL PROTECTED]> > > We only use these variables when displaying the trie in proc so > place them into the iterator to make this explicit. We should > probably do something smarter to handle the CONFIG_IP_MULTIPLE_TABLES > case but at least this makes it clear that the silliness is limited > to the display in /proc. > > Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> > Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] Remove ip_fib_local_table and ip_fib_main_table defines
From: "Denis V. Lunev" <[EMAIL PROTECTED]> Date: Thu, 6 Dec 2007 17:58:25 +0300 > From: Eric W. Biederman <[EMAIL PROTECTED]> > > There are only 2 users and it doesn't hurt to call fib_get_table > instead, and it makes it easier to make the fib network namespace > aware. > > Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> > Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 6/6] ipv6 - route6/fib6 : dont panic a kmem_cache_create
From: Daniel Lezcano <[EMAIL PROTECTED]> Date: Thu, 06 Dec 2007 14:53:35 +0100 > If the kmem_cache_creation fails, the kernel will panic. It is acceptable > if the system is booting, but if the ipv6 protocol is compiled as a module > and it is loaded after the system has booted, do we want to panic instead > of just failing to initialize the protocol ? > > The init function is now returning an error and this one is checked for > protocol initialization. So the ipv6 protocol will safely fails. > > Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]> > Acked-by: Benjamin Thery <[EMAIL PROTECTED]> Also applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/6] ipv6 - make af_inet6 to check ip6_route_init return value
From: Daniel Lezcano <[EMAIL PROTECTED]> Date: Thu, 06 Dec 2007 14:53:34 +0100 > The af_inet6 initialization function does not check the return code > of the route initilization, so if something goes wrong, the protocol > initialization will continue anyway. > This patch takes into account the modification made in the different > route's initialization subroutines to check the return value and to > make the protocol initialization to fail. > > Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]> > Acked-by: Benjamin Thery <[EMAIL PROTECTED]> Applied, thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 4/6] ipv6 - make ip6_route_init to return an error code
From: Daniel Lezcano <[EMAIL PROTECTED]> Date: Thu, 06 Dec 2007 14:53:33 +0100 > The route initialization function does not return any value to notify if > the initialization is successful or not. This patch checks all calls made > for the initilization in order to return a value for the caller. > > Unfortunatly, proc_net_fops_create will return a NULL pointer if > CONFIG_PROC_FS > is off, so we can not check the return code without an ifdef CONFIG_PROC_FS > block in the ip6_route_init function. > > Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]> > Acked-by: Benjamin Thery <[EMAIL PROTECTED]> Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/6] ipv6 - make fib6_rules_init to return an error code
From: Daniel Lezcano <[EMAIL PROTECTED]> Date: Thu, 06 Dec 2007 14:53:32 +0100 > When the fib_rules initialization finished, no return code is provided > so there is no way to know, for the caller, if the initialization has > been successful or has failed. This patch fix that. > > Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]> > Acked-by: Benjamin Thery <[EMAIL PROTECTED]> Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] ipv6 - make xfrm6_init to return an error code
From: Daniel Lezcano <[EMAIL PROTECTED]> Date: Thu, 06 Dec 2007 14:53:31 +0100 > The xfrm initialization function does not return any error code, so > if there is an error, the caller can not be advise of that. > This patch checks the return code of the different called functions > in order to return a successful or failed initialization. > > Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]> > Acked-by: Benjamin Thery <[EMAIL PROTECTED]> Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/6] ipv6 - make fib6_init to return an error code
From: Daniel Lezcano <[EMAIL PROTECTED]> Date: Thu, 06 Dec 2007 14:53:30 +0100 > If there is an error in the initialization function, nothing is followed up > to the caller. So I add a return value to be set for the init function. > > Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]> > Acked-by: Benjamin Thery <[EMAIL PROTECTED]> Applied. Please format your header subject lines as: [patch N/M] [IPV6]: Blah blah blah. Since this is what I edit them into anyways. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] multiple namespaces in the all dst_ifdown routines
From: "Denis V. Lunev" <[EMAIL PROTECTED]> Date: Thu, 6 Dec 2007 15:17:46 +0300 > move dst entries to a namespace loopback to catch refcounting leaks. > > Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html