Re: [ANNOUNCE] ESFQ --> SFQ patches for Linux 2.6.24

2008-02-19 Thread Patrick McHardy

David Miller wrote:

From: "Brock Noland" <[EMAIL PROTECTED]>
Date: Sat, 9 Feb 2008 20:30:58 -0600


Is this going to be merged anytime soon?


If it gets submitted to the proper mailing list, it might.
'linux-net' is for user questions, it is not where the networking
developers hang out, 'netdev' is.

And you have to post patches for review, not URL's point to
the patches.  It has to be int he email, in an applyable form
so people can review the thing properly.



Since SFQ is not exactly simple and I needed something like this
myself, I followed Paul's suggestion and added a new scheduler
(DRR) for this with more flexible limits.

I'll rediff against net-2.6.26 within the next days and send
a final version for review (anyone interested is welcome to
already review this version of course :).

commit 13d0cc64d0f7fed945c357cf4ca43330c8f95ad2
Author: Patrick McHardy <[EMAIL PROTECTED]>
Date:   Mon Feb 18 22:21:55 2008 +0100

    [NET_SCHED]: Add DRR scheduler

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index dbb7ac3..2fca9c4 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -482,4 +482,20 @@ struct tc_netem_corrupt
 
 #define NETEM_DIST_SCALE   8192
 
+/* DRR */
+
+enum
+{
+   TCA_DRR_UNSPEC,
+   TCA_DRR_QUANTUM,
+   __TCA_DRR_MAX
+};
+
+#define TCA_DRR_MAX(__TCA_DRR_MAX - 1)
+
+struct tc_drr_stats
+{
+   s32 deficit;
+};
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 82adfe6..7e1ab99 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -196,6 +196,9 @@ config NET_SCH_NETEM
 
  If unsure, say N.
 
+config NET_SCH_DRR
+   tristate "DRR scheduler"
+
 config NET_SCH_INGRESS
tristate "Ingress Qdisc"
depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 1d2b0f7..b055f74 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -28,6 +28,7 @@ obj-$(CONFIG_NET_SCH_TEQL)+= sch_teql.o
 obj-$(CONFIG_NET_SCH_PRIO) += sch_prio.o
 obj-$(CONFIG_NET_SCH_ATM)  += sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)+= sch_netem.o
+obj-$(CONFIG_NET_SCH_DRR)  += sch_drr.o
 obj-$(CONFIG_NET_CLS_U32)  += cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)   += cls_route.o
 obj-$(CONFIG_NET_CLS_FW)   += cls_fw.o
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
new file mode 100644
index 000..aa241b5
--- /dev/null
+++ b/net/sched/sch_drr.c
@@ -0,0 +1,534 @@
+/*
+ * net/sched/sch_drr.c Deficit Round Robin scheduler
+ *
+ * Copyright (c) 2008 Patrick McHardy <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct drr_class {
+   struct hlist_node   hlist;
+   u32 classid;
+   unsigned intrefcnt;
+
+   struct gnet_stats_basic bstats;
+   struct gnet_stats_queue qstats;
+   struct gnet_stats_rate_est  rate_est;
+   struct list_headalist;
+   struct Qdisc *  qdisc;
+
+   u32 quantum;
+   s32 deficit;
+};
+
+#define DRR_HSIZE  16
+
+struct drr_sched {
+   struct list_headactive;
+   struct tcf_proto *  filter_list;
+   unsigned intfilter_cnt;
+   struct hlist_head   clhash[DRR_HSIZE];
+   struct sk_buff *requeue;
+};
+
+static unsigned int drr_hash(u32 h)
+{
+   h ^= h >> 8;
+   h ^= h >> 4;
+
+   return h & (DRR_HSIZE - 1);
+}
+
+static struct drr_class *drr_find_class(struct Qdisc *sch, u32 classid)
+{
+   struct drr_sched *q = qdisc_priv(sch);
+   struct drr_class *cl;
+   struct hlist_node *n;
+
+   hlist_for_each_entry(cl, n, &q->clhash[drr_hash(classid)], hlist) {
+   if (cl->classid == classid)
+   return cl;
+   }
+   return NULL;
+}
+
+static void drr_purge_queue(struct drr_class *cl)
+{
+   unsigned int len = cl->qdisc->q.qlen;
+
+   qdisc_reset(cl->qdisc);
+   qdisc_tree_decrease_qlen(cl->qdisc, len);
+}
+
+static const struct nla_policy drr_policy[TCA_DRR_MAX + 1] = {
+   [TCA_DRR_QUANTUM]   = { .type = NLA_U32 },
+};
+
+static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
+   struct nlattr **tca, unsigned long *arg)
+{
+   struct drr_sched *q = qdisc_priv(sch);
+   struct drr_class 

Re: conntrack doesn't always work when a bridge is used

2008-01-11 Thread Patrick McHardy

Damien Thébault wrote:

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index c1757c7..362fe89 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -285,12 +285,17 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff 
*skb)
skb->nf_bridge->mask ^= BRNF_NF_BRIDGE_PREROUTING;

skb->dev = bridge_parent(skb->dev);
-   if (!skb->dev)
-   kfree_skb(skb);
-   else {
+   if (skb->dev) {
+   struct dst_entry *dst = skb->dst;
+
nf_bridge_pull_encap_header(skb);
-   skb->dst->output(skb);
+
+   if (dst->hh)
+   return neigh_hh_output(dst->hh, skb);
+   else if (dst->neighbour)
+   return dst->neighbour->output(skb);
}
+   kfree_skb(skb);
return 0;
 }





I confirm that this patch solves the problem with this setup, thanks!


Thanks a lot for testing and providing all the data.


Does this mean that without this patch, DNAT doesn't work (correctly)
on a bridge?


DNAT itself works, but the incorrect POSTROUTING hook invocation
can break other things like packet mangling by NAT helpers.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: conntrack doesn't always work when a bridge is used

2008-01-11 Thread Patrick McHardy

Patrick McHardy wrote:

Damien Thébault wrote:

On the router, I'm using this script :

ifconfig eth0 0.0.0.0 up
brctl addbr br0
brctl addif br0 eth0
ifconfig br0 192.168.1.70 up
ifconfig br0:0 192.168.2.70 up
iptables -t nat -A POSTROUTING -d 192.168.2.0/24 -j MASQUERADE
iptables -t nat -A PREROUTING -d 192.168.2.250 -j DNAT
--to-destination 192.168.2.50 




Thanks. Its the DNAT rule thats causing this, the bridge netfilter code
calls dst_output directly for bridged dnated frames, causing these
hook invocations:

   PREROUTING
dst_output()POSTROUTING
   FORWARD
   POSTROUTING


which is obviously broken. I'll see if I can come up with a fix for this.


It appears this has always been broken. Could you test this patch please?

The bridge code only calls dst_output to get a new destination MAC
address for the DNATed packet when the new destination is reachable
on the same bridge, so this patch simply hands the packet to the
neighbour output function without going through the IP stack.


diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index c1757c7..362fe89 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -285,12 +285,17 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff 
*skb)
skb->nf_bridge->mask ^= BRNF_NF_BRIDGE_PREROUTING;
 
skb->dev = bridge_parent(skb->dev);
-   if (!skb->dev)
-   kfree_skb(skb);
-   else {
+   if (skb->dev) {
+   struct dst_entry *dst = skb->dst;
+
nf_bridge_pull_encap_header(skb);
-   skb->dst->output(skb);
+
+   if (dst->hh)
+   return neigh_hh_output(dst->hh, skb);
+   else if (dst->neighbour)
+   return dst->neighbour->output(skb);
}
+   kfree_skb(skb);
return 0;
 }
 


Re: conntrack doesn't always work when a bridge is used

2008-01-11 Thread Patrick McHardy

Damien Thébault wrote:

On Jan 11, 2008 1:24 PM, Patrick McHardy <[EMAIL PROTECTED]> wrote:
  

No, this should work properly. I just tried to reproduce it,
but I only get a single POSTROUTING invocation. I tried with
real bridged traffic, traffic routed between two different
bridge devices and traffic routed between a bridge device
and a normal ethernet device, but everything seems to work
correctly.

Could you send me the commands you're using to configure
your setup and everything (routing, iptables, ...) that
could be related?




On the router, I'm using this script :

ifconfig eth0 0.0.0.0 up
brctl addbr br0
brctl addif br0 eth0
ifconfig br0 192.168.1.70 up
ifconfig br0:0 192.168.2.70 up
iptables -t nat -A POSTROUTING -d 192.168.2.0/24 -j MASQUERADE
iptables -t nat -A PREROUTING -d 192.168.2.250 -j DNAT
--to-destination 192.168.2.50
modprobe nf_nat_ftp
echo 1 > /proc/sys/net/ipv4/ip_forward

And for logging :

modprobe ipt_LOG
iptables -t raw -A OUTPUT -p tcp --dport 21 -j TRACE
iptables -t raw -A OUTPUT -p tcp --sport 21 -j TRACE
iptables -t raw -A PREROUTING -p tcp --dport 21 -j TRACE
iptables -t raw -A PREROUTING -p tcp --sport 21 -j TRACE

I only have one interface (eth0), that's why I use br0 and br0:0, so
the wireshark captures show each packet twice, input on br0 and output
on br0:0 (or input on br0:0 and output on br0) when capturing on eth0.

On the ftp client/server :

ifconfig eth2 192.168.1.50
ifconfig eth2:0 192.168.2.50
ip route del 192.168.2.0/24
ip route add 192.168.2.0/24 dev eth2 via 192.168.1.70

And then I try to connect to 192.168.2.250, this will use the router
192.168.1.70 on eth2, wille be DNATted to 192.168.2.50 and will come
back on eth2:0 on the ftp server.

Like the router captures, we have eth2 and eth2:0 together when
capturing on eth2.

This configuration will work fine, but if I run any of this on the
router, it will not work well anymore :

ifconfig br0:0 192.168.2.7 up

or

ifconfig br0:0 192.168.2.170 up

I don't think I'm using anything else.
  


Thanks. Its the DNAT rule thats causing this, the bridge netfilter code
calls dst_output directly for bridged dnated frames, causing these
hook invocations:

   PREROUTING
dst_output()POSTROUTING
   FORWARD
   POSTROUTING


which is obviously broken. I'll see if I can come up with a fix for this.

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: conntrack doesn't always work when a bridge is used

2008-01-11 Thread Patrick McHardy

Damien Thébault wrote:

2008/1/2 Damien Thébault <[EMAIL PROTECTED]>:
  

On Dec 30, 2007 6:53 PM, Patrick McHardy <[EMAIL PROTECTED]> wrote:


Thanks. They still show the double POST_ROUTING effects (the retransmitted
\0a), but I can't figure out why this would be happening. Please add TRACE
rules in both directions for the FTP control traffic and post the output.
This will allow to verify that we're indeed dealing with double hook
invocations and not some other bug:

modprobe ipt_LOG
iptables -t raw -A OUTPUT -p tcp --dport 21 -j TRACE
iptables -t raw -A OUTPUT -p tcp --sport 21 -j TRACE
iptables -t raw -A PREROUTING -p tcp --dport 21 -j TRACE
iptables -t raw -A PREROUTING -p tcp --sport 21 -j TRACE
  

I tried to use the patch I created earlier (the one adding the hooks
again). I said it worked but it does not everytime.

By the way, Patrick, what do you think about this bug? Maybe I
shouldn't rely on bridges but it's a useful feature sometimes.
  


No, this should work properly. I just tried to reproduce it,
but I only get a single POSTROUTING invocation. I tried with
real bridged traffic, traffic routed between two different
bridge devices and traffic routed between a bridge device
and a normal ethernet device, but everything seems to work
correctly.

Could you send me the commands you're using to configure
your setup and everything (routing, iptables, ...) that
could be related?




-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: conntrack doesn't always work when a bridge is used

2007-12-30 Thread Patrick McHardy

Damien Thébault wrote:

On Dec 22, 2007 8:56 AM, Patrick McHardy <[EMAIL PROTECTED]> wrote:

Yes, the captures show the effects from the double POSTROUTING
invocation. Could you send me captures from the current net-2.6
tree?



Sure, here they are.
(I used David Miller's net-2.6.25 at 75fa3253609430f28da005da494ce5ad3b5c78a1 )
  


Thanks. They still show the double POST_ROUTING effects (the retransmitted
\0a), but I can't figure out why this would be happening. Please add TRACE
rules in both directions for the FTP control traffic and post the output.
This will allow to verify that we're indeed dealing with double hook
invocations and not some other bug:

modprobe ipt_LOG
iptables -t raw -A OUTPUT -p tcp --dport 21 -j TRACE
iptables -t raw -A OUTPUT -p tcp --sport 21 -j TRACE
iptables -t raw -A PREROUTING -p tcp --dport 21 -j TRACE
iptables -t raw -A PREROUTING -p tcp --sport 21 -j TRACE

Thanks.


-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: conntrack doesn't always work when a bridge is used

2007-12-21 Thread Patrick McHardy

Damien Thébault wrote:

On Dec 20, 2007 12:25 PM, Patrick McHardy <[EMAIL PROTECTED]> wrote:

Thanks. Could you also post a tcpdump and enable conntrack logging
by doing "echo 255 >/proc/sys/net/netfilter/nf_conntrack_log_invalid"
and post the output of that, if any (you also need to load ipt_LOG
in case you're not using some other logging backend).



I captured three times. The first time ("bad1" files), the reply is
coming back, but the ftp client doesn't seem to handle it. The second
time ("bad2" files), there is a problem with sequence numbers. And
then the last time ("good" files), it's ok.

I had sequence number errors without the previous bridge patch which
get merged in net-2.6. So I'll try again with the net-2.6 kernel.



Yes, the captures show the effects from the double POSTROUTING
invocation. Could you send me captures from the current net-2.6
tree?


-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: conntrack doesn't always work when a bridge is used

2007-12-20 Thread Patrick McHardy

Damien Thébault wrote:

On Dec 20, 2007 12:07 PM, Patrick McHardy <[EMAIL PROTECTED]> wrote:

Don't worry. I was just wondering because I asked for the output
of the *non-working* case :) Please post that and I'll look into it.



The fact is that this was the output of the non working case, they are similar.
I'm attaching the four files I just made, with both the working and
the non-working case.



Thanks. Could you also post a tcpdump and enable conntrack logging
by doing "echo 255 >/proc/sys/net/netfilter/nf_conntrack_log_invalid"
and post the output of that, if any (you also need to load ipt_LOG
in case you're not using some other logging backend).
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: conntrack doesn't always work when a bridge is used

2007-12-20 Thread Patrick McHardy

Damien Thébault wrote:

Yes, when I'm using ip addresses with the same length, the conntrack
-E output is similar, and it's working.
But if I change the router's "wan"-side ip address to be longer or
shorter than the client's ip address, then it's non-working again.

I don't think it's something in the configuration : the results are
present on two different computers, one being a x86 little endian
debian laptop where I did the bisect, the other being an arm xscale
big endian board with a custom distro (nothing funny here, just
kernel, drivers, busybox and  some utilities).

Well, I'm sorry, I don't want to bother anyone, but those are really
the results I'm seeing.



Don't worry. I was just wondering because I asked for the output
of the *non-working* case :) Please post that and I'll look into it.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: conntrack doesn't always work when a bridge is used

2007-12-20 Thread Patrick McHardy

Damien Thébault wrote:

On Dec 19, 2007 8:03 PM, Patrick McHardy <[EMAIL PROTECTED]> wrote:

Could you capture the conntrack events of the non-working
case with (run in parallel):

conntrack -E
conntrack -E expect



Sure, here it is :


That actually looks like it works properly.

New control connection:


[NEW] tcp  6 120 SYN_SENT src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 [UNREPLIED] src=192.168.2.50 dst=192.168.2.70
sport=21 dport=45090
 [UPDATE] tcp  6 60 SYN_RECV src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
dport=45090
 [UPDATE] tcp  6 432000 ESTABLISHED src=192.168.1.5
dst=192.168.2.250 sport=45090 dport=21 src=192.168.2.50
dst=192.168.2.70 sport=21 dport=45090 [ASSURED]


New expectation for data connection:

> conntrack -E expect :
>
> 300 proto=6 src=192.168.2.50 dst=192.168.2.70 sport=0 dport=33344

New data connection machting expectation, both source and
destination properly NATed:


[NEW] tcp  6 120 SYN_SENT src=192.168.2.50 dst=192.168.2.70
sport=20 dport=33344 [UNREPLIED] src=192.168.1.5 dst=192.168.2.250
sport=33344 dport=20
 [UPDATE] tcp  6 60 SYN_RECV src=192.168.2.50 dst=192.168.2.70
sport=20 dport=33344 src=192.168.1.5 dst=192.168.2.250 sport=33344
dport=20
 [UPDATE] tcp  6 432000 ESTABLISHED src=192.168.2.50
dst=192.168.2.70 sport=20 dport=33344 src=192.168.1.5
dst=192.168.2.250 sport=33344 dport=20 [ASSURED]
 [UPDATE] tcp  6 120 FIN_WAIT src=192.168.2.50 dst=192.168.2.70
sport=20 dport=33344 src=192.168.1.5 dst=192.168.2.250 sport=33344
dport=20 [ASSURED]
 [UPDATE] tcp  6 60 CLOSE_WAIT src=192.168.2.50 dst=192.168.2.70
sport=20 dport=33344 src=192.168.1.5 dst=192.168.2.250 sport=33344
dport=20 [ASSURED]
 [UPDATE] tcp  6 10 CLOSE src=192.168.2.50 dst=192.168.2.70
sport=20 dport=33344 src=192.168.1.5 dst=192.168.2.250 sport=33344
dport=20 [ASSURED]


Data connection closed


 [UPDATE] tcp  6 120 FIN_WAIT src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
dport=45090 [ASSURED]
 [UPDATE] tcp  6 60 CLOSE_WAIT src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
dport=45090 [ASSURED]
 [UPDATE] tcp  6 30 LAST_ACK src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
dport=45090 [ASSURED]
 [UPDATE] tcp  6 120 TIME_WAIT src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
dport=45090 [ASSURED]
 [UPDATE] tcp  6 10 CLOSE src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
dport=45090 [ASSURED]


Control connection closed


[DESTROY] tcp  6 src=192.168.2.50 dst=192.168.2.70 sport=20
dport=33344 packets=4 bytes=559 src=192.168.1.5 dst=192.168.2.250
sport=33344 dport=20 packets=4 bytes=216



[DESTROY] tcp  6 src=192.168.1.5 dst=192.168.2.250 sport=45090
dport=21 packets=17 bytes=916 src=192.168.2.50 dst=192.168.2.70
sport=21 dport=45090 packets=12 bytes=1162


Both connections destroyed
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: conntrack doesn't always work when a bridge is used

2007-12-19 Thread Patrick McHardy

Damien Thébault wrote:

Hello,

I sent the quoted mail to linux-net with my problem yesterday, but I
did a git bisect today and I got the following output :


2bf540b73ed5b304e84bb4d4c390d49d1cfa0ef8 is first bad commit
commit 2bf540b73ed5b304e84bb4d4c390d49d1cfa0ef8
Author: Patrick McHardy <[EMAIL PROTECTED]>
Date:   Wed Dec 13 16:54:25 2006 -0800

[NETFILTER]: bridge-netfilter: remove deferred hooks

Remove the deferred hooks and all related code as scheduled in
feature-removal-schedule.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

:04 04 c49ea947455937566b6129991dde5e86f2453aae 
6611736ce5c0fcde7627494b66b9ea94e37ea42e M  Documentation
:04 04 d0dd0700fe68f98b52687be3a0c31d73f7b15b81 
f8ddf15a0389c5f5b7f2c11d7d0db039a660e1d5 M  include
:04 04 dafccf7ff8657be9adca6b28dbd365cdd6c01ca5 
3eeb1cb4b16cc5cb698ab559b47ea6b0991d4d3a M  net


With exactly this version, the behaviour is similar with what I see
with 2.6.23 and 2.6.24-rc5 (see below).



Could you capture the conntrack events of the non-working
case with (run in parallel):

conntrack -E
conntrack -E expect
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] iptables: mangle table obsoletes filter table

2007-10-12 Thread Patrick McHardy

Al Boldi wrote:

Patrick McHardy wrote:
  

The netlink based iptables successor I'm currently working on allows to
dynamically create tables with user-specified priorities and "built-in"
chains. The only built-in tables will be those that need extra
processing (mangle/nat). So it should be possible to set up tables
basically any way you desire.



Wow!  How soon can we expect this to surface on mainline?


I can't tell at this point, there's still too much work to do
for a realistic estimate. I'll post patches to netfilter-devel
as soon as its good enough for some real testing.

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] iptables: mangle table obsoletes filter table

2007-10-12 Thread Patrick McHardy
Jan Engelhardt wrote:
> On Oct 12 2007 15:48, Patrick McHardy wrote:
> 
>>The netlink based iptables successor I'm currently working on allows to
>>dynamically create tables with user-specified priorities and "built-in"
>>chains. The only built-in tables will be those that need extra
>>processing (mangle/nat). So it should be possible to set up tables
>>basically any way you desire.
> 
> 
> Will ebtables move a bit closer to iptables?


I didn't get to that part yet, but yes, thats one of the goals.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] iptables: mangle table obsoletes filter table

2007-10-12 Thread Patrick McHardy
Jan Engelhardt wrote:
> On Oct 12 2007 16:30, Al Boldi wrote:

With the existence of the mangle table, how useful is the filter table?
>>>
>>>A similar discussion was back in March 2007.
>>>http://marc.info/?l=netfilter-devel&m=117394977210823&w=2
>>>http://marc.info/?l=netfilter-devel&m=117400063907706&w=2
>>>
>>>in the end, my proposal was something like
>>>http://jengelh.hopto.org/GFX0/nf_proposal2.svg
>>
>>Any chance you could publish this as something readable like text/html?
> 
> 
> Like, image/png?
> http://jengelh.hopto.org/GFX0/nf_proposal2.png


The netlink based iptables successor I'm currently working on allows to
dynamically create tables with user-specified priorities and "built-in"
chains. The only built-in tables will be those that need extra
processing (mangle/nat). So it should be possible to set up tables
basically any way you desire.



-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] iptables: mangle table obsoletes filter table

2007-10-11 Thread Patrick McHardy
Please send mails discussing netfilter to netfilter-devel.

Al Boldi wrote:
> With the existence of the mangle table, how useful is the filter table?
> 
> Other than requiring the REJECT target to be ported to the mangle table, is 
> the filter table faster than the mangle table?

There are some minor differences in ordering (mangle comes before
DNAT, filter afterwards), but for most rulesets thats completely
irrelevant. The only difference that really matters is that mangle
performs rerouting in LOCAL_OUT for packets that had their routing
key changed, so its really a superset of the filter table. If you
want to use REJECT in the mangle table, you just need to remove the
restriction to filter, it works fine. I would prefer to also remove
the restriction of MARK, CONNMARK etc. to mangle, they're used for
more than just routing today so that restriction also doesn't make
much sense. Patches for this are welcome.

> If not, then shouldn't the filter table be obsoleted to avoid confusion?

That would probably confuse people. Just don't use it if you don't
need to.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [A BUG?]: routing with multiple default routes

2007-07-30 Thread Patrick McHardy
Bill Davidsen wrote:
> Patrick McHardy wrote:
> 
>>> So is the "src" portion of my table intended to set the source IP, or
>>> did I misread the doc? And is that a bug or a feature?
>>> 
>>
>>
>>
>> Source address selection is done before the first packet is generated,
>> so the marks can't affect it. They just cause rerouting of the packet,
>> but don't change the source address afterwards.
>>   
> 
> So what is the source specification used for then? If it doesn't set the
> source IP, (and it doesn't, I need an SNAT rule), and it doesn't force
> that source IP out the designated NIC, (doesn't do that, that's why I
> came up with all the MARK rules), what exactly does it do? Or is it just
> for documentation?


With routing by fwmark, its entirely useless. The only possibility
to use it would be to support setting a per-socket nfmark value.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hotplug and Multipath routes = lost route

2007-07-25 Thread Patrick McHardy
Dâniel Fraga wrote:
>   I have 2 cable modems on a server (Linux 2.6.22). I use
> multipath, so the route is something like this:
> 
> default 
> nexthop via 201.6.102.1  dev eth1 weight 256
> nexthop via 201.6.107.1  dev eth2 weight 128
> 
>   The first one (eth1) has a higher priority, then when it goes
> down, I can mark the interface eth1 down and Linux automatically
> detects the "dead" gateway and change the route to the second one.
> 
>   Ok. The problem is that when one of the modems goes down, and
> as they use the cdc_ether module to communicate via USB, the *entire*
> route is erased because one of the devices doesn't exist anymore. 
> 
>   It's not a problem with hotplug, since it's correct to remove
> the device and the route that would go through it. But it would be nice
> if the kernel just removed the specific "nexthop" which uses the
> inactive device instead of removing the entire default route.
> 
>   Is there a way to tell the kernel to do that? Or to not remove
> the route at all and just mark the "nexthop" with the inactive device
> as dead and wait for it to come back alive?


Simple solution, use fallback routes:

default
nexthop via dev eth1 ...
nexthop via dev eth2 ...
default nexthop via dev eth1 ..
default nexthop via dev eth2 ..

Removing either interface will cause the multipath route and
one of the other default routes to be removed and the remaining
one can take over.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK

2007-07-24 Thread Patrick McHardy
Al Boldi wrote:
> Patrick McHardy wrote:
> 
>>Al Boldi wrote:
>>
>>>Also, we could leave this as is, and select NF_CONNTRACK_ENABLED instead
>>>of NF_CONNTRACK.
>>
>>I guess so, and that would have to select NF_CONNTRACK.
> 
> 
> Should I resend, or can you take care of it?


Please resend after testing.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK

2007-07-24 Thread Patrick McHardy
Al Boldi wrote:
> Patrick McHardy wrote:
> 
>>Al Boldi wrote:
>>
>>>Patrick McHardy wrote:
>>>
>>>>But I vaguely recall having tried this myself and it broke somewhere,
>>>>maybe it was because of the NF_CONNTRACK_ENABLED option, I can't
>>>>recall anymore. Al, if this also works without removal of
>>>>NF_CONNTRACK_ENABLED, please resend without that part.
>>>
>>>It doesn't.  But how about this, if you really can't live without
>>>NF_CONNTRACK_ENBLED:
>>>
>>>==
>>>--- Kconfig.old  2007-07-09 06:38:52.0 +0300
>>>+++ Kconfig  2007-07-24 20:24:27.0 +0300
>>>@@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG
>>>   and is also scheduled to replace the old syslog-based ipt_LOG
>>>   and ip6t_LOG modules.
>>>
>>>-# Rename this to NF_CONNTRACK in a 2.6.25
>>>-config NF_CONNTRACK_ENABLED
>>>+config NF_CONNTRACK
>>> tristate "Netfilter connection tracking support"
>>> help
>>>   Connection tracking keeps a record of what packets have passed
>>>@@ -40,9 +39,9 @@ config NF_CONNTRACK_ENABLED
>>>
>>>   To compile it as a module, choose M here.  If unsure, say N.
>>>
>>>-config NF_CONNTRACK
>>>+config NF_CONNTRACK_ENABLED
>>> tristate
>>>-default NF_CONNTRACK_ENABLED
>>>+default NF_CONNTRACK
>>>
>>> config NF_CT_ACCT
>>> bool "Connection tracking flow accounting"
>>
>>That defeats the only purpose why we kept it.
> 
> 
> I'm not sure how this would defeat the only purpose.  Isn't the purpose of 
> this to alias NF_CONNTRACK_ENABLED to NF_CONNTRACK?  And as such would yield 
> the same result.


The purpose is to avoid forcing people a second time to reconfigure
the conntrack options since we've completed nf_conntrack and removed
ip_conntrack. Previously NF_CONNTRACK was a bool (selecting the new
implementation) and NF_CONNTRACK_ENABLED specified whether to build
either nf_conntrack or ip_conntrack modular/static/not at all. So
old configs only have the information whether to build modular in
NF_CONNTRACK_ENABLED, but NF_CONNTRACK is what actually controls it.
With your change, old configs will still build nf_conntrack properly,
but they will always choose static linking.

> Also, we could leave this as is, and select NF_CONNTRACK_ENABLED instead of 
> NF_CONNTRACK.

I guess so, and that would have to select NF_CONNTRACK.

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK

2007-07-24 Thread Patrick McHardy
Al Boldi wrote:
> Patrick McHardy wrote:
> 
>>But I vaguely recall having tried this myself and it broke somewhere,
>>maybe it was because of the NF_CONNTRACK_ENABLED option, I can't
>>recall anymore. Al, if this also works without removal of
>>NF_CONNTRACK_ENABLED, please resend without that part.
> 
> 
> It doesn't.  But how about this, if you really can't live without 
> NF_CONNTRACK_ENBLED:
> 
> ==
> --- Kconfig.old   2007-07-09 06:38:52.0 +0300
> +++ Kconfig   2007-07-24 20:24:27.0 +0300
> @@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG
> and is also scheduled to replace the old syslog-based ipt_LOG
> and ip6t_LOG modules.
>  
> -# Rename this to NF_CONNTRACK in a 2.6.25
> -config NF_CONNTRACK_ENABLED
> +config NF_CONNTRACK
>   tristate "Netfilter connection tracking support"
>   help
> Connection tracking keeps a record of what packets have passed
> @@ -40,9 +39,9 @@ config NF_CONNTRACK_ENABLED
>  
> To compile it as a module, choose M here.  If unsure, say N.
>  
> -config NF_CONNTRACK
> +config NF_CONNTRACK_ENABLED
>   tristate
> - default NF_CONNTRACK_ENABLED
> + default NF_CONNTRACK
>  
>  config NF_CT_ACCT
>   bool "Connection tracking flow accounting"


That defeats the only purpose why we kept it. How about we change this
once we remove it, in 2.6.25?
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK

2007-07-24 Thread Patrick McHardy
Sam Ravnborg wrote:
> On Tue, Jul 24, 2007 at 08:36:33AM +0300, Al Boldi wrote:
> 
>>Replaces NF_CONNTRACK_ENABLED with NF_CONNTRACK and selects it for 
>>NF_CONNTRACK_IPV4 and NF_CONNTRACK_IPV6
>>
>>This exposes IPv4/6 connection tracking options for easier Kconfig setup.
>>
>>Signed-off-by: Al Boldi <[EMAIL PROTECTED]>
>>Cc: David Miller <[EMAIL PROTECTED]>
>>Cc: Sam Ravnborg <[EMAIL PROTECTED]>
>>Cc: Andrew Morton <[EMAIL PROTECTED]>
>>---
>>--- a/net/netfilter/Kconfig   2007-07-09 06:38:52.0 +0300
>>+++ b/net/netfilter/Kconfig   2007-07-24 08:28:06.0 +0300
>>@@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG
>>and is also scheduled to replace the old syslog-based ipt_LOG
>>and ip6t_LOG modules.
>> 
>>-# Rename this to NF_CONNTRACK in a 2.6.25
>>-config NF_CONNTRACK_ENABLED
>>+config NF_CONNTRACK


We kept this mainly for an easier upgrade. As the comment states, it
should go in 2.6.25, at which time all people having reconfigured
their kernel at least once since ip_conntrack was removed will have
the NF_CONNTRACK option set to the same value as NF_CONNTRACK_ENABLED.

>>--- a/net/ipv4/netfilter/Kconfig  2007-07-09 06:38:50.0 +0300
>>+++ b/net/ipv4/netfilter/Kconfig  2007-07-24 08:27:39.0 +0300
>>@@ -7,7 +7,7 @@ menu "IP: Netfilter Configuration"
>> 
>> config NF_CONNTRACK_IPV4
>>  tristate "IPv4 connection tracking support (required for NAT)"
>>- depends on NF_CONNTRACK
>>+ select NF_CONNTRACK
>>  ---help---
>>Connection tracking keeps a record of what packets have passed
>>through your machine, in order to figure out how they are related
>>--- a/net/ipv6/netfilter/Kconfig  2007-07-09 06:38:51.0 +0300
>>+++ b/net/ipv6/netfilter/Kconfig  2007-07-24 08:27:54.0 +0300
>>@@ -7,7 +7,8 @@ menu "IPv6: Netfilter Configuration (EXP
>> 
>> config NF_CONNTRACK_IPV6
>>  tristate "IPv6 connection tracking support (EXPERIMENTAL)"
>>- depends on INET && IPV6 && EXPERIMENTAL && NF_CONNTRACK
>>+ depends on INET && IPV6 && EXPERIMENTAL
>>+ select NF_CONNTRACK
>>  ---help---
>>Connection tracking keeps a record of what packets have passed
>>through your machine, in order to figure out how they are related
>>
> 
> This change looks wrong.
> Due to the reverse nature of "select" kconfig cannot fulfill the dependencies
> of selected symbols. So as a rule of thumb select should only select
> symbols with no menu and no dependencies to avoid some of the
> problems that have popped up during the last months.


In this case it looks OK since the dependencies of IPv4 connection
tracking are (besides NF_CONNTRACK) are superset of those of
nf_conntrack.

But I vaguely recall having tried this myself and it broke somewhere,
maybe it was because of the NF_CONNTRACK_ENABLED option, I can't
recall anymore. Al, if this also works without removal of
NF_CONNTRACK_ENABLED, please resend without that part.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPR2 + Netfilter: stateful _routing_ on inbound DNAT, in dual-homed setup?

2007-07-20 Thread Patrick McHardy
Frantisek Rysanek wrote:
> I know that Netfilter can do seamless stateful filtering of traffic 
> returning back through NAT. If I set up two uplinks with a NAT 
> "horizon split" on each of them, it shouldn't be a problem to route 
> traffic to either interface by merely modifying the default route 
> (for manual fail-over), or even by using multiple default routes with 
> IPR2 per-flow balancing mechanisms - and I won't create a routing 
> loop, as my public outbound source address will always belong to the 
> respective ISP, courtesy of the twin NAT outside's.  
> 
> Now what about *inbound* traffic? Suppose I've got a web server in 
> the DMZ. I'm wondering about possible fail-over setups with the two 
> ISP uplinks. I could set up two SNAT rules in the Netfilter's 
> PREROUTING table, one rule for each outside interface, both of them 
> pointing to the internal IP address of my web server. This would work 
> for the inbound packets, but how would the FW box deal with the 
> returning outbound traffic? I know that the Netfilter NAT can observe 
> the stateful information for filtering, but will IPR2 be able to 
> observe that information for *routing*? Not likely, I'd say. Never 
> heard of stateful *routing*. The necessary kernel guts could actually 
> be quite similar to the existing IPR2 per-flow balancing stuff, but I 
> doubt that this (dual-path stateful routing on NAT return traffic) 
> would work somehow seamlessly, out of the box, in the current 
> incarnation of IPR2+Netfilter... Obviously I can do without it, but 
> it would be a nice final touch :-)  
> 
> Any ideas are welcome :-)


You probably want CONNMARK combined with routing by fwmark.
That allows you to deal with NAT properly.

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [A BUG?]: routing with multiple default routes

2007-07-10 Thread Patrick McHardy
Bill Davidsen wrote:
> Still working on this problem, I have found what appears to be a bug.
> The documentation seems to indicate that in a route definition if I have
> "src x.x.x.x" it defines the outgoing IP address. I'm cautiously going
> to say that doesn't seem to be the case.
> 
> I have these rules:
> 
>firewall2:root> ip rule show
>0:  from all lookup local
>32765:  from all fwmark 0x1 lookup mail.in
>32766:  from all lookup main
>32767:  from all lookup 253
> 
> which should (do) send marked packets to the ruleset "mail.in" (historic
> name, not related to incoming mail). And I have these routes defined:
> 
>firewall2:root> ip route show
>192.168.1.0/24 dev eth2  proto kernel  scope link  src 192.168.1.47
>64.65.253.0/24 dev eth1  scope link
>192.168.12.0/24 dev eth0  scope link
>169.254.0.0/16 dev eth1  scope link
>127.0.0.0/8 dev lo  scope link
>default via 192.168.1.1 dev eth2  src 192.168.1.47  metric 1
>firewall2:root> ip route show table mail.in
>default via 64.65.253.1 dev eth1  src 64.65.253.246
> 
> And if I run my multi-NIC tcpdump, I see that packets which are not
> marked go out eth2, and those which ARE marked do in fact go out eth1 as
> they should... but with the source IP of the default route, rather than
> that specified in the mail.in definition. If I add
> 
> iptables -A POSTROUTING -t nat -o eth1 -m mark --mark 1 -j SNAT
>--to 64.65.253.246
> 
> to the nat table, all of a sudden everything works. Note, this is the
> simple two ISP case, running internal to my site, not the nasty one I
> described originally (below).
> 
> So is the "src" portion of my table intended to set the source IP, or
> did I misread the doc? And is that a bug or a feature?


Source address selection is done before the first packet is generated,
so the marks can't affect it. They just cause rerouting of the packet,
but don't change the source address afterwards.

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Conntrack SIP Problem

2007-06-18 Thread Patrick McHardy
Jerome Borsboom wrote:
> Signed-off-by: Jerome Borsboom <[EMAIL PROTECTED]>

Applied, thanks Jerome.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Conntrack SIP Problem

2007-06-18 Thread Patrick McHardy
Jerome Borsboom wrote:
> This is a CC of a patch from my discussion on linux-net mailinglist
> which may be also appropriate here.


It wasn't CCed so I've added linux-net since you've also posted
the patch there.

> Below is a patch that I had to include on top of Herbert Xu's recent
> nat-sip patch to get my SIP setup working:
> 
> [NETFILTER]: sip: Fix RTP address NAT
> 
> My setup is a Fritzbox SIP-client behind a NAT-firewall that talks to a
> server on the internet. The first chunk of the patch was not necessary
> to get the setup working, but I think it is more correct to include it.
> The idea behind it is that DNAT of the the RTP session is only necessary
> if the SIP session has been SNATed. The second chunk adds some SIP
> messages that must be processed as they contain SDP information in my case.


Thanks. They both look OK, but for 2.6.22 I'm only going to add
the new message types. I'll queue the first chunk for 2.6.23.
Can you please send me a Signed-off-by: line for your patch?

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Conntrack SIP Problem

2007-06-14 Thread Patrick McHardy
[CCed netfilter-devel]

Herbert Xu wrote:
> We should correlate the RTP addresses in the SIP packets and setup
> a correct expectation once both are received.


This is a bit of a problem because if we want to NAT the port,
we can't guarantee that its still unused when we finally set
up the expectation. We'd need a way to reserve ports to make
this work reliable.

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Conntrack SIP Problem

2007-06-12 Thread Patrick McHardy
Jerome Borsboom wrote:
> I am not the maintainer of the NAT-SIP module. In the future, you should
> post similar requests to the linux-net mailinglist where it can be
> picked up by more skilled people. I have cross-posted my reply there too.


Actually netfilter-devel is the correct list for these kinds of
problems. Current -stable and -rc should work fine, please report
any problems you're still seeing.

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tbf scheduler: TSO support

2007-05-11 Thread Patrick McHardy
Hirokazu Takahashi wrote:
> I think the concept of TBF is quit good but the userspace tools have
> become old that it doesn't fit to Gb ethernet environment.
> The tools should be updated to care about much faster network and
> GbE jumbo frames. I agree with you at this point.
> 
> On the other hand, handling TSO packet should be a kernel issue.
> A TSO packet is logically just a multiple segmented packet
> including several ordinary packets. This feature should be kept
> invisible from userspace.


Putting aside this question, this cannot work properly without userspace
since the rate table is only built for the given MTU. Your patch tries
to work around that by summing up the result of L2T in q->max_size
steps, which gives incorrect results with MPU or overhead settings.

I think we can only do two things without userspace support:

- disable TSO (not a good idea)
- split the skb into q->max_size chunks on dequeue

The later would allow people to take full advantage of TSO with properly
configured TBF, but it would still at least work with a too small mtu
value.

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tbf scheduler: TSO support

2007-05-10 Thread Patrick McHardy
David Miller wrote:
> From: Patrick McHardy <[EMAIL PROTECTED]>
> Date: Thu, 10 May 2007 14:56:39 +0200
> 
>>I don't see why this is needed, the correct way to use TBF with TSO
>>is to specify a larger MTU value, in which case it won't drop TSO
>>packets.
> 
> 
> Why should a user have to know anything in the world about TSO in
> order to configure TBF properly?  I don't think they should have
> to at all.


The user shouldn't necessarily, but userspace should.
The way I see it the MTU is a fundamental parameter for TBF
(the peakrate bucket size) and just because userspace picks
a bad default (2000) this is no reason to change the
implementation to something that is not really TBF anymore
and even affects non-TSO packets _and_ TSO packets even
when the MTU is chosen large enough (granted, the first
point is an implementation detail). The much better solution
would be to let userspace pick an appropriate default value
and still let the user override it.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tbf scheduler: TSO support

2007-05-10 Thread Patrick McHardy
Hirokazu Takahashi wrote:
> TBF --- Simple Token Bucket Filter --- packet scheduler doesn't
> work correctly with TSO on that it slows down to send out packets.
> TSO packets will be discarded since the size can be larger than
> the scheduler expects. But it won't cause serious problems
> because the retransmitted packets can be passed.
> 
> So I made the scheduler allow to pass TSO packets:
> 
>  - tbf_enqueue() accepts packets with any size if the netdevice
>has TSO ability.
> 
>  - tbf_dequeue() can handle the packets whose size is larger than
>the bucket, which keeps tokens.
>Any packet, which may be TSO packet, can be sent if the bucket is
>full of tokens. this may lead that the number of tokens in
>the bucket turns into negative value, which means kind of debt.
>But we don't have to mind it because this will be filled up
>with tokens in a short time and it will turns into positive value
>again.
> 
> I'm not sure if this approach is the best. I appreciate any comments.


I don't see why this is needed, the correct way to use TBF with TSO
is to specify a larger MTU value, in which case it won't drop TSO
packets.

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iptables, 2.6.20.3 and state match

2007-03-27 Thread Patrick McHardy
David CHANIAL wrote:
> Hi,
> 
>   I trying to exec this cmd :
> 
> iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
> 
>   but, it always say me :
> 
> iptables: Invalid argument
> 
>   I think i have checked all the necessary options ine the menuconfig, 
> but you 
> can find my .config here for checks :
> 
> http://rafb.net/p/zToOyK10.html

You're missing CONFIG_NF_CONNTRACK_IPV4.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problem with active ftp masqarading in kernel newer than 2.6.10-rc1

2005-04-22 Thread Patrick McHardy
Mateusz wrote:
PORT 10,0,4,32,11,121 
200 PORT command successful. 
STOR P4020553.JPG 
500: " OR P4020553.JPG not understood."
It seems, that first two letters from this command were cutted, and server
gets "OR"
instead of "STOR" command
This patch should fix it.
Regards
Patrick
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2005/04/22 00:52:27+02:00 [EMAIL PROTECTED] 
#   [NETFILTER]: Fix NAT sequence number adjustment
#   
#   The NAT changes in 2.6.11 changed the position where helpers
#   are called and perform packet mangling. Before 2.6.11, a NAT
#   helper was called before the packet was NATed and had its
#   sequence number adjusted. Since 2.6.11, the helpers get packets
#   with already adjusted sequence numbers.
#   
#   This breaks sequence number adjustment, adjust_tcp_sequence()
#   needs the original sequence number to determine whether
#   a packet was a retransmission and to store it for further
#   corrections. It can't be reconstructed without more information
#   than available, so this patch restores the old order by
#   calling helpers from a new conntrack hook two priorities
#   below ip_conntrack_confirm() and adjusting the sequence number
#   from a new NAT hook one priority below ip_conntrack_confirm().
#   
#   Tracked down by Phil Oester <[EMAIL PROTECTED]>
#   
#   Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>
# 
# net/ipv4/netfilter/ip_nat_standalone.c
#   2005/04/22 00:52:19+02:00 [EMAIL PROTECTED] +53 -1
#   [NETFILTER]: Fix NAT sequence number adjustment
#   
#   The NAT changes in 2.6.11 changed the position where helpers
#   are called and perform packet mangling. Before 2.6.11, a NAT
#   helper was called before the packet was NATed and had its
#   sequence number adjusted. Since 2.6.11, the helpers get packets
#   with already adjusted sequence numbers.
#   
#   This breaks sequence number adjustment, adjust_tcp_sequence()
#   needs the original sequence number to determine whether
#   a packet was a retransmission and to store it for further
#   corrections. It can't be reconstructed without more information
#   than available, so this patch restores the old order by
#   calling helpers from a new conntrack hook two priorities
#   below ip_conntrack_confirm() and adjusting the sequence number
#   from a new NAT hook one priority below ip_conntrack_confirm().
#   
#   Tracked down by Phil Oester <[EMAIL PROTECTED]>
#   
#   Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>
# 
# net/ipv4/netfilter/ip_nat_core.c
#   2005/04/22 00:52:19+02:00 [EMAIL PROTECTED] +0 -9
#   [NETFILTER]: Fix NAT sequence number adjustment
#   
#   The NAT changes in 2.6.11 changed the position where helpers
#   are called and perform packet mangling. Before 2.6.11, a NAT
#   helper was called before the packet was NATed and had its
#   sequence number adjusted. Since 2.6.11, the helpers get packets
#   with already adjusted sequence numbers.
#   
#   This breaks sequence number adjustment, adjust_tcp_sequence()
#   needs the original sequence number to determine whether
#   a packet was a retransmission and to store it for further
#   corrections. It can't be reconstructed without more information
#   than available, so this patch restores the old order by
#   calling helpers from a new conntrack hook two priorities
#   below ip_conntrack_confirm() and adjusting the sequence number
#   from a new NAT hook one priority below ip_conntrack_confirm().
#   
#   Tracked down by Phil Oester <[EMAIL PROTECTED]>
#   
#   Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>
# 
# net/ipv4/netfilter/ip_conntrack_standalone.c
#   2005/04/22 00:52:19+02:00 [EMAIL PROTECTED] +45 -6
#   [NETFILTER]: Fix NAT sequence number adjustment
#   
#   The NAT changes in 2.6.11 changed the position where helpers
#   are called and perform packet mangling. Before 2.6.11, a NAT
#   helper was called before the packet was NATed and had its
#   sequence number adjusted. Since 2.6.11, the helpers get packets
#   with already adjusted sequence numbers.
#   
#   This breaks sequence number adjustment, adjust_tcp_sequence()
#   needs the original sequence number to determine whether
#   a packet was a retransmission and to store it for further
#   corrections. It can't be reconstructed without more information
#   than available, so this patch restores the old order by
#   calling helpers from a new conntrack hook two priorities
#   below ip_conntrack_confirm() and adjusting the sequence number
#   from a new NAT hook one priority below ip_conntrack_confirm().
#   
#   Tracked down by Phil Oester <[EMAIL PROTECTED]>
#   
#   Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>
# 
# include/linux/netfilter_ipv4.h
#   2005/04/22 00:52:19+02:00 [EMAIL PROTECTED] +3 -0
#   [NETFILTER]: Fix NAT sequence number adjustment
#   
#   The NAT changes in 2.6.11 changed the position where helpers
#   are called a

Re: [ANNOUNCE] iproute2 release

2005-03-12 Thread Patrick McHardy
Stephen Hemminger wrote:
Minor update release for iproute2 is available.
It's missing this patch to use correct values for HZ. If you have
doubts, please say so, I'm not going to resend a fourth time. To
demonstrate the problem:
$ while true; do date; ip -s route get 172.16.195.200; sleep 1; done
Sat Mar 12 15:55:13 CET 2005
172.16.195.200 dev eth0  src 172.16.1.123
cache  expires 2133437sec users 2 used 18 mtu 1500 advmss 1460 
hoplimit 64
Sat Mar 12 15:55:14 CET 2005
172.16.195.200 dev eth0  src 172.16.1.123
cache  expires 2133436sec users 1 used 19 mtu 1500 advmss 1460 
hoplimit 64
Sat Mar 12 15:55:15 CET 2005
172.16.195.200 dev eth0  src 172.16.1.123
cache  expires 2133436sec users 1 used 20 mtu 1500 advmss 1460 
hoplimit 64
Sat Mar 12 15:55:16 CET 2005
172.16.195.200 dev eth0  src 172.16.1.123
cache  expires 2133436sec users 1 used 21 mtu 1500 advmss 1460 
hoplimit 64
Sat Mar 12 15:55:17 CET 2005
172.16.195.200 dev eth0  src 172.16.1.123
cache  expires 2133436sec users 1 used 22 mtu 1500 advmss 1460 
hoplimit 64
Sat Mar 12 15:55:18 CET 2005
172.16.195.200 dev eth0  src 172.16.1.123
cache  expires 2133436sec users 1 used 23 mtu 1500 advmss 1460 
hoplimit 64
Sat Mar 12 15:55:19 CET 2005
172.16.195.200 dev eth0  src 172.16.1.123
cache  expires 2133436sec users 1 used 24 mtu 1500 advmss 1460 
hoplimit 64
Sat Mar 12 15:55:20 CET 2005
172.16.195.200 dev eth0  src 172.16.1.123
cache  expires 2133436sec users 1 used 25 mtu 1500 advmss 1460 
hoplimit 64
Sat Mar 12 15:55:21 CET 2005
172.16.195.200 dev eth0  src 172.16.1.123
cache  expires 2133436sec users 1 used 26 mtu 1500 advmss 1460 
hoplimit 64
Sat Mar 12 15:55:22 CET 2005
172.16.195.200 dev eth0  src 172.16.1.123
cache  expires 2133436sec users 1 used 27 mtu 1500 advmss 1460 
hoplimit 64
Sat Mar 12 15:55:23 CET 2005
172.16.195.200 dev eth0  src 172.16.1.123
cache  expires 2133436sec users 1 used 28 mtu 1500 advmss 1460 
hoplimit 64
Sat Mar 12 15:55:24 CET 2005
172.16.195.200 dev eth0  src 172.16.1.123
cache  expires 2133435sec users 1 used 29 mtu 1500 advmss 1460 
hoplimit 64
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2005/01/29 10:59:53+01:00 [EMAIL PROTECTED] 
#   Use USER_HZ where necessary
# 
# BitKeeper/etc/logging_ok
#   2005/01/29 10:59:51+01:00 [EMAIL PROTECTED] +1 -0
#   Logging to [EMAIL PROTECTED] accepted
# 
# tc/tc_util.c
#   2005/01/29 10:59:48+01:00 [EMAIL PROTECTED] +1 -1
#   Use USER_HZ where necessary
# 
# lib/utils.c
#   2005/01/29 10:59:48+01:00 [EMAIL PROTECTED] +7 -0
#   Use USER_HZ where necessary
# 
# ip/iproute.c
#   2005/01/29 10:59:48+01:00 [EMAIL PROTECTED] +3 -3
#   Use USER_HZ where necessary
# 
# include/utils.h
#   2005/01/29 10:59:48+01:00 [EMAIL PROTECTED] +10 -0
#   Use USER_HZ where necessary
# 
diff -Nru a/include/utils.h b/include/utils.h
--- a/include/utils.h   2005-03-12 15:49:04 +01:00
+++ b/include/utils.h   2005-03-12 15:49:04 +01:00
@@ -113,4 +113,14 @@
return __iproute2_hz_internal;
 }
 
+extern int __iproute2_user_hz_internal;
+extern int __get_user_hz(void);
+
+static __inline__ int get_user_hz(void)
+{
+   if (__iproute2_user_hz_internal == 0)
+   __iproute2_user_hz_internal = __get_user_hz();
+   return __iproute2_user_hz_internal;
+}
+
 #endif /* __UTILS_H__ */
diff -Nru a/ip/iproute.c b/ip/iproute.c
--- a/ip/iproute.c  2005-03-12 15:49:04 +01:00
+++ b/ip/iproute.c  2005-03-12 15:49:04 +01:00
@@ -412,7 +412,7 @@
struct rta_cacheinfo *ci = RTA_DATA(tb[RTA_CACHEINFO]);
static int hz;
if (!hz)
-   hz = get_hz();
+   hz = get_user_hz();
if (ci->rta_expires != 0)
fprintf(fp, " expires %dsec", 
ci->rta_expires/hz);
if (ci->rta_error != 0)
@@ -439,7 +439,7 @@
if ((r->rtm_flags & RTM_F_CLONED) || (ci && ci->rta_expires)) {
static int hz;
if (!hz)
-   hz = get_hz();
+   hz = get_user_hz();
if (r->rtm_flags & RTM_F_CLONED)
fprintf(fp, "%scache ", _SL_);
if (ci->rta_expires)
@@ -491,7 +491,7 @@
if (i-2 < sizeof(mx_names)/sizeof(char*))
fprintf(fp, " %s", mx_names[i-2]);
else
-   fprintf(fp, " metric%d", i);
+   fprintf(fp, " metric %d", i);
if (mxlock & (1install/hz);
if (tm->lastuse != 0)


RE: BUG: Unintended (?) XFRM bypass

2005-02-22 Thread Patrick McHardy
On Tue, 22 Feb 2005, DuBuisson, Thomas wrote:
Summary of important items:
ip policy routing used.
Packet matching policy not getting encrypted
Kernel version 2.6.9
ipsec-tools version 0.4rc1
Using IPsec tunneling
I'm more than happy to try suggestions or provide misc details.
I can't reproduce it with 2.6.11-rc4. Please try the latest kernel
to see if the problem persists.
Regards
Patrick
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: Unintended (?) XFRM bypass

2005-02-22 Thread Patrick McHardy
DuBuisson, Thomas wrote:
Please CC me on all responses.
The XFRM frame work seems to be bypassed by the use of advanced routing.
I have ran the following test:
Network:A <---> B <-> C
where the IP of 'B' on network AB is j (eth0)
and the IP of 'B' on network BC is k (eth1)
Kernel 2.6.x: Be sure to have: Advanced Routing->Policy Routing compiled in
your kernel.
A) Setup IPsec ESP tunnels between computer A and B (both IP addresses k and
j)
B) Send packets to 'A' from 'B' with IP 'k'.
Do this with: ip route add A src k dev eth0
C) Observe that these packets are unencrypted.
Works correctly here. Which kernel are you using ? Please post your full
configuration (policies, routes, firewall rules) so we can see whats
different with your setup.
Regards
Patrick
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPSec - Strange routing problem

2005-02-10 Thread Patrick McHardy
Tom Eastep wrote:
Patrick,
Are you aware of a way to install forward policies in a Roadwarrior
scenario (racoon instance configured with "generate-policy on; passive
on;")? I'm only able to generate in and out policies so the tunnel
endpoint fails to forward traffic from the tunnel.
I sent a working but unfinished patch to ipsec-tools-devel some time ago,
hopeing someone would pick it up and integrate it properly. I don't know
what happend to it.
Regards
Patrick
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPSec - Strange routing problem

2005-02-09 Thread Patrick McHardy
Joshua Schmidlkofer wrote:
I am setting up an IPSec tunnel (ESP) between two networks.  I can 
ping the private side of the routers, from the routers, however, I 
can't get anything else through.  It's very strange.  tcpdump shows 
traffic going in the correct direction, but nothing get's out the 
internal network.

 ## [Begin local tunnel: 10.1.1.0/24 ] 
 spdadd 2.2.2.2 10.1.1.0/24  any -P in  ipsec
esp/tunnel/2.2.2.2-5.5.5.5/require;
 spdadd 10.1.1.0/24  2.2.2.2 any -P out ipsec
esp/tunnel/5.5.5.5-2.2.2.2/require;
 ## [End  ] 
You need to install forward policies. You can duplicate the input policy
and replace "-P in" with "-P fwd" or use setkey from a current ipsec-tools
release, which does this automatically.
Regards
Patrick
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html