question: frag_max_size not checked in ip_finish_output

2018-11-20 Thread Wenxin Wang
Dear developers,

It seems that with defragmentation enabled,
`ip_finish_output` doesn't honor `IPCB(skb)->frag_max_size`,
while `ip6_finish_output` checks `IP6CB(skb)->frag_max_size`.
(Sorry for the reposting, I found that I need to subscribe
to the mailing list, and I also add results of my experiment)

The relevant code is here
https://elixir.bootlin.com/linux/latest/source/net/ipv4/ip_output.c#L310
https://elixir.bootlin.com/linux/latest/source/net/ipv6/ip6_output.c#L151

As far as I know, `frag_max_size` prevents the forwarding routine
from sending packets longer than the maximum fragment received.
I'm wondering why `ip_finish_output` doesn't do the same as
`ip6_finish_output`, especially when `ip_fragment` and `ip_do_fragment`,
called (indirectly) by `ip_finish_output` itself, both cap the output mtu
by this `frag_max_size`.

I did some experiement with two connected machines, one as a router
with ipv4/ipv6 NAT and 1500 mtu, the other as a client behind NAT
and 1280 mtu. Since NAT was enabled on the router it will do
defragmentation. Using `traceroute`, the client sent udp packets with length
1500, which were fragmented by itself. I captured packets
on the ingress and egress port of the NAT router, and here's the output:

--- IPv4 with NAT (and defrag)
ingress:
09:26:39.030436 IP 192.168.1.2.44654 > 223.5.5.5.33434: UDP, bad
length 1472 > 1248
09:26:39.030451 IP 192.168.1.2 > 223.5.5.5: ip-proto-17

egress:
09:26:39.030543 IP 202.38.101.2.58599 > 223.5.5.5.33437: UDP, length 1472

--- IPv6 with NAT (and defrag)
ingress:
09:18:26.947246 IP6 fdff::2 > 2001:250:3::1: frag (0|1232) 57242 >
33434: UDP, bad length 1452 > 1224
09:18:26.947262 IP6 fdff::2 > 2001:250:3::1: frag (1232|228)

egress:
09:18:26.947362 IP6 2001:da8:c4c6:2::2 > 2001:250:3::1: frag (0|1232)
49616 > 33437: UDP, bad length 1452 > 1224
09:18:26.947365 IP6 2001:da8:c4c6:2::2 > 2001:250:3::1: frag (1232|228)
-

It can be seen that with defragmentation, IPv6 keeps the fragments
below `frag_max_size`, while IPv4 doesn't. I understand that IPv6 routers
are not allowed to meddle with fragmentation, while IPv4 routes can at least
further fragment packets; but judging from the behavior of `ip_fragment`,
I think the IPv4 code is also trying to honor `frag_max_size`, but didn't
check it when deciding to do fragmentation or not.

Many thanks in advance!
If I'm sending to the wrong person, or wrong mailing list, please let me
know. It's my first time trying to ask questions to Linux developers, and
sorry for the disturbance.

Thank you for making Linux great ;)
Sincerely,
Wenxin Wang

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


question: frag_max_size not checked in ip_finish_output

2018-11-20 Thread Wenxin Wang
Dear developers,
I'm trying to understand the different behavior between `ip_finish_output` and
`ip6_finish_output`, when deciding whether to do fragmentation or not.

`ip_finish_output` calls `ip_fragment` when `skb->len` exceeds the
destination mtu;
In addition to this mtu check, `ip6_finish_output` also checks if
`skb->len > IP6CB(skb)->frag_max_size`.

The relevant code is here
https://elixir.bootlin.com/linux/latest/source/net/ipv4/ip_output.c#L310
https://elixir.bootlin.com/linux/latest/source/net/ipv6/ip6_output.c#L151

As far as I know, `frag_max_size` prevents the forwarding routine from sending
packets longer than the maximum fragment received after defragmentation.
I'm wondering why `ip_finish_output` doesn't check similarily for
`IPCB(skb)->frag_max_size`,
especially when `ip_fragment` and `ip_do_fragment`, called
(indirectly) by `ip_finish_output`,
both cap the output mtu by this frag_max_size.

Many thanks in advance!
If I'm sending to the wrong person, or wrong mailing list, please let
me know. It's my first
time trying to ask questions to Linux developers, and sorry for the
disturbance. Currently I'm
not subscribed to any mailing list, but I will if necessary.

Thank you for making Linux great ;)
Sincerely,
Wenxin Wang

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies