TPROXY and original dest address question
Hi, I found some time to get back to my transparent proxy support for Netfilter. I posted a patch about 2 months ago which implemented a TPROXY target in its own tproxy table, which was able to redirect TCP sessions to a local socket but was missing a way to query this address. At the developer's workshop I agreed with Rusty that the destination address should be stored associated with the socket as soon as the connection is established. So here's how it would work: - TPROXY target redirects a session - the original destination address/port number is stored in the IPCB() part of the skb - as soon as the socket is created this address/port number is copied into sk->tp_pinfo.af_tcp (struct tcp_opt) This would happen in tcp_v4_hnd_req() - this information is queried by the application using a getsockopt call to fetch the original destination address, the getsockopt can be implemented by registering an nf_sockopt_ops I'd like to have the core-members advice, is this a good way? Harald? -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: TPROXY
Hi, On Thu, Mar 14, 2002 at 06:05:43PM +0100, Jean-Michel Hemstedt wrote: > - is there any update regarding TPROXY since 13/Feb/2002? not yet, sorry > - is TPROXY intended to replace 'slessdir' and 'IP_INTERCEPT'? yes. > - will it be included in the kernel someday (which version?)? as soon as it's ready, but it all depends on the coreteam. > - does it provide the definitive patch for nonlocal binding? not yet, but it will support this. > - are there examples on how to use it? (apart from the comments in the diff)? again, not yet, the release I made was completely a work-in-progress, it showed some signs of working but it was not complete. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: TPROXY
On Wed, Mar 20, 2002 at 12:12:24AM +0100, Henrik Nordstrom wrote: > > 2) the 50080/50443 applications rely on TPROXY framework and uses > > nonlocal_bind. > > Except that nonlocal_bind do not yet work in TPROXY, does it? not yet. > > > Zorp supports HTTPS, but it doesn't encapsulate it into CONNECT. > > > It simply decrypts ongoing traffic, checks HTTP within it, and > > > sends it on reencrypted. But for this to work you'd need to run > > > Zorp on your firewall (where it was meant to run) > > At the cost of totally invalidating SSL in terms of proxying. > > - Client can no longer verify the authenticity of the origin server > further than the proxy. > - Servers can no longer authenticate or verify the client. > > Typical man-in-the-middle scenario. > > I assume we are talking about what is nominated by the IEFT WREC > group as "surrogate" servers rather than proxies here.. If not then > decrypting proxied SSL traffic is a serious breach of security. Tunnelling SSL through firewalls _is_ a more serious breach of security. It is a full-speed covert channel. IRC and ICQ clients began to use such holes in the firewall to send IRC/ICQ traffic. Of course a proxy sitting between the client and the server means that peer certificates cannot be verified on the other peer. On the server side the firewall can perform this verification (and show a trusted certificate to the client) Providing a client certificate to the server is not very common, if it is required a tunnel can be opened to that _specific_ server, and nothing else. So using a real decrypting HTTPS proxy for general https traffic, and opening holes to specific destinations is definitely more secure than a simple 'pass-through' hole in the firewall. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: TPROXY and original dest address question
Hi, I have the intent to develop real transparent proxy support into the kernel 2.4 series (not a backport of the original 2.2 code) Since at a few places it affects network core I asked the question below on netfilter-devel and they directed me to here. Could you please comment on it? For a reference, the implementation tries to touch the networking code the least possible, so it rewrites destination addresses prior they enter the networking core. Its a simple, stateless DNAT. On Wed, Mar 27, 2002 at 08:59:01AM +0100, Harald Welte wrote: > On Tue, Mar 26, 2002 at 04:21:04PM +0100, Balazs Scheidler wrote: > > Hi, > > > > I found some time to get back to my transparent proxy support for Netfilter. > > cool. We'd really like to see this getting forward. > > > - TPROXY target redirects a session > > > > - the original destination address/port number is stored in the IPCB() part > > of the skb > > > > - as soon as the socket is created this address/port number is copied into > > sk->tp_pinfo.af_tcp (struct tcp_opt) This would happen in tcp_v4_hnd_req() > > > > - this information is queried by the application using a getsockopt call to > > fetch the original destination address, the getsockopt can be implemented > > by registering an nf_sockopt_ops > > > > I'd like to have the core-members advice, is this a good way? Harald? > > This looks fine to me, but I'm not as much into the sockets code as others > are. > > If you want to make it really correct, I'd send that Mail to > the [EMAIL PROTECTED] Mailinglist. > > David Miller, Andi Kleen and Alexey Kuznetsov (the networking gods) are hanging > out on that list, so you might get some comments related the 'abuse' of > tp_pinfo.af_tcp and IPCB() from them. > > Based on their reaction you will see if there is a need to change something > or if they would like something like this in the kernel. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: TPROXY
On Wed, Mar 27, 2002 at 10:15:56AM +0100, Henrik Nordstrom wrote: > On Tuesdayen den 26 March 2002 16.33, Balazs Scheidler wrote: > > > Providing a client certificate to the server is not very common, if it is > > required a tunnel can be opened to that _specific_ server, and nothing > > else. > > > > So using a real decrypting HTTPS proxy for general https traffic, and > > opening holes to specific destinations is definitely more secure than a > > simple 'pass-through' hole in the firewall. > > You missed the point here. Using a decryption HTTPS proxy invalidates both > the use of client certificates AND the use of server certificates, which > makes the use of SSL somewhat pointless. Further, unless the proxy runs it's > own CA trusted by the browsers then the users will always be warned that the > server certificate is invalid when using such proxy. I think you missed the point here. Of course the firewall verifies the server's certificate using its own trusted list of CAs. The user is not capable of deciding whether a certificate presented to him really belongs to the given server. They simply press 'continue' without thinking that the server they are communicating with is fake. Of course if you AND your users know what the hell a certificate is, they can decide but I think you are a minority. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: TPROXY and original dest address question
On Wed, Mar 27, 2002 at 02:56:56PM +0100, Henrik Nordstrom wrote: > Please don't forget UDP. I won't. I definitely want UDP as well. > > For UDP you need to save the original destination, and then implement a > control message extension for sending this to userspace together with the > packet in response to a recvmsg() call, or hack the kernel to return the > original destination in IP_PKTINFO (would be the most natural I think). > > See earlier post from me for a lengthy discussion on how one can do this in > the current NAT scheme. If you same the address in the actual skb then this > becomes even easier. In case of IP_PKTINFO only only two lines.. > > Reviewing the existing sockopt options the following seems like the correct > calls: > > * For TCP, return the original destination in getsockopt(SOL_IP, > IP_PKTOPTIONS...) > > * For UCP, return the original destination in the IP_PKTINFO recvmsg control > message, and if possible, use the same to allow the application to control > the source address when sending packets using sendmsg(). ok. I originally wanted to have separate getsockopt calls, but it's better to use already established ones. The only possible problem that I need to tocuh the networking core which I want to avoid touching. > What I do not quite get is how TPROXY is supposed to handle return traffic, > fragmented packets or ICMP, if you are doing stateless NAT. It doesn't handle currently any of them. Fragmentation can be solved by defragmenting incoming packets. (they are destined to the local ip stack anyway) ICMP can be handled in the prerouting hook looking up possible transparent proxy entries. > Also, who is responsible for making sure the application protocol is NAT:ed > properly in TPROXY. For example FTP PASV. Is it the kernel, or is it the > userspace proxy responsibility to get the correct (foreign) IP address in > such case? And what about "related" connections such as an FTP data channel? Of course the proxy itself. How it currently works in Zorp (with kernel 2.2): * the FTP command channel is redirected to the proxy * when a PASV command is sent, a non-local bind is performed to bind to the server's IP & random port * the PASV reply is rewritten to contain information about the allocated port * the data channel is established when the client connects to the socket the firewall allocated * the connection to the server is then established by the proxy > > Sorry if I am making things overly complex here.. > > In my view (as an application developer, not netfilter hacker) the problems > with the standard netfilter approach are: > > 1. Cannot easily support non-local bind, to allow the userspace proxy > application to masquerade as the client > > 2. Cannot get the original destination of a redirected UDP packet in an easy > manner (might be possible by parsing /proc/net/ip_conntrack and quess which > is the correct "connection"...) > > 3. conntrack adds yet another state table, with a bunch of new DOS > conditions one must worry about.. conntrack will not be involved in TPROXY, though I want them to interoperate. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: TPROXY
On Wed, Mar 27, 2002 at 04:17:53PM +0100, Jean-Michel Hemstedt wrote: > > > On Tuesdayen den 26 March 2002 16.33, Balazs Scheidler wrote: > > The user is not capable of deciding whether a certificate presented to him > > really belongs to the given server. They simply press 'continue' without > > thinking that the server they are communicating with is fake. > > > > Of course if you AND your users know what the hell a certificate is, they > > can decide but I think you are a minority. > > > > We are far from TPROXY, but here is my point of view: > > - HTTPS decrypting proxy is an (mitma) alternative if you want > to block all "CONNECT" operations in your proxy. But it sounds > like an absuse protection against inside users. And unfortunately, > for the user itself, as mentionned above, it will block services > such as home banking as well. * If you allow HTTPS transparently, CONNECT is not invoked. * If you use a non-transparent HTTP proxy, the client requests a CONNECT from the proxy which in turn connects to the web server opening a hole in your firewall. You have three options: 1) enable SSL traffic without being able to verify its contents (Nimda through SSL anyone?) 2) disable SSL completely 3) use a decrypting SSL proxy with content verification > > - If your proxy allows "CONNECT" requests, then virtually anything > can pass through it, and HTTPS decrypting proxy does not make sense. why? I attach a decrypting HTTPS proxy when a CONNECT request is encountered, as follows: * Nontransparent HTTP proxy receives a CONNECT www.homebank.hu:443 HTTP/1.0 request * Http proxy stacks in an SSL proxy which receives the datastream after CONNECT * The SSL proxy decrypts traffic and stacks in a HTTP proxy again: [nontransparent HTTP proxy] | [decrypting SSL proxy invoked after CONNECT] | [stacked transparent HTTP proxy] The above scenario is completely doable with Zorp. > Then, if you are really concerned by insider attacks, what about a > session/tunnel timer which could be a possible (ugly) protection > against wormhole kinds of attacks, without invalidating ssl? IMHO it's not about insider attacks, its about incompetent clients who start trojan horses, get viruses and accept certificates without even knowing what it means. Decrypting on the firewall is not invalidating SSL. SSL is authentication+integrity protection+crypted traffic. Authentication is performed by the firewall, integrity protection is performed and the whole traffic is crypted. Authentication is moved from the client computer to the firewall, which checks it more strictly than most clients do. And the firewall accepts a certificate based on its policy. No user should override this. Of course when moving the certificate authentication is not an option (because client certificates are used, which are stored on a hardware token), you can still use a 'hole', but this can be limited to a few addresses only. btw: I think this discussion is off-topic on netfilter-devel, so we might continue our discussion in private. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: TPROXY and original dest address question
> > It doesn't handle currently any of them. Fragmentation can be solved by > > defragmenting incoming packets. (they are destined to the local ip stack > > anyway) > > Defragmentation is defenitely needed for this thing to be used in production. > For experimentation conntrack can be used to defragment.. In my previous attempts to forward port the transparent proxy features of 2.2, I simply used ip_defrag(skb), which returned non-NULL when a full fragment was reassembled. > > ICMP can be handled in the prerouting hook looking up possible transparent > > proxy entries. > > Where is the "possible transparent proxy entries" defined? Internally in > TPROXY, or in the host IP stack socket table? in TPROXY. > I guess this would be the rule table telling what should be diverted by > TPROXY, which from my understanding would be your iptables ruleset... No. I have -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
[PATCH] transparent proxying, #2 release
Hi, I have prepared a work-in-progress patch showing which directions I'm heading with my transparent proxy patches. Here is a summary of changes: 1) It is now possible to query where a connection was destined. It is using the method Henrik Nordstrom suggested: I defined the IP_ORIGADDRS control message (can be enabled using a setsockopt call, and queried using IP_PKTOPTIONS) 2) I also added support for fragmented packets. I didn't test it though, comments on this are welcome. I'm doing this in my PREROUTING hook: + if (ip->frag_off & htons(IP_MF|IP_OFFSET)) { + *pskb = ip_defrag(*pskb); + if (*pskb == NULL) + return NF_STOLEN; + } 3) I wrote a small program which shows how to use the currently implemented features. It can be started from inetd (because that was the easiest way) Comments, as always, are welcome. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1 diff -urN --exclude-from kernel-exclude linux-2.4.17-vanilla/include/linux/in.h linux-2.4.17-TPROXY-ng/include/linux/in.h --- linux-2.4.17-vanilla/include/linux/in.h Mon Nov 5 21:42:13 2001 +++ linux-2.4.17-TPROXY-ng/include/linux/in.h Wed Mar 27 08:54:22 2002 @@ -67,6 +67,7 @@ #defineIP_RECVTOS 13 #define IP_MTU 14 #define IP_FREEBIND15 +#define IP_ORIGADDRS 16 /* BSD compatibility */ #define IP_RECVRETOPTS IP_RETOPTS @@ -107,6 +108,14 @@ struct in_addr ipi_spec_dst; struct in_addr ipi_addr; }; + +struct in_origaddrs { + struct in_addr ioa_srcaddr; + struct in_addr ioa_dstaddr; + unsigned short int ioa_srcport; + unsigned short int ioa_dstport; +}; + /* Structure describing an Internet (IP) socket address. */ #define __SOCK_SIZE__ 16 /* sizeof(struct sockaddr) */ diff -urN --exclude-from kernel-exclude linux-2.4.17-vanilla/include/linux/netfilter_ipv4/ipt_TPROXY.h linux-2.4.17-TPROXY-ng/include/linux/netfilter_ipv4/ipt_TPROXY.h --- linux-2.4.17-vanilla/include/linux/netfilter_ipv4/ipt_TPROXY.h Thu Jan 1 01:00:00 1970 +++ linux-2.4.17-TPROXY-ng/include/linux/netfilter_ipv4/ipt_TPROXY.hWed Feb 13 +09:29:34 2002 @@ -0,0 +1,15 @@ +#ifndef _IPT_TPROXY_H_target +#define _IPT_TPROXY_H_target + +struct ipt_tproxy_target_info { + u_int16_t redir_port; + /* unsigned long fwmark; */ +}; + +struct ipt_tproxy_user_info { + int changed; + u_int16_t redir_port; + unsigned long fwmark; +}; + +#endif /*_IPT_TPROXY_H_target*/ diff -urN --exclude-from kernel-exclude linux-2.4.17-vanilla/include/net/ip.h linux-2.4.17-TPROXY-ng/include/net/ip.h --- linux-2.4.17-vanilla/include/net/ip.h Mon Nov 5 21:43:09 2001 +++ linux-2.4.17-TPROXY-ng/include/net/ip.h Wed Mar 27 08:55:07 2002 @@ -46,6 +46,12 @@ #define IPSKB_MASQUERADED 1 #define IPSKB_TRANSLATED 2 #define IPSKB_FORWARDED4 + +#if defined(CONFIG_IP_NF_TPROXY) || defined(CONFIG_IP_NF_TPROXY_MODULE) + u32 origdstaddr; + u16 origdstport; +#endif + }; struct ipcm_cookie diff -urN --exclude-from kernel-exclude linux-2.4.17-vanilla/include/net/sock.h linux-2.4.17-TPROXY-ng/include/net/sock.h --- linux-2.4.17-vanilla/include/net/sock.h Thu Mar 28 02:18:47 2002 +++ linux-2.4.17-TPROXY-ng/include/net/sock.h Thu Mar 28 05:19:41 2002 @@ -418,6 +418,11 @@ int linger2; unsigned long last_synq_overflow; + +#if defined(CONFIG_IP_NF_TPROXY) || defined(CONFIG_IP_NF_TPROXY_MODULE) + u32 origdstaddr; + u16 origdstport; +#endif }; diff -urN --exclude-from kernel-exclude linux-2.4.17-vanilla/net/ipv4/ip_sockglue.c linux-2.4.17-TPROXY-ng/net/ipv4/ip_sockglue.c --- linux-2.4.17-vanilla/net/ipv4/ip_sockglue.c Wed Oct 31 00:08:12 2001 +++ linux-2.4.17-TPROXY-ng/net/ipv4/ip_sockglue.c Thu Mar 28 03:14:58 2002 @@ -48,6 +48,7 @@ #define IP_CMSG_TOS4 #define IP_CMSG_RECVOPTS 8 #define IP_CMSG_RETOPTS16 +#define IP_CMSG_ORIGADDRS 32 /* * SOL_IP control messages. @@ -107,6 +108,20 @@ put_cmsg(msg, SOL_IP, IP_RETOPTS, opt->optlen, opt->__data); } +#if defined(CONFIG_IP_NF_TPROXY) || defined(CONFIG_IP_NF_TPROXY_MODULE) + +void ip_cmsg_recv_origaddrs(struct msghdr *msg, struct sk_buff *skb) +{ + struct in_origaddrs ioa; + + ioa.ioa_srcaddr.s_addr = 0; + ioa.ioa_srcport = 0; + ioa.ioa_dstaddr.s_addr = IPCB(skb)->origdstaddr; + ioa.ioa_dstport = IPCB(skb)->origdstport; + put_cmsg(msg, SOL_IP, IP_ORIGADDRS, sizeof(ioa), &ioa); +} + +#endif void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb) { @@ -135,6 +150,13 @@ if (flags & 1) ip_cmsg_recv_retopts(msg, skb); + +#if defined(CONFIG_IP_NF_TPROXY) || defined(CONFIG_IP_NF_TPROXY_MODULE) + if ((flags>>=1) == 0) + return; + if (flags
Re: TPROXY and original dest address question
On Thu, Mar 28, 2002 at 04:02:46PM +0100, Henrik Nordstrom wrote: > Balazs Scheidler wrote: > > > > Where is the "possible transparent proxy entries" defined? Internally in > > > TPROXY, or in the host IP stack socket table? > > > > in TPROXY. > > > > > I guess this would be the rule table telling what should be diverted by > > > TPROXY, which from my understanding would be your iptables ruleset... > > > > No. I have > > You have what? Seems to be part of the message missing here..?? Yes, sorry. There's a translation table in TPROXY independent from the tproxy iptables table. The rules are in the iptables table called 'tproxy', and contains one transparent proxy rule for each service needed. As a connection is established, a new entry is added to the translation table with: remote addr/remote port, original dest/original port, local dest/local port. Then both the prerouting and the local output hooks perform translation of the packet flow according to the translation table. In a sence this table is similar to the conntrack tables, with the exception that the primary focus is to assign packet endpoints with local sockets, identified by their own IP/port pair. Thus the connection between a redirected session and a local socket is not the socket layer, but this translation table, therefore no packet with foreign IP address enter the networking core. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: TPROXY and original dest address question
On Thu, Mar 28, 2002 at 04:14:13PM +0100, Henrik Nordstrom wrote: > Is TPROXY is a stand-alone netfilter module, not a iptables target? > > I thought it was a iptables target, but from your answer it seems like it > should be a netfilter module on it's own.. It became an iptables module on its own. The reasoning was discussed at the developers' meeting in Enschede. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: TPROXY and original dest address question
On Thu, Mar 28, 2002 at 04:39:51PM +0100, Henrik Nordstrom wrote: > Thanks. Explains it quite well. > > So there is yet another state table involved here. > > Now I am a little confused. What exacly is it that makes this new state table > better suited for the job than conntrack? because we don't do full TCP tracking, and our NAT is quite limited. (only DNAT, and only to local IP stack). And in addition entries are not timeouted from the table. a new entry is added to this table when 1) a TPROXY destination is encountered 2) when a socket is 'bound' to a foreign address (either for listening and connecting) an entry is removed from this table when 1) the socket associated with the entry is destroyed (iff a socket is associated with an entry) 2) when a TCP rst is returned by the stack (happens only when a socket is not yet associated) -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: TPROXY and original dest address question
On Thu, Mar 28, 2002 at 05:55:25PM +0100, Henrik Nordstrom wrote: > > an entry is removed from this table when > > > > 1) the socket associated with the entry is destroyed (iff a socket is > >associated with an entry) > > Ok. So there is integration between the tproxy table and the host IP stack > somehow, to keep the TPROXY table in sync with the host IP stack. Nice. Kind > of missing in conntrack.. Until now the only binding between the socket and the entry was the address of the local socket. This might have to be changed due to the problem you described below. > > 2) when a TCP rst is returned by the stack (happens only when a socket is > >not yet associated) > > Why this? And doesn't it allow for an easy DOS on TPROXY sessions? > > You should not be processing RST unless you are also processing TCP widows. > Not all RST packets resets "the" session. > > Ah, I think I understand now. You only do this when there isn't yet a socket > in the host IP stack. In such case it is needed. yes, an alternative would be a hook in the kernel which is called when a socket was not found to an incoming SYN. this is an ugly hack though. > Sounds like it could be made to work for TCP. > > UDP is a bit different thou.. but there isn't that big need of a any > connection table there, except for ICMP processing. For UDP I only want to do half-NAT, which means that it would be possible to send a UDP frame with a custom IP address, and receive one destined to somebody else. ICMP processing is needed when we send an UDP frame (with foreign source address), and the destination host returns an ICMP error to the sender. In Linux 2.2 we do the following as a transparent proxy with UDP traffic: * we have a sender socket, which we use to send packets with specified source AND destination address * prior to sending a packet we create, bind and connect a socket to a destination we are sending packets to (this socket receives ICMP errors) A similar approach is doable: * the source address is specified in a control message of sendmsg() * this doesn't create a translation entry in TPROXY * a separate socket is created, and a setsockopt is issued, which places socket related information into the translation table * when an ICMP error is received, the second socket is found, and ICMP error is rewritten accordingly * if no specific socket is found, it is forwarded (and dropped on the forward chain) > > Hmm.. regarding ICMP. How do you plan on handle ICMP from the host stack > without TCP window tracking? > > Problem: There may be multiple sessions from the same client IP,PORT to the > same PORT on multiple servers, and after NAT there isn't sufficient > information to distinguish these by the addressing alone. > > 10.0.1.4:52346 -> 192.168.96.32:80 > 10.0.1.4:52346 -> 192.168.84.253:80 > 10.0.1.4:52346 -> 176.16.48.52:80 > > The problem is much more evident if you look at UDP traffic, but exists for > TCP as well. For TCP you can easily see this if there is multiple clients > behind a NAT gateway (for example netfilter SNAT). > > Hmm.. this problem probably also applies to the de-NAT:ing of traffic, but > there you can probably get by by querying the socket for the real source > address (original destination address). Hmm... this is a real problem, but I only see this occuring in our case when we redirect a session, and responses must be de-NATed. In this case however skb->sk is unique (except for maybe SYN-ACK, because socket is not created until the three-way handshake is fully completed) -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
TPROXY, picking up the source address for UDP
Hi, Another problem I popped into while figuring out how to set the outgoing source address for UDP frames. In Linux 2.2 UDP sending with specified source address worked on a frame by frame basis, and I would like to keep this behaviour. An aux message would be used in sendmsg() to specify the outgoing source address of a packet. My problem is that it is quite difficult to change the source ip of the skb based on an aux message. In the kernel it works as follows: 1) udp_sendmsg is called 2) which in turn calls ip_cmsg_send, which sets up an ipcm_cookie struct 3) this struct is then passed to ip_build_xmit(), which sets up the skb based on ipcm My problem is that to attach create a new cmsg, I'd need to modify the cmsg_cookie struct as it is the only connection between the skb and sendmsg(), and in addition ip_build_xmit() must also be changed as this is the one which processes messages. An alternative way would be add a translation entry about the required change to the TPROXY translation table. The problem with this that adding the entry, sending a single frame, and removing the entry doesn't seem to be very atomic to me. (the only possibility here would be to create a flag assigned to the translation entry, saying that this entry is applied only once => but this might cause problems on SMP, as two processes might be issuing sendmsg() calls at the same time) Opinions? -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
TPROXY-ng-03
Hi, I've released my latest version of the TPROXY patches available at http://www.balabit.hu/en/downloads/tproxy/ The changes include: * handle fragmented packets (also for LOCAL_OUT) * handle parallel sessions with non-unique address tuples (using a cookie assigned to sessions) I've sent this patch to Andi Kleen to have feedback from the core network developers, as it changes some parts in core TCP. TODO: * need to delete entries of the translation table (when a session ends) this again needs some support from the core) Comments welcome, -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
TPROXY-04
Hi, I have released my latest transparent proxy patches, now with most functions in place. It's available at http://www.balabit.hu/en/downloads/tproxy (the link at the bottom) I've also uploaded some sample programs, which perform the following: * listen on a foreign address * connect using a foreign address as a source * query original destination of redirected connections Todo: * clean it up (it needs some cleanup!) * further tests for UDP functions * convince core developers to accept my patches against the core (or suggest alternative implementations) Comments, tests, flames welcome. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: Howto Change packet route destination from a NF_IP_LOCAL_OUT hook
On Mon, Apr 22, 2002 at 11:00:56PM -0500, Peter Caldes wrote: > Hopefully somebody here can help me. I'm not that familiar with the detailed Linux >Networking stack. > > I have a box which acts as a gateway and need a way for (multiple) user (root) level >applications to > insert IP packets into the IP stack and somehow bypass normal routing which is based >on the > destination IP addr of the packet. > > I want the application(s) to specify the next hop the packet takes without modifying >the IP packet > itself, so that the packet can be directed/forwarded to a particular router based on >the application > parameters. The real reason is that IP addresses in the same subnet might reside >behind different > routers. (ie. 1.2.3.1 is behind RouterA, 1.2.3.2 is behind RouterB). The application >knows which > router to use. > > I've been able to do this under AIX V4.2.1(with some kernel extensions) using RAW IP >sockets and > then specifying a source route option on the socket with setsockopt(). The kernel >mod checks the > socket options and if it sees a source-route option, it computes a route to the >first ip address in > the source-route list instead of the ip destaddr. > > Now I need to do something similar with Linux. > > It seems I can register a NF_IP_LOCAL_OUT hook, but I don't know how to mangle >skb->dst. > I also assume that when the NF_IP_LOCAL_OUT hook is called, I can scan the socket >options to do > something similar. You don't need kernel modules under Linux. Simply put an fwmark on the packets using an iptables rule, and then use policy routing to route the packets based on that value. ip rule add fwmark 100 lookup table 100 ip route add default via x.x.x.x table 100 -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
another netfilter ICMP bug
Hi, I've encountered another ICMP translation problem in netfilter. This time it occurs when a process initiates a connection and it is translated on the same host. How to reproduce: Box A -- Box B 192.168.131.124 192.168.131.1 Routes back 10.0.0.0/24 using 192.168.131.124 as gateway iptables -t nat -A POSTROUTING -p tcp -s 192.168.131.124 --sport \ -j SNAT --to-source 10.0.0.1 and nc -s 192.168.131.124 -p 192.168.131.1 80 The connection works as expected if Box B accepts connections on port 80, but if I cause Box B to send an ICMP port unreachable back: (boxb was using ipchains in my case therefore the ipchains command line) boxb# ipchains -s 10.0.0.0/24 -d 0/0 80 -j REJECT The source address within the ICMP port unreachable is not rewritten as the following LOG output shows. (to trigger the LOG output I added another rule to INPUT: iptables -A INPUT -p icmp -j LOG): IN=eth0 OUT= MAC=00:50:56:bb:83:25:00:50:bf:0b:f6:2f:08:00 \ SRC=192.168.131.1 DST=192.168.131.124 LEN=88 TOS=0x00 \ PREC=0xC0 TTL=255 ID=26730 PROTO=ICMP TYPE=3 CODE=3 \ [SRC=10.0.0.1 DST=192.168.131.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=53526 DF PROTO=TCP SPT= DPT=80 WINDOW=5840 RES=0x00 SYN URGP=0 ] -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
oops when conntrack entry times out
Hi, I've run into a problem, which causes an Ooops during ip_nat_cleanup_conntrack(). I call ip_nat_setup_info() from my PREROUTING hook (right after conntrack, and before nat), everything works correctly, NAT is applied to both directions. The oops occurs exactly when the conntrack entry times out (I was looking at /proc/net/ip_conntrack). The backtrace shows that a NULL pointer is dereferenced in ip_nat_cleanup_conntrack() at this line: LIST_DELETE(&bysource[hash_by_src(&conn->tuplehash[IP_CT_DIR_ORIGINAL] .tuple.src, conn->tuplehash[IP_CT_DIR_ORIGINAL] .tuple.dst.protonum)], &info->bysource); As it seems either info->bysource->prev or info->bysource->next is NULL. Anyone with an idea why this might happen? The same code works if I call ip_nat_setup_info() from POSTROUTING. I can't see the difference between simple DNAT (which works), from my TPROXY DNAT, which works but oopses. Anyone with an idea? -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: tproxy using conntrack/nat?
On Wed, May 22, 2002 at 12:29:54PM +0200, Harald Welte wrote: > On Tue, May 21, 2002 at 03:30:24PM +0200, Balazs Scheidler wrote: > > This help from the core (through the notifier and an identifier cookie) is > > ugly. As I think about more it is very ugly. Just grep for cookie and > > notifier in my patches. > > yes, and this is most likely to cause problems with integration of your patch > into the mainstream kernel. in addition to this, it is not a robust solution. > > > As it seems conntrack/NAT provides all the necessary features, the problem > > is that sometimes it provides too much: > > * it also rewrites parts of the packet data (PORT and PASV translation > > within FTP for example) > > well, you don't need to load the ip_conntrack/nat_ftp.o modules in case you > are running a transparent FTP proxy. > > Or do you think people want to kernel-NAT (incl. helpers) some ftp-traffic, > and put some other ftp-traffic through the transparent FTP proxy? I don't > think this is a very realistic scenario, as it leads to uncertainty anyway. Think of mass transfer of data. You have two semi-trusted security zone with high bandwidth requirements and an internet zone which is completely untrusted and has lower bandwidth requirements. You use an FTP proxy to the internet, and FTP nat between the trusted zones. Of course there are other scenarios as well. I simply don't want to lose the rich set of features netfilter already provides, I want to add to them. > > > * the original source/destination address cannot be atomically found for UDP > > packets > > > > The second seems to be easy to solve, though it needs some changes in the > > core. The first one is more tough though. > > Ok, if you say so :) Your changes are welcome, as long as they aren't too > intrusive. I'm thinking about some mechanism of hooking into control message processing to be able to send aux messages using sendmsg and recvmsg. Or instead of hooks, store rewritten addresses in the skb, and add the aux messages to the core. I did not say it will be easy to convince DaveM to include those patches, only that it's easy to implement. :) > > > Ideally I don't want to lose the ability to NAT and TPROXY at the same time > > (of course different sessions) > > > To avoid application level rewriting of packet data one would not load the > > necessary NAT helper, but it is not doable in my case, as one session should > > be NATed while the other tproxied. > > yes, but within a single protocol? Such a setup would cause me headaches. > either you want to have a transparent FTP proxy [for security reasons], or > you don't. But mixing the two doesn't sound like a nice strategy, > especially since the kernel nat doesn't provide you with session logs, ... see my previous explanation. > > > As I know it's currently not possible to exclude sessions from helper > > processing. TProxy only needs to NAT the encapsulating TCP session, and > > nothing else. > > > > Currently ip_nat_setup_info() assigns the helper to a given connection: > > > > unsigned int > > ip_nat_setup_info(struct ip_conntrack *conntrack, > > const struct ip_nat_multi_range *mr, > > unsigned int hooknum) > > > > I see two possibilities: > > > > * add a new argument to ip_nat_setup_info() to avoid helpers > > seems reasonable. it's only three arguments currently, having four wouldn't > be a problem. add a 'int flags' argument and define a flag for > 'BYPASS_HELPER'. ok, isn't this api change too intrusive to other netfilter parts? ip_nat_setup_info() is referenced 11 times on an unpatched 2.4.18. A third solution would be to add new NFCT_ flag. Do you still prefer the flags argument? > This however wouldn't bypass the conntrack helper [which > could already say INVALID because a packet doesn't match the layer5+ state > of the connection, see for example the PPTP helper]. Don't forget that we have two conntrack entries if traffic is flown through a transparent proxy, and conntrack processing is done prior to NAT rewriting. Please tell me if I'm wrong, but CONNTRACK sees an unmodified PORT command assigned to a session with unmodified destination address. > > > * reset conntrack->helper to NULL once ip_nat_setup_info returns (this might > > cause races though) > > ugly. ok, it was only an idea. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: another netfilter ICMP bug
On Thu, May 23, 2002 at 07:03:52PM +0200, Harald Welte wrote: > On Thu, May 23, 2002 at 10:18:23AM +0200, Balazs Scheidler wrote: > > Hi, > > > > I've encountered another ICMP translation problem in netfilter. This time it > > occurs when a process initiates a connection and it is translated on the > > same host. > > are you sure this problem persists, even after applying the icmp nat fix? yes, I've forgotten to mention that I first applied the patch, and the problem persisted. (btw: a plain .diff file for the mentioned fix would make it much easier to apply the patch, I had to cut & paste it from the .html) -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: another netfilter ICMP bug
On Thu, May 23, 2002 at 07:03:52PM +0200, Harald Welte wrote: > On Thu, May 23, 2002 at 10:18:23AM +0200, Balazs Scheidler wrote: > > Hi, > > > > I've encountered another ICMP translation problem in netfilter. This time it > > occurs when a process initiates a connection and it is translated on the > > same host. > > are you sure this problem persists, even after applying the icmp nat fix? so, here's my third attempt at tproxy support, this time it is using conntrack/nat to do most of the work. It still missing some parts, especially: * a way to get rewritten UDP destination address (in fact I did nothing in this area, TCP works because of SO_ORIGDSTADDR) * a way to get notified when a socket is closed (if a program registers itself, the registration remains there unless explicitly removed) Otherwise everything seems to work: * connecting from foreign address * listening on foreign address * redirecting sessions (this is untested, but should work, at least with the good old REDIRECT target) I still have a separate table, though it is not absolutely necessary. The user interface might change as I was not thinking about typical usage scenarios. I touched NAT a few places, so it might collide with newnat, I was using a vanilla 2.4.18 kernel for my developments. Tips, ideas, flames welcome, PS: I don't plan to rewrite the whole a fourth time :), I'm quite satisfied with the way it currently works. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1 diff -urN --exclude-from=kernel-exclude linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ip_nat.h linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ip_nat.h --- linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ip_nat.h Thu Apr 26 00:00:28 2001 +++ linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ip_nat.h Thu May 23 11:59:38 +2002 @@ -111,10 +111,13 @@ struct ip_nat_seq seq[IP_CT_DIR_MAX]; }; +#define IP_NAT_BYPASS_HELPERS 0x0001 + /* Set up the info structure to map into this range. */ extern unsigned int ip_nat_setup_info(struct ip_conntrack *conntrack, const struct ip_nat_multi_range *mr, - unsigned int hooknum); + unsigned int hooknum, + int flags); /* Is this tuple already taken? (not by us)*/ extern int ip_nat_used_tuple(const struct ip_conntrack_tuple *tuple, diff -urN --exclude-from=kernel-exclude linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ip_nat_core.h linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ip_nat_core.h --- linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ip_nat_core.h Mon Dec 11 22:31:32 2000 +++ linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ip_nat_core.hThu May 23 +12:00:24 2002 @@ -26,6 +26,11 @@ extern void place_in_hashes(struct ip_conntrack *conntrack, struct ip_nat_info *info); +void ip_nat_update_hashes(struct ip_conntrack *conntrack, + struct ip_nat_info *info, + int initialized); + + /* Built-in protocols. */ extern struct ip_nat_protocol ip_nat_protocol_tcp; extern struct ip_nat_protocol ip_nat_protocol_udp; diff -urN --exclude-from=kernel-exclude linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ip_tproxy.h linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ip_tproxy.h --- linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ip_tproxy.h Thu Jan 1 01:00:00 1970 +++ linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ip_tproxy.h Thu May 23 +11:14:55 2002 @@ -0,0 +1,25 @@ +#ifndef _IP_TPROXY_H +#define _IP_TPROXY_H + +#include + +/* + * used in setsockopt(SOL_IP, IP_TPROXY) should not collide + * with values in + */ +#define IP_TPROXY 16 + +/* bitfields in in_tproxy.itp_flags */ +#define ITP_CONNECT 0x0001 +#define ITP_LISTEN 0x0002 +#define ITP_ONCE 0x0001 +#define ITP_REMOVE 0x0002 + +/* structure passed to setsockopt(SOL_IP, IP_TPROXY) */ +struct in_tproxy { + u_int32_t itp_flags; + u_int32_t itp_faddr; + u_int16_t itp_fport; +}; + +#endif diff -urN --exclude-from=kernel-exclude linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ipt_TPROXY.h linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ipt_TPROXY.h --- linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ipt_TPROXY.h Thu Jan 1 01:00:00 1970 +++ linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ipt_TPROXY.h Wed May 22 +02:57:23 2002 @@ -0,0 +1,15 @@ +#ifndef _IPT_TPROXY_H_target +#define _IPT_TPROXY_H_target + +struct ipt_tproxy_target_info { + u_int16_t redir_port; + /* unsigned long fwmark; */ +}; + +struct ipt_tproxy_user_info { + int changed; + u_int16_t redir_port; + unsigned long fwmark; +}; + +#endif /*_IPT_TPROXY_H_target*/ diff -urN --exclude-from=kernel-exclude linux-2.4.18-vani
addendum to ICMP translation problem
Hi, Last week I reported an ICMP translation problem, which occurs if the connection is initiated by a local process. I now further investigated the problem, it doesn't occur: * if the NAT box is a gateway, and the connection is initiated on another box. * if the connection is not initiated, but accepted As SNAT happens at NF_IP_POST_ROUTING, reply translation will be performed at NF_IP_PRE_ROUTING. The following DEBUG output shows what happens (enabled DEBUGP at the top of ip_nat_core.c): icmp reply translation, ct=c3617480, hooknum=0, ctinfo=4 icmp_reply_translation: translating error c396f260 hook 0 dir REPLY, num_manips=2 icmp_reply: manip 0 dir ORIG hook 4 icmp_reply: manip 1 dir REPLY hook 0 icmp_reply: outer DST -> 192.168.131.124 As it seems the inner manip is not called, as it is registered to hook 4 (POST_ROUTING, ORIG) As POST_ROUTING will never be called in ORIG-inal direction for this packet, the inner packet is never translated. I see two ways of fixing the issue: * fix icmp_reply_translation() to perform all of its translation at the same time (both the inner and the outer header) * register a NAT hook at LOCAL_IN, and perform translation of packets registered at (POST_ROUTING, ORIG) The first option seems to be doable, the second is a big change, though seems to be cleaner. Opinions? -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: addendum to ICMP translation problem [PATCH]
On Mon, May 27, 2002 at 12:32:32PM +0200, Balazs Scheidler wrote: > As SNAT happens at NF_IP_POST_ROUTING, reply translation will be performed > at NF_IP_PRE_ROUTING. The following DEBUG output shows what happens (enabled > DEBUGP at the top of ip_nat_core.c): > > icmp reply translation, ct=c3617480, hooknum=0, ctinfo=4 > icmp_reply_translation: translating error c396f260 hook 0 dir REPLY, num_manips=2 > icmp_reply: manip 0 dir ORIG hook 4 > icmp_reply: manip 1 dir REPLY hook 0 > icmp_reply: outer DST -> 192.168.131.124 > > As it seems the inner manip is not called, as it is registered to hook 4 > (POST_ROUTING, ORIG) > > As POST_ROUTING will never be called in ORIG-inal direction for this packet, > the inner packet is never translated. I was wrong here. The same manip is applied at different hooks (once at PRE_ROUTING and once at POST_ROUTING) > I see two ways of fixing the issue: > * fix icmp_reply_translation() to perform all of its translation at the same > time (both the inner and the outer header) > * register a NAT hook at LOCAL_IN, and perform translation of packets > registered at (POST_ROUTING, ORIG) > > The first option seems to be doable, the second is a big change, though > seems to be cleaner. I implemented option #1, and the patch is below. However I'm not 100% sure that I'm free to translate the inner packet at PREROUTING time. (it must have had some reasons that it was performed at POST_ROUTING time) Functionality wise the patch seems to work all-right. --- ip_nat_core.c.old Mon May 27 04:53:09 2002 +++ ip_nat_core.c Mon May 27 05:00:23 2002 @@ -843,7 +843,7 @@ packet, except it was never src/dst reversed, so where we would normally apply a dst manip, we apply a src, and vice versa. */ - if (info->manips[i].hooknum == opposite_hook[hooknum]) { + if (info->manips[i].hooknum == hooknum) { DEBUGP("icmp_reply: inner %s -> %u.%u.%u.%u %u\n", info->manips[i].maniptype == IP_NAT_MANIP_SRC ? "DST" : "SRC", @@ -854,9 +854,9 @@ &info->manips[i].manip, !info->manips[i].maniptype, &skb->nfcache); - /* Outer packet needs to have IP header NATed like - it's a reply. */ - } else if (info->manips[i].hooknum == hooknum) { + /* Outer packet needs to have IP header NATed like + it's a reply. */ + /* Use mapping to map outer packet: 0 give no per-proto mapping */ DEBUGP("icmp_reply: outer %s -> %u.%u.%u.%u\n", -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
my last NAT fix unseen?
Hi, I've posted a patch against the ICMP NAT problem I encountered, and there was no reply. It was under the topic "Re: addendum to ICMP translation problem [PATCH]" it was posted 27th May. Could you please review it? -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: [RFC] Re: another netfilter ICMP bug
On Thu, May 30, 2002 at 12:16:22PM +0200, Jozsef Kadlecsik wrote: > On 30 May 2002, Andras Kis-Szabo wrote: > > > > don't forget that ICMP error messages only quote the first 64 bytes of the > > > original packet. Adding up IP and TCP headers (both 20 bytes without > > > options), you only have 24 bytes of original payload. This might be somewhat > > > more in UDP though due to its shorter header. > > > > > > A full length PORT command is 28 bytes, though a common scenario fits into > > > 24 bytes. > > > > > > I see two solutions: > > > * truncate the packet, and remove the payload area of deNATed ICMP messages, > > > if the inner header is either TCP or UDP (because in this case we _KNOW_ > > > what is header and what is payload) > > > * don't use packet filtering if separating the two zones is so important > > > > > > The first one could also be implemented using an ICMPTRIM target in your > > > mangle table, which could also trim ICMP echo request/reply payloads. (which > > > can easily be used to tunnel a whole IP stack through a firewall) > > > > Ok, I didn't know the IPv4-ICMP RFC. I just sent a special packet with > > TCP payload and I got back the payload. It was only a first check. > > (In IPv6-ICMP the length-limit is ~1298 bytes, ...) > > Sidenote: ICMPTRIP could not be used to trim ICMP echo requests/replies: > > "The data received in the echo message must be returned in the echo > reply message." Ok, that's true. Those packets are to be dropped then. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: my last NAT fix unseen?
On Thu, May 30, 2002 at 03:24:01PM +0200, Harald Welte wrote: > On Thu, May 30, 2002 at 10:39:06AM +0200, Balazs Scheidler wrote: > > Hi, > > > > I've posted a patch against the ICMP NAT problem I encountered, and there > > was no reply. It was under the topic "Re: addendum to ICMP translation > > problem [PATCH]" it was posted 27th May. > > > > Could you please review it? > > oh my god. Please. I will review your patch ASAP, as will do every other > coreteam member. > > It's just like not everybody has the time to immediately look into every > detail of something as complex as the ICMP reply translation. > > SCNR. Ok sorry, I thought the question I raised in the message beside the patch was trivial to answer for somebody who wrote the code. And since I have no general overview on netfilter code, I don't know how my patch affects other parts of netfilter. I'll wait patiently. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
[PATCH] linux tproxy support
Hi, I've released my new release of the Linux transparent proxy patch. It is available at: http://www.balabit.hu/en/downloads/tproxy/ or http://www.balabit.hu/downloads/tproxy/linux-2.4/cttproxy-2.4.18-02.tar.gz It features: * test programs for listening on/connecting from foreign addresses (TCP) * a kernel patch against vanilla 2.4.18 (it includes my last ICMP translation fix) I've included the README file, which outlines its use below. TODO: * when the socket is closed, the entry assigned to the socket should be deleted. Sadly the only solution is to patch the core to notify tproxy about this event, so the assigned entry can be deleted. * receiving UDP packets on a foreign address should work, but sending from foreign address doesn't work, as it also needs heavy patching in the kernel. README: How it works? - Within the tproxy module in the kernel there's a table describing the relationship between local sockets and non-local IP address/port pairs. A local socket is referenced by its local IP/port, therefore all sockets to be used for transparent proxy purposes must be bound to a local IP prior anything can be done. To connect from, or listen on a foreign address an entry to this table must be added. To add a translation table entry, create a socket (bind it to a local interface), and call the setsockopt IP_TPROXY_ASSIGN at level SOL_IP with a structure describing the nonlocal address (struct in_tproxy). If this setsockopt succeeds, specify what you want to do with the given socket, by calling IP_TPROXY_FLAGS with the combination of the bits in in_tproxy.h: /* bitfields in IP_TPROXY_FLAGS */ #define ITP_CONNECT 0x0001 #define ITP_LISTEN 0x0002 #define ITP_ONCE 0x0001 ITP_CONNECT means you want to initiate a connection, ITP_LISTEN means you want to accept connections on the foreign address specified in IP_TPROXY_ASSIGN. ITP_ONCE means that this translation is to be performed only once, and then it should be removed from the table atomically. You usually want to specify ITP_ONCE with ITP_CONNECT, and may specify ITP_ONCE for listening socket when only one connection is to be accepted. (FTP data connection for example) -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
[RFC] TCP core changes needed for transparent proxying
Hi, Sorry to disturb you again with my transparent proxy efforts, but finally after my third complete reorganization, things seem to work fine, without _any_ core TCP changes. I have a couple of problems though, which all involve core TCP code patches, and so I would like some advice on the preferred way. 1. notification about destroyed sockets I definitely need a notification when a socket is closed. Possible solutions: a) create a notifier in inet_sock_release() function, where my tproxy module registers itself. b) call a netfilter specific function when CONFIG_NETFILTER is defined in a way similar to how setsockopts are delegated to netfilter. I like the second option a bit more, as putting notifiers here and there is IMHO ugly. Other parts in netfilter might need such a feature too, as netfilter modules might assign state to sockets (through setsockopt/getsockopt) which needs to be freed when the socket is closed. 2. receiving rewritten original address for datagram based protocols (UDP) As looking up a table when a packet is received is not atomic (the way it needs to be done when using simple NAT), I was thinking about attaching the original address information to the packet itself, which can be queried via an aux message with recvmsg(). As it is not possible to hook into aux message processing, I did this with a patch to ip_sockglue.c. The important parts of my patch is at the end of this message. I tried to be as general as possible, and made the NAT framework to save original addresses in IPCB(skb), which is returned in an IP_ORIGADDRS auxillary message when recvmsg() is called on a socket with IP_RECVORIGADDRS setsockopt enabled. Is this solution ok for you? 3. specifying outgoing source address for datagram based protocols (UDP) A similar problem applies to sending as well. To be atomic, I need to specify the outgoing source address at sendmsg() time using an aux message. The problem is again that it is difficult to hook into aux message processing, and the skb is not created until ip_build_xmit() time, therefore the skb cannot be used to store this information unless ip_build_xmit() itself is patched. Any idea to resolve this issue? And now my current patch against ip_sockglue.c diff -urN --exclude-from kernel-exclude linux-2.4.18-vanilla/net/ipv4/ip_sockglue.c linux-2.4.18-cttproxy/net/ipv4/ip_sockglue.c --- linux-2.4.18-vanilla/net/ipv4/ip_sockglue.c Wed Oct 31 00:08:12 2001 +++ linux-2.4.18-cttproxy/net/ipv4/ip_sockglue.cFri May 24 02:44:44 2002@@ +-48,6 +48,7 @@ #define IP_CMSG_TOS4 #define IP_CMSG_RECVOPTS 8 #define IP_CMSG_RETOPTS16 +#define IP_CMSG_ORIGADDRS 32 /* * SOL_IP control messages. @@ -107,6 +108,20 @@ put_cmsg(msg, SOL_IP, IP_RETOPTS, opt->optlen, opt->__data); } +#if defined(CONFIG_IP_NF_NAT_NEEDED) + +void ip_cmsg_recv_origaddrs(struct msghdr *msg, struct sk_buff *skb) +{ +struct in_origaddrs ioa; + +ioa.ioa_srcaddr.s_addr = IPCB(skb)->orig_srcaddr; +ioa.ioa_srcport = IPCB(skb)->orig_srcport; +ioa.ioa_dstaddr.s_addr = IPCB(skb)->orig_dstaddr; +ioa.ioa_dstport = IPCB(skb)->orig_dstport; +put_cmsg(msg, SOL_IP, IP_ORIGADDRS, sizeof(ioa), &ioa); +} + +#endif void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb) { @@ -135,6 +150,12 @@ if (flags & 1) ip_cmsg_recv_retopts(msg, skb); + if ((flags>>=1) == 0) + return; +#if defined(CONFIG_IP_NF_NAT_NEEDED) + if (flags & 1) + ip_cmsg_recv_origaddrs(msg, skb); +#endif } int ip_cmsg_send(struct msghdr *msg, struct ipcm_cookie *ipc) @@ -167,6 +188,19 @@ ipc->addr = info->ipi_spec_dst.s_addr; break; } +#if defined(CONFIG_IP_NF_NAT_NEEDED) + case IP_ORIGADDRS: +{ +struct in_origaddrs *ioa; + +if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct in_origaddrs))) +return -EINVAL; +ioa = (struct in_origaddrs *) CMSG_DATA(cmsg); + +/* FIXME: where to store addresses so NAT might pick it up? */ + break; + } +#endif default: return -EINVAL; } -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
[RFC] matching tproxied packets
Hi, Suppose you have a TCP session, which is transparently redirected to a local proxy. With the current state of the tproxy framework one need to add two rules to iptables: - one to the tproxy table to actually redirect a session - one to the filter table to let the NATed traffic enter the local stack (in INPUT) I'd like to make tproxies easier to administer, so I'm thinking about a simple way of matching tproxied packets, which can be ACCEPTed from the INPUT chain. Possible solutions: * use a new state (called TPROXY), which would be applied to all TPROXYed packets (might interact badly with nat/conntrack). * have the tproxy framework mark all packets with an fwmark, and let the packets in based on the value of fwmark * have a separate match (called tproxy), which matches tproxied sessions based on some value stored in the associated conntrack entry which one do you prefer? -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: [RFC] matching tproxied packets
On Tue, Jun 04, 2002 at 05:14:47PM +0200, Henrik Nordstrom wrote: > Balazs Scheidler wrote: > > > * use a new state (called TPROXY), which would be applied to all TPROXYed > > packets (might interact badly with nat/conntrack). > > It will in no doubt interact badly with connection tracking (and therefore > NAT). ok. > > > * have the tproxy framework mark all packets with an fwmark, and let the > > packets in based on the value of fwmark > > Will interact badly with fwmark based routing. of course the mark value would be controlled by the user, and not assigned automatically. > > * have a separate match (called tproxy), which matches tproxied sessions > > based on some value stored in the associated conntrack entry > > Defenitely my preference, but I might be biased as I make heavy use of > connection tracking and fwmark based routing in combination. This was my conclusion as well. So I'll go for this solution. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
[RFC] unidirectional NAT
Hi, First of all, thanks for the feedback on my tproxy patches. It generally works well for TCP based connections, what I'm up to now is proper support for UDP. The problem with datagram based protocols is that connection tracking (at least in my case involving Zorp) and address translation is done by the userspace proxy. The only features for an UDP proxy is the following: * being able to receive frames originally destined elsewhere (the REDIRECT case) * being able to receive frames from an arbitrary host, originally destined to another arbitrary host (the foreign address listen case) * being able to send frames using an arbitrary source IP/address, to an arbitrary host (the foreign connect case) I use the NAT framework to redirect packets to the local stack, but as a sideeffect NAT translates replies as well. Now I don't want reply translation :), that's why the subject unidirectional NAT, which would mean to translate packets in only one direction. (to be honest the best would be to translate a single packet only) I'm thinking about two possibilities: * yet another flag to ip_nat_setup_info() to set up a single manip only. * free the state associated to UDP packets after the translation was applied. * instead of setting up a NAT translation, call manip_pkt() directly somehow Ideas? -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: [RFC] unidirectional NAT
On Wed, Jun 05, 2002 at 11:48:49AM +0200, Jozsef Kadlecsik wrote: > On Wed, 5 Jun 2002, Balazs Scheidler wrote: > > * yet another flag to ip_nat_setup_info() to set up a single manip only. > > * free the state associated to UDP packets after the translation was applied. > > * instead of setting up a NAT translation, call manip_pkt() directly somehow > > I'd combine the third with the new table I wrote about some months ago > (working name for the table is 'raw' instead of 'notrack' or 'select'). > The proposed new target for the table is 'NOTRACK' so that the selected > packet would be skipped by conntrack and NAT as well. If I understand your > problem correctly, a target 'NONAT' could then be easily added and you > could call manip_pkt as you wish. Let me think a bit about it. For UDP packets I don't really need conntracking sessions, I only need to translate single packets, but I'd like to avoid messing with IP and UDP header translation myself. So NOTRACK is good for me, I don't need NONAT since I don't need conntrack either. The question is how you mark an skb to avoid tracking? (an idea was to use a flag in nfct, is it still true?) Is you patch available somewhere? -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: [RFC] matching tproxied packets
On Wed, Jun 05, 2002 at 08:53:25AM +0200, Jozsef Kadlecsik wrote: > On Tue, 4 Jun 2002, Balazs Scheidler wrote: > > Possible solutions: > > > > * use a new state (called TPROXY), which would be applied to all TPROXYed > > packets (might interact badly with nat/conntrack). > > * have the tproxy framework mark all packets with an fwmark, and let the > > packets in based on the value of fwmark > > * have a separate match (called tproxy), which matches tproxied sessions > > based on some value stored in the associated conntrack entry > > > > which one do you prefer? > > The latter seems to me the best solution. ok, should I simply add fields somewhere in struct ip_conntrack, or there's a bitfield I can add a flag to? Looking at the struct I can't see a place general enough, so I can add a new field just to hold a single bit, or a general "flags" field, which can be used by other matches later. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: [RFC] matching tproxied packets
On Wed, Jun 05, 2002 at 01:02:44PM +0200, Jozsef Kadlecsik wrote: > On Wed, 5 Jun 2002, Balazs Scheidler wrote: > > > ok, should I simply add fields somewhere in struct ip_conntrack, or there's > > a bitfield I can add a flag to? > > There is no such bitfield you could use at the moment. > > > Looking at the struct I can't see a place general enough, so I can add a new > > field just to hold a single bit, or a general "flags" field, which can be > > used by other matches later. > > This is a good question. Probably it is better to add a (general) 'flags' > field. But I have no idea for what else we could use it :-) As I added a flags argument to ip_nat_setup_info() (currently with a single bit specifying that NAT helpers are to be bypassed), this flags could be stored in ct->nat.flags -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
[RFC} handling closed sockets from netfilter
Hi, Yet another request for comments. I promise I'll stop with these soon. :) So I've mentioned that I need a notification when a socket is closed. As there are other parts in netfilter which might assign state to sockets through setsockopt/getsockopt, I wanted to make the close callback as general as possible. I'm thinking about putting a close() function pointer into struct nf_sockopt_ops, and call it from nf_sock_release (new function), which is called from inet_release() when CONFIG_NETFILTER is defined. This way any parts of netfilter might register a close callback. (even conntrack might use it to deregister local sockets from the conntrack table) Opinions? -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: [RFC} handling closed sockets from netfilter
On Thu, Jun 06, 2002 at 10:31:21AM +0100, Andy Whitcroft wrote: > > Somewhat arbritrary, but perhaps calling the callback release would match > its meaning and the calling graph better. ok, callback renamed to release. My implementation is now in place, and I'm compiling the kernel right now. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: [RFC] unidirectional NAT
On Wed, Jun 05, 2002 at 12:11:32PM +0200, Jozsef Kadlecsik wrote: > On Wed, 5 Jun 2002, Balazs Scheidler wrote: > > > Let me think a bit about it. For UDP packets I don't really need > > conntracking sessions, I only need to translate single packets, but I'd like > > to avoid messing with IP and UDP header translation myself. > > > > So NOTRACK is good for me, I don't need NONAT since I don't need conntrack > > either. The question is how you mark an skb to avoid tracking? (an idea was > > to use a flag in nfct, is it still true?) > > No, I'll go with Rusty's solution: a dummy conntrack entry is used. > > > Is you patch available somewhere? > > Not yet, but real soon I'll post it :-). Can you send me what you have available? I'd like to close my transparency project, and so I'd be willing to contribute to the conntrack exemption project :) -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
netfilter on solaris?
Hi, It is a strange idea I know, but I'd be interested in what the opinion of the core netfilter developers is on porting the whole netfilter subsystem to Solaris? Apart from the technical issues, would there be any problems? Does the GPL allow this kind of usage? (it would be implemented as a module) Technically, the most difficult tasks are to remove the dependency on Linux like sk_buff (Solaris has a chain of mblk_t's), locking (it's more or less done using macros), routing differences and I suppose many things I don't see right now. ps: don't tell me to use ipfilter (which works on Solaris), it's awful -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: netfilter on solaris?
On Fri, Jun 14, 2002 at 12:32:55PM +0200, Jozsef Kadlecsik wrote: > On Fri, 14 Jun 2002, Balazs Scheidler wrote: > > > It is a strange idea I know, but I'd be interested in what the opinion of > > the core netfilter developers is on porting the whole netfilter subsystem to > > Solaris? > > You must have plenty of time. I envy you! :-) I don't. I simply need to run Zorp on Solaris. btw: I've found some proxy functionality in the Solaris core kernel while looking at the source. Should I turn to DaveM now pointing at this? :) (he was the one who said: I don't care about transparent proxying as long as it does _NOT_ touch the TCP core) > > > Apart from the technical issues, would there be any problems? Does the GPL > > allow this kind of usage? (it would be implemented as a module) > > If the module is GPLed, then I don't see problems here but I'm not a > layer. I assume you meant an s/layer/lawyer/ here. As I know it is allowed to write GPLd modules for Photoshop (the gimp plugin issue), so it must be allowed to use GPLd modules in a propriately kernel. > > But how do you imagine the porting so that the maintenance would not > become a nightmare? Of course I'd want to provide system independency using some headers which would make it work on both Linux/Solaris, so it could be incorporated into standard Netfilter as well. So including headers would be changed from: #include #include #include #include #include #include #include #include ... To: #include "os.h" #include etc. And maybe references to sk_buff * and skb related functions would be changed to inline functions or macros. It's a huge work I assume, but ipfilter's code is _very_ disappointing. > > > Technically, the most difficult tasks are to remove the dependency on Linux > > like sk_buff (Solaris has a chain of mblk_t's), locking (it's more or less > > done using macros), routing differences and I suppose many things I don't > > see right now. > > Challenging. But wouldn't it be more straightforward to run Linux on that > SPARC machine? And there are still plenty to do on 32/64bit issues in > netfilter... The bad thing is that it's not a single computer I want to use. I want to create a product that runs on Suns, and it's generally not a good practice to dump Solaris and use Linux instead. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: netfilter on solaris?
On Sat, Jun 15, 2002 at 02:55:30PM +0200, Harald Welte wrote: > On Fri, Jun 14, 2002 at 12:47:07PM +0200, Balazs Scheidler wrote: > As long as I am one of the maintainers of netfilter/iptables, I am not > going to do any extra hassle in order to support different operating systems. > This includes using weird different types instead of sk_buff. Linux kernel > hackers expect to see linux code, not something abstract using only typedefs > and macros all over the place. ok, it was only an idea. I thought netfilter would only benefit from the effort. the technical problems would be overwhelming anyway. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: netfilter on solaris?
On Sat, Jun 15, 2002 at 02:52:12PM +0200, Harald Welte wrote: > On Fri, Jun 14, 2002 at 12:05:40PM +0200, Balazs Scheidler wrote: > > Hi, > > > > It is a strange idea I know, but I'd be interested in what the opinion of > > the core netfilter developers is on porting the whole netfilter subsystem to > > Solaris? > > After my netfilter presentation at linuxtag, somebody was asking me exactly > this question. And your answer was? > > Apart from the technical issues, would there be any problems? Does the GPL > > allow this kind of usage? (it would be implemented as a module) > > This is basically the same question like binary-only kernel modules. I think it is more similar to Gimp plugins under Photoshop case. > > Having netfilter within a different kernel, is technically the same: > GPL'd netfilter/iptables code is called from a binary-only kernel. Under binary-only you mean propriatery, or something nonGPLd? Solaris kernel source _is_ available, though it doesn't use the GPL. > Thus, my conclusion would be: Without the explicit permission of the authors, > it is legally not possible to use netfilter/iptables within a proprietary > OS kernel. But of coursel, I am not a lawyer. > > I for myself are somewhat undecided, but I tend to share Rusty's view: > I haven't ever given permission for using my code by binary only kernel > modules. All new code I'm working on wil export GPLONLY to make sure > about that. I think while disallowing binary only modules restrict the vendors who release propriatery software relying on free software, disallowing netfilter on propriatery platforms restrict users who would like to use free software on their platform. Even the GPL makes difference between propriatery system and non-system libraries. While it is a possibility to dump the native OS, and replace it with Linux, most non-x86 platforms work with their native OS best (be it HP-UX, Solaris or Tru64). This argument is worthless anyway. The task is tedious, and without support of the original developers it would die immediately. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: performance issues (nat / conntrack)
On Tue, Jun 25, 2002 at 04:17:54PM +0200, Jozsef Kadlecsik wrote: > On Tue, 25 Jun 2002, Jean-Michel Hemstedt wrote: > > > The book-keeping overhead is at least doubled compared to the > > > conntrack-only case - this explains pretty well the results you got. > > > > what do you mean by 'book-keeping' ? > > Does NAT do a lookup even if there are no rules? > > I have to write again: even if there are no any rules, NULL > mapping happens and new connections must be put into both nat hashes. This should not explain the performance degradation others found. If no rules are found in the table, the conntrack entry is added to the NAT hashes. (place_in_hashes() function), this involves adding the entry to two linked lists (changes two pointers per list), and then calling do_bindings() which does nothing (num_manips == 0) except for calling helpers, which should be none, if helper modules are not loaded. Adding entries to the NAT hashes doesn't involve memory allocation (NAT info is stored in ip_conntrack), therefore I don't see the reason for the 50% performance decrease. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: performance issues (nat / conntrack)
On Tue, Jun 25, 2002 at 04:53:33PM +0200, Jozsef Kadlecsik wrote: > On Tue, 25 Jun 2002, Jean-Michel Hemstedt wrote: > > PS: could anybody redo similar tests so that we can compare the results > > and stop killing the messenger, please? ;o) > > Sorry if I look harsh, it's not my intention at all. We were simply over > almost exaclty the same arguments several times. And those resulted > neither pinpointing real flaws in the system, nor better algorithms. no only head pointers for hashes are preallocated. conntrack structures themselves are allocated by the slab allocator: kmem_cache_alloc() called in init_conntrack() which initializes a single conntrack entry. So the initial memory allocations for conntrack and nat are conntrack: htable_size * 8 (8 is sizeof(list_head)) nat: 2 * htable_size * 8 -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: performance issues (nat / conntrack)
On Tue, Jun 25, 2002 at 09:06:47PM +0200, Harald Welte wrote: > On Tue, Jun 25, 2002 at 05:13:02PM +0200, Balazs Scheidler wrote: > > On Tue, Jun 25, 2002 at 04:17:54PM +0200, Jozsef Kadlecsik wrote: > > > On Tue, 25 Jun 2002, Jean-Michel Hemstedt wrote: > > > > what do you mean by 'book-keeping' ? > > > > Does NAT do a lookup even if there are no rules? > > > > > > I have to write again: even if there are no any rules, NULL > > > mapping happens and new connections must be put into both nat hashes. > > > > This should not explain the performance degradation others found. If no > > rules are found in the table, the conntrack entry is added to the NAT > > hashes. (place_in_hashes() function), this involves adding the entry to two > > linked lists (changes two pointers per list), and then calling do_bindings() > > which does nothing (num_manips == 0) except for calling helpers, which > > should be none, if helper modules are not loaded. > > > > Adding entries to the NAT hashes doesn't involve memory allocation (NAT info > > is stored in ip_conntrack), therefore I don't see the reason for the 50% > > performance decrease. > > think about the lock contention on SMP system. The 'null binding' > approach for nat (and for example, that nat helpers are called for > connections with 'null binding') is a poor design. > > I've recently did some testing which try to avoid the null binding, but > as I'm not entirely sure they don't break something else I haven't been > releasing them yet. The original test machine used to gather performance information was not SMP: " here is my test bed: tested target: -kernel 2.4.18 + non_local_bind + small conntrack timeouts... -PIII~500MHz, RAM=256MB -2*100Mb/s NIC " -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
NAT, not doing route_me_harder?
Hi, I was wondering what the reason is for NAT not rerouting modified packets? If anything important is modified by a mangle rule that affects routing, the routing decision is automatically redone as this code fragment shows: ret = ipt_do_table(pskb, hook, in, out, &packet_mangler, NULL); /* Reroute for ANY change. */ if (ret != NF_DROP && ret != NF_STOLEN && ret != NF_QUEUE && ((*pskb)->nh.iph->saddr != saddr || (*pskb)->nh.iph->daddr != daddr || (*pskb)->nfmark != nfmark || (*pskb)->nh.iph->tos != tos)) return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP; NAT doesn't do anything like this. So given an SNAT rule changes the source address in POSTROUTING, the routing tables are not looked up again, so source address dependant policy routing rules are not applied. It might not be the best to change this by default, but it could be implemented by a match, e.g. iptables -t nat -A POSTROUTING -p tcp -d 0/0 --dport 25 -m reroute -j SNAT --to-source 1.2.3.4 -m reroute would flag the packet as one which needs rerouting (using for example a flag in nfcache). Packets flagged as such would be rerouted after do_bindings() is called. Opinions? -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: NAT, not doing route_me_harder?
> This is done only in the OUTPUT chain, and only because the TCP kernel has > already routed locally originating packets before they first hit netfilter. > > > NAT doesn't do anything like this. So given an SNAT rule changes the source > > address in POSTROUTING, the routing tables are not looked up again, so > > source address dependant policy routing rules are not applied. > > It sure does, in the same spot as mangle, which only is when there is a > destnination nat transformations applied to a locally originated packet. > > in ip_nat_local_fn(): > ret = ip_nat_fn(hooknum, pskb, in, out, okfn); > if (ret != NF_DROP && ret != NF_STOLEN > && ((*pskb)->nh.iph->saddr != saddr > || (*pskb)->nh.iph->daddr != daddr)) > return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP; > > For the other cases (including mangle), all the transformations that are > assumed to affect routing is done in PREROUTING. SNAT is not among them. > > There is a number of ways to route SNAT:ed packets differently if needed. The > method I use is usually to use the nfmark of mangle PREROUTING or OUTPUT in > combination with SNAT in POSTROUTING. Mangle marks the packet telling that > this should be NAT:ed according to policy X, this nfmark is then used in > routing to route the packet in the correct direction and by nat POSTROUTING > to apply the correct NAT rule. But what happens when you initiate a connection on the host running netfilter, thus you have no PREROUTING chain? Scenario: I have a default route to gateway A on interface a, but I want my SMTP traffic to leave the box on a different interface b with a different gateway B. Of course this means a different source address is to be assigned to the outgoing connection. Assume it is not possible to set the bind address of the MTA (or setting it affects mail delivery in other directions). I have source based routing, that makes packets go to the correct direction based on their source address. If I'm doing SNAT in POSTROUTING, the routing decision is not redone, thus it leaves with the specified source address, but on the wrong interface. I think I now understand, have my packets marked in local OUTPUT, route based on that mark, and SNAT based on the marks. Is this the way you suggested? Hmm.. this sounds reasonable on the programmer's perspective, but is difficult to maintain from the user's: it needs two rules. iptables -t mangle -A OUTPUT -p tcp ! -s -d 0/0 --dport 25 -j MARK --set-mark 100 iptables -t nat -A POSTROUTING -m mark --mark 100 -j SNAT --to-source instead of: iptables -t nat -A POSTROUTING -p tcp ! -s -d 0/0 --dport 25 -m reroute -j SNAT --to-source Hmm... as I think some more, the 2nd case might not even be possible, as the nat rule is triggered only once during a session, and it would mean that syn would be routed correctly, but packets following it would not. Solution: instead of nfcache the flag could be stored in ip_conntrack. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: NAT, not doing route_me_harder?
On Wed, Jun 26, 2002 at 12:04:23PM +0200, Henrik Nordstrom wrote: > Balazs Scheidler wrote: > > I think I now understand, have my packets marked in local OUTPUT, route > > based on that mark, and SNAT based on the marks. Is this the way you > > suggested? Hmm.. this sounds reasonable on the programmer's perspective, > > but is difficult to maintain from the user's: it needs two rules. > > Yes, it requires three custom rules rather than two (there is also the routing > policy rule) > > Having NAT reroute all packets due to source nat transformations would be a > significant performance impact only to support the corner cases where it is > handy.. Why? The rerouting would be triggered only if the user requests it, so normal path would not be affected. And as routing decisions are heavily cached, it is said (I think it was Harald who said that) that routing decisions are not expensive. It would add a simple bit-test in normal path, and a second routing decision if explicitly requested: something like this in ip_nat_fn(), after do_bindings is called: saddr = (*pskb)->nh.iph->saddr; daddr = (*pskb)->nh.iph->daddr; ret = do_bindings(ct, ctinfo, info, hooknum, pskb); if (ret != NF_DROP && ret != NF_STOLEN && (ct->flags & IP_NAT_REROUTE)) { if (((*pskb)->nh.iph->saddr != saddr || (*pskb)->nh.iph->daddr != daddr)) ret = (ip_route_me_harder(pskb) == 0) ? ret : NF_DROP; } return ret; This could also be extended with the local output case, so ip_nat_fn() and ip_nat_local_fn() could be merged. (the if condition would become: if (ret != NF_DROP && ret != NF_STOLEN && (hooknum == NF_IP_LOCAL_OUT || ct->flags & IP_NAT_REROUTE)) { ... } -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Re: [PATCH}: Make MARK target terminate (resend)
On Sat, Jun 29, 2002 at 12:36:36PM +0200, Henrik Nordstrom wrote: > On Saturday 29 June 2002 11.46, Patrick McHardy wrote: > So the question to the Netfilter core team is if it would be OK to add > a new option and "module class" to the userspace tools, and have the > existing IPT_CONTINUE targets dual-register as both a target and a > match. I can try to whip something together if this is seen as > something acceptable. Should be fully backwards/forward compatible > with existing rulesets with only a minimal amount of code > duplication. The only compability issue is that if you make use the > new feature then you cannot go back to a older userspace or kernel.. I for one would second a feature like this. I see a good number of places where it could be used (the long standing missing -l option is one example) -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1