TPROXY and original dest address question

2002-03-26 Thread Balazs Scheidler

Hi,

I found some time to get back to my transparent proxy support for Netfilter.
I posted a patch about 2 months ago which implemented a TPROXY target in its
own tproxy table, which was able to redirect TCP sessions to a local socket
but was missing a way to query this address.

At the developer's workshop I agreed with Rusty that the destination address
should be stored associated with the socket as soon as the connection is
established. So here's how it would work:

- TPROXY target redirects a session

- the original destination address/port number is stored in the IPCB() part
  of the skb

- as soon as the socket is created this address/port number is copied into
  sk->tp_pinfo.af_tcp (struct tcp_opt) This would happen in tcp_v4_hnd_req()

- this information is queried by the application using a getsockopt call to
  fetch the original destination address, the getsockopt can be implemented
  by registering an nf_sockopt_ops

I'd like to have the core-members advice, is this a good way? Harald?

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: TPROXY

2002-03-26 Thread Balazs Scheidler

Hi,

On Thu, Mar 14, 2002 at 06:05:43PM +0100, Jean-Michel Hemstedt wrote:
> - is there any update regarding TPROXY since 13/Feb/2002?

not yet, sorry

> - is TPROXY intended to replace 'slessdir' and 'IP_INTERCEPT'?

yes.

> - will it be included in the kernel someday (which version?)?

as soon as it's ready, but it all depends on the coreteam.

> - does it provide the definitive patch for nonlocal binding?

not yet, but it will support this.

> - are there examples on how to use it? (apart from the comments in the diff)?

again, not yet, the release I made was completely a work-in-progress, it
showed some signs of working but it was not complete.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: TPROXY

2002-03-26 Thread Balazs Scheidler

On Wed, Mar 20, 2002 at 12:12:24AM +0100, Henrik Nordstrom wrote:
> > 2) the 50080/50443 applications rely on TPROXY framework and uses
> > nonlocal_bind.
> 
> Except that nonlocal_bind do not yet work in TPROXY, does it?

not yet.

> > > Zorp supports HTTPS, but it doesn't encapsulate it into CONNECT.
> > > It simply decrypts ongoing traffic, checks HTTP within it, and
> > > sends it on reencrypted. But for this to work you'd need to run
> > > Zorp on your firewall (where it was meant to run)
> 
> At the cost of totally invalidating SSL in terms of proxying.
> 
>   - Client can no longer verify the authenticity of the origin server 
> further than the proxy.
>   - Servers can no longer authenticate or verify the client.
> 
> Typical man-in-the-middle scenario.
> 
> I assume we are talking about what is nominated by the IEFT WREC 
> group as "surrogate" servers rather than proxies here.. If not then 
> decrypting proxied SSL traffic is a serious breach of security.

Tunnelling SSL through firewalls _is_ a more serious breach of security.
It is a full-speed covert channel. IRC and ICQ clients began to use such
holes in the firewall to send IRC/ICQ traffic.

Of course a proxy sitting between the client and the server means that peer
certificates cannot be verified on the other peer. On the server side the
firewall can perform  this verification (and show a trusted certificate to
the client)

Providing a client certificate to the server is not very common, if it is
required a tunnel can be opened to that _specific_ server, and nothing else.

So using a real decrypting HTTPS proxy for general https traffic, and
opening holes to specific destinations is definitely more secure than a
simple 'pass-through' hole in the firewall.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: TPROXY and original dest address question

2002-03-27 Thread Balazs Scheidler

Hi,

I have the intent to develop real transparent proxy support into the kernel
2.4 series (not a backport of the original 2.2 code)

Since at a few places it affects network core I asked the question below on
netfilter-devel and they directed me to here.

Could you please comment on it?

For a reference, the implementation tries to touch the networking code the
least possible, so it rewrites destination addresses prior they enter the
networking core. Its a simple, stateless DNAT.

On Wed, Mar 27, 2002 at 08:59:01AM +0100, Harald Welte wrote:
> On Tue, Mar 26, 2002 at 04:21:04PM +0100, Balazs Scheidler wrote:
> > Hi,
> > 
> > I found some time to get back to my transparent proxy support for Netfilter.
> 
> cool.  We'd really like to see this getting forward.
>  
> > - TPROXY target redirects a session
> > 
> > - the original destination address/port number is stored in the IPCB() part
> >   of the skb
> > 
> > - as soon as the socket is created this address/port number is copied into
> >   sk->tp_pinfo.af_tcp (struct tcp_opt) This would happen in tcp_v4_hnd_req()
> > 
> > - this information is queried by the application using a getsockopt call to
> >   fetch the original destination address, the getsockopt can be implemented
> >   by registering an nf_sockopt_ops
> > 
> > I'd like to have the core-members advice, is this a good way? Harald?
> 
> This looks fine to me, but I'm not as much into the sockets code as others
> are.
> 
> If you want to make it really correct, I'd send that Mail to
> the [EMAIL PROTECTED] Mailinglist.
> 
> David Miller, Andi Kleen and Alexey Kuznetsov (the networking gods) are hanging
> out on that list, so you might get some comments related the 'abuse' of
> tp_pinfo.af_tcp and IPCB() from them.
> 
> Based on their reaction you will see if there is a need to change something
> or if they would like something like this in the kernel.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: TPROXY

2002-03-27 Thread Balazs Scheidler

On Wed, Mar 27, 2002 at 10:15:56AM +0100, Henrik Nordstrom wrote:
> On Tuesdayen den 26 March 2002 16.33, Balazs Scheidler wrote:
> 
> > Providing a client certificate to the server is not very common, if it is
> > required a tunnel can be opened to that _specific_ server, and nothing
> > else.
> >
> > So using a real decrypting HTTPS proxy for general https traffic, and
> > opening holes to specific destinations is definitely more secure than a
> > simple 'pass-through' hole in the firewall.
> 
> You missed the point here. Using a decryption HTTPS proxy invalidates both 
> the use of client certificates AND the use of server certificates, which 
> makes the use of SSL somewhat pointless. Further, unless the proxy runs it's 
> own CA trusted by the browsers then the users will always be warned that the 
> server certificate is invalid when using such proxy.

I think you missed the point here. Of course the firewall verifies the
server's certificate using its own trusted list of CAs.

The user is not capable of deciding whether a certificate presented to him
really belongs to the given server. They simply press 'continue' without
thinking that the server they are communicating with is fake.

Of course if you AND your users know what the hell a certificate is, they
can decide but I think you are a minority.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: TPROXY and original dest address question

2002-03-27 Thread Balazs Scheidler

On Wed, Mar 27, 2002 at 02:56:56PM +0100, Henrik Nordstrom wrote:
> Please don't forget UDP.

I won't. I definitely want UDP as well.

> 
> For UDP you need to save the original destination, and then implement a 
> control message extension for sending this to userspace together with the 
> packet in response to a recvmsg() call, or hack the kernel to return the 
> original destination in IP_PKTINFO (would be the most natural I think).
> 
> See earlier post from me for a lengthy discussion on how one can do this in 
> the current NAT scheme. If you same the address in the actual skb then this 
> becomes even easier. In case of IP_PKTINFO only only two lines..
> 
> Reviewing the existing sockopt options the following seems like the correct 
> calls:
> 
>  * For TCP, return the original destination in getsockopt(SOL_IP, 
> IP_PKTOPTIONS...)
> 
>  * For UCP, return the original destination in the IP_PKTINFO recvmsg control 
> message, and if possible, use the same to allow the application to control 
> the source address when sending packets using sendmsg().

ok. I originally wanted to have separate getsockopt calls, but it's better
to use already established ones. The only possible problem that I need to
tocuh the networking core which I want to avoid touching.

> What I do not quite get is how TPROXY is supposed to handle return traffic, 
> fragmented packets or ICMP, if you are doing stateless NAT.

It doesn't handle currently any of them. Fragmentation can be solved by
defragmenting incoming packets. (they are destined to the local ip stack
anyway)

ICMP can be handled in the prerouting hook looking up possible transparent
proxy entries.

> Also, who is responsible for making sure the application protocol is NAT:ed 
> properly in TPROXY. For example FTP PASV. Is it the kernel, or is it the 
> userspace proxy responsibility to get the correct (foreign) IP address in 
> such case? And what about "related" connections such as an FTP data channel?

Of course the proxy itself. How it currently works in Zorp (with kernel
2.2):

* the FTP command channel is redirected to the proxy
* when a PASV command is sent, a non-local bind is performed to bind to the
  server's IP & random port
* the PASV reply is rewritten to contain information about the allocated
  port 
* the data channel is established when the client connects to the socket the
  firewall allocated
* the connection to the server is then established by the proxy

> 
> Sorry if I am making things overly complex here..
> 
> In my view (as an application developer, not netfilter hacker) the problems 
> with the standard netfilter approach are:
> 
>  1. Cannot easily support non-local bind, to allow the userspace proxy 
> application to masquerade as the client
> 
>  2. Cannot get the original destination of a redirected UDP packet in an easy 
> manner (might be possible by parsing /proc/net/ip_conntrack and quess which 
> is the correct "connection"...)
> 
>  3. conntrack adds yet another state table, with a bunch of new DOS 
> conditions one must worry about..

conntrack will not be involved in TPROXY, though I want them to
interoperate.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: TPROXY

2002-03-27 Thread Balazs Scheidler

On Wed, Mar 27, 2002 at 04:17:53PM +0100, Jean-Michel Hemstedt wrote:
> > > On Tuesdayen den 26 March 2002 16.33, Balazs Scheidler wrote:
> > The user is not capable of deciding whether a certificate presented to him
> > really belongs to the given server. They simply press 'continue' without
> > thinking that the server they are communicating with is fake.
> >
> > Of course if you AND your users know what the hell a certificate is, they
> > can decide but I think you are a minority.
> >
> 
> We are far from TPROXY, but here is my point of view:
> 
> - HTTPS decrypting proxy is an (mitma) alternative if you want
>   to block all "CONNECT" operations in your proxy. But it sounds
>   like an absuse protection against inside users. And unfortunately,
>   for the user itself, as mentionned above, it will block services
>   such as home banking as well.

* If you allow HTTPS transparently, CONNECT is not invoked.
* If you use a non-transparent HTTP proxy, the client requests a CONNECT from
  the proxy which in turn connects to the web server opening a hole in your
  firewall.

You have three options:
1) enable SSL traffic without being able to verify its contents (Nimda
   through SSL anyone?)
2) disable SSL completely
3) use a decrypting SSL proxy with content verification

> 
> - If your proxy allows "CONNECT" requests, then virtually anything
>   can pass through it, and HTTPS decrypting proxy does not make sense.

why? I attach a decrypting HTTPS proxy when a CONNECT request is
encountered, as follows:

* Nontransparent HTTP proxy receives a CONNECT www.homebank.hu:443 HTTP/1.0 request
* Http proxy stacks in an SSL proxy which receives the datastream after CONNECT
* The SSL proxy decrypts traffic and stacks in a HTTP proxy again:

[nontransparent HTTP proxy]
|
[decrypting SSL proxy invoked after CONNECT]
|
[stacked transparent HTTP proxy]

The above scenario is completely doable with Zorp.

> Then, if you are really concerned by insider attacks, what about a
> session/tunnel timer which could be a possible (ugly) protection
> against wormhole kinds of attacks, without invalidating ssl?

IMHO it's not about insider attacks, its about incompetent clients who start
trojan horses, get viruses and accept certificates without even knowing what
it means.

Decrypting on the firewall is not invalidating SSL. SSL is
authentication+integrity protection+crypted traffic. Authentication is
performed by the firewall, integrity protection is performed and the whole
traffic is crypted. Authentication is moved from the client computer to the
firewall, which checks it more strictly than most clients do.

And the firewall accepts a certificate based on its policy. No user should
override this.

Of course when moving the certificate authentication is not an option
(because client certificates are used, which are stored on a hardware
token), you can still use a 'hole', but this can be limited to a few
addresses only.

btw: I think this discussion is off-topic on netfilter-devel, so we might
continue our discussion in private.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: TPROXY and original dest address question

2002-03-28 Thread Balazs Scheidler

> > It doesn't handle currently any of them. Fragmentation can be solved by
> > defragmenting incoming packets. (they are destined to the local ip stack
> > anyway)
> 
> Defragmentation is defenitely needed for this thing to be used in production. 
> For experimentation conntrack can be used to defragment..

In my previous attempts to forward port the transparent proxy features of
2.2, I simply used ip_defrag(skb), which returned non-NULL when a full
fragment was reassembled.

> > ICMP can be handled in the prerouting hook looking up possible transparent
> > proxy entries.
> 
> Where is the "possible transparent proxy entries" defined? Internally in 
> TPROXY, or in the host IP stack socket table?

in TPROXY.

> I guess this would be the rule table telling what should be diverted by 
> TPROXY, which from my understanding would be your iptables ruleset...

No. I have 

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




[PATCH] transparent proxying, #2 release

2002-03-28 Thread Balazs Scheidler

Hi,

I have prepared a work-in-progress patch showing which directions I'm
heading with my transparent proxy patches. Here is a summary of changes:

1) It is now possible to query where a connection was destined. It is using
   the method Henrik Nordstrom suggested:

   I defined the IP_ORIGADDRS control message (can be enabled using a
   setsockopt call, and queried using IP_PKTOPTIONS)

2) I also added support for fragmented packets. I didn't test it though,
   comments on this are welcome. I'm doing this in my PREROUTING hook:

+   if (ip->frag_off & htons(IP_MF|IP_OFFSET)) {
+   *pskb = ip_defrag(*pskb);
+   if (*pskb == NULL)
+   return NF_STOLEN;
+   }   


3) I wrote a small program which shows how to use the currently implemented
   features. It can be started from inetd (because that was the easiest way)

Comments, as always, are welcome.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1


diff -urN --exclude-from kernel-exclude linux-2.4.17-vanilla/include/linux/in.h 
linux-2.4.17-TPROXY-ng/include/linux/in.h
--- linux-2.4.17-vanilla/include/linux/in.h Mon Nov  5 21:42:13 2001
+++ linux-2.4.17-TPROXY-ng/include/linux/in.h   Wed Mar 27 08:54:22 2002
@@ -67,6 +67,7 @@
 #defineIP_RECVTOS  13
 #define IP_MTU 14
 #define IP_FREEBIND15
+#define IP_ORIGADDRS   16
 
 /* BSD compatibility */
 #define IP_RECVRETOPTS IP_RETOPTS
@@ -107,6 +108,14 @@
struct in_addr  ipi_spec_dst;
struct in_addr  ipi_addr;
 };
+
+struct in_origaddrs {
+   struct in_addr ioa_srcaddr;
+   struct in_addr ioa_dstaddr;
+   unsigned short int ioa_srcport;
+   unsigned short int ioa_dstport;
+};
+
 
 /* Structure describing an Internet (IP) socket address. */
 #define __SOCK_SIZE__  16  /* sizeof(struct sockaddr)  */
diff -urN --exclude-from kernel-exclude 
linux-2.4.17-vanilla/include/linux/netfilter_ipv4/ipt_TPROXY.h 
linux-2.4.17-TPROXY-ng/include/linux/netfilter_ipv4/ipt_TPROXY.h
--- linux-2.4.17-vanilla/include/linux/netfilter_ipv4/ipt_TPROXY.h  Thu Jan  1 
01:00:00 1970
+++ linux-2.4.17-TPROXY-ng/include/linux/netfilter_ipv4/ipt_TPROXY.hWed Feb 13 
+09:29:34 2002
@@ -0,0 +1,15 @@
+#ifndef _IPT_TPROXY_H_target
+#define _IPT_TPROXY_H_target
+
+struct ipt_tproxy_target_info {
+   u_int16_t redir_port;
+   /* unsigned long fwmark; */
+};
+
+struct ipt_tproxy_user_info {
+   int changed;
+   u_int16_t redir_port;
+   unsigned long fwmark;
+};
+
+#endif /*_IPT_TPROXY_H_target*/
diff -urN --exclude-from kernel-exclude linux-2.4.17-vanilla/include/net/ip.h 
linux-2.4.17-TPROXY-ng/include/net/ip.h
--- linux-2.4.17-vanilla/include/net/ip.h   Mon Nov  5 21:43:09 2001
+++ linux-2.4.17-TPROXY-ng/include/net/ip.h Wed Mar 27 08:55:07 2002
@@ -46,6 +46,12 @@
 #define IPSKB_MASQUERADED  1
 #define IPSKB_TRANSLATED   2
 #define IPSKB_FORWARDED4
+
+#if defined(CONFIG_IP_NF_TPROXY) || defined(CONFIG_IP_NF_TPROXY_MODULE)
+   u32 origdstaddr;
+   u16 origdstport;
+#endif
+
 };
 
 struct ipcm_cookie
diff -urN --exclude-from kernel-exclude linux-2.4.17-vanilla/include/net/sock.h 
linux-2.4.17-TPROXY-ng/include/net/sock.h
--- linux-2.4.17-vanilla/include/net/sock.h Thu Mar 28 02:18:47 2002
+++ linux-2.4.17-TPROXY-ng/include/net/sock.h   Thu Mar 28 05:19:41 2002
@@ -418,6 +418,11 @@
int linger2;
 
unsigned long last_synq_overflow; 
+
+#if defined(CONFIG_IP_NF_TPROXY) || defined(CONFIG_IP_NF_TPROXY_MODULE)
+   u32 origdstaddr;
+   u16 origdstport;
+#endif
 };
 

diff -urN --exclude-from kernel-exclude linux-2.4.17-vanilla/net/ipv4/ip_sockglue.c 
linux-2.4.17-TPROXY-ng/net/ipv4/ip_sockglue.c
--- linux-2.4.17-vanilla/net/ipv4/ip_sockglue.c Wed Oct 31 00:08:12 2001
+++ linux-2.4.17-TPROXY-ng/net/ipv4/ip_sockglue.c   Thu Mar 28 03:14:58 2002
@@ -48,6 +48,7 @@
 #define IP_CMSG_TOS4
 #define IP_CMSG_RECVOPTS   8
 #define IP_CMSG_RETOPTS16
+#define IP_CMSG_ORIGADDRS  32
 
 /*
  * SOL_IP control messages.
@@ -107,6 +108,20 @@
put_cmsg(msg, SOL_IP, IP_RETOPTS, opt->optlen, opt->__data);
 }
 
+#if defined(CONFIG_IP_NF_TPROXY) || defined(CONFIG_IP_NF_TPROXY_MODULE)
+
+void ip_cmsg_recv_origaddrs(struct msghdr *msg, struct sk_buff *skb)
+{
+   struct in_origaddrs ioa;
+   
+   ioa.ioa_srcaddr.s_addr = 0;
+   ioa.ioa_srcport = 0;
+   ioa.ioa_dstaddr.s_addr = IPCB(skb)->origdstaddr;
+   ioa.ioa_dstport = IPCB(skb)->origdstport;
+   put_cmsg(msg, SOL_IP, IP_ORIGADDRS, sizeof(ioa), &ioa);
+}
+
+#endif
 
 void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb)
 {
@@ -135,6 +150,13 @@
 
if (flags & 1)
ip_cmsg_recv_retopts(msg, skb);
+
+#if defined(CONFIG_IP_NF_TPROXY) || defined(CONFIG_IP_NF_TPROXY_MODULE)
+   if ((flags>>=1) == 0)
+   return;
+   if (flags 

Re: TPROXY and original dest address question

2002-03-28 Thread Balazs Scheidler

On Thu, Mar 28, 2002 at 04:02:46PM +0100, Henrik Nordstrom wrote:
> Balazs Scheidler wrote:
> 
> > > Where is the "possible transparent proxy entries" defined? Internally in
> > > TPROXY, or in the host IP stack socket table?
> >
> > in TPROXY.
> >
> > > I guess this would be the rule table telling what should be diverted by
> > > TPROXY, which from my understanding would be your iptables ruleset...
> >
> > No. I have
> 
> You have what? Seems to be part of the message missing here..??

Yes, sorry. There's a translation table in TPROXY independent from the
tproxy iptables table. 

The rules are in the iptables table called 'tproxy', and contains one
transparent proxy rule for each service needed.

As a connection is established, a new entry is added to the translation
table with: remote addr/remote port, original dest/original port, local
dest/local port.

Then both the prerouting and the local output hooks perform translation of
the packet flow according to the translation table.

In a sence this table is similar to the conntrack tables, with the exception
that the primary focus is to assign packet endpoints with local sockets,
identified by their own IP/port pair.

Thus the connection between a redirected session and a local socket is not
the socket layer, but this translation table, therefore no packet with
foreign IP address enter the networking core.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: TPROXY and original dest address question

2002-03-28 Thread Balazs Scheidler

On Thu, Mar 28, 2002 at 04:14:13PM +0100, Henrik Nordstrom wrote:
> Is TPROXY is a stand-alone netfilter module, not a iptables target?
> 
> I thought it was a iptables target, but from your answer it seems like it 
> should be a netfilter module on it's own..

It became an iptables module on its own. The reasoning was discussed at the
developers' meeting in Enschede.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: TPROXY and original dest address question

2002-03-28 Thread Balazs Scheidler

On Thu, Mar 28, 2002 at 04:39:51PM +0100, Henrik Nordstrom wrote:
> Thanks. Explains it quite well.
> 
> So there is yet another state table involved here.
> 
> Now I am a little confused. What exacly is it that makes this new state table 
> better suited for the job than conntrack?

because we don't do full TCP tracking, and our NAT is quite limited. (only
DNAT, and only to local IP stack). And in addition entries are not timeouted
from the table.

a new entry is added to this table when 

1) a TPROXY destination is encountered
2) when a socket is 'bound' to a foreign address (either for listening and
   connecting)

an entry is removed from this table when

1) the socket associated with the entry is destroyed (iff a socket is
   associated with an entry)
2) when a TCP rst is returned by the stack (happens only when a socket is
   not yet associated)

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: TPROXY and original dest address question

2002-03-29 Thread Balazs Scheidler

On Thu, Mar 28, 2002 at 05:55:25PM +0100, Henrik Nordstrom wrote:
> > an entry is removed from this table when
> >
> > 1) the socket associated with the entry is destroyed (iff a socket is
> >associated with an entry)
> 
> Ok. So there is integration between the tproxy table and the host IP stack 
> somehow, to keep the TPROXY table in sync with the host IP stack. Nice. Kind 
> of missing in conntrack..

Until now the only binding between the socket and the entry was the address
of the local socket.

This might have to be changed due to the problem you described below.

> > 2) when a TCP rst is returned by the stack (happens only when a socket is
> >not yet associated)
> 
> Why this? And doesn't it allow for an easy DOS on TPROXY sessions?
> 
> You should not be processing RST unless you are also processing TCP widows. 
> Not all RST packets resets "the" session.
> 
> Ah, I think I understand now. You only do this when there isn't yet a socket 
> in the host IP stack. In such case it is needed.

yes, an alternative would be a hook in the kernel which is called when a
socket was not found to an incoming SYN. this is an ugly hack though.

> Sounds like it could be made to work  for TCP.
> 
> UDP is a bit different thou.. but there isn't that big need of a any 
> connection table there, except for ICMP processing.

For UDP I only want to do half-NAT, which means that it would be possible to
send a UDP frame with a custom IP address, and receive one destined to
somebody else. ICMP processing is needed when we send an UDP frame (with
foreign source address), and the destination host returns an ICMP error to
the sender.

In Linux 2.2 we do the following as a transparent proxy with UDP traffic:
* we have a sender socket, which we use to send packets with specified
  source AND destination address
* prior to sending a packet we create, bind and connect a socket to a
  destination we are sending packets to (this socket receives ICMP errors)

A similar approach is doable:
* the source address is specified in a control message of sendmsg()
* this doesn't create a translation entry in TPROXY
* a separate socket is created, and a setsockopt is issued, which places
  socket related information into the translation table
* when an ICMP error is received, the second socket is found, and ICMP error
  is rewritten accordingly
* if no specific socket is found, it is forwarded (and dropped on the
  forward chain)

> 
> Hmm.. regarding ICMP. How do you plan on handle ICMP from the host stack 
> without TCP window tracking?
> 
> Problem: There may be multiple sessions from the same client IP,PORT to the 
> same PORT on multiple servers, and after NAT there isn't sufficient 
> information to distinguish these by the addressing alone.
> 
>   10.0.1.4:52346 ->  192.168.96.32:80
>   10.0.1.4:52346 ->  192.168.84.253:80
>   10.0.1.4:52346 ->  176.16.48.52:80
> 
> The problem is much more evident if you look at UDP traffic, but exists for 
> TCP as well. For TCP you can easily see this if there is multiple clients 
> behind a NAT gateway (for example netfilter SNAT).
> 
> Hmm.. this problem probably also applies to the de-NAT:ing of traffic, but 
> there you can probably get by by querying the socket for the real source 
> address (original destination address).

Hmm... this is a real problem, but I only see this occuring in our case
when we redirect a session, and responses must be de-NATed. In this case
however skb->sk is unique (except for maybe SYN-ACK, because socket is not
created until the three-way handshake is fully completed)

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




TPROXY, picking up the source address for UDP

2002-03-29 Thread Balazs Scheidler

Hi,

Another problem I popped into while figuring out how to set the outgoing
source address for UDP frames.

In Linux 2.2 UDP sending with specified source address worked on a frame by
frame basis, and I would like to keep this behaviour. 

An aux message would be used in sendmsg() to specify the outgoing source
address of a packet. My problem is that it is quite difficult to change the
source ip of the skb based on an aux message.

In the kernel it works as follows:

1) udp_sendmsg is called
2) which in turn calls ip_cmsg_send, which sets up an ipcm_cookie struct
3) this struct is then passed to ip_build_xmit(), which sets up the skb
   based on ipcm

My problem is that to attach create a new cmsg, I'd need to modify the
cmsg_cookie struct as it is the only connection between the skb and
sendmsg(), and in addition ip_build_xmit() must also be changed as this is
the one which processes messages.

An alternative way would be add a translation entry about the required
change to the TPROXY translation table. The problem with this that adding
the entry, sending a single frame, and removing the entry doesn't seem to be
very atomic to me. (the only possibility here would be to create a flag
assigned to the translation entry, saying that this entry is applied only
once => but this might cause problems on SMP, as two processes might be
issuing sendmsg() calls at the same time)

Opinions?

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




TPROXY-ng-03

2002-04-11 Thread Balazs Scheidler

Hi,

I've released my latest version of the TPROXY patches available at

http://www.balabit.hu/en/downloads/tproxy/

The changes include:
* handle fragmented packets (also for LOCAL_OUT)
* handle parallel sessions with non-unique address tuples (using a cookie
  assigned to sessions)

I've sent this patch to Andi Kleen to have feedback from the core network
developers, as it changes some parts in core TCP.

TODO:
* need to delete entries of the translation table (when a session ends) this
  again needs some support from the core)

Comments welcome,

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




TPROXY-04

2002-04-18 Thread Balazs Scheidler

Hi,

I have released my latest transparent proxy patches, now with most functions
in place. It's available at

http://www.balabit.hu/en/downloads/tproxy (the link at the bottom)

I've also uploaded some sample programs, which perform the following:
* listen on a foreign address
* connect using a foreign address as a source
* query original destination of redirected connections

Todo:
* clean it up (it needs some cleanup!)
* further tests for UDP functions
* convince core developers to accept my patches against the core (or suggest
  alternative implementations)

Comments, tests, flames welcome.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: Howto Change packet route destination from a NF_IP_LOCAL_OUT hook

2002-04-24 Thread Balazs Scheidler

On Mon, Apr 22, 2002 at 11:00:56PM -0500, Peter Caldes wrote:
> Hopefully somebody here can help me. I'm not that familiar with the detailed Linux 
>Networking stack.
> 
> I have a box which acts as a gateway and need a way for (multiple) user (root) level 
>applications to
> insert IP packets into the IP stack and somehow bypass normal routing which is based 
>on the
> destination IP addr of the packet.
> 
> I want the application(s) to specify the next hop the packet takes without modifying 
>the IP packet
> itself, so that the packet can be directed/forwarded to a particular router based on 
>the application
> parameters. The real reason is that IP addresses in the same subnet might reside 
>behind different
> routers. (ie. 1.2.3.1 is behind RouterA, 1.2.3.2 is behind RouterB). The application 
>knows which
> router to use.
> 
> I've been able to do this under AIX V4.2.1(with some kernel extensions) using RAW IP 
>sockets and
> then specifying a source route option on the socket with setsockopt(). The kernel 
>mod checks the
> socket options and if it sees a source-route option, it computes a route to the 
>first ip address in
> the source-route list instead of the ip destaddr.
> 
> Now I need to do something similar with Linux.
> 
> It seems I can register a NF_IP_LOCAL_OUT hook, but I don't know how to mangle 
>skb->dst.
> I also assume that when the NF_IP_LOCAL_OUT hook is called, I can scan the socket 
>options to do
> something similar.

You don't need kernel modules under Linux. Simply put an fwmark on the
packets using an iptables rule, and then use policy routing to route the
packets based on that value.

ip rule add fwmark 100 lookup table 100
ip route add default via x.x.x.x table 100

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




another netfilter ICMP bug

2002-05-23 Thread Balazs Scheidler

Hi,

I've encountered another ICMP translation problem in netfilter. This time it
occurs when a process initiates a connection and it is translated on the
same host.

How to reproduce:

Box A -- Box B
192.168.131.124  192.168.131.1
 Routes back 10.0.0.0/24 using 192.168.131.124 
as gateway

iptables -t nat -A POSTROUTING -p tcp -s 192.168.131.124 --sport  \
 -j SNAT --to-source 10.0.0.1

and

nc -s 192.168.131.124 -p  192.168.131.1 80

The connection works as expected if Box B accepts connections on port 80,
but if I cause Box B to send an ICMP port unreachable back:

(boxb was using ipchains in my case therefore the ipchains command line)
boxb# ipchains -s 10.0.0.0/24 -d 0/0 80 -j REJECT

The source address within the ICMP port unreachable is not rewritten as the
following LOG output shows. (to trigger the LOG output I added another rule
to INPUT: iptables -A INPUT -p icmp -j LOG):

IN=eth0 OUT= MAC=00:50:56:bb:83:25:00:50:bf:0b:f6:2f:08:00 \
SRC=192.168.131.1 DST=192.168.131.124 LEN=88 TOS=0x00 \
PREC=0xC0 TTL=255 ID=26730 PROTO=ICMP TYPE=3 CODE=3 \
[SRC=10.0.0.1 DST=192.168.131.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=53526 DF 
PROTO=TCP SPT= DPT=80 WINDOW=5840 RES=0x00 SYN URGP=0 ]
 

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




oops when conntrack entry times out

2002-05-23 Thread Balazs Scheidler

Hi,

I've run into a problem, which causes an Ooops during ip_nat_cleanup_conntrack().

I call ip_nat_setup_info() from my PREROUTING hook (right after conntrack,
and before nat), everything works correctly, NAT is applied to both
directions. The oops occurs exactly when the conntrack entry times out (I
was looking at
/proc/net/ip_conntrack).

The backtrace shows that a NULL pointer is dereferenced in
ip_nat_cleanup_conntrack() at this line:


LIST_DELETE(&bysource[hash_by_src(&conn->tuplehash[IP_CT_DIR_ORIGINAL]
  .tuple.src, 
conn->tuplehash[IP_CT_DIR_ORIGINAL]
  .tuple.dst.protonum)],
&info->bysource);

As it seems either info->bysource->prev or info->bysource->next is NULL.

Anyone with an idea why this might happen? The same code works if I call
ip_nat_setup_info() from POSTROUTING.

I can't see the difference between simple DNAT (which works), from my TPROXY
DNAT, which works but oopses.

Anyone with an idea?

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: tproxy using conntrack/nat?

2002-05-23 Thread Balazs Scheidler

On Wed, May 22, 2002 at 12:29:54PM +0200, Harald Welte wrote:
> On Tue, May 21, 2002 at 03:30:24PM +0200, Balazs Scheidler wrote:
> > This help from the core (through the notifier and an identifier cookie) is
> > ugly. As I think about more it is very ugly. Just grep for cookie and
> > notifier in my patches.
> 
> yes, and this is most likely to cause problems with integration of your patch
> into the mainstream kernel.

in addition to this, it is not a robust solution.

> 
> > As it seems conntrack/NAT provides all the necessary features, the problem
> > is that sometimes it provides too much:
> > * it also rewrites parts of the packet data (PORT and PASV translation
> >   within FTP for example)
> 
> well, you don't need to load the ip_conntrack/nat_ftp.o modules in case you
> are running a transparent FTP proxy.  
> 
> Or do you think people want to kernel-NAT (incl. helpers) some ftp-traffic,
> and put some other ftp-traffic through the transparent FTP proxy?  I don't
> think this is a very realistic scenario, as it leads to uncertainty anyway.

Think of mass transfer of data. You have two semi-trusted security zone with
high bandwidth requirements and an internet zone which is completely
untrusted and has lower bandwidth requirements.

You use an FTP proxy to the internet, and FTP nat between the trusted zones.
Of course there are other scenarios as well. I simply don't want to lose the
rich set of features netfilter already provides, I want to add to them.

> 
> > * the original source/destination address cannot be atomically found for UDP
> >   packets
> > 
> > The second seems to be easy to solve, though it needs some changes in the
> > core. The first one is more tough though. 
> 
> Ok, if you say so :) Your changes are welcome, as long as they aren't too
> intrusive.

I'm thinking about some mechanism of hooking into control message processing
to be able to send aux messages using sendmsg and recvmsg.

Or instead of hooks, store rewritten addresses in the skb, and add the aux
messages to the core.

I did not say it will be easy to convince DaveM to include those
patches, only that it's easy to implement. :)

> 
> > Ideally I don't want to lose the ability to NAT and TPROXY at the same time
> > (of course different sessions)
> 
> > To avoid application level rewriting of packet data one would not load the
> > necessary NAT helper, but it is not doable in my case, as one session should
> > be NATed while the other tproxied.
> 
> yes, but within a single protocol?  Such a setup would cause me headaches.
> either you want to have a transparent FTP proxy [for security reasons], or
> you don't.  But mixing the two doesn't sound like a nice strategy, 
> especially since the kernel nat doesn't provide you with session logs, ...

see my previous explanation.

> 
> > As I know it's currently not possible to exclude sessions from helper
> > processing. TProxy only needs to NAT the encapsulating TCP session, and
> > nothing else.
> > 
> > Currently ip_nat_setup_info() assigns the helper to a given connection:
> > 
> > unsigned int
> > ip_nat_setup_info(struct ip_conntrack *conntrack,
> >   const struct ip_nat_multi_range *mr,
> >   unsigned int hooknum)
> > 
> > I see two possibilities:
> > 
> > * add a new argument to ip_nat_setup_info() to avoid helpers
> 
> seems reasonable. it's only three arguments currently, having four wouldn't
> be a problem.  add a 'int flags' argument and define a flag for
> 'BYPASS_HELPER'.  

ok, isn't this api change too intrusive to other netfilter parts?
ip_nat_setup_info() is referenced 11 times on an unpatched 2.4.18.

A third solution would be to add new NFCT_ flag. Do you still prefer the
flags argument?

>  This however wouldn't bypass the conntrack helper [which
> could already say INVALID because a packet doesn't match the layer5+ state 
> of the connection, see for example the PPTP helper].

Don't forget that we have two conntrack entries if traffic is flown through
a transparent proxy, and conntrack processing is done prior to NAT
rewriting. 

Please tell me if I'm wrong, but CONNTRACK sees an unmodified PORT command
assigned to a session with unmodified destination address.

> 
> > * reset conntrack->helper to NULL once ip_nat_setup_info returns (this might
> >   cause races though)
> 
> ugly. 

ok, it was only an idea.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: another netfilter ICMP bug

2002-05-23 Thread Balazs Scheidler

On Thu, May 23, 2002 at 07:03:52PM +0200, Harald Welte wrote:
> On Thu, May 23, 2002 at 10:18:23AM +0200, Balazs Scheidler wrote:
> > Hi,
> > 
> > I've encountered another ICMP translation problem in netfilter. This time it
> > occurs when a process initiates a connection and it is translated on the
> > same host.
> 
> are you sure this problem persists, even after applying the icmp nat fix?

yes, I've forgotten to mention that I first applied the patch, and the
problem persisted.

(btw: a plain .diff file for the mentioned fix would make it much easier to
apply the patch, I had to cut & paste it from the .html)

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: another netfilter ICMP bug

2002-05-23 Thread Balazs Scheidler

On Thu, May 23, 2002 at 07:03:52PM +0200, Harald Welte wrote:
> On Thu, May 23, 2002 at 10:18:23AM +0200, Balazs Scheidler wrote:
> > Hi,
> > 
> > I've encountered another ICMP translation problem in netfilter. This time it
> > occurs when a process initiates a connection and it is translated on the
> > same host.
> 
> are you sure this problem persists, even after applying the icmp nat fix?

so, here's my third attempt at tproxy support, this time it is using
conntrack/nat to do most of the work. It still missing some parts,
especially:

* a way to get rewritten UDP destination address (in fact I did nothing in
  this area, TCP works because of SO_ORIGDSTADDR)
* a way to get notified when a socket is closed (if a program registers
  itself, the registration remains there unless explicitly removed)

Otherwise everything seems to work:
* connecting from foreign address
* listening on foreign address
* redirecting sessions (this is untested, but should work, at least with the
  good old REDIRECT target)

I still have a separate table, though it is not absolutely necessary.
The user interface might change as I was not thinking about typical usage
scenarios.

I touched NAT a few places, so it might collide with newnat, I was using a
vanilla 2.4.18 kernel for my developments.

Tips, ideas, flames welcome,

PS: I don't plan to rewrite the whole a fourth time :), I'm quite satisfied with
the way it currently works.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1


diff -urN --exclude-from=kernel-exclude 
linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ip_nat.h 
linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ip_nat.h
--- linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ip_nat.h  Thu Apr 26 00:00:28 
2001
+++ linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ip_nat.h Thu May 23 11:59:38 
+2002
@@ -111,10 +111,13 @@
struct ip_nat_seq seq[IP_CT_DIR_MAX];
 };
 
+#define IP_NAT_BYPASS_HELPERS  0x0001
+
 /* Set up the info structure to map into this range. */
 extern unsigned int ip_nat_setup_info(struct ip_conntrack *conntrack,
  const struct ip_nat_multi_range *mr,
- unsigned int hooknum);
+ unsigned int hooknum,
+ int flags);
 
 /* Is this tuple already taken? (not by us)*/
 extern int ip_nat_used_tuple(const struct ip_conntrack_tuple *tuple,
diff -urN --exclude-from=kernel-exclude 
linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ip_nat_core.h 
linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ip_nat_core.h
--- linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ip_nat_core.h Mon Dec 11 
22:31:32 2000
+++ linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ip_nat_core.hThu May 23 
+12:00:24 2002
@@ -26,6 +26,11 @@
 extern void place_in_hashes(struct ip_conntrack *conntrack,
struct ip_nat_info *info);
 
+void ip_nat_update_hashes(struct ip_conntrack *conntrack, 
+ struct ip_nat_info *info,
+ int initialized);
+
+
 /* Built-in protocols. */
 extern struct ip_nat_protocol ip_nat_protocol_tcp;
 extern struct ip_nat_protocol ip_nat_protocol_udp;
diff -urN --exclude-from=kernel-exclude 
linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ip_tproxy.h 
linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ip_tproxy.h
--- linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ip_tproxy.h   Thu Jan  1 
01:00:00 1970
+++ linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ip_tproxy.h  Thu May 23 
+11:14:55 2002
@@ -0,0 +1,25 @@
+#ifndef _IP_TPROXY_H
+#define _IP_TPROXY_H
+
+#include 
+
+/* 
+ * used in setsockopt(SOL_IP, IP_TPROXY) should not collide 
+ * with values in  
+ */
+#define IP_TPROXY 16
+
+/* bitfields in in_tproxy.itp_flags */
+#define ITP_CONNECT  0x0001
+#define ITP_LISTEN   0x0002
+#define ITP_ONCE 0x0001
+#define ITP_REMOVE   0x0002
+
+/* structure passed to setsockopt(SOL_IP, IP_TPROXY) */
+struct in_tproxy {
+   u_int32_t itp_flags;
+   u_int32_t itp_faddr;
+   u_int16_t itp_fport;
+};
+
+#endif
diff -urN --exclude-from=kernel-exclude 
linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ipt_TPROXY.h 
linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ipt_TPROXY.h
--- linux-2.4.18-vanilla/include/linux/netfilter_ipv4/ipt_TPROXY.h  Thu Jan  1 
01:00:00 1970
+++ linux-2.4.18-cttproxy/include/linux/netfilter_ipv4/ipt_TPROXY.h Wed May 22 
+02:57:23 2002
@@ -0,0 +1,15 @@
+#ifndef _IPT_TPROXY_H_target
+#define _IPT_TPROXY_H_target
+
+struct ipt_tproxy_target_info {
+   u_int16_t redir_port;
+   /* unsigned long fwmark; */
+};
+
+struct ipt_tproxy_user_info {
+   int changed;
+   u_int16_t redir_port;
+   unsigned long fwmark;
+};
+
+#endif /*_IPT_TPROXY_H_target*/
diff -urN --exclude-from=kernel-exclude 
linux-2.4.18-vani

addendum to ICMP translation problem

2002-05-27 Thread Balazs Scheidler

Hi,

Last week I reported an ICMP translation problem, which occurs if the
connection is initiated by a local process.

I now further investigated the problem, it doesn't occur:
* if the NAT box is a gateway, and the connection is initiated on another
  box.
* if the connection is not initiated, but accepted

As SNAT happens at NF_IP_POST_ROUTING, reply translation will be performed
at NF_IP_PRE_ROUTING. The following DEBUG output shows what happens (enabled
DEBUGP at the top of ip_nat_core.c):

icmp reply translation, ct=c3617480, hooknum=0, ctinfo=4
icmp_reply_translation: translating error c396f260 hook 0 dir REPLY, num_manips=2
icmp_reply: manip 0 dir ORIG hook 4
icmp_reply: manip 1 dir REPLY hook 0
icmp_reply: outer DST -> 192.168.131.124

As it seems the inner manip is not called, as it is registered to hook 4
(POST_ROUTING, ORIG)

As POST_ROUTING will never be called in ORIG-inal direction for this packet,
the inner packet is never translated. 

I see two ways of fixing the issue:
* fix icmp_reply_translation() to perform all of its translation at the same
  time (both the inner and the outer header)
* register a NAT hook at LOCAL_IN, and perform translation of packets
  registered at (POST_ROUTING, ORIG)

The first option seems to be doable, the second is a big change, though
seems to be cleaner.

Opinions?

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: addendum to ICMP translation problem [PATCH]

2002-05-27 Thread Balazs Scheidler

On Mon, May 27, 2002 at 12:32:32PM +0200, Balazs Scheidler wrote:
> As SNAT happens at NF_IP_POST_ROUTING, reply translation will be performed
> at NF_IP_PRE_ROUTING. The following DEBUG output shows what happens (enabled
> DEBUGP at the top of ip_nat_core.c):
> 
> icmp reply translation, ct=c3617480, hooknum=0, ctinfo=4
> icmp_reply_translation: translating error c396f260 hook 0 dir REPLY, num_manips=2
> icmp_reply: manip 0 dir ORIG hook 4
> icmp_reply: manip 1 dir REPLY hook 0
> icmp_reply: outer DST -> 192.168.131.124
> 
> As it seems the inner manip is not called, as it is registered to hook 4
> (POST_ROUTING, ORIG)
> 
> As POST_ROUTING will never be called in ORIG-inal direction for this packet,
> the inner packet is never translated. 

I was wrong here. The same manip is applied at different hooks (once at
PRE_ROUTING and once at  POST_ROUTING)

> I see two ways of fixing the issue:
> * fix icmp_reply_translation() to perform all of its translation at the same
>   time (both the inner and the outer header)
> * register a NAT hook at LOCAL_IN, and perform translation of packets
>   registered at (POST_ROUTING, ORIG)
> 
> The first option seems to be doable, the second is a big change, though
> seems to be cleaner.

I implemented option #1, and the patch is below. However I'm not 100% sure
that I'm free to translate the inner packet at PREROUTING time. (it must
have had some reasons that it was performed at POST_ROUTING time)

Functionality wise the patch seems to work all-right.

--- ip_nat_core.c.old   Mon May 27 04:53:09 2002
+++ ip_nat_core.c   Mon May 27 05:00:23 2002
@@ -843,7 +843,7 @@
   packet, except it was never src/dst reversed, so
   where we would normally apply a dst manip, we apply
   a src, and vice versa. */
-   if (info->manips[i].hooknum == opposite_hook[hooknum]) {
+   if (info->manips[i].hooknum == hooknum) {
DEBUGP("icmp_reply: inner %s -> %u.%u.%u.%u %u\n",
   info->manips[i].maniptype == IP_NAT_MANIP_SRC
   ? "DST" : "SRC",
@@ -854,9 +854,9 @@
  &info->manips[i].manip,
  !info->manips[i].maniptype,
  &skb->nfcache);
-   /* Outer packet needs to have IP header NATed like
-   it's a reply. */
-   } else if (info->manips[i].hooknum == hooknum) {
+   /* Outer packet needs to have IP header NATed like
+  it's a reply. */
+
/* Use mapping to map outer packet: 0 give no
per-proto mapping */
DEBUGP("icmp_reply: outer %s -> %u.%u.%u.%u\n",


-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




my last NAT fix unseen?

2002-05-30 Thread Balazs Scheidler

Hi,

I've posted a patch against the ICMP NAT problem I encountered, and there
was no reply. It was under the topic "Re: addendum to ICMP translation
problem [PATCH]" it was posted 27th May.

Could you please review it?

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: [RFC] Re: another netfilter ICMP bug

2002-05-30 Thread Balazs Scheidler

On Thu, May 30, 2002 at 12:16:22PM +0200, Jozsef Kadlecsik wrote:
> On 30 May 2002, Andras Kis-Szabo wrote:
> 
> > > don't forget that ICMP error messages only quote the first 64 bytes of the
> > > original packet. Adding up IP and TCP headers (both 20 bytes without
> > > options), you only have 24 bytes of original payload. This might be somewhat
> > > more in UDP though due to its shorter header.
> > >
> > > A full length PORT command is 28 bytes, though a common scenario fits into
> > > 24 bytes.
> > >
> > > I see two solutions:
> > > * truncate the packet, and remove the payload area of deNATed ICMP messages,
> > >   if the inner header is either TCP or UDP (because in this case we _KNOW_
> > >   what is header and what is payload)
> > > * don't use packet filtering if separating the two zones is so important
> > >
> > > The first one could also be implemented using an ICMPTRIM target in your
> > > mangle table, which could also trim ICMP echo request/reply payloads. (which
> > > can easily be used to tunnel a whole IP stack through a firewall)
> >
> > Ok, I didn't know the IPv4-ICMP RFC. I just sent a special packet with
> > TCP payload and I got back the payload. It was only a first check.
> > (In IPv6-ICMP the length-limit is ~1298 bytes, ...)
> 
> Sidenote: ICMPTRIP could not be used to trim ICMP echo requests/replies:
> 
> "The data received in the echo message must be returned in the echo
> reply message."

Ok, that's true. Those packets are to be dropped then.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: my last NAT fix unseen?

2002-05-30 Thread Balazs Scheidler

On Thu, May 30, 2002 at 03:24:01PM +0200, Harald Welte wrote:
> On Thu, May 30, 2002 at 10:39:06AM +0200, Balazs Scheidler wrote:
> > Hi,
> > 
> > I've posted a patch against the ICMP NAT problem I encountered, and there
> > was no reply. It was under the topic "Re: addendum to ICMP translation
> > problem [PATCH]" it was posted 27th May.
> > 
> > Could you please review it?
> 
> oh my god. Please. I will review your patch ASAP, as will do every other
> coreteam member.
> 
> It's just like not everybody has the time to immediately look into every
> detail of something as complex as the ICMP reply translation.
> 
> SCNR.

Ok sorry, I thought the question I raised in the message beside the patch was
trivial to answer for somebody who wrote the code. And since I have no
general overview on netfilter code, I don't know how my patch affects other
parts of netfilter.

I'll wait patiently.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




[PATCH] linux tproxy support

2002-06-03 Thread Balazs Scheidler

Hi,

I've released my new release of the Linux transparent proxy patch. It is
available at:

http://www.balabit.hu/en/downloads/tproxy/

or 

http://www.balabit.hu/downloads/tproxy/linux-2.4/cttproxy-2.4.18-02.tar.gz

It features:
* test programs for listening on/connecting from foreign addresses (TCP)
* a kernel patch against vanilla 2.4.18
  (it includes my last ICMP translation fix)

I've included the README file, which outlines its use below.

TODO:
* when the socket is closed, the entry assigned to the socket should be
  deleted. Sadly the only solution is to patch the core to notify tproxy
  about this event, so the assigned entry can be deleted.
* receiving UDP packets on a foreign address should work, but sending from
  foreign address doesn't work, as it also needs heavy patching in the
  kernel.

README:

How it works?
-

Within the tproxy module in the kernel there's a table describing the
relationship between local sockets and non-local IP address/port pairs. A
local socket is referenced by its local IP/port, therefore all sockets to be
used for transparent proxy purposes must be bound to a local IP prior
anything can be done.

To connect from, or listen on a foreign address an entry to this table must
be added.

To add a translation table entry, create a socket (bind it to a local
interface), and call the setsockopt IP_TPROXY_ASSIGN at level SOL_IP with a
structure describing the nonlocal address (struct in_tproxy).

If this setsockopt succeeds, specify what you want to do with the given
socket, by calling IP_TPROXY_FLAGS with the combination of the bits in
in_tproxy.h:

/* bitfields in IP_TPROXY_FLAGS */
#define ITP_CONNECT  0x0001
#define ITP_LISTEN   0x0002
#define ITP_ONCE 0x0001

ITP_CONNECT means you want to initiate a connection, ITP_LISTEN means you
want to accept connections on the foreign address specified in
IP_TPROXY_ASSIGN.

ITP_ONCE means that this translation is to be performed only once, and then
it should be removed from the table atomically. You usually want to specify
ITP_ONCE with ITP_CONNECT, and may specify ITP_ONCE for listening socket
when only one connection is to be accepted. (FTP data connection for
example)

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




[RFC] TCP core changes needed for transparent proxying

2002-06-04 Thread Balazs Scheidler

Hi,

Sorry to disturb you again with my transparent proxy efforts, but finally
after my third complete reorganization, things seem to work fine, without
_any_ core TCP changes. 

I have a couple of problems though, which all involve core TCP code patches,
and so I would like some advice on the preferred way.

1. notification about destroyed sockets

I definitely need a notification when a socket is closed. Possible
solutions:
  a) create a notifier in inet_sock_release() function, where my tproxy
 module registers itself.
  b) call a netfilter specific function when CONFIG_NETFILTER is defined
 in a way similar to how setsockopts are delegated to netfilter.

I like the second option a bit more, as putting notifiers here and there is
IMHO ugly. Other parts in netfilter might need such a feature too, as
netfilter modules might assign state to sockets (through
setsockopt/getsockopt) which needs to be freed when the socket is closed.

2. receiving rewritten original address for datagram based protocols (UDP)

As looking up a table when a packet is received is not atomic (the way it
needs to be done when using simple NAT), I was thinking about attaching the
original address information to the packet itself, which can be queried via
an aux message with recvmsg(). As it is not possible to hook into aux
message processing, I did this with a patch to ip_sockglue.c. The important
parts of my patch is at the end of this message.

I tried to be as general as possible, and made the NAT framework to save
original addresses in IPCB(skb), which is returned in an IP_ORIGADDRS
auxillary message when recvmsg() is called on a socket with IP_RECVORIGADDRS
setsockopt enabled.

Is this solution ok for you?

3. specifying outgoing source address for datagram based protocols (UDP)

A similar problem applies to sending as well. To be atomic, I need to
specify the outgoing source address at sendmsg() time using an aux message.
The problem is again that it is difficult to hook into aux message
processing, and the skb is not created until ip_build_xmit() time, therefore
the skb cannot be used to store this information unless ip_build_xmit()
itself is patched.

Any idea to resolve this issue?

And now my current patch against ip_sockglue.c

diff -urN --exclude-from kernel-exclude linux-2.4.18-vanilla/net/ipv4/ip_sockglue.c 
linux-2.4.18-cttproxy/net/ipv4/ip_sockglue.c
--- linux-2.4.18-vanilla/net/ipv4/ip_sockglue.c Wed Oct 31 00:08:12 2001
+++ linux-2.4.18-cttproxy/net/ipv4/ip_sockglue.cFri May 24 02:44:44 2002@@ 
+-48,6 +48,7 @@
 #define IP_CMSG_TOS4
 #define IP_CMSG_RECVOPTS   8
 #define IP_CMSG_RETOPTS16
+#define IP_CMSG_ORIGADDRS  32
 
 /*
  * SOL_IP control messages.
@@ -107,6 +108,20 @@
put_cmsg(msg, SOL_IP, IP_RETOPTS, opt->optlen, opt->__data);
 }
 
+#if defined(CONFIG_IP_NF_NAT_NEEDED)
+
+void ip_cmsg_recv_origaddrs(struct msghdr *msg, struct sk_buff *skb)
+{
+struct in_origaddrs ioa;
+
+ioa.ioa_srcaddr.s_addr = IPCB(skb)->orig_srcaddr;
+ioa.ioa_srcport = IPCB(skb)->orig_srcport;
+ioa.ioa_dstaddr.s_addr = IPCB(skb)->orig_dstaddr;
+ioa.ioa_dstport = IPCB(skb)->orig_dstport;
+put_cmsg(msg, SOL_IP, IP_ORIGADDRS, sizeof(ioa), &ioa);
+}
+
+#endif
 
 void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb)
 {
@@ -135,6 +150,12 @@
 
if (flags & 1)
ip_cmsg_recv_retopts(msg, skb);
+   if ((flags>>=1) == 0)
+   return;
+#if defined(CONFIG_IP_NF_NAT_NEEDED)
+   if (flags & 1)
+   ip_cmsg_recv_origaddrs(msg, skb);
+#endif
 }
 
 int ip_cmsg_send(struct msghdr *msg, struct ipcm_cookie *ipc)
@@ -167,6 +188,19 @@
ipc->addr = info->ipi_spec_dst.s_addr;
break;
}
+#if defined(CONFIG_IP_NF_NAT_NEEDED)
+   case IP_ORIGADDRS:
+{
+struct in_origaddrs *ioa;
+
+if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct in_origaddrs)))
+return -EINVAL;
+ioa = (struct in_origaddrs *) CMSG_DATA(cmsg);
+
+/* FIXME: where to store addresses so NAT might pick it up? */
+   break;
+   }
+#endif
default:
return -EINVAL;
}


-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




[RFC] matching tproxied packets

2002-06-04 Thread Balazs Scheidler

Hi,

Suppose you have a TCP session, which is transparently redirected to a local
proxy. With the current state of the tproxy framework one need to add two
rules to iptables:

- one to the tproxy table to actually redirect a session
- one to the filter table to let the NATed traffic enter the local stack (in
  INPUT)

I'd like to make tproxies easier to administer, so I'm thinking about a
simple way of matching tproxied packets, which can be ACCEPTed from the
INPUT chain.

Possible solutions:

* use a new state (called TPROXY), which would be applied to all TPROXYed
  packets (might interact badly with nat/conntrack).
* have the tproxy framework mark all packets with an fwmark, and let the
  packets in based on the value of fwmark
* have a separate match (called tproxy), which matches tproxied sessions
  based on some value stored in the associated conntrack entry

which one do you prefer?

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: [RFC] matching tproxied packets

2002-06-04 Thread Balazs Scheidler

On Tue, Jun 04, 2002 at 05:14:47PM +0200, Henrik Nordstrom wrote:
> Balazs Scheidler wrote:
> 
> > * use a new state (called TPROXY), which would be applied to all TPROXYed
> >   packets (might interact badly with nat/conntrack).
> 
> It will in no doubt interact badly with connection tracking (and therefore 
> NAT).

ok.

> 
> > * have the tproxy framework mark all packets with an fwmark, and let the
> >   packets in based on the value of fwmark
> 
> Will interact badly with fwmark based routing.

of course the mark value would be controlled by the user, and not assigned
automatically.

> > * have a separate match (called tproxy), which matches tproxied sessions
> >   based on some value stored in the associated conntrack entry
> 
> Defenitely my preference, but I might be biased as I make heavy use of 
> connection tracking and fwmark based routing in combination.

This was my conclusion as well. So I'll go for this solution.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




[RFC] unidirectional NAT

2002-06-05 Thread Balazs Scheidler

Hi,

First of all, thanks for the feedback on my tproxy patches. It generally
works well for TCP based connections, what I'm up to now is proper support
for UDP.

The problem with datagram based protocols is that connection tracking (at
least in my case involving Zorp) and address translation is done by the
userspace proxy.

The only features for an UDP proxy is the following:
* being able to receive frames originally destined elsewhere (the REDIRECT
  case)
* being able to receive frames from an arbitrary host, originally destined
  to another arbitrary host (the foreign address listen case)
* being able to send frames using an arbitrary source IP/address, to an
  arbitrary host (the foreign connect case)

I use the NAT framework to redirect packets to the local stack, but as a
sideeffect NAT translates replies as well. 

Now I don't want reply translation :), that's why the subject unidirectional
NAT, which would mean to translate packets in only one direction. (to be
honest the best would be to translate a single packet only)

I'm thinking about two possibilities:

* yet another flag to ip_nat_setup_info() to set up a single manip only. 
* free the state associated to UDP packets after the translation was applied.
* instead of setting up a NAT translation, call manip_pkt() directly somehow

Ideas?

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: [RFC] unidirectional NAT

2002-06-05 Thread Balazs Scheidler

On Wed, Jun 05, 2002 at 11:48:49AM +0200, Jozsef Kadlecsik wrote:
> On Wed, 5 Jun 2002, Balazs Scheidler wrote:
> > * yet another flag to ip_nat_setup_info() to set up a single manip only.
> > * free the state associated to UDP packets after the translation was applied.
> > * instead of setting up a NAT translation, call manip_pkt() directly somehow
> 
> I'd combine the third with the new table I wrote about some months ago
> (working name for the table is 'raw' instead of 'notrack' or 'select').
> The proposed new target for the table is 'NOTRACK' so that the selected
> packet would be skipped by conntrack and NAT as well. If I understand your
> problem correctly, a target 'NONAT' could then be easily added and you
> could call manip_pkt as you wish.

Let me think a bit about it. For UDP packets I don't really need
conntracking sessions, I only need to translate single packets, but I'd like
to avoid messing with IP and UDP header translation myself.

So NOTRACK is good for me, I don't need NONAT since I don't need conntrack
either. The question is how you mark an skb to avoid tracking? (an idea was
to use a flag in nfct, is it still true?)

Is you patch available somewhere?

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: [RFC] matching tproxied packets

2002-06-05 Thread Balazs Scheidler

On Wed, Jun 05, 2002 at 08:53:25AM +0200, Jozsef Kadlecsik wrote:
> On Tue, 4 Jun 2002, Balazs Scheidler wrote:
> > Possible solutions:
> >
> > * use a new state (called TPROXY), which would be applied to all TPROXYed
> >   packets (might interact badly with nat/conntrack).
> > * have the tproxy framework mark all packets with an fwmark, and let the
> >   packets in based on the value of fwmark
> > * have a separate match (called tproxy), which matches tproxied sessions
> >   based on some value stored in the associated conntrack entry
> >
> > which one do you prefer?
> 
> The latter seems to me the best solution.

ok, should I simply add fields somewhere in struct ip_conntrack, or there's
a bitfield I can add a flag to? 

Looking at the struct I can't see a place general enough, so I can add a new
field just to hold a single bit, or a general "flags" field, which can be
used by other matches later.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: [RFC] matching tproxied packets

2002-06-05 Thread Balazs Scheidler

On Wed, Jun 05, 2002 at 01:02:44PM +0200, Jozsef Kadlecsik wrote:
> On Wed, 5 Jun 2002, Balazs Scheidler wrote:
> 
> > ok, should I simply add fields somewhere in struct ip_conntrack, or there's
> > a bitfield I can add a flag to?
> 
> There is no such bitfield you could use at the moment.
> 
> > Looking at the struct I can't see a place general enough, so I can add a new
> > field just to hold a single bit, or a general "flags" field, which can be
> > used by other matches later.
> 
> This is a good question. Probably it is better to add a (general) 'flags'
> field. But I have no idea for what else we could use it :-)

As I added a flags argument to ip_nat_setup_info() (currently with a single
bit specifying that NAT helpers are to be bypassed), this flags could be
stored in ct->nat.flags

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




[RFC} handling closed sockets from netfilter

2002-06-06 Thread Balazs Scheidler

Hi,

Yet another request for comments. I promise I'll stop with these soon. :)

So I've mentioned that I need a notification when a socket is closed. As
there are other parts in netfilter which might assign state to sockets
through setsockopt/getsockopt, I wanted to make the close callback as
general as possible.

I'm thinking about putting a close() function pointer into struct
nf_sockopt_ops, and call it from nf_sock_release (new function), which is
called from inet_release() when CONFIG_NETFILTER is defined.

This way any parts of netfilter might register a close callback. (even
conntrack might use it to deregister local sockets from the conntrack table)

Opinions?

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: [RFC} handling closed sockets from netfilter

2002-06-06 Thread Balazs Scheidler

On Thu, Jun 06, 2002 at 10:31:21AM +0100, Andy Whitcroft wrote:
> 
> Somewhat arbritrary, but perhaps calling the callback release would match
> its meaning and the calling graph better.

ok, callback renamed to release. My implementation is now in place, and I'm
compiling the kernel right now.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: [RFC] unidirectional NAT

2002-06-12 Thread Balazs Scheidler

On Wed, Jun 05, 2002 at 12:11:32PM +0200, Jozsef Kadlecsik wrote:
> On Wed, 5 Jun 2002, Balazs Scheidler wrote:
> 
> > Let me think a bit about it. For UDP packets I don't really need
> > conntracking sessions, I only need to translate single packets, but I'd like
> > to avoid messing with IP and UDP header translation myself.
> >
> > So NOTRACK is good for me, I don't need NONAT since I don't need conntrack
> > either. The question is how you mark an skb to avoid tracking? (an idea was
> > to use a flag in nfct, is it still true?)
> 
> No, I'll go with Rusty's solution: a dummy conntrack entry is used.
> 
> > Is you patch available somewhere?
> 
> Not yet, but real soon I'll post it :-).

Can you send me what you have available? I'd like to close my transparency
project, and so I'd be willing to contribute to the conntrack exemption
project :)

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




netfilter on solaris?

2002-06-14 Thread Balazs Scheidler

Hi,

It is a strange idea I know, but I'd be interested in what the opinion of
the core netfilter developers is on porting the whole netfilter subsystem to
Solaris?

Apart from the technical issues, would there be any problems? Does the GPL
allow this kind of usage? (it would be implemented as a module)

Technically, the most difficult tasks are to remove the dependency on Linux
like sk_buff (Solaris has a chain of mblk_t's), locking (it's more or less
done using macros), routing differences and I suppose many things I don't
see right now.

ps: don't tell me to use ipfilter (which works on Solaris), it's awful

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: netfilter on solaris?

2002-06-14 Thread Balazs Scheidler

On Fri, Jun 14, 2002 at 12:32:55PM +0200, Jozsef Kadlecsik wrote:
> On Fri, 14 Jun 2002, Balazs Scheidler wrote:
> 
> > It is a strange idea I know, but I'd be interested in what the opinion of
> > the core netfilter developers is on porting the whole netfilter subsystem to
> > Solaris?
> 
> You must have plenty of time. I envy you! :-)

I don't. I simply need to run Zorp on Solaris.

btw: I've found some proxy functionality in the Solaris core kernel while
looking at the source. Should I turn to DaveM now pointing at this? :) (he
was the one who said: I don't care about transparent proxying as long as it
does _NOT_ touch the TCP core)

> 
> > Apart from the technical issues, would there be any problems? Does the GPL
> > allow this kind of usage? (it would be implemented as a module)
> 
> If the module is GPLed, then I don't see problems here but I'm not a
> layer.

I assume you meant an s/layer/lawyer/ here. As I know it is allowed to write
GPLd modules for Photoshop (the gimp plugin issue), so it must be allowed to
use GPLd modules in a propriately kernel.

> 
> But how do you imagine the porting so that the maintenance would not
> become a nightmare?

Of course I'd want to provide system independency using some headers which
would make it work on both Linux/Solaris, so it could be incorporated into
standard Netfilter as well. 

So including headers would be changed from:

#include 
#include 
#include 


#include 
#include 
#include 
#include 
#include 
...

To:

#include "os.h"
#include  

etc.

And maybe references to sk_buff * and skb related functions would be changed
to inline functions or macros. It's a huge work I assume, but ipfilter's
code is _very_ disappointing.

> 
> > Technically, the most difficult tasks are to remove the dependency on Linux
> > like sk_buff (Solaris has a chain of mblk_t's), locking (it's more or less
> > done using macros), routing differences and I suppose many things I don't
> > see right now.
> 
> Challenging. But wouldn't it be more straightforward to run Linux on that
> SPARC machine? And there are still plenty to do on 32/64bit issues in
> netfilter...

The bad thing is that it's not a single computer I want to use. I want to
create a product that runs on Suns, and it's generally not a good practice
to dump Solaris and use Linux instead.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: netfilter on solaris?

2002-06-15 Thread Balazs Scheidler

On Sat, Jun 15, 2002 at 02:55:30PM +0200, Harald Welte wrote:
> On Fri, Jun 14, 2002 at 12:47:07PM +0200, Balazs Scheidler wrote:

> As long as I am one of the maintainers of netfilter/iptables, I am not 
> going to do any extra hassle in order to support different operating systems.
> This includes using weird different types instead of sk_buff.  Linux kernel
> hackers expect to see linux code, not something abstract using only typedefs
> and macros all over the place.

ok, it was only an idea. I thought netfilter would only benefit from the effort.
the technical problems would be overwhelming anyway.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: netfilter on solaris?

2002-06-15 Thread Balazs Scheidler

On Sat, Jun 15, 2002 at 02:52:12PM +0200, Harald Welte wrote:
> On Fri, Jun 14, 2002 at 12:05:40PM +0200, Balazs Scheidler wrote:
> > Hi,
> > 
> > It is a strange idea I know, but I'd be interested in what the opinion of
> > the core netfilter developers is on porting the whole netfilter subsystem to
> > Solaris?
> 
> After my netfilter presentation at linuxtag, somebody was asking me exactly
> this question.

And your answer was?

> > Apart from the technical issues, would there be any problems? Does the GPL
> > allow this kind of usage? (it would be implemented as a module)
> 
> This is basically the same question like binary-only kernel modules. 

I think it is more similar to Gimp plugins under Photoshop case.

> 
> Having netfilter within a different kernel, is technically the same: 
> GPL'd netfilter/iptables code is called from a binary-only kernel.  

Under binary-only you mean propriatery, or something nonGPLd? Solaris kernel
source _is_ available, though it doesn't use the GPL.

> Thus, my conclusion would be: Without the explicit permission of the authors,
> it is legally not possible to use netfilter/iptables within a proprietary
> OS kernel.  But of coursel, I am not a lawyer.
> 
> I for myself are somewhat undecided, but I tend to share Rusty's view:
> I haven't ever given permission for using my code by binary only kernel
> modules.  All new code I'm working on wil export GPLONLY to make sure
> about that.

I think while disallowing binary only modules restrict the vendors who
release propriatery software relying on free software, disallowing netfilter
on propriatery platforms restrict users who would like to use free software
on their platform. Even the GPL makes difference between propriatery system
and non-system libraries.

While it is a possibility to dump the native OS, and replace it with Linux,
most non-x86 platforms work with their native OS best (be it HP-UX, Solaris
or Tru64).

This argument is worthless anyway. The task is tedious, and without support
of the original developers it would die immediately.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: performance issues (nat / conntrack)

2002-06-25 Thread Balazs Scheidler

On Tue, Jun 25, 2002 at 04:17:54PM +0200, Jozsef Kadlecsik wrote:
> On Tue, 25 Jun 2002, Jean-Michel Hemstedt wrote:
> > > The book-keeping overhead is at least doubled compared to the
> > > conntrack-only case - this explains pretty well the results you got.
> >
> > what do you mean by 'book-keeping' ?
> > Does NAT do a lookup even if there are no rules?
> 
> I have to write again: even if there are no any rules, NULL
> mapping happens and new connections must be put into both nat hashes.

This should not explain the performance degradation others found. If no
rules are found in the table, the conntrack entry is added to the NAT
hashes. (place_in_hashes() function), this involves adding the entry to two
linked lists (changes two pointers per list), and then calling do_bindings()
which does nothing (num_manips == 0) except for calling helpers, which
should be none, if helper modules are not loaded.

Adding entries to the NAT hashes doesn't involve memory allocation (NAT info
is stored in ip_conntrack), therefore I don't see the reason for the 50%
performance decrease.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: performance issues (nat / conntrack)

2002-06-25 Thread Balazs Scheidler

On Tue, Jun 25, 2002 at 04:53:33PM +0200, Jozsef Kadlecsik wrote:
> On Tue, 25 Jun 2002, Jean-Michel Hemstedt wrote:
> > PS: could anybody redo similar tests so that we can compare the results
> > and stop killing the messenger, please? ;o)
> 
> Sorry if I look harsh, it's not my intention at all. We were simply over
> almost exaclty the same arguments several times. And those resulted
> neither pinpointing real flaws in the system, nor better algorithms.

no only head pointers for hashes are preallocated. conntrack structures
themselves are allocated by the slab allocator: kmem_cache_alloc() called in
init_conntrack() which initializes a single conntrack entry.

So the initial memory allocations for conntrack and nat are

conntrack: htable_size * 8 

(8 is sizeof(list_head))

nat: 2 * htable_size * 8

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: performance issues (nat / conntrack)

2002-06-26 Thread Balazs Scheidler

On Tue, Jun 25, 2002 at 09:06:47PM +0200, Harald Welte wrote:
> On Tue, Jun 25, 2002 at 05:13:02PM +0200, Balazs Scheidler wrote:
> > On Tue, Jun 25, 2002 at 04:17:54PM +0200, Jozsef Kadlecsik wrote:
> > > On Tue, 25 Jun 2002, Jean-Michel Hemstedt wrote:
> > > > what do you mean by 'book-keeping' ?
> > > > Does NAT do a lookup even if there are no rules?
> > > 
> > > I have to write again: even if there are no any rules, NULL
> > > mapping happens and new connections must be put into both nat hashes.
> > 
> > This should not explain the performance degradation others found. If no
> > rules are found in the table, the conntrack entry is added to the NAT
> > hashes. (place_in_hashes() function), this involves adding the entry to two
> > linked lists (changes two pointers per list), and then calling do_bindings()
> > which does nothing (num_manips == 0) except for calling helpers, which
> > should be none, if helper modules are not loaded.
> > 
> > Adding entries to the NAT hashes doesn't involve memory allocation (NAT info
> > is stored in ip_conntrack), therefore I don't see the reason for the 50%
> > performance decrease.
> 
> think about the lock contention on SMP system. The 'null binding'
> approach for nat (and for example, that nat helpers are called for
> connections with 'null binding') is a poor design.  
> 
> I've recently did some testing which try to avoid the null binding, but 
> as I'm not entirely sure they don't break something else I haven't been
> releasing them yet.

The original test machine used to gather performance information was not
SMP:

"
here is my test bed:

tested target:
 -kernel 2.4.18 + non_local_bind + small conntrack timeouts...
 -PIII~500MHz, RAM=256MB
 -2*100Mb/s NIC

"

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




NAT, not doing route_me_harder?

2002-06-26 Thread Balazs Scheidler

Hi,

I was wondering what the reason is for NAT not rerouting modified packets?

If anything important is modified by a mangle rule that affects routing, the
routing decision is automatically redone as this code fragment shows:

ret = ipt_do_table(pskb, hook, in, out, &packet_mangler, NULL);
/* Reroute for ANY change. */
if (ret != NF_DROP && ret != NF_STOLEN && ret != NF_QUEUE
&& ((*pskb)->nh.iph->saddr != saddr
|| (*pskb)->nh.iph->daddr != daddr
|| (*pskb)->nfmark != nfmark
|| (*pskb)->nh.iph->tos != tos))
return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP;

NAT doesn't do anything like this. So given an SNAT rule changes the source
address in POSTROUTING, the routing tables are not looked up again, so
source address dependant policy routing rules are not applied.

It might not be the best to change this by default, but it could be
implemented by a match, e.g.

iptables -t nat -A POSTROUTING -p tcp -d 0/0 --dport 25 -m reroute -j SNAT --to-source 
1.2.3.4

-m reroute would flag the packet as one which needs rerouting (using for
example a flag in nfcache). Packets flagged as such would be rerouted after
do_bindings() is called.

Opinions?

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: NAT, not doing route_me_harder?

2002-06-26 Thread Balazs Scheidler

> This is done only in the OUTPUT chain, and only because the TCP kernel has 
> already routed locally originating packets before they first hit netfilter.
> 
> > NAT doesn't do anything like this. So given an SNAT rule changes the source
> > address in POSTROUTING, the routing tables are not looked up again, so
> > source address dependant policy routing rules are not applied.
> 
> It sure does, in the same spot as mangle, which only is when there is a 
> destnination nat transformations applied to a locally originated packet.
> 
> in ip_nat_local_fn():
> ret = ip_nat_fn(hooknum, pskb, in, out, okfn);
> if (ret != NF_DROP && ret != NF_STOLEN
> && ((*pskb)->nh.iph->saddr != saddr
> || (*pskb)->nh.iph->daddr != daddr))
> return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP;
> 
> For the other cases (including mangle), all the transformations that are 
> assumed to affect routing is done in PREROUTING. SNAT is not among them.
> 
> There is a number of ways to route SNAT:ed packets differently if needed. The 
> method I use is usually to use the nfmark of mangle PREROUTING or OUTPUT in 
> combination with SNAT in POSTROUTING. Mangle marks the packet telling that 
> this should be NAT:ed according to policy X, this nfmark is then used in 
> routing to route the packet in the correct direction and by nat POSTROUTING 
> to apply the correct NAT rule.

But what happens when you initiate a connection on the host running
netfilter, thus you have no PREROUTING chain?

Scenario:

I have a default route to gateway A on interface a, but I want my SMTP
traffic to leave the box on a different interface b with a different gateway
B. Of course this means a different source address is to be assigned to the
outgoing connection. Assume it is not possible to set the bind address of
the MTA (or setting it affects mail delivery in other directions).

I have source based routing, that makes packets go to the correct direction
based on their source address.

If I'm doing SNAT in POSTROUTING, the routing decision is not redone, thus
it leaves with the specified source address, but on the wrong interface.

I think I now understand, have my packets marked in local OUTPUT, route
based on that mark, and SNAT based on the marks. Is this the way you
suggested? Hmm.. this sounds reasonable on the programmer's perspective, but
is difficult to maintain from the user's: it needs two rules.

iptables -t mangle -A OUTPUT -p tcp ! -s  -d 0/0 --dport 25 -j MARK 
--set-mark 100
iptables -t nat -A POSTROUTING -m mark --mark 100 -j SNAT --to-source 

instead of:

iptables -t nat -A POSTROUTING -p tcp ! -s  -d 0/0 --dport 25 -m reroute -j 
SNAT --to-source 

Hmm... as I think some more, the 2nd case might not even be possible, as the
nat rule is triggered only once during a session, and it would mean that syn
would be routed correctly, but packets following it would not. Solution:
instead of nfcache the flag could be stored in ip_conntrack.

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: NAT, not doing route_me_harder?

2002-06-26 Thread Balazs Scheidler

On Wed, Jun 26, 2002 at 12:04:23PM +0200, Henrik Nordstrom wrote:
> Balazs Scheidler wrote:
> > I think I now understand, have my packets marked in local OUTPUT, route
> > based on that mark, and SNAT based on the marks. Is this the way you
> > suggested? Hmm.. this sounds reasonable on the programmer's perspective,
> > but is difficult to maintain from the user's: it needs two rules.
> 
> Yes, it requires three custom rules rather than two (there is also the routing 
> policy rule)
> 
> Having NAT reroute all packets due to source nat transformations would be a 
> significant performance impact only to support the corner cases where it is 
> handy..

Why? The rerouting would be triggered only if the user requests it, so
normal path would not be affected. And as routing decisions are heavily
cached, it is said (I think it was Harald who said that) that routing
decisions are not expensive. It would add a simple bit-test in normal path,
and a second routing decision if explicitly requested:

something like this in ip_nat_fn(), after do_bindings is called:

saddr = (*pskb)->nh.iph->saddr;
daddr = (*pskb)->nh.iph->daddr;

ret = do_bindings(ct, ctinfo, info, hooknum, pskb);
if (ret != NF_DROP && ret != NF_STOLEN && (ct->flags & IP_NAT_REROUTE)) {
if (((*pskb)->nh.iph->saddr != saddr || (*pskb)->nh.iph->daddr != 
daddr))
ret = (ip_route_me_harder(pskb) == 0) ? ret : NF_DROP;

}
return ret;

This could also be extended with the local output case, so ip_nat_fn() and
ip_nat_local_fn() could be merged. (the if condition would become:

if (ret != NF_DROP && ret != NF_STOLEN && (hooknum == NF_IP_LOCAL_OUT || ct->flags & 
IP_NAT_REROUTE)) {
...
}


-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1




Re: [PATCH}: Make MARK target terminate (resend)

2002-07-01 Thread Balazs Scheidler

On Sat, Jun 29, 2002 at 12:36:36PM +0200, Henrik Nordstrom wrote:
> On Saturday 29 June 2002 11.46, Patrick McHardy wrote:
> So the question to the Netfilter core team is if it would be OK to add 
> a new option and "module class" to the userspace tools, and have the 
> existing IPT_CONTINUE targets dual-register as both a target and a 
> match. I can try to whip something together if this is seen as 
> something acceptable. Should be fully backwards/forward compatible 
> with existing rulesets with only a minimal amount of code 
> duplication. The only compability issue is that if you make use the 
> new feature then you cannot go back to a older userspace or kernel..

I for one would second a feature like this. I see a good number of places
where it could be used (the long standing missing -l option is one example)

-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1