On Mon, 2010-08-02 at 12:03 -0600, Alex Rousskov wrote: 
> On 08/01/2010 05:47 PM, Andrew Beverley wrote:
> > Please find attached the first version of the netfilter mark patch. I've
> > not yet tested it extensively, but would welcome some initial feedback
> > or comments.

Thanks for the prompt reply. Please find attached an updated version of
the patch, which includes fixes to all the feedback below. It remains
not-extensively tested, but is attached for further comments.

> * It would be nice to get a proposed commit message describing the 
> changes as a patch preamble. Among other things, this will allow 
> reviewers to understand the overall scope and intent of your work.

Sorry about that, this is new to me. As you've probably gathered, the
aim of this patch is to implement the existing TOS functionality, but as
netfilter marking.

> * comm_set_mark() has a very generic name. Many things can be "marked", 
> for many reasons. Can you come up with a better, more specific one?

Renamed to comm_set_nfmark.

> * comm_set_mark() is not documented but is a part of the public Comm API.
> 

Now documented in the source code. Is that the only place to document?

> * getNfctMark() definition #ifdef conditions are inconsistent with its 
> declaration and use #ifdefs.

Fixed.

> * getNfctMark() is static but does not start with a capital letter.

Fixed.

> * getNfctMark() might belong to fde rather than FwdState because it does 
> not use FwdState. I am not sure about this one, but it may indicate a 
> general design problem -- a callback with no connection to the caller.

I thought about moving to fde, but I feel it sits better in FwdState as
although it is not called directly within it, it is called as a result
of other code in there.

getNfctMark() has been renamed to GetNfMarkCallback

> * Please document who calls getNfctMark() and when.
> 

Comments added.

> * If getNfctMark() is an async callback that will be called some time 
> after the nfct_callback_register returns, how do you know that clientFde 
> is still a valid pointer _and_ is pointing to the same connection 
> information?
> 
> * If getNfctMark() is an async callback that will be called at some 
> random time after the nfct_callback_register returns, is it safe to use 
> debugs()?

I'm pretty sure it's synchronous: if I add a 3 second delay in the
callback, then the rest of the code isn't executed until the callback
has finished, and the conntrack information is still found.

> * Please use Doxygen /** or /// comments for method descriptions.
> 

Fixed.

> * Please declare local variables when you first use them, if possible. 
> For example,

Fixed.

> * Please add a comment why ct = nfct_new() is not leaking memory despite 
> no matching delete/free OR fix the leak.

nfct_destory() added. Thanks for pointing this out.

> * upstreamMark member documentation should be fixed. What does that 
> member store/mean? I understand that you were copying the documentation 
> bug from upstreamTOS, but I hope we do not have to replicate that.

upstreamMark and upstreamTOS fixed.

> * Please break out new code into a new FwdState method(s) instead of 
> making FwdState::dispatch longer and uglier.

I have added 2 new methods: getUpstreamTOS() and getUpstreamNfMark()

> * Same comment applies to src/client_side_reply.cc code, including the 
> old QOS code already there. It should be extracted from regular methods 
> as they are getting difficult to follow, especially with all the ifdefs.

I have added tosLocalMissValue() and nfmarkLocalMissValue()

> * The new code implements a non-critical feature because errors do not 
> terminate transactions. Yet, most errors are reported at level 1 to 
> cache.log. We often have to modify the code to remove excessive 
> cache.log pollution because it hurts busy proxies. Do you need to use 
> level-1 reporting? Of every error? Or perhaps the code should just 
> increment some stats counters?

I have moved most of the general errors to level 2.

> * Is there a way to defer most support checks to runtime (like 
> comm_set_mark does it), to minimize the use of #ifdefs in the core code? 
> For example, can we use one #ifdef variable for both USE_QOS_NFMARK and 
> USE_ZPH_QOS code, in most contexts? They are intermixed in the code in a 
> complex ways that I find difficult to follow.

I have simplified this and removed as many as possible. I have added
isTosActive() and isMarkActive() as suggested by Amos, and used these
instead. Some of the code relies on the libnetfilter_conntrack
libraries, so I have had to keep that inside #ifdefs.

This leads me on to my first question: why not just remove USE_ZPH_QOS
and the compilation option --enable-zph-qos? With the attached patch,
the code only runs when specified in squid.conf, and it has no other
dependencies. The ZPH kernel patch part can remain in the _SQUID_LINUX
tests.

> * The USE_QOS_NFMARK and USE_ZPH_QOS naming seems inconsistent.
> 

I have renamed USE_ZPH_QOS to USE_QOS_TOS. However, see my question
above.

> The above review does not answer your questions below, and I am sorry 
> for that. I hope Amos or others do better. I agree with many Amos' 
> comments, and I especially encourage you to reduce and simplify #ifs and 
> #ifdefs if possible, following Amos' hints.

Hopefully the code is a lot clearer now, especially with the #defs
removed. Replies to Amos' comments will follow shortly.

> > - The existing TOS patch cannot be disabled at runtime. As such, this
> > mark patch cannot be either. Would it be preferable to only enable them
> > both if the qos_flows config option is present? This would also have the
> > advantage of being able to add CAP_NET_ADMIN as appropriate at runtime.

They are now both enabled at runtime.

> > - I have used sscanf instead of strtoul for both TOS and MARK in
> > QosConfig.cc (sscanf doesn't auto-detect the format of unsigned long
> > int). As a result, the tos variable could be changed to type char, which
> > is what it should be in my opinion. Should I do this?

If we can move to strtoul, I would like to change 'tos' to char
throughout. Currently it is possible to set it to invalid values in
squid.conf, which then cause problems with dumpConfigLine.

Question number 2: what is stubQosConfig.cc? Does that also need
updating for this patch?

Many thanks,

Andy


=== modified file 'configure.in'
--- configure.in	2010-08-01 06:29:48 +0000
+++ configure.in	2010-08-05 20:20:50 +0000
@@ -1146,6 +1146,34 @@
 fi
 AC_SUBST(SSLLIB)
 
+dnl Allow user to specify libnetfilter_conntrack (needed for QOS netfilter marking)
+AC_ARG_WITH(netfilter-conntrack,
+  AS_HELP_STRING([--with-netfilter-conntrack=PATH],
+                 [Compile with the Netfilter conntrack libraries. The path to
+                  the development libraries and headers
+                  installation can be specified if outside of the
+                  system standard directories]), [ 
+case "$with_netfilter_conntrack" in
+  no)
+    : # Nothing special to do here
+    ;;
+  yes)
+    AC_CHECK_LIB([netfilter_conntrack], [nfct_query],,
+                  AC_MSG_ERROR([libnetfilter-conntrack library not found. Needed for netfilter-conntrack support]),
+                  [-lnetfilter_conntrack])
+    AC_CHECK_HEADERS([libnetfilter_conntrack/libnetfilter_conntrack.h \
+                      libnetfilter_conntrack/libnetfilter_conntrack_tcp.h])
+    ;;
+  *)  
+    if test ! -d $withval ; then
+      AC_MSG_ERROR([--with-netfilter-conntrack path does not point to a directory])
+    fi
+    LDFLAGS="-L$with_netfilter_conntrack/lib $LDFLAGS"
+    CPPFLAGS="-I$with_netfilter_conntrack/include $CPPFLAGS"
+    with_netfilter_conntrack=yes
+    ;;
+  esac
+])
 
 AC_ARG_ENABLE(forw-via-db,
   AS_HELP_STRING([--enable-forw-via-db],[Enable Forw/Via database]), [
@@ -2057,10 +2085,19 @@
 SQUID_YESNO([$enableval],
             [unrecognized argument to --enable-zph-qos: $enableval])
 ])
-SQUID_DEFINE_BOOL(USE_ZPH_QOS,${enable_zph_qos:=no},
+SQUID_DEFINE_BOOL(USE_QOS_TOS,${enable_zph_qos:=no},
           [Enable Zero Penalty Hit QOS. When set, Squid will alter the
            TOS field of HIT responses to help policing network traffic])
 AC_MSG_NOTICE([ZPH QOS enabled: $enable_zph_qos])
+if test "$enable_zph_qos" = "yes" ; then
+    if test "$with_netfilter_conntrack" = "yes" ; then
+        AC_MSG_NOTICE([QOS netfilter marking enabled: $with_netfilter_conntrack])
+        SQUID_DEFINE_BOOL(USE_QOS_NFMARK,$with_netfilter_conntrack,
+                      [Enable support for QOS netfilter packet marking])
+    else
+        AC_MSG_WARN([--with-netfilter-conntrack not enabled. QOS features will not support Netfilter marking.])
+    fi
+fi
 
 dnl --with-maxfd present for compatibility with Squid-2.
 dnl undocumented in ./configure --help  to encourage using the Squid-3 directive.

=== modified file 'doc/release-notes/release-3.2.sgml'
--- doc/release-notes/release-3.2.sgml	2010-08-02 13:55:59 +0000
+++ doc/release-notes/release-3.2.sgml	2010-08-05 19:13:42 +0000
@@ -396,6 +396,15 @@
 	<p>Please check and update your squid.conf to use the text <em>none</em> for no limit instead of the old 0 (zero).
 	<p>All users upgrading need to be aware that from Squid-3.3 setting this option to 0 (zero) will mean zero bytes of memory get pooled.
 
+	<tag>qos_flows</tag>
+	<p>New options <em>mark</em> and <em>tos</em>
+	<p><em>tos</em> retains the original QOS functionality of the IP header TOS field.
+	<p><em>mark</em> offers the same functionality, but with a netfilter mark value.
+	<p>These options should be placed immediately after qos_flows.
+	<p>The <em>tos</em> value is optional in order to maintain backwards compatability.
+	<p>Netfilter marking requires libnetfilter_conntrack, which must be included during compilation using --with-netfilter-conntrack.
+	<p>The preserve-miss functionality is available with the <em>mark</em> option and requires no kernel patching.
+
 	<tag>windows_ipaddrchangemonitor</tag>
 	<p>Now only available to be set in Windows builds.
 
@@ -472,6 +481,9 @@
 	   Currently one demo helper <em>fake</em> is provided in shell and C++ forms to demonstrate
 	   the helper protocol usage and provide exemplar code.
 
+	<tag>--with-netfiler-conntrack</tag>
+	<p>Includes the libnetfilter_conntrack library, required for the new qos_flows option <em>mark</em>.
+
 </descrip>
 
 <sect1>Changes to existing options<label id="modifiedoptions">

=== modified file 'src/cf.data.pre'
--- src/cf.data.pre	2010-08-02 19:46:51 +0000
+++ src/cf.data.pre	2010-08-07 06:52:16 +0000
@@ -1527,23 +1527,28 @@
 
 NAME: qos_flows
 TYPE: QosConfig
-IFDEF: USE_ZPH_QOS
+IFDEF: USE_QOS_TOS
 DEFAULT: none
 LOC: Ip::Qos::TheConfig
 DOC_START
 	Allows you to select a TOS/DSCP value to mark outgoing
-	connections with, based on where the reply was sourced.
+	connections with, based on where the reply was sourced.	For
+	platforms using netfilter, allows you to set a netfilter mark
+	value instead of, or in addition to, a TOS value.
 
 	TOS values really only have local significance - so you should
 	know what you're specifying. For more information, see RFC2474,
 	RFC2475, and RFC3260.
 
 	The TOS/DSCP byte must be exactly that - octet value 0x00-0xFF.
-	Note that in practice often only values up to 0x3F are usable
-	as the two highest bits have been redefined for use by ECN
-	(RFC3168).
-
-	This setting is configured by setting the source TOS values:
+	Note that in practice often only values up to 0x3F are usable as
+	the two highest bits have been redefined for use by ECN (RFC3168).
+
+	Mark values can be any unsigned integer value (normally up to 0xFFFFFFFF)
+
+	This setting is configured by setting the following values:
+
+	tos|mark                Whether to set TOS or netfilter mark values
 
 	local-hit=0xFF		Value to mark local cache hits.
 
@@ -1551,23 +1556,31 @@
 
 	parent-hit=0xFF		Value to mark hits from parent peers.
 
-
-	NOTE: 'miss' preserve feature is only possible on Linux at this time.
-
-	For the following to work correctly, you will need to patch your
-	linux kernel with the TOS preserving ZPH patch.
-	The kernel patch can be downloaded from http://zph.bratcheda.org
+	The TOS varient of the following features are only possible on Linux
+	and require your kernel to be patched with the TOS preserving ZPH
+	patch, available from http://zph.bratcheda.org
+	No patch is needed to preserve the netfilter mark, which will work
+	with all varients of netfilter.
 
 	disable-preserve-miss
-		By default, the existing TOS value of the response coming
-		from the remote server will be retained and masked with
-		miss-mark. This option disables that feature.
+		This option disables the preservation of the TOS or netfilter
+		mark. By default, the existing TOS or netfilter mark value of
+		the response coming from the remote server will be retained
+		and masked with miss-mark.
+		NOTE: in the case of a netfilter mark, the mark must be set on
+		the connection (using the CONNMARK target) not on the packet
+		(MARK target).
 
 	miss-mask=0xFF
-		Allows you to mask certain bits in the TOS received from the
-		remote server, before copying the value to the TOS sent
-		towards clients.
-		Default: 0xFF (TOS from server is not changed).
+		Allows you to mask certain bits in the TOS or mark value
+		received from the remote server, before copying the value to
+		the TOS sent towards clients.
+		Default for tos: 0xFF (TOS from server is not changed).
+		Default for mark: 0xFFFFFFFF (mark from server is not changed).
+
+	All of these features require the --enable-zph-qos compilation flag.
+	Netfilter marking also requires the libnetfilter_conntrack libraries
+	(--with-netfilter-conntrack) and libcap 2.09+ (--with-libcap)
 
 DOC_END
 

=== modified file 'src/client_side_reply.cc'
--- src/client_side_reply.cc	2010-04-17 10:38:50 +0000
+++ src/client_side_reply.cc	2010-08-06 18:54:13 +0000
@@ -1667,12 +1667,17 @@
         /* guarantee nothing has been sent yet! */
         assert(http->out.size == 0);
         assert(http->out.offset == 0);
-#if USE_ZPH_QOS
-        if (Ip::Qos::TheConfig.tos_local_hit) {
-            debugs(33, 2, "ZPH Local hit, TOS=" << Ip::Qos::TheConfig.tos_local_hit);
+
+        if (Ip::Qos::TheConfig.tos_local_hit && Ip::Qos::TheConfig.isTosActive()) {
+            debugs(33, 2, "QOS Local hit, TOS=" << Ip::Qos::TheConfig.tos_local_hit);
             comm_set_tos(http->getConn()->fd, Ip::Qos::TheConfig.tos_local_hit);
         }
-#endif /* USE_ZPH_QOS */
+
+        if (Ip::Qos::TheConfig.mark_local_hit && Ip::Qos::TheConfig.isMarkActive()) {
+            debugs(33, 2, "QOS Local hit, Mark=" << Ip::Qos::TheConfig.mark_local_hit);
+            comm_set_nfmark(http->getConn()->fd, Ip::Qos::TheConfig.mark_local_hit);
+        }
+
         localTempBuffer.offset = reqofs;
         localTempBuffer.length = getNextNode()->readBuffer.length;
         localTempBuffer.data = getNextNode()->readBuffer.data;
@@ -1710,6 +1715,50 @@
     dlinkAdd(http, &http->active, &ClientActiveRequests);
 }
 
+/**
+ * Function to return the appropriate TOS value to set on packets when
+ * items have not been retrieved from local cache. Called by
+ * clientReplyContext::sendMoreData if QOS is enabled for TOS.
+ */
+int
+clientReplyContext::tosLocalMissValue(int fd)
+{
+    int tos = 0;
+    if (Ip::Qos::TheConfig.tos_sibling_hit && http->request->hier.code==SIBLING_HIT ) {
+        tos = Ip::Qos::TheConfig.tos_sibling_hit;
+        debugs(33, 2, "QOS: Sibling Peer hit with hier.code=" << http->request->hier.code << ", TOS=" << tos);
+    } else if (Ip::Qos::TheConfig.tos_parent_hit && http->request->hier.code==PARENT_HIT) {
+        tos = Ip::Qos::TheConfig.tos_parent_hit;
+        debugs(33, 2, "QOS: Parent Peer hit with hier.code=" << http->request->hier.code << ", TOS=" << tos);
+    } else if (Ip::Qos::TheConfig.preserve_miss_tos && Ip::Qos::TheConfig.preserve_miss_tos_mask) {
+        tos = fd_table[fd].upstreamTOS & Ip::Qos::TheConfig.preserve_miss_tos_mask;
+        debugs(33, 2, "QOS: Preserving TOS on miss, TOS="<<tos);
+    }
+    return tos;
+}
+
+/**
+ * Function to return the appropriate netfilter mark value to set on
+ * packets when items have not been retrieved from local cache. Called
+ * by clientReplyContext::sendMoreData if QOS is enabled for NF mark.
+ */
+int
+clientReplyContext::nfmarkLocalMissValue(int fd)
+{
+    unsigned int mark = 0;
+    if (Ip::Qos::TheConfig.mark_sibling_hit && http->request->hier.code==SIBLING_HIT ) {
+        mark = Ip::Qos::TheConfig.mark_sibling_hit;
+        debugs(33, 2, "QOS: Sibling Peer hit with hier.code=" << http->request->hier.code << ", Mark=" << mark);
+    } else if (Ip::Qos::TheConfig.mark_parent_hit && http->request->hier.code==PARENT_HIT) {
+        mark = Ip::Qos::TheConfig.mark_parent_hit;
+        debugs(33, 2, "QOS: Parent Peer hit with hier.code=" << http->request->hier.code << ", Mark=" << mark);
+    } else if (Ip::Qos::TheConfig.preserve_miss_mark) {
+        mark = fd_table[fd].upstreamMark & Ip::Qos::TheConfig.preserve_miss_mark_mask;
+        debugs(33, 2, "QOS: Preserving mark on miss, Mark="<<mark);
+    }
+    return mark;
+}
+
 bool
 clientReplyContext::errorInStream(StoreIOBuffer const &result, size_t const &sizeToProcess)const
 {
@@ -1951,23 +2000,17 @@
         body_buf = buf;
     }
 
-#if USE_ZPH_QOS
     if (reqofs==0 && !logTypeIsATcpHit(http->logType)) {
         assert(fd >= 0); // the beginning of this method implies fd may be -1
-        int tos = 0;
-        if (Ip::Qos::TheConfig.tos_sibling_hit && http->request->hier.code==SIBLING_HIT ) {
-            tos = Ip::Qos::TheConfig.tos_sibling_hit;
-            debugs(33, 2, "ZPH: Sibling Peer hit with hier.code=" << http->request->hier.code << ", TOS=" << tos);
-        } else if (Ip::Qos::TheConfig.tos_parent_hit && http->request->hier.code==PARENT_HIT) {
-            tos = Ip::Qos::TheConfig.tos_parent_hit;
-            debugs(33, 2, "ZPH: Parent Peer hit with hier.code=" << http->request->hier.code << ", TOS=" << tos);
-        } else if (Ip::Qos::TheConfig.preserve_miss_tos && Ip::Qos::TheConfig.preserve_miss_tos_mask) {
-            tos = fd_table[fd].upstreamTOS & Ip::Qos::TheConfig.preserve_miss_tos_mask;
-            debugs(33, 2, "ZPH: Preserving TOS on miss, TOS="<<tos);
-        }
-        comm_set_tos(fd,tos);
+        if (Ip::Qos::TheConfig.isTosActive()) {
+            int tos = tosLocalMissValue(fd);
+            comm_set_tos(fd,tos);
+        }
+        if (Ip::Qos::TheConfig.isMarkActive()) {
+            unsigned int mark = nfmarkLocalMissValue(fd);
+            comm_set_nfmark(fd,mark);
+        }
     }
-#endif
 
     /* We've got the final data to start pushing... */
     flags.storelogiccomplete = 1;

=== modified file 'src/client_side_reply.h'
--- src/client_side_reply.h	2010-05-13 06:20:23 +0000
+++ src/client_side_reply.h	2010-08-05 13:39:16 +0000
@@ -115,6 +115,8 @@
     clientStreamNode *getNextNode() const;
     void makeThisHead();
     bool errorInStream(StoreIOBuffer const &result, size_t const &sizeToProcess)const ;
+    int tosLocalMissValue(int fd);
+    int nfmarkLocalMissValue(int fd);
     void sendStreamError(StoreIOBuffer const &result);
     void pushStreamData(StoreIOBuffer const &result, char *source);
     clientStreamNode * next() const;

=== modified file 'src/comm.cc'
--- src/comm.cc	2010-07-28 12:39:35 +0000
+++ src/comm.cc	2010-08-05 17:53:55 +0000
@@ -620,33 +620,58 @@
     return anErrno == ENFILE || anErrno == EMFILE;
 }
 
+/**
+ * Function to set the TOS value of packets. Sets the value on the socket
+ * which then gets copied to the packets. Called from client_side_reply.cc
+ */
 int
 comm_set_tos(int fd, int tos)
 {
 #ifdef IP_TOS
     int x = setsockopt(fd, IPPROTO_IP, IP_TOS, (char *) &tos, sizeof(int));
     if (x < 0)
-        debugs(50, 1, "comm_set_tos: setsockopt(IP_TOS) on FD " << fd << ": " << xstrerror());
+        debugs(50, 2, "comm_set_tos: setsockopt(IP_TOS) on FD " << fd << ": " << xstrerror());
     return x;
 #else
-    debugs(50, 0, "WARNING: setsockopt(IP_TOS) not supported on this platform");
+    debugs(50, DBG_IMPORTANT, "WARNING: setsockopt(IP_TOS) not supported on this platform");
     return -1;
 #endif
 }
 
+/**
+ * IPV6 of the above function
+ */
 void
 comm_set_v6only(int fd, int tos)
 {
 #ifdef IPV6_V6ONLY
     if (setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, (char *) &tos, sizeof(int)) < 0) {
-        debugs(50, 1, "comm_open: setsockopt(IPV6_V6ONLY) " << (tos?"ON":"OFF") << " for FD " << fd << ": " << xstrerror());
+        debugs(50, 2, "comm_open: setsockopt(IPV6_V6ONLY) " << (tos?"ON":"OFF") << " for FD " << fd << ": " << xstrerror());
     }
 #else
-    debugs(50, 0, "WARNING: comm_open: setsockopt(IPV6_V6ONLY) not supported on this platform");
+    debugs(50, DBG_IMPORTANT, "WARNING: comm_open: setsockopt(IPV6_V6ONLY) not supported on this platform");
 #endif /* sockopt */
 }
 
 /**
+ * Function to set the netfilter mark value of packets. Sets the value on the
+ * socket which then gets copied to the packets. Called from client_side_reply.cc
+ */
+int
+comm_set_nfmark(int fd, unsigned int mark)
+{
+#ifdef SO_MARK
+    int x = setsockopt(fd, SOL_SOCKET, SO_MARK, &mark, sizeof(unsigned int));
+    if (x < 0)
+        debugs(50, 2, "comm_set_nfmark: setsockopt(SO_MARK) on FD " << fd << ": " << xstrerror());
+    return x;
+#else
+    debugs(50, DBG_IMPORTANT, "WARNING: setsockopt(SO_MARK) not supported on this platform");
+    return -1;
+#endif
+}
+
+/**
  * Set the socket IP_TRANSPARENT option for Linux TPROXY v4 support.
  */
 void

=== modified file 'src/comm.h'
--- src/comm.h	2010-07-06 23:09:44 +0000
+++ src/comm.h	2010-08-05 13:38:08 +0000
@@ -77,6 +77,7 @@
 SQUIDCEXTERN int comm_openex(int, int, Ip::Address &, int, unsigned char TOS, const char *);
 SQUIDCEXTERN u_short comm_local_port(int fd);
 SQUIDCEXTERN int comm_set_tos(int fd, int tos);
+SQUIDCEXTERN int comm_set_nfmark(int fd, unsigned int mark);
 
 SQUIDCEXTERN void commSetSelect(int, unsigned int, PF *, void *, time_t);
 SQUIDCEXTERN void commResetSelect(int);

=== modified file 'src/fde.h'
--- src/fde.h	2010-07-06 23:09:44 +0000
+++ src/fde.h	2010-08-07 07:23:43 +0000
@@ -109,9 +109,10 @@
         long handle;
     } win32;
 #endif
-#if USE_ZPH_QOS
-    unsigned char upstreamTOS;			/* see FwdState::dispatch()  */
-#endif
+    unsigned char upstreamTOS;                  /* Stores the TOS flags of the packets
+                                                        from the remote server. See FwdState::dispatch()  */
+    unsigned int upstreamMark;                  /* Stores the Netfilter mark value of the connection
+                                                        to the remote server. See FwdState::dispatch()  */
 
 private:
     /** Clear the fde class back to NULL equivalent. */

=== modified file 'src/forward.cc'
--- src/forward.cc	2010-08-01 04:33:31 +0000
+++ src/forward.cc	2010-08-07 07:04:45 +0000
@@ -42,6 +42,7 @@
 #include "hier_code.h"
 #include "HttpReply.h"
 #include "HttpRequest.h"
+#include "ip/QosConfig.h"
 #include "MemObject.h"
 #include "pconn.h"
 #include "SquidTime.h"
@@ -965,6 +966,26 @@
     self = NULL;	// refcounted
 }
 
+#if USE_QOS_NFMARK
+/**
+ * Callback function to mark connection once it's been found.
+ * This function is called by the libnetfilter_conntrack
+ * libraries, during nfct_query in FwdState::getUpstreamNfMark.
+ * nfct_callback_register is used to register this function.
+*/
+int
+FwdState::GetNfMarkCallback(enum nf_conntrack_msg_type type,
+              struct nf_conntrack *ct,
+              void *data)
+{
+        fde *clientFde = (fde *)data;
+        clientFde->upstreamMark = nfct_get_attr_u32(ct, ATTR_MARK);
+        debugs(17, 3, "QOS: Retrieved connection mark value: " << clientFde->upstreamMark);
+
+        return NFCT_CB_CONTINUE;
+}
+#endif
+
 void
 FwdState::dispatch()
 {
@@ -990,44 +1011,26 @@
 
     netdbPingSite(request->GetHost());
 
-#if USE_ZPH_QOS && defined(_SQUID_LINUX_)
-    /* Bug 2537: This part of ZPH only applies to patched Linux kernels. */
-
-    /* Retrieves remote server TOS value, and stores it as part of the
+    /* Retrieves remote server TOS or MARK value, and stores it as part of the
      * original client request FD object. It is later used to forward
-     * remote server's TOS in the response to the client in case of a MISS.
+     * remote server's TOS/MARK in the response to the client in case of a MISS.
      */
-    fde * clientFde = &fd_table[client_fd];
-    if (clientFde) {
-        int tos = 1;
-        int tos_len = sizeof(tos);
-        clientFde->upstreamTOS = 0;
-        if (setsockopt(server_fd,SOL_IP,IP_RECVTOS,&tos,tos_len)==0) {
-            unsigned char buf[512];
-            int len = 512;
-            if (getsockopt(server_fd,SOL_IP,IP_PKTOPTIONS,buf,(socklen_t*)&len) == 0) {
-                /* Parse the PKTOPTIONS structure to locate the TOS data message
-                 * prepared in the kernel by the ZPH incoming TCP TOS preserving
-                 * patch.
-                 */
-                unsigned char * pbuf = buf;
-                while (pbuf-buf < len) {
-                    struct cmsghdr *o = (struct cmsghdr*)pbuf;
-                    if (o->cmsg_len<=0)
-                        break;
+    if (Ip::Qos::TheConfig.isMarkActive()) {
+        fde * clientFde = &fd_table[client_fd];
+        fde * servFde = &fd_table[server_fd];
+        if (clientFde && servFde) {
+            /* Get the netfilter mark for the connection */
+            FwdState::getUpstreamNfMark(clientFde, servFde);
+        }
+    }
 
-                    if (o->cmsg_level == SOL_IP && o->cmsg_type == IP_TOS) {
-                        int *tmp = (int*)CMSG_DATA(o);
-                        clientFde->upstreamTOS = (unsigned char)*tmp;
-                        break;
-                    }
-                    pbuf += CMSG_LEN(o->cmsg_len);
-                }
-            } else {
-                debugs(33, 1, "ZPH: error in getsockopt(IP_PKTOPTIONS) on FD "<<server_fd<<" "<<xstrerror());
-            }
-        } else {
-            debugs(33, 1, "ZPH: error in setsockopt(IP_RECVTOS) on FD "<<server_fd<<" "<<xstrerror());
+#if defined(_SQUID_LINUX_)
+    /* Bug 2537: The TOS forward part of QOS only applies to patched Linux kernels. */
+    if (Ip::Qos::TheConfig.isTosActive()) {
+        fde * clientFde = &fd_table[client_fd];
+        if (clientFde) {
+            /* Get the TOS value for the packet */
+            FwdState::getUpstreamTOS(clientFde);
         }
     }
 #endif
@@ -1318,6 +1321,116 @@
     hierarchyNote(&request->hier, fs->code, nextHop);
 }
 
+/**
+ * Function to retrieve the TOS value of the inbound packet.
+ * Called by FwdState::dispatch if QOS options are enabled.
+ */
+void
+FwdState::getUpstreamTOS(fde *clientFde)
+{
+#if USE_QOS_TOS
+    int tos = 1;
+    int tos_len = sizeof(tos);
+    clientFde->upstreamTOS = 0;
+    if (setsockopt(server_fd,SOL_IP,IP_RECVTOS,&tos,tos_len)==0) {
+        unsigned char buf[512];
+        int len = 512;
+        if (getsockopt(server_fd,SOL_IP,IP_PKTOPTIONS,buf,(socklen_t*)&len) == 0) {
+            /* Parse the PKTOPTIONS structure to locate the TOS data message
+             * prepared in the kernel by the ZPH incoming TCP TOS preserving
+             * patch.
+             */
+            unsigned char * pbuf = buf;
+            while (pbuf-buf < len) {
+                struct cmsghdr *o = (struct cmsghdr*)pbuf;
+                if (o->cmsg_len<=0)
+                    break;
+
+                if (o->cmsg_level == SOL_IP && o->cmsg_type == IP_TOS) {
+                    int *tmp = (int*)CMSG_DATA(o);
+                    clientFde->upstreamTOS = (unsigned char)*tmp;
+                    break;
+                }
+                pbuf += CMSG_LEN(o->cmsg_len);
+            }
+        } else {
+            debugs(33, 1, "QOS: error in getsockopt(IP_PKTOPTIONS) on FD "<<server_fd<<" "<<xstrerror());
+        }
+    } else {
+        debugs(33, 1, "QOS: error in setsockopt(IP_RECVTOS) on FD "<<server_fd<<" "<<xstrerror());
+    }
+#endif
+}
+
+/**
+ * Function to retrieve the netfilter mark value of the connection
+ * to the upstream server. Called by FwdState::dispatch if QOS
+ * options are enabled.
+ */
+void FwdState::getUpstreamNfMark(fde *clientFde, fde *servFde)
+{
+#if USE_QOS_NFMARK
+    /* Allocate a new conntrack */
+    if (struct nf_conntrack *ct = nfct_new()) {
+
+        /* Prepare data needed to find the connection in the conntrack table.
+         * We need the local and remote IP address, and the local and remote
+         * port numbers.
+         */
+        
+        Ip::Address serv_fde_local_conn;
+        struct addrinfo *addr = NULL;
+        serv_fde_local_conn.InitAddrInfo(addr);
+        getsockname(server_fd, addr->ai_addr, &(addr->ai_addrlen));
+        serv_fde_local_conn = *addr;
+        serv_fde_local_conn.GetAddrInfo(addr);
+
+        unsigned short serv_fde_local_port = ((struct sockaddr_in*)addr->ai_addr)->sin_port;
+        struct in6_addr serv_fde_local_ip6;
+        struct in_addr serv_fde_local_ip;
+
+        if (Ip::EnableIpv6 && serv_fde_local_conn.IsIPv6()) {
+            serv_fde_local_ip6 = ((struct sockaddr_in6*)addr->ai_addr)->sin6_addr;
+            nfct_set_attr_u8(ct, ATTR_L3PROTO, AF_INET6);
+            struct in6_addr serv_fde_remote_ip6;
+            inet_pton(AF_INET6,servFde->ipaddr,(struct in6_addr*)&serv_fde_remote_ip6);
+            nfct_set_attr(ct, ATTR_IPV6_DST, serv_fde_remote_ip6.s6_addr);
+            nfct_set_attr(ct, ATTR_IPV6_SRC, serv_fde_local_ip6.s6_addr);
+        } else {
+            serv_fde_local_ip = ((struct sockaddr_in*)addr->ai_addr)->sin_addr;
+            nfct_set_attr_u8(ct, ATTR_L3PROTO, AF_INET);
+            nfct_set_attr_u32(ct, ATTR_IPV4_DST, inet_addr(servFde->ipaddr));
+            nfct_set_attr_u32(ct, ATTR_IPV4_SRC, serv_fde_local_ip.s_addr);
+        }
+
+        nfct_set_attr_u8(ct, ATTR_L4PROTO, IPPROTO_TCP);   
+        nfct_set_attr_u16(ct, ATTR_PORT_DST, htons(servFde->remote_port));   
+        nfct_set_attr_u16(ct, ATTR_PORT_SRC, serv_fde_local_port);
+
+        /* Open a handle to the conntrack */
+        if (struct nfct_handle *h = nfct_open(CONNTRACK, 0)) {
+            /* Register the callback. The callback function will record the mark value. */
+            nfct_callback_register(h, NFCT_T_ALL, GetNfMarkCallback, (void *)clientFde);
+            /* Query the conntrack table using the data previously set */
+            int x = nfct_query(h, NFCT_Q_GET, ct);
+            if (x == -1) {
+                debugs(17, 2, "Failed to retrieve connection mark: (" << x << ") " << strerror(errno)
+                  << " (Destination " << servFde->ipaddr << ":" << servFde->remote_port
+                  << ", source " << serv_fde_local_conn << ")" );
+            }
+            
+            nfct_close(h);
+        } else {
+            debugs(17, 2, "Failed to open handle on conntrack");
+        }
+        serv_fde_local_conn.FreeAddrInfo(addr);
+        nfct_destroy(ct);
+
+    } else {
+        debugs(17, 2, "Failed to allocate new conntrack");
+    }
+#endif
+}
 
 /**** PRIVATE NON-MEMBER FUNCTIONS ********************************************/
 

=== modified file 'src/forward.h'
--- src/forward.h	2010-05-02 19:32:42 +0000
+++ src/forward.h	2010-08-07 07:21:17 +0000
@@ -9,6 +9,10 @@
 #include "comm.h"
 #include "hier_code.h"
 #include "ip/Address.h"
+#if HAVE_LIBNETFILTER_CONNTRACK_LIBNETFILTER_CONNTRACK_H
+#include <libnetfilter_conntrack/libnetfilter_conntrack.h>
+#include <libnetfilter_conntrack/libnetfilter_conntrack_tcp.h>
+#endif
 
 class FwdServer
 {
@@ -28,6 +32,9 @@
     static void fwdStart(int fd, StoreEntry *, HttpRequest *);
     void startComplete(FwdServer *);
     void startFail();
+#if USE_QOS_NFMARK
+    static int GetNfMarkCallback(enum nf_conntrack_msg_type type, struct nf_conntrack *ct, void *clientFde);
+#endif
     void fail(ErrorState *err);
     void unregister(int fd);
     void complete();
@@ -45,6 +52,7 @@
     void dispatch();
     void pconnPush(int fd, const peer *_peer, const HttpRequest *req, const char *domain, Ip::Address &client_addr);
 
+
     bool dontRetry() { return flags.dont_retry; }
 
     void dontRetry(bool val) { flags.dont_retry = val; }
@@ -65,6 +73,8 @@
     void completed();
     void retryOrBail();
     static void RegisterWithCacheManager(void);
+    void getUpstreamTOS(fde *clientFde);
+    void getUpstreamNfMark(fde *clientFde, fde *servFde);
 
 #if WIP_FWD_LOG
 

=== modified file 'src/ip/QosConfig.cc'
--- src/ip/QosConfig.cc	2010-04-17 02:29:04 +0000
+++ src/ip/QosConfig.cc	2010-08-07 07:00:32 +0000
@@ -1,7 +1,4 @@
 #include "squid.h"
-
-#if USE_ZPH_QOS
-
 #include "QosConfig.h"
 
 Ip::Qos::QosConfig Ip::Qos::TheConfig;
@@ -10,8 +7,13 @@
         tos_local_hit(0),
         tos_sibling_hit(0),
         tos_parent_hit(0),
-        preserve_miss_tos(1),
-        preserve_miss_tos_mask(255)
+        preserve_miss_tos(0),
+        preserve_miss_tos_mask(255),
+        mark_local_hit(0),
+        mark_sibling_hit(0),
+        mark_parent_hit(0),
+        preserve_miss_mark(0),
+        preserve_miss_mark_mask(0xFFFFFFFF)
 {
     ;
 }
@@ -19,61 +21,163 @@
 void
 Ip::Qos::QosConfig::parseConfigLine()
 {
-    // %i honors 0 and 0x prefixes, which are important for things like umask
+    // strtoul honors 0 and 0x prefixes, which are important for things like umask
     /* parse options ... */
     char *token;
+    /* These are set as appropriate and then used to check whether the initial loop has been done */
+    bool mark = false;
+    bool tos = false;
+    /* Assume preserve is true. We don't set at initialisation as this affects isTosActive().
+       We have to do this now, as we may never match the 'tos' parameter below */
+#if !USE_QOS_TOS
+    debugs(3, DBG_CRITICAL, "ERROR: Invalid option 'qos_flows'. QOS features not enabled in this build");
+    self_destruct();
+#endif
+
     while ( (token = strtok(NULL, w_space)) ) {
 
+        // Work out TOS or mark. Default to TOS for backwards compatibility
+        if (!(mark || tos)) {
+            if (strncmp(token, "mark",4) == 0) {
+#if USE_QOS_NFMARK
+                mark = true;
+                // Assume preserve is true. We don't set at initialisation as this affects isMarkActive()
+                preserve_miss_mark = 1;
+#else
+                debugs(3, DBG_CRITICAL, "ERROR: Invalid parameter 'mark' in qos_flows option. "
+                                            << "Netfilter marking not enabled in this build");
+                self_destruct();
+#endif
+            } else if (strncmp(token, "tos",3) == 0) {
+                preserve_miss_tos = 1;
+                tos = true;
+            } else {
+                preserve_miss_tos = 1;
+                tos = true;
+            }
+        }
+
         if (strncmp(token, "local-hit=",10) == 0) {
-            sscanf(&token[10], "%i", &tos_local_hit);
+            if (mark) {
+                mark_local_hit = (unsigned int)strtoul(&token[10], NULL, 0);
+            } else {
+                tos_local_hit = (int)strtoul(&token[10], NULL, 0);
+            }
         } else if (strncmp(token, "sibling-hit=",12) == 0) {
-            sscanf(&token[12], "%i", &tos_sibling_hit);
+            if (mark) {
+                mark_sibling_hit = (unsigned int)strtoul(&token[12], NULL, 0);
+            } else {
+                tos_sibling_hit = (int)strtoul(&token[12], NULL, 0);
+            }
         } else if (strncmp(token, "parent-hit=",11) == 0) {
-            sscanf(&token[11], "%i", &tos_parent_hit);
+            if (mark) {
+                mark_parent_hit = (unsigned int)strtoul(&token[11], NULL, 0);
+            } else {
+                tos_parent_hit = (int)strtoul(&token[11], NULL, 0);
+            }
         } else if (strcmp(token, "disable-preserve-miss") == 0) {
-            preserve_miss_tos = 0;
-            preserve_miss_tos_mask = 0;
-        } else if (preserve_miss_tos && strncmp(token, "miss-mask=",10) == 0) {
-            sscanf(&token[10], "%i", &preserve_miss_tos_mask);
+            if (preserve_miss_tos_mask!=0xFF || preserve_miss_mark_mask!=0xFFFFFFFF) {
+                debugs(3, DBG_CRITICAL, "ERROR: miss-mask feature cannot be set with disable-preserve-miss");
+            }
+            if (mark) {
+                preserve_miss_mark = 0;
+                preserve_miss_mark_mask = 0;
+            } else {
+                preserve_miss_tos = 0;
+                preserve_miss_tos_mask = 0;
+            }
+        } else if (strncmp(token, "miss-mask=",10) == 0) {
+            if (mark && preserve_miss_mark) {
+                preserve_miss_mark_mask = (unsigned int)strtoul(&token[10], NULL, 0);
+            } else if (preserve_miss_tos) {
+                preserve_miss_tos_mask = (int)strtoul(&token[10], NULL, 0);
+            } else {
+                debugs(3, DBG_CRITICAL, "ERROR: miss-mask feature cannot be set with disable-preserve-miss");
+            }
         }
     }
 }
 
+
 /**
  * NOTE: Due to the low-level nature of the library these
  * objects are part of the dump function must be self-contained.
- * which means no StoreEntry refrences. Just a basic char* buffer.
+ * which means no StoreEntry references. Just a basic char* buffer.
  */
 void
-Ip::Qos::QosConfig::dumpConfigLine(char *entry, const char *name) const
+Ip::Qos::QosConfig::dumpConfigLine(char *entry, const char *name)
 {
     char *p = entry;
-    snprintf(p, 10, "%s", name); // strlen("qos_flows ");
-    p += strlen(name);
-
-    if (tos_local_hit >0) {
-        snprintf(p, 15, " local-hit=%2x", tos_local_hit);
-        p += 15;
-    }
-
-    if (tos_sibling_hit >0) {
-        snprintf(p, 17, " sibling-hit=%2x", tos_sibling_hit);
-        p += 17;
-    }
-    if (tos_parent_hit >0) {
-        snprintf(p, 16, " parent-hit=%2x", tos_parent_hit);
-        p += 16;
-    }
-    if (preserve_miss_tos != 0) {
-        snprintf(p, 22, " disable-preserve-miss");
-        p += 22;
-    }
-    if (preserve_miss_tos && preserve_miss_tos_mask != 0) {
-        snprintf(p, 15, " miss-mask=%2x", preserve_miss_tos_mask);
-        p += 15;
-    }
-    snprintf(p, 1, "\n");
-//    p += 1;
-}
-
-#endif /* USE_ZPH_QOS */
+    if (isTosActive()) {
+
+        p += snprintf(p, 11, "%s", name); // strlen("qos_flows ");
+        p += snprintf(p, 4, "%s", "tos");
+
+        if (tos_local_hit >0) {
+            p += snprintf(p, 16, " local-hit=0x%02X", tos_local_hit);
+        }
+        if (tos_sibling_hit >0) {
+            p += snprintf(p, 18, " sibling-hit=0x%02X", tos_sibling_hit);
+        }
+        if (tos_parent_hit >0) {
+            p += snprintf(p, 17, " parent-hit=0x%02X", tos_parent_hit);
+        }
+        if (preserve_miss_tos == 0) {
+            p += snprintf(p, 23, " disable-preserve-miss");
+        }
+        if (preserve_miss_tos && preserve_miss_tos_mask != 0) {
+            p += snprintf(p, 16, " miss-mask=0x%02X", preserve_miss_tos_mask);
+        }
+        p += snprintf(p, 2, "\n");
+    }
+    
+    if (isMarkActive()) {
+        p += snprintf(p, 11, "%s", name); // strlen("qos_flows ");
+        p += snprintf(p, 5, "%s", "mark");
+
+        if (mark_local_hit >0) {
+            p += snprintf(p, 22, " local-hit=0x%02X", mark_local_hit);
+        }
+        if (mark_sibling_hit >0) {
+            p += snprintf(p, 24, " sibling-hit=0x%02X", mark_sibling_hit);
+        }
+        if (mark_parent_hit >0) {
+            p += snprintf(p, 23, " parent-hit=0x%02X", mark_parent_hit);
+        }
+        if (preserve_miss_mark == 0) {
+            p += snprintf(p, 23, " disable-preserve-miss");
+        }
+        if (preserve_miss_mark && preserve_miss_mark_mask != 0) {
+            p += snprintf(p, 22, " miss-mask=0x%02X", preserve_miss_mark_mask);
+        }
+        p += snprintf(p, 2, "\n");
+    }
+}
+
+/**
+ * Returns true or false depending on whether we should carry out the
+ * QOS functions for the TOS flags
+ */
+bool
+Ip::Qos::QosConfig::isTosActive()
+{
+    if (tos_local_hit || tos_sibling_hit || tos_parent_hit || preserve_miss_tos) {
+        return true;
+    } else {
+        return false;
+    }
+}
+
+/**
+ * Returns true or false depending on whether we should carry out the
+ * QOS functions for netfilter marks
+ */
+bool
+Ip::Qos::QosConfig::isMarkActive()
+{
+    if (mark_local_hit || mark_sibling_hit || mark_parent_hit || preserve_miss_mark) {
+        return true;
+    } else {
+        return false;
+    }
+}

=== modified file 'src/ip/QosConfig.h'
--- src/ip/QosConfig.h	2010-04-18 00:13:00 +0000
+++ src/ip/QosConfig.h	2010-08-06 17:13:48 +0000
@@ -3,8 +3,6 @@
 
 #include "config.h"
 
-#if USE_ZPH_QOS
-
 namespace Ip
 {
 
@@ -19,13 +17,20 @@
     int tos_parent_hit;
     int preserve_miss_tos;
     int preserve_miss_tos_mask;
+    unsigned int mark_local_hit;
+    unsigned int mark_sibling_hit;
+    unsigned int mark_parent_hit;
+    int preserve_miss_mark;
+    unsigned int preserve_miss_mark_mask;
 
 public:
     QosConfig();
     ~QosConfig() {};
 
     void parseConfigLine();
-    void dumpConfigLine(char *entry, const char *name) const;
+    bool isTosActive();
+    bool isMarkActive();
+    void dumpConfigLine(char *entry, const char *name);
 };
 
 extern QosConfig TheConfig;
@@ -42,5 +47,4 @@
 }; // namespace Qos
 }; // namespace Ip
 
-#endif /* USE_ZPH_QOS */
 #endif /* SQUID_QOSCONFIG_H */

=== modified file 'src/ip/stubQosConfig.cc'
--- src/ip/stubQosConfig.cc	2010-04-25 07:07:14 +0000
+++ src/ip/stubQosConfig.cc	2010-08-05 15:02:00 +0000
@@ -1,6 +1,6 @@
 #include "squid.h"
 
-#if USE_ZPH_QOS
+#if USE_QOS_TOS
 
 #include "ip/QosConfig.h"
 #include "Store.h"
@@ -44,4 +44,4 @@
     ; /* Not needed in stub */
 }
 
-#endif /* USE_ZPH_QOS */
+#endif /* USE_QOS_TOS */

=== modified file 'src/tools.cc'
--- src/tools.cc	2010-07-25 08:10:12 +0000
+++ src/tools.cc	2010-08-06 18:38:15 +0000
@@ -43,6 +43,7 @@
 #include "ProtoPort.h"
 #include "SquidMath.h"
 #include "SquidTime.h"
+#include "ip/QosConfig.h"
 #include "ipc/Kids.h"
 #include "ipc/Coordinator.h"
 #include "SwapDir.h"
@@ -1308,7 +1309,7 @@
         cap_value_t cap_list[10];
         cap_list[ncaps++] = CAP_NET_BIND_SERVICE;
 
-        if (Ip::Interceptor.TransparentActive()) {
+        if (Ip::Interceptor.TransparentActive() || Ip::Qos::TheConfig.isMarkActive()) {
             cap_list[ncaps++] = CAP_NET_ADMIN;
         }
 

Reply via email to