Re: [PATCH 1/1 linux-next] netfilter: conntrack: fix kmemleak false positive

2016-09-22 Thread Florian Westphal
Fabian Frederick  wrote:
> Hello Florian,
> 
>         First problem is solved: table gets cleared 3 minutes earlier
> but I still have kmemleak before running the following:
> 
> echo scan > /sys/kernel/debug/kmemleak
> cat /sys/kernel/debug/kmemleak
> Nothing
> echo scan > /sys/kernel/debug/kmemleak
> cat /sys/kernel/debug/kmemleak
> -> rsyslogd
> 
> I talked about false positive because everything is cleared later.

Hmm, I fear this is a real bug and not false positive.

Should be possible to confirm this via slabinfo:

grep nf_conntrack /proc/slabinfo

The active objects should match the conntrack count.
(conntrack -C, or wc -l < /proc/).

> > > unreferenced object 0x88003b0e6600 (size 248):
> > >   comm "rsyslogd", pid 1595, jiffies 4294741312 (age 7.343s)
> > >   ...
> > >   backtrace:
> > >     [] kmemleak_alloc+0x23/0x40
> > >     [] kmem_cache_alloc+0xd9/0x180
> > >     [] __nf_conntrack_alloc.isra.50+0x48/0x170
> > >     [] nf_conntrack_in+0x3a2/0x5f0
> > >     [] ipv4_conntrack_local+0x40/0x50
> > >     [] nf_iterate+0x5d/0x70
> > >     [] nf_hook_slow+0x5f/0xb0
> > >     [] __ip_local_out+0xad/0xe0
> > >     [] ip_local_out+0x17/0x40
> > >     [] ip_send_skb+0x14/0x40
> > >     [] udp_send_skb+0x91/0x260
> > >     [] udp_sendmsg+0x2f5/0x950
> > >     [] inet_sendmsg+0x60/0x90
> > >     [] sock_sendmsg+0x33/0x40
> > >     [] SYSC_sendto+0xee/0x160
> > >     [] SyS_sendto+0x9/0x10

Hmm, so we leak when allocating conntrack for outgoing packet.
Do you do any filtering (DROP) in output/postrouting?

> > > (248 bytes being an nf_conn structure)
> > >
> > > Those structures being cleared in gc_worker() later on we can't talk
> > > about unreferenced object so this patch uses kmemleak_not_leak() to
> > > prevent those warnings.
> >
> > If thats the case, why is kmemleak complaining? Are you sure this
> > is a false positive?

Looks like a real bug to me, but I don't see anything obvious so far.
I'll look at this again tomorrow.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] netfilter: Create revision 2 of xt_hashlimit to support higher pps rates

2016-09-22 Thread Vishwanath Pai
Thanks for pointing this out, I will reorder the fields to:

struct hashlimit_cfg2 {
__u64 avg;/* Average secs between packets * scale */
__u64 burst;
__u32 mode; /* bitmask of XT_HASHLIMIT_HASH_* */

This should fix the hole and avoid padding.

-Vishwanath

On 09/22/2016 12:53 PM, Jan Engelhardt wrote:
> On Thursday 2016-09-22 18:43, Vishwanath Pai wrote:
>> >+struct hashlimit_cfg2 {
>> >+   __u32 mode;   /* bitmask of XT_HASHLIMIT_HASH_* */
>> >+   __u64 avg;/* Average secs between packets * scale */
>> >+   __u64 burst;  /* Period multiplier for upper limit. */
> This would have different sizes between i386 and x86_64,
> necessiting additional compat functions. It should be padded
> or reordered instead.
> 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf] netfilter: nf_tables: Ensure u8 attributes are loaded from u32 within the bounds

2016-09-22 Thread Laura Garcia
On Thu, Sep 22, 2016 at 09:16:07AM -0700, Eric Dumazet wrote:
> On Thu, 2016-09-22 at 16:58 +0200, Pablo Neira Ayuso wrote:
> > attributes")
> > 
> > Always use 12 bytes commit-ids. 4da449a is too short, given the number
> > of changes we're getting in the kernel tree, this may become ambiguous
> > at some point so it won't be unique.
> > 
> > You can achieve this via: git log --oneline --abbrev=12
> 
> and Documentation/SubmittingPatches has these tips :
> 
> 
> You should also be sure to use at least the first twelve characters of the
> SHA-1 ID.  The kernel repository holds a *lot* of objects, making
> collisions with shorter IDs a real possibility.  Bear in mind that, even if
> there is no collision with your six-character ID now, that condition may
> change five years from now.
> 
> If your patch fixes a bug in a specific commit, e.g. you found an issue using
> git-bisect, please use the 'Fixes:' tag with the first 12 characters of the
> SHA-1 ID, and the one line summary.  For example:
> 
> Fixes: e21d2170f366 ("video: remove unnecessary 
> platform_set_drvdata()")
> 
> The following git-config settings can be used to add a pretty format for
> outputting the above style in the git log or git show commands
> 
> [core]
> abbrev = 12
> [pretty]
> fixes = Fixes: %h (\"%s\")
> 
> 

Duly noted, thanks.

In this case, it was not referred to a fix but an extension of the
older patch to other files with the same problem.


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf] netfilter: nf_tables: Ensure u8 attributes are loaded from u32 within the bounds

2016-09-22 Thread Laura Garcia
On Thu, Sep 22, 2016 at 04:58:36PM +0200, Pablo Neira Ayuso wrote:
> On Wed, Sep 14, 2016 at 03:00:02PM +0200, Laura Garcia Liebana wrote:
> > Check storage of u32 netlink attributes in smaller resources. This
> > validation is usually required when the u32 netlink attributes are being
> > stored in a private structure size of u8 in the kernel.
> 
> Applied with changes, no need to resend. If I break anything, just
> follow up on top.
> 
> I have rewritten this description and the documentation on the code a bit.
> 
> More changes:
> 
> > 4da449a ("netfilter: nft_exthdr: Add size check on u8 nft_exthdr
> > attributes")
> 
> Always use 12 bytes commit-ids. 4da449a is too short, given the number
> of changes we're getting in the kernel tree, this may become ambiguous
> at some point so it won't be unique.
> 
> You can achieve this via: git log --oneline --abbrev=12
> 
> More comments below.
> 
> > Suggested-by: Pablo Neira Ayuso 
> > Signed-off-by: Laura Garcia Liebana 
> > ---
> >  include/net/netfilter/nf_tables.h |  2 ++
> >  net/netfilter/nf_tables_api.c | 26 ++
> >  net/netfilter/nft_bitwise.c   | 10 +-
> >  net/netfilter/nft_byteorder.c | 17 +++--
> >  net/netfilter/nft_cmp.c   |  6 +-
> >  net/netfilter/nft_exthdr.c| 19 ---
> >  net/netfilter/nft_immediate.c |  4 
> >  7 files changed, 73 insertions(+), 11 deletions(-)
> > 
> > diff --git a/include/net/netfilter/nf_tables.h 
> > b/include/net/netfilter/nf_tables.h
> > index f2f1339..608130f 100644
> > --- a/include/net/netfilter/nf_tables.h
> > +++ b/include/net/netfilter/nf_tables.h
> > @@ -127,6 +127,8 @@ static inline enum nft_registers nft_type_to_reg(enum 
> > nft_data_types type)
> > return type == NFT_DATA_VERDICT ? NFT_REG_VERDICT : NFT_REG_1 * 
> > NFT_REG_SIZE / NFT_REG32_SIZE;
> >  }
> >  
> > +unsigned int nft_parse_u32_check(const struct nlattr *attr, int maxlen,
> > +u32 *dest);
> 
> I have renamed maxlen to max, so this fits in one line.
> 
> >  unsigned int nft_parse_register(const struct nlattr *attr);
> >  int nft_dump_register(struct sk_buff *skb, unsigned int attr, unsigned int 
> > reg);
> >  
> > diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> > index 7e1c876..d7b000f 100644
> > --- a/net/netfilter/nf_tables_api.c
> > +++ b/net/netfilter/nf_tables_api.c
> > @@ -4343,6 +4343,32 @@ static int nf_tables_check_loops(const struct 
> > nft_ctx *ctx,
> >  }
> >  
> >  /**
> > + * nft_parse_u32_check - parse a u32 value to store the value into a
> > + *   smaller resource.
> > + *
> > + * @attr: netlink attribute
> > + * @maxlen: maximum value to be stored in dest
> > + * @dest: pointer to the resource
> > + *
> > + * Parse and store a given u32 value into a resource. Returns an error
> > + * ERANGE if the value will overload the maxlen, otherwise a 0 will be
> > + * returned and the value is stored into dest.
> 
> I've rewritten this a bit.
> 
> > + */
> > +unsigned int nft_parse_u32_check(const struct nlattr *attr, int maxlen,
> 
> I have rename maxlen to max. Probably maxval would have been better,
> anyway.
> 
> > +u32 *dest)
> > +{
> > +   int reg;
> 
> Variable names are important, they provide context when reading the
> code. Look, from the 'reg' name it seems we're fetching a register.
> However, we are obtaining a value, so I renamed this to val. Try to
> select the right name next time.
> 
> > +
> > +   reg = ntohl(nla_get_be32(attr));
> > +   if (reg > maxlen)
> > +   return -ERANGE;
> > +
> > +   *dest = reg;
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(nft_parse_u32_check);
> > +
> > +/**
> >   * nft_parse_register - parse a register value from a netlink attribute
> >   *
> >   * @attr: netlink attribute
> > diff --git a/net/netfilter/nft_bitwise.c b/net/netfilter/nft_bitwise.c
> > index d71cc18..e7f23a1 100644
> > --- a/net/netfilter/nft_bitwise.c
> > +++ b/net/netfilter/nft_bitwise.c
> > @@ -14,6 +14,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  
> > @@ -52,6 +53,7 @@ static int nft_bitwise_init(const struct nft_ctx *ctx,
> >  {
> > struct nft_bitwise *priv = nft_expr_priv(expr);
> > struct nft_data_desc d1, d2;
> > +   u32 len;
> > int err;
> >  
> > if (tb[NFTA_BITWISE_SREG] == NULL ||
> > @@ -61,7 +63,13 @@ static int nft_bitwise_init(const struct nft_ctx *ctx,
> > tb[NFTA_BITWISE_XOR] == NULL)
> > return -EINVAL;
> >  
> > -   priv->len  = ntohl(nla_get_be32(tb[NFTA_BITWISE_LEN]));
> > +   err  = nft_parse_u32_check(tb[NFTA_BITWISE_LEN], U8_MAX,
> > +  );
>
> 
> This line break was unnecessary. We can have lines 80-chars long, and
> for some reason you broken the line before the limit. I have repaired
> this.

Re: [PATCH 1/1 linux-next] netfilter: conntrack: fix kmemleak false positive

2016-09-22 Thread Fabian Frederick


> On 21 September 2016 at 23:02 Florian Westphal  wrote:
>
>
> Fabian Frederick  wrote:
> > Since commit f330a7fdbe16
> > ("netfilter: conntrack: get rid of conntrack timer")
> >
> > closed connections remain longer in /proc/net/nf_conntrack
> >
> > Running current kernel; just after boot:
> > cat /proc/net/nf_conntrack | wc -l = 5
> > 4 minutes required to clean up the table.
>
> We should reap the stale entries while iterating, just like
> we do for ctnetlink interface.
>
> Can you try this patch?
>
> diff --git a/net/netfilter/nf_conntrack_standalone.c
> b/net/netfilter/nf_conntrack_standalone.c
> --- a/net/netfilter/nf_conntrack_standalone.c
> +++ b/net/netfilter/nf_conntrack_standalone.c
> @@ -212,6 +212,11 @@ static int ct_seq_show(struct seq_file *s, void *v)
>       if (unlikely(!atomic_inc_not_zero(>ct_general.use)))
>               return 0;
> 
> +     if (nf_ct_should_gc(ct)) {
> +             nf_ct_kill(ct);
> +             goto release;
> +     }
> +
>       /* we only want to print DIR_ORIGINAL */
>       if (NF_CT_DIRECTION(hash))
>               goto release;
>

Hello Florian,

        First problem is solved: table gets cleared 3 minutes earlier
but I still have kmemleak before running the following:

echo scan > /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak
Nothing
echo scan > /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak
-> rsyslogd

I talked about false positive because everything is cleared later.

Note that problem appears only in a VM which is really slow due to
ksoftirqd eating lot of CPU for an unknown reason. Maybe you should test
somewhere else before applying.

Regards,
Fabian

> > Going back to kernel version before commit above there are
> > no connections after some seconds.
> >
> > Referring to the commit changelog this was an expected behaviour but
> > it results in temporary kmemleak reports:
>
> I don't see kmemleak complaints on my test vm, I'm reluctant to
> turn it off.
>
> Can you explain why we see such 'false positive'?
>
> The conntracks should still be referenced, as they
> are in main table.
>
> > unreferenced object 0x88003b0e6600 (size 248):
> >   comm "rsyslogd", pid 1595, jiffies 4294741312 (age 7.343s)
> >   ...
> >   backtrace:
> >     [] kmemleak_alloc+0x23/0x40
> >     [] kmem_cache_alloc+0xd9/0x180
> >     [] __nf_conntrack_alloc.isra.50+0x48/0x170
> >     [] nf_conntrack_in+0x3a2/0x5f0
> >     [] ipv4_conntrack_local+0x40/0x50
> >     [] nf_iterate+0x5d/0x70
> >     [] nf_hook_slow+0x5f/0xb0
> >     [] __ip_local_out+0xad/0xe0
> >     [] ip_local_out+0x17/0x40
> >     [] ip_send_skb+0x14/0x40
> >     [] udp_send_skb+0x91/0x260
> >     [] udp_sendmsg+0x2f5/0x950
> >     [] inet_sendmsg+0x60/0x90
> >     [] sock_sendmsg+0x33/0x40
> >     [] SYSC_sendto+0xee/0x160
> >     [] SyS_sendto+0x9/0x10
> >
> > (248 bytes being an nf_conn structure)
> >
> > Those structures being cleared in gc_worker() later on we can't talk
> > about unreferenced object so this patch uses kmemleak_not_leak() to
> > prevent those warnings.
>
> If thats the case, why is kmemleak complaining? Are you sure this
> is a false positive?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] netfilter: Create revision 2 of xt_hashlimit to support higher pps rates

2016-09-22 Thread Jan Engelhardt

On Thursday 2016-09-22 18:43, Vishwanath Pai wrote:
>+struct hashlimit_cfg2 {
>+  __u32 mode;   /* bitmask of XT_HASHLIMIT_HASH_* */
>+  __u64 avg;/* Average secs between packets * scale */
>+  __u64 burst;  /* Period multiplier for upper limit. */

This would have different sizes between i386 and x86_64,
necessiting additional compat functions. It should be padded
or reordered instead.

>+
>+  /* user specified */
>+  __u32 size; /* how many buckets */
>+  __u32 max;  /* max number of entries */
>+  __u32 gc_interval;  /* gc interval */
>+  __u32 expire;   /* when do entries expire? */
>+
>+  __u8 srcmask, dstmask;
>+};
>+
>+struct xt_hashlimit_mtinfo2 {
>+  char name[NAME_MAX];
>+  struct hashlimit_cfg2 cfg;
>+
>+  /* Used internally by the kernel */
>+  struct xt_hashlimit_htable *hinfo __attribute__((aligned(8)));
>+};

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] netfilter: Create revision 2 of xt_hashlimit to support higher pps rates

2016-09-22 Thread Vishwanath Pai
V2:
Removed the call to BUG() in cfg_copy, we return -EINVAL
instead and all calls to cfg_copy check for this

V3:
change "revision" in the call to cfg_copy inside htable_create to 2
previously this would pass down revision from the function parameter
this is wrong since *cfg here is always hashlimit_cfg2

There is no change in userspace patch so V2 will work

--

netfilter: Create revision 2 of xt_hashlimit to support higher pps rates

Create a new revision for the hashlimit iptables extension module. Rev 2
will support higher pps of upto 1 million, Version 1 supports only 10k.

To support this we have to increase the size of the variables avg and
burst in hashlimit_cfg to 64-bit. Create two new structs hashlimit_cfg2
and xt_hashlimit_mtinfo2 and also create newer versions of all the
functions for match, checkentry and destroy.

Some of the functions like hashlimit_mt, hashlimit_mt_check etc are very
similar in both rev1 and rev2 with only minor changes, so I have split
those functions and moved all the common code to a *_common function.

Signed-off-by: Vishwanath Pai 
Signed-off-by: Joshua Hunt 
---
 include/uapi/linux/netfilter/xt_hashlimit.h |  23 ++
 net/netfilter/xt_hashlimit.c| 330 ++--
 2 files changed, 285 insertions(+), 68 deletions(-)

diff --git a/include/uapi/linux/netfilter/xt_hashlimit.h 
b/include/uapi/linux/netfilter/xt_hashlimit.h
index ea8c1c0..be5d2e1 100644
--- a/include/uapi/linux/netfilter/xt_hashlimit.h
+++ b/include/uapi/linux/netfilter/xt_hashlimit.h
@@ -6,6 +6,7 @@
 
 /* timings are in milliseconds. */
 #define XT_HASHLIMIT_SCALE_v1 1
+#define XT_HASHLIMIT_SCALE 100llu
 /* 1/10,000 sec period => max of 10,000/sec.  Min rate is then 429490
  * seconds, or one packet every 59 hours.
  */
@@ -63,6 +64,20 @@ struct hashlimit_cfg1 {
__u8 srcmask, dstmask;
 };
 
+struct hashlimit_cfg2 {
+   __u32 mode;   /* bitmask of XT_HASHLIMIT_HASH_* */
+   __u64 avg;/* Average secs between packets * scale */
+   __u64 burst;  /* Period multiplier for upper limit. */
+
+   /* user specified */
+   __u32 size; /* how many buckets */
+   __u32 max;  /* max number of entries */
+   __u32 gc_interval;  /* gc interval */
+   __u32 expire;   /* when do entries expire? */
+
+   __u8 srcmask, dstmask;
+};
+
 struct xt_hashlimit_mtinfo1 {
char name[IFNAMSIZ];
struct hashlimit_cfg1 cfg;
@@ -71,4 +86,12 @@ struct xt_hashlimit_mtinfo1 {
struct xt_hashlimit_htable *hinfo __attribute__((aligned(8)));
 };
 
+struct xt_hashlimit_mtinfo2 {
+   char name[NAME_MAX];
+   struct hashlimit_cfg2 cfg;
+
+   /* Used internally by the kernel */
+   struct xt_hashlimit_htable *hinfo __attribute__((aligned(8)));
+};
+
 #endif /* _UAPI_XT_HASHLIMIT_H */
diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 6b0ad93..64579f3 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -57,6 +57,7 @@ static inline struct hashlimit_net *hashlimit_pernet(struct 
net *net)
 
 /* need to declare this at the top */
 static const struct file_operations dl_file_ops_v1;
+static const struct file_operations dl_file_ops;
 
 /* hash table crap */
 struct dsthash_dst {
@@ -86,8 +87,8 @@ struct dsthash_ent {
unsigned long expires;  /* precalculated expiry time */
struct {
unsigned long prev; /* last modification */
-   u_int32_t credit;
-   u_int32_t credit_cap, cost;
+   u_int64_t credit;
+   u_int64_t credit_cap, cost;
} rateinfo;
struct rcu_head rcu;
 };
@@ -98,7 +99,7 @@ struct xt_hashlimit_htable {
u_int8_t family;
bool rnd_initialized;
 
-   struct hashlimit_cfg1 cfg;  /* config */
+   struct hashlimit_cfg2 cfg;  /* config */
 
/* used internally */
spinlock_t lock;/* lock for list_head */
@@ -114,6 +115,30 @@ struct xt_hashlimit_htable {
struct hlist_head hash[0];  /* hashtable itself */
 };
 
+static int
+cfg_copy(struct hashlimit_cfg2 *to, void *from, int revision)
+{
+   if (revision == 1) {
+   struct hashlimit_cfg1 *cfg = (struct hashlimit_cfg1 *)from;
+
+   to->mode = cfg->mode;
+   to->avg = cfg->avg;
+   to->burst = cfg->burst;
+   to->size = cfg->size;
+   to->max = cfg->max;
+   to->gc_interval = cfg->gc_interval;
+   to->expire = cfg->expire;
+   to->srcmask = cfg->srcmask;
+   to->dstmask = cfg->dstmask;
+   } else if (revision == 2) {
+   memcpy(to, from, sizeof(struct hashlimit_cfg2));
+   } else {
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static DEFINE_MUTEX(hashlimit_mutex);  /* protects htables list */
 static struct 

[PATCH v3 1/2] netfilter: Prepare xt_hashlimit.c for revision 2

2016-09-22 Thread Vishwanath Pai
I am planning to add a revision 2 for the hashlimit xtables module to
support higher packets per second rates. This patch renames all the
functions and variables related to revision 1 by adding _v1 at the
end of the names.

Signed-off-by: Vishwanath Pai 
Signed-off-by: Joshua Hunt 
---
 include/uapi/linux/netfilter/xt_hashlimit.h |  2 +-
 net/netfilter/xt_hashlimit.c| 61 +++--
 2 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/include/uapi/linux/netfilter/xt_hashlimit.h 
b/include/uapi/linux/netfilter/xt_hashlimit.h
index 6db9037..ea8c1c0 100644
--- a/include/uapi/linux/netfilter/xt_hashlimit.h
+++ b/include/uapi/linux/netfilter/xt_hashlimit.h
@@ -5,7 +5,7 @@
 #include 
 
 /* timings are in milliseconds. */
-#define XT_HASHLIMIT_SCALE 1
+#define XT_HASHLIMIT_SCALE_v1 1
 /* 1/10,000 sec period => max of 10,000/sec.  Min rate is then 429490
  * seconds, or one packet every 59 hours.
  */
diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 1786968..6b0ad93 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -56,7 +56,7 @@ static inline struct hashlimit_net *hashlimit_pernet(struct 
net *net)
 }
 
 /* need to declare this at the top */
-static const struct file_operations dl_file_ops;
+static const struct file_operations dl_file_ops_v1;
 
 /* hash table crap */
 struct dsthash_dst {
@@ -215,8 +215,8 @@ dsthash_free(struct xt_hashlimit_htable *ht, struct 
dsthash_ent *ent)
 }
 static void htable_gc(struct work_struct *work);
 
-static int htable_create(struct net *net, struct xt_hashlimit_mtinfo1 *minfo,
-u_int8_t family)
+static int htable_create_v1(struct net *net, struct xt_hashlimit_mtinfo1 
*minfo,
+   u_int8_t family)
 {
struct hashlimit_net *hashlimit_net = hashlimit_pernet(net);
struct xt_hashlimit_htable *hinfo;
@@ -265,7 +265,7 @@ static int htable_create(struct net *net, struct 
xt_hashlimit_mtinfo1 *minfo,
hinfo->pde = proc_create_data(minfo->name, 0,
(family == NFPROTO_IPV4) ?
hashlimit_net->ipt_hashlimit : hashlimit_net->ip6t_hashlimit,
-   _file_ops, hinfo);
+   _file_ops_v1, hinfo);
if (hinfo->pde == NULL) {
kfree(hinfo->name);
vfree(hinfo);
@@ -398,7 +398,7 @@ static void htable_put(struct xt_hashlimit_htable *hinfo)
(slowest userspace tool allows), which means
CREDITS_PER_JIFFY*HZ*60*60*24 < 2^32 ie.
 */
-#define MAX_CPJ (0x / (HZ*60*60*24))
+#define MAX_CPJ_v1 (0x / (HZ*60*60*24))
 
 /* Repeated shift and or gives us all 1s, final shift and add 1 gives
  * us the power of 2 below the theoretical max, so GCC simply does a
@@ -410,7 +410,7 @@ static void htable_put(struct xt_hashlimit_htable *hinfo)
 #define _POW2_BELOW32(x) (_POW2_BELOW16(x)|_POW2_BELOW16((x)>>16))
 #define POW2_BELOW32(x) ((_POW2_BELOW32(x)>>1) + 1)
 
-#define CREDITS_PER_JIFFY POW2_BELOW32(MAX_CPJ)
+#define CREDITS_PER_JIFFY_v1 POW2_BELOW32(MAX_CPJ_v1)
 
 /* in byte mode, the lowest possible rate is one packet/second.
  * credit_cap is used as a counter that tells us how many times we can
@@ -428,11 +428,12 @@ static u32 xt_hashlimit_len_to_chunks(u32 len)
 static u32 user2credits(u32 user)
 {
/* If multiplying would overflow... */
-   if (user > 0x / (HZ*CREDITS_PER_JIFFY))
+   if (user > 0x / (HZ*CREDITS_PER_JIFFY_v1))
/* Divide first. */
-   return (user / XT_HASHLIMIT_SCALE) * HZ * CREDITS_PER_JIFFY;
+   return (user / XT_HASHLIMIT_SCALE_v1) *\
+   HZ * CREDITS_PER_JIFFY_v1;
 
-   return (user * HZ * CREDITS_PER_JIFFY) / XT_HASHLIMIT_SCALE;
+   return (user * HZ * CREDITS_PER_JIFFY_v1) / XT_HASHLIMIT_SCALE_v1;
 }
 
 static u32 user2credits_byte(u32 user)
@@ -461,7 +462,7 @@ static void rateinfo_recalc(struct dsthash_ent *dh, 
unsigned long now, u32 mode)
return;
}
} else {
-   dh->rateinfo.credit += delta * CREDITS_PER_JIFFY;
+   dh->rateinfo.credit += delta * CREDITS_PER_JIFFY_v1;
cap = dh->rateinfo.credit_cap;
}
if (dh->rateinfo.credit > cap)
@@ -603,7 +604,7 @@ static u32 hashlimit_byte_cost(unsigned int len, struct 
dsthash_ent *dh)
 }
 
 static bool
-hashlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
+hashlimit_mt_v1(const struct sk_buff *skb, struct xt_action_param *par)
 {
const struct xt_hashlimit_mtinfo1 *info = par->matchinfo;
struct xt_hashlimit_htable *hinfo = info->hinfo;
@@ -660,7 +661,7 @@ hashlimit_mt(const struct sk_buff *skb, struct 
xt_action_param *par)
return false;
 }
 
-static int hashlimit_mt_check(const struct xt_mtchk_param *par)
+static int hashlimit_mt_check_v1(const struct xt_mtchk_param *par)
 

Re: [PATCH nf] netfilter: nf_tables: Ensure u8 attributes are loaded from u32 within the bounds

2016-09-22 Thread Eric Dumazet
On Thu, 2016-09-22 at 16:58 +0200, Pablo Neira Ayuso wrote:
> attributes")
> 
> Always use 12 bytes commit-ids. 4da449a is too short, given the number
> of changes we're getting in the kernel tree, this may become ambiguous
> at some point so it won't be unique.
> 
> You can achieve this via: git log --oneline --abbrev=12

and Documentation/SubmittingPatches has these tips :


You should also be sure to use at least the first twelve characters of the
SHA-1 ID.  The kernel repository holds a *lot* of objects, making
collisions with shorter IDs a real possibility.  Bear in mind that, even if
there is no collision with your six-character ID now, that condition may
change five years from now.

If your patch fixes a bug in a specific commit, e.g. you found an issue using
git-bisect, please use the 'Fixes:' tag with the first 12 characters of the
SHA-1 ID, and the one line summary.  For example:

Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()")

The following git-config settings can be used to add a pretty format for
outputting the above style in the git log or git show commands

[core]
abbrev = 12
[pretty]
fixes = Fixes: %h (\"%s\")


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_helper: Use sizeof(variable) instead of literal number

2016-09-22 Thread Pablo Neira Ayuso
On Tue, Sep 20, 2016 at 10:31:04AM +0800, f...@ikuai8.com wrote:
> From: Gao Feng 
> 
> It's better to use sizeof(info->name)-1 as index to force set the string
> tail instead of literal number '29'.

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: nf_tables: check tprot_set first when we use xt.thoff

2016-09-22 Thread Pablo Neira Ayuso
On Sat, Sep 17, 2016 at 02:31:20PM +0800, Liping Zhang wrote:
> From: Liping Zhang 
> 
> pkt->xt.thoff is not always set properly, but we use it without any check.
> For payload expr, it will cause wrong results. For nftrace, we may notify
> the wrong network or transport header to the user space, furthermore,
> input the following nft rules, warning message will be printed out:
>   # nft add rule arp filter output meta nftrace set 1
> 
>   WARNING: CPU: 0 PID: 13428 at net/netfilter/nf_tables_trace.c:263
>   nft_trace_notify+0x4a3/0x5e0 [nf_tables]
>   Call Trace:
>   [] dump_stack+0x63/0x85
>   [] __warn+0xcb/0xf0
>   [] warn_slowpath_null+0x1d/0x20
>   [] nft_trace_notify+0x4a3/0x5e0 [nf_tables]
>   [ ... ]
>   [] nft_do_chain_arp+0x78/0x90 [nf_tables_arp]
>   [] nf_iterate+0x62/0x80
>   [] nf_hook_slow+0x73/0xd0
>   [] arp_xmit+0x8f/0xb0
>   [ ... ]
>   [] arp_solicit+0x106/0x2c0
> 
> So before we use pkt->xt.thoff, check the tprot_set first.

Applied, thanks a lot.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: nf_queue: improve queue range support for bridge family

2016-09-22 Thread Pablo Neira Ayuso
On Thu, Sep 15, 2016 at 08:50:16PM +0800, Liping Zhang wrote:
> From: Liping Zhang 
> 
> After commit ac2863445686 ("netfilter: bridge: add nf_afinfo to enable
> queuing to userspace"), we can queue packets to the user space in bridge
> family. But when the user specify the queue range, packets will be only
> delivered to the first queue num. Because in nfqueue_hash, we only support
> ipv4 and ipv6 family. Now add support for bridge family too.

Applied, thanks for finishing this!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nft] tests: py: add more test cases for queue expr

2016-09-22 Thread Pablo Neira Ayuso
On Thu, Sep 15, 2016 at 12:02:09AM +0800, Liping Zhang wrote:
> From: Liping Zhang 
> 
> It's necessary to cover more test cases, for example, large queue
> range 1-65535, error queue number 65536.
> 
> Also add a space before tailing square brackets, this is updated to
> keep consistent with other expr.

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next V2] netfilter: nft_queue: add _SREG_QNUM attr to select the queue number

2016-09-22 Thread Pablo Neira Ayuso
On Wed, Sep 14, 2016 at 11:41:46PM +0800, Liping Zhang wrote:
> From: Liping Zhang 
> 
> Currently, the user can specify the queue numbers by _QUEUE_NUM and
> _QUEUE_TOTAL attributes, this is enough in most situations.
> 
> But acctually, it is not very flexible, for example:
>   tcp dport 80 mapped to queue0
>   tcp dport 81 mapped to queue1
>   tcp dport 82 mapped to queue2
> In order to do this thing, we must add 3 nft rules, and more
> mapping meant more rules ...
> 
> So take one register to select the queue number, then we can add one
> simple rule to mapping queues, maybe like this:
>   queue num tcp dport map { 80:0, 81:1, 82:2 ... }
> 
> Florian Westphal also proposed wider usage scenarios:
>   queue num jhash ip saddr . ip daddr mod ...
>   queue num meta cpu ...
>   queue num meta mark ...
> 
> The last point is how to load a queue number from sreg, although we can
> use *(u16*)>data[reg] to load the queue number, just like nat expr
> to load its l4port do.
> 
> But we will cooperate with hash expr, meta cpu, meta mark expr and so on.
> They all store the result to u32 type, so cast it to u16 pointer and
> dereference it will generate wrong result in the big endian system.
> 
> So just keep it simple, we treat queue number as u32 type, although u16
> type is already enough.

Applied, thanks Liping.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf] netfilter: nf_tables: Ensure u8 attributes are loaded from u32 within the bounds

2016-09-22 Thread Pablo Neira Ayuso
On Thu, Sep 22, 2016 at 04:58:36PM +0200, Pablo Neira Ayuso wrote:
> On Wed, Sep 14, 2016 at 03:00:02PM +0200, Laura Garcia Liebana wrote:
> > Check storage of u32 netlink attributes in smaller resources. This
> > validation is usually required when the u32 netlink attributes are being
> > stored in a private structure size of u8 in the kernel.
> 
> Applied with changes, no need to resend. If I break anything, just
> follow up on top.

And for the record:

netfilter: nf_tables: validate maximum value of u32 netlink attributes

I have renamed the titled to this, slightly shorter.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nft] src: support ct l3proto/protocol without direction syntax

2016-09-22 Thread Liping Zhang
From: Liping Zhang 

Acctually, ct l3proto and ct protocol are unrelated to direction, so
it's unnecessary that we must specify dir if we want to use them.

Now add support that we can match ct l3proto/protocol without direction:
  # nft add rule filter input ct l3proto ipv4
  # nft add rule filter output ct protocol 17

Note: existing syntax is still preserved, so "ct reply l3proto ipv6"
is still fine.

Signed-off-by: Liping Zhang 
---
 src/parser_bison.y   | 2 ++
 tests/py/ip/ct.t | 8 
 tests/py/ip/ct.t.payload | 8 
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/parser_bison.y b/src/parser_bison.y
index aac10dc..36dbc8d 100644
--- a/src/parser_bison.y
+++ b/src/parser_bison.y
@@ -2513,6 +2513,8 @@ ct_key:   STATE   { $$ = 
NFT_CT_STATE; }
|   EXPIRATION  { $$ = NFT_CT_EXPIRATION; }
|   HELPER  { $$ = NFT_CT_HELPER; }
|   LABEL   { $$ = NFT_CT_LABELS; }
+   |   L3PROTOCOL  { $$ = NFT_CT_L3PROTOCOL; }
+   |   PROTOCOL{ $$ = NFT_CT_PROTOCOL; }
|   ct_key_counters
;
 ct_key_dir :   SADDR   { $$ = NFT_CT_SRC; }
diff --git a/tests/py/ip/ct.t b/tests/py/ip/ct.t
index 65f5d92..d0f16c5 100644
--- a/tests/py/ip/ct.t
+++ b/tests/py/ip/ct.t
@@ -13,11 +13,11 @@ ct reply saddr 192.168.1.0/24;ok
 ct original daddr 192.168.1.0/24;ok
 ct reply daddr 192.168.1.0/24;ok
 
-ct original l3proto ipv4;ok
-ct reply l3proto foobar;fail
+ct l3proto ipv4;ok
+ct l3proto foobar;fail
 
-ct original protocol 6 ct original proto-dst 22;ok
-ct original protocol 17 ct reply proto-src 53;ok
+ct protocol 6 ct original proto-dst 22;ok
+ct original protocol 17 ct reply proto-src 53;ok;ct protocol 17 ct reply 
proto-src 53
 
 # wrong address family
 ct reply daddr dead::beef;fail
diff --git a/tests/py/ip/ct.t.payload b/tests/py/ip/ct.t.payload
index 0449b07..56633a2 100644
--- a/tests/py/ip/ct.t.payload
+++ b/tests/py/ip/ct.t.payload
@@ -42,14 +42,14 @@ ip test-ip4 output
   [ bitwise reg 1 = (reg=1 & 0x00ff ) ^ 0x ]
   [ cmp eq reg 1 0x0001a8c0 ]
 
-# ct original l3proto ipv4
+# ct l3proto ipv4
 ip test-ip4 output
-  [ ct load l3protocol => reg 1 , dir original ]
+  [ ct load l3protocol => reg 1 ]
   [ cmp eq reg 1 0x0002 ]
 
-# ct original protocol 6 ct original proto-dst 22
+# ct protocol 6 ct original proto-dst 22
 ip test-ip4 output
-  [ ct load protocol => reg 1 , dir original ]
+  [ ct load protocol => reg 1 ]
   [ cmp eq reg 1 0x0006 ]
   [ ct load proto_dst => reg 1 , dir original ]
   [ cmp eq reg 1 0x1600 ]
-- 
2.5.5


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 libnftnl] expr: numgen: add number generation offset

2016-09-22 Thread Pablo Neira Ayuso
On Tue, Sep 13, 2016 at 01:50:41PM +0200, Laura Garcia Liebana wrote:
> Add support to pass through an offset value to the counter
> initialization. With this feature, the sysadmin is able to apply a value
> to be added to the generated number.
> 
> Example:
> 
>   meta mark set numgen inc mod 2 offset 100
> 
> This will generate marks with series 100, 101, 100, 101, ...

Also applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH conntrack-tools] Link nfct and helper modules with `-z lazy`

2016-09-22 Thread Pablo Neira Ayuso
On Sun, Sep 11, 2016 at 01:54:19PM -0700, Kevin Cernekee wrote:
> Some distributions, such as Gentoo and Chrome OS, try to link all
> programs with `-z now` as a security hardening measure.  This breaks
> nfct, because nfct cannot satisfy all of the helper modules' symbols.
> Therefore nfct implicitly depends on lazy binding.
> 
> Have autoconf probe the linker to see if `-z lazy` works, and if so,
> use it to link nfct and the helpers.
> 
> conntrackd itself is unaffected, and should still work with `-z now`.

Applied, thanks for your patience.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf v5] netfilter: seqadj: Fix the wrong ack adjust for the RST packet without ack

2016-09-22 Thread fgao
From: Gao Feng 

It is valid that the TCP RST packet which does not set ack flag, and bytes
of ack number are zero. But current seqadj codes would adjust the "0" ack
to invalid ack number. Actually seqadj need to check the ack flag before
adjust it for these RST packets.

The following is my test case

client is 10.26.98.245, and add one iptable rule:
iptables  -I INPUT -p tcp --sport 12345 -m connbytes --connbytes 2:
--connbytes-dir reply --connbytes-mode packets -j REJECT --reject-with
tcp-reset
This iptables rule could generate on TCP RST without ack flag.

server:10.172.135.55
Enable the synproxy with seqadjust by the following iptables rules
iptables -t raw -A PREROUTING -i eth0 -p tcp -d 10.172.135.55 --dport 12345
-m tcp --syn -j CT --notrack

iptables -A INPUT -i eth0 -p tcp -d 10.172.135.55 --dport 12345 -m conntrack
--ctstate INVALID,UNTRACKED -j SYNPROXY --sack-perm --timestamp --wscale 7
--mss 1460
iptables -A OUTPUT -o eth0 -p tcp -s 10.172.135.55 --sport 12345 -m conntrack
--ctstate INVALID,UNTRACKED -m tcp --tcp-flags SYN,RST,ACK SYN,ACK -j ACCEPT

The following is my test result.

1. packet trace on client
root@routers:/tmp# tcpdump -i eth0 tcp port 12345 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [S], seq 3695959829,
win 29200, options [mss 1460,sackOK,TS val 452367884 ecr 0,nop,wscale 7],
length 0
IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [S.], seq 546723266,
ack 3695959830, win 0, options [mss 1460,sackOK,TS val 15643479 ecr 452367884,
nop,wscale 7], length 0
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [.], ack 1, win 229,
options [nop,nop,TS val 452367885 ecr 15643479], length 0
IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [.], ack 1, win 226,
options [nop,nop,TS val 15643479 ecr 452367885], length 0
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [R], seq 3695959830,
win 0, length 0

2. seqadj log on server
[62873.867319] Adjusting sequence number from 602341895->546723267,
ack from 3695959830->3695959830
[62873.867644] Adjusting sequence number from 602341895->546723267,
ack from 3695959830->3695959830
[62873.869040] Adjusting sequence number from 3695959830->3695959830,
ack from 0->55618628

To summarize, it is clear that the seqadj codes adjust the 0 ack when receive
one TCP RST packet without ack.

Signed-off-by: Gao Feng 
---
 v5: Use goto to decrease the patch size
 v4: Don't invoke nf_ct_sack_adjust when no ack flag
 v3: Add the reproduce steps and packet trace
 v2: Regenerate because the first patch is removed
 v1: Initial patch

 net/netfilter/nf_conntrack_seqadj.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nf_conntrack_seqadj.c 
b/net/netfilter/nf_conntrack_seqadj.c
index dff0f0c..08d0640 100644
--- a/net/netfilter/nf_conntrack_seqadj.c
+++ b/net/netfilter/nf_conntrack_seqadj.c
@@ -169,7 +169,7 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
s32 seqoff, ackoff;
struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
struct nf_ct_seqadj *this_way, *other_way;
-   int res;
+   int res = 1;
 
this_way  = >seq[dir];
other_way = >seq[!dir];
@@ -184,27 +184,31 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
else
seqoff = this_way->offset_before;
 
+   newseq = htonl(ntohl(tcph->seq) + seqoff);
+   inet_proto_csum_replace4(>check, skb, tcph->seq, newseq, false);
+   pr_debug("Adjusting sequence number from %u->%u\n",
+ntohl(tcph->seq), ntohl(newseq));
+   tcph->seq = newseq;
+
+   if (unlikely(!tcph->ack))
+   goto out;
+
if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
  other_way->correction_pos))
ackoff = other_way->offset_after;
else
ackoff = other_way->offset_before;
 
-   newseq = htonl(ntohl(tcph->seq) + seqoff);
newack = htonl(ntohl(tcph->ack_seq) - ackoff);
-
-   inet_proto_csum_replace4(>check, skb, tcph->seq, newseq, false);
inet_proto_csum_replace4(>check, skb, tcph->ack_seq, newack,
 false);
-
-   pr_debug("Adjusting sequence number from %u->%u, ack from %u->%u\n",
+   pr_debug("Adjusting ack number from %u->%u, ack from %u->%u\n",
 ntohl(tcph->seq), ntohl(newseq), ntohl(tcph->ack_seq),
 ntohl(newack));
-
-   tcph->seq = newseq;
tcph->ack_seq = newack;
 
res = nf_ct_sack_adjust(skb, protoff, tcph, ct, ctinfo);
+out:
spin_unlock_bh(>lock);
 
return res;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf v4] netfilter: seqadj: Fix the wrong ack adjust for the RST packet without ack

2016-09-22 Thread fgao
From: Gao Feng 

It is valid that the TCP RST packet which does not set ack flag, and bytes
of ack number are zero. But current seqadj codes would adjust the "0" ack
to invalid ack number. Actually seqadj need to check the ack flag before
adjust it for these RST packets.

The following is my test case

client is 10.26.98.245, and add one iptable rule:
iptables  -I INPUT -p tcp --sport 12345 -m connbytes --connbytes 2:
--connbytes-dir reply --connbytes-mode packets -j REJECT --reject-with
tcp-reset
This iptables rule could generate on TCP RST without ack flag.

server:10.172.135.55
Enable the synproxy with seqadjust by the following iptables rules
iptables -t raw -A PREROUTING -i eth0 -p tcp -d 10.172.135.55 --dport 12345
-m tcp --syn -j CT --notrack

iptables -A INPUT -i eth0 -p tcp -d 10.172.135.55 --dport 12345 -m conntrack
--ctstate INVALID,UNTRACKED -j SYNPROXY --sack-perm --timestamp --wscale 7
--mss 1460
iptables -A OUTPUT -o eth0 -p tcp -s 10.172.135.55 --sport 12345 -m conntrack
--ctstate INVALID,UNTRACKED -m tcp --tcp-flags SYN,RST,ACK SYN,ACK -j ACCEPT

The following is my test result.

1. packet trace on client
root@routers:/tmp# tcpdump -i eth0 tcp port 12345 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [S], seq 3695959829,
win 29200, options [mss 1460,sackOK,TS val 452367884 ecr 0,nop,wscale 7],
length 0
IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [S.], seq 546723266,
ack 3695959830, win 0, options [mss 1460,sackOK,TS val 15643479 ecr 452367884,
nop,wscale 7], length 0
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [.], ack 1, win 229,
options [nop,nop,TS val 452367885 ecr 15643479], length 0
IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [.], ack 1, win 226,
options [nop,nop,TS val 15643479 ecr 452367885], length 0
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [R], seq 3695959830,
win 0, length 0

2. seqadj log on server
[62873.867319] Adjusting sequence number from 602341895->546723267,
ack from 3695959830->3695959830
[62873.867644] Adjusting sequence number from 602341895->546723267,
ack from 3695959830->3695959830
[62873.869040] Adjusting sequence number from 3695959830->3695959830,
ack from 0->55618628

To summarize, it is clear that the seqadj codes adjust the 0 ack when receive
one TCP RST packet without ack.

Signed-off-by: Gao Feng 
---
 v4: Don't invoke nf_ct_sack_adjust when no ack flag
 v3: Add the reproduce steps and packet trace
 v2: Regenerate because the first patch is removed
 v1: Initial patch

 net/netfilter/nf_conntrack_seqadj.c | 37 -
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/net/netfilter/nf_conntrack_seqadj.c 
b/net/netfilter/nf_conntrack_seqadj.c
index dff0f0c..80ab429 100644
--- a/net/netfilter/nf_conntrack_seqadj.c
+++ b/net/netfilter/nf_conntrack_seqadj.c
@@ -169,7 +169,7 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
s32 seqoff, ackoff;
struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
struct nf_ct_seqadj *this_way, *other_way;
-   int res;
+   int res = 1;
 
this_way  = >seq[dir];
other_way = >seq[!dir];
@@ -184,27 +184,30 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
else
seqoff = this_way->offset_before;
 
-   if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
- other_way->correction_pos))
-   ackoff = other_way->offset_after;
-   else
-   ackoff = other_way->offset_before;
-
newseq = htonl(ntohl(tcph->seq) + seqoff);
-   newack = htonl(ntohl(tcph->ack_seq) - ackoff);
-
inet_proto_csum_replace4(>check, skb, tcph->seq, newseq, false);
-   inet_proto_csum_replace4(>check, skb, tcph->ack_seq, newack,
-false);
+   pr_debug("Adjusting sequence number from %u->%u\n",
+ntohl(tcph->seq), ntohl(newseq));
+   tcph->seq = newseq;
 
-   pr_debug("Adjusting sequence number from %u->%u, ack from %u->%u\n",
-ntohl(tcph->seq), ntohl(newseq), ntohl(tcph->ack_seq),
-ntohl(newack));
+   if (likely(tcph->ack)) {
+   if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
+ other_way->correction_pos))
+   ackoff = other_way->offset_after;
+   else
+   ackoff = other_way->offset_before;
 
-   tcph->seq = newseq;
-   tcph->ack_seq = newack;
+   newack = htonl(ntohl(tcph->ack_seq) - ackoff);
+   inet_proto_csum_replace4(>check, skb, tcph->ack_seq,
+newack, false);
+   pr_debug("Adjusting ack number from %u->%u, ack from %u->%u\n",
+ntohl(tcph->seq), ntohl(newseq), ntohl(tcph->ack_seq),