date:20170629

Re: [PATCH v3] datapath: Avoid using stack larger than 1024.

2017-06-29 Thread Tonghao Zhang

Thanks for your works. I send v3 for net-next and add "Acked-by".
If it is applied, I will backport it to d...@openvswitch.org.

On Fri, Jun 30, 2017 at 1:45 AM, Pravin Shelar  wrote:
> On Wed, Jun 28, 2017 at 6:38 PM, Tonghao Zhang  
> wrote:
>> When compiling OvS-master on 4.4.0-81 kernel,
>> there is a warning:
>>
>> CC [M]  /root/ovs/datapath/linux/datapath.o
>> /root/ovs/datapath/linux/datapath.c: In function
>> 'ovs_flow_cmd_set':
>> /root/ovs/datapath/linux/datapath.c:1221:1: warning:
>> the frame size of 1040 bytes is larger than 1024 bytes
>> [-Wframe-larger-than=]
>>
>> This patch factors out match-init and action-copy to avoid
>> "Wframe-larger-than=1024" warning. Because mask is only
>> used to get actions, we new a function to save some
>> stack space.
>>
>> Signed-off-by: Tonghao Zhang 
>> ---
> Looks good, you have to submit patch against net-next for upstream OVS
> module first.

Re: [PATCH RFC 0/2] kproxy: Kernel Proxy

2017-06-29 Thread Willy Tarreau

On Thu, Jun 29, 2017 at 04:43:28PM -0700, Tom Herbert wrote:
> On Thu, Jun 29, 2017 at 1:58 PM, Willy Tarreau  wrote:
> > On Thu, Jun 29, 2017 at 01:40:26PM -0700, Tom Herbert wrote:
> >> > In fact that's not much what I observe in field. In practice, large
> >> > data streams are cheaply relayed using splice(), I could achieve
> >> > 60 Gbps of HTTP forwarding via HAProxy on a 4-core xeon 2 years ago.
> >> > And when you use SSL, the cost of the copy to/from kernel is small
> >> > compared to all the crypto operations surrounding this.
> >> >
> >> Right, getting rid of the extra crypto operations and so called "SSL
> >> inspection" is the ultimate goal this is going towards.
> >
> > Yep but in order to take decisions at L7 you need to decapsulate SSL.
> >
> Decapsulate or decrypt? There's a big difference... :-) I'm am aiming
> to just have to decapsulate.

Sorry, but what difference do you make ? For me "decapsulate" means
"extract the next level layer", and for SSL it means you need to decrypt.

> >
> >> Performance is relevant because we
> >> potentially want security applied to every message in every
> >> communication in a containerized data center. Putting the userspace
> >> hop in the datapath of every packet is know to be problematic, not
> >> just for the performance hit  also because it increases the attack
> >> surface on users' privacy.
> >
> > While I totally agree on the performance hit when inspecting each packet,
> > I fail to see the relation with users' privacy. In fact under some
> > circumstances it can even be the opposite. For example, using something
> > like kTLS for a TCP/HTTP proxy can result in cleartext being observable
> > in strace while it's not visible when TLS is terminated in userland because
> > all you see are openssl's read()/write() operations. Maybe you have specific
> > attacks in mind ?
> >
> No, just the normal problem of making yet one more tool systematically
> have access to user data.

OK.

> >> > Regarding kernel-side protocol parsing, there's an unfortunate trend
> >> > at moving more and more protocols to userland due to these protocols
> >> > evolving very quickly. At least you'll want to find a way to provide
> >> > these parsers from userspace, which will inevitably come with its set
> >> > of problems or limitations :-/
> >> >
> >> That's why everything is going BPF now ;-)
> >
> > Yes, I knew you were going to suggest this :-)  I'm still prudent on it
> > to be honnest. I don't think it would be that easy to implement an HPACK
> > encoder/decoder using BPF. And even regarding just plain HTTP parsing,
> > certain very small operations in haproxy's parser can quickly result in
> > a 10% performance degradation when improperly optimized (ie: changing a
> > "likely", altering branch prediction, or cache walk patterns when using
> > arrays to evaluate character classes faster). But for general usage I
> > indeed think it should be OK.
> >
> HTTP might qualify as a special case, and I believe there's already
> been some work to put http in kernel by Alexander Krizhanovsky and
> others. In this case maybe http parse could be front end before BPF.

It could indeed be an option. We've seen this with Tux in the past.

> Although, pretty clear we'll need regex in BPF if we want use it with
> http.

I think so as well. And some loop-like operations (foreach or stuff like
this so that they remain bounded).

Willy

Re: [PATCH net-next] cxgb4: Add PTP Hardware Clock (PHC) support (fwd)

2017-06-29 Thread Julia Lawall

The complete context isn't shown, but it seems likely that there is a goto
out_free under line 1207, with no unlock on >ptp_lock.

julia

-- Forwarded message --
Date: Fri, 30 Jun 2017 11:54:23 +0800
From: kbuild test robot <fengguang...@intel.com>
To: kbu...@01.org
Cc: Julia Lawall <julia.law...@lip6.fr>
Subject: Re: [PATCH net-next] cxgb4: Add PTP Hardware Clock (PHC) support

Hi Atul,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Atul-Gupta/cxgb4-Add-PTP-Hardware-Clock-PHC-support/20170629-200758
:: branch date: 16 hours ago
:: commit date: 16 hours ago

>> drivers/net/ethernet/chelsio/cxgb4/sge.c:1190:2-8: preceding lock on line 
>> 1204

git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 159226c60ceb77858018f6c31d17a575b3679b8a
vim +1190 drivers/net/ethernet/chelsio/cxgb4/sge.c

fd3a47900 drivers/net/cxgb4/sge.c  Dimitris Michailidis 
2010-04-01  1184/*
fd3a47900 drivers/net/cxgb4/sge.c  Dimitris Michailidis 
2010-04-01  1185 * The chip min packet length is 10 octets but play 
safe and reject
fd3a47900 drivers/net/cxgb4/sge.c  Dimitris Michailidis 
2010-04-01  1186 * anything shorter than an Ethernet header.
fd3a47900 drivers/net/cxgb4/sge.c  Dimitris Michailidis 
2010-04-01  1187 */
fd3a47900 drivers/net/cxgb4/sge.c  Dimitris Michailidis 
2010-04-01  1188if (unlikely(skb->len < ETH_HLEN)) {
a7525198a drivers/net/ethernet/chelsio/cxgb4/sge.c Eric W. Biederman
2014-03-15  1189  out_free: dev_kfree_skb_any(skb);
fd3a47900 drivers/net/cxgb4/sge.c  Dimitris Michailidis 
2010-04-01 @1190return NETDEV_TX_OK;
fd3a47900 drivers/net/cxgb4/sge.c  Dimitris Michailidis 
2010-04-01  1191}
fd3a47900 drivers/net/cxgb4/sge.c  Dimitris Michailidis 
2010-04-01  1192
637d3e997 drivers/net/ethernet/chelsio/cxgb4/sge.c Hariprasad Shenai
2015-05-05  1193/* Discard the packet if the length is greater than mtu 
*/
637d3e997 drivers/net/ethernet/chelsio/cxgb4/sge.c Hariprasad Shenai
2015-05-05  1194max_pkt_len = ETH_HLEN + dev->mtu;
8d09e6b8b drivers/net/ethernet/chelsio/cxgb4/sge.c Hariprasad Shenai
2016-07-28  1195if (skb_vlan_tagged(skb))
637d3e997 drivers/net/ethernet/chelsio/cxgb4/sge.c Hariprasad Shenai
2015-05-05  1196max_pkt_len += VLAN_HLEN;
637d3e997 drivers/net/ethernet/chelsio/cxgb4/sge.c Hariprasad Shenai
2015-05-05  1197if (!skb_shinfo(skb)->gso_size && (unlikely(skb->len > 
max_pkt_len)))
637d3e997 drivers/net/ethernet/chelsio/cxgb4/sge.c Hariprasad Shenai
2015-05-05  1198goto out_free;
637d3e997 drivers/net/ethernet/chelsio/cxgb4/sge.c Hariprasad Shenai
2015-05-05  1199
fd3a47900 drivers/net/cxgb4/sge.c  Dimitris Michailidis 
2010-04-01  1200pi = netdev_priv(dev);
fd3a47900 drivers/net/cxgb4/sge.c  Dimitris Michailidis 
2010-04-01  1201adap = pi->adapter;
fd3a47900 drivers/net/cxgb4/sge.c  Dimitris Michailidis 
2010-04-01  1202qidx = skb_get_queue_mapping(skb);
159226c60 drivers/net/ethernet/chelsio/cxgb4/sge.c Atul Gupta   
2017-06-28  1203if (ptp_enabled) {
159226c60 drivers/net/ethernet/chelsio/cxgb4/sge.c Atul Gupta   
2017-06-28 @1204spin_lock(>ptp_lock);
159226c60 drivers/net/ethernet/chelsio/cxgb4/sge.c Atul Gupta   
2017-06-28  1205if (!(adap->ptp_tx_skb)) {
159226c60 drivers/net/ethernet/chelsio/cxgb4/sge.c Atul Gupta   
2017-06-28  1206skb_shinfo(skb)->tx_flags |= 
SKBTX_IN_PROGRESS;
159226c60 drivers/net/ethernet/chelsio/cxgb4/sge.c Atul Gupta   
2017-06-28  1207adap->ptp_tx_skb = skb_get(skb);

:: The code at line 1190 was first introduced by commit
:: fd3a47900b6f9fa72a4074ecb630f9dae62f1a95 cxgb4: Add packet queues and 
packet DMA code

:: TO: Dimitris Michailidis <d...@chelsio.com>
:: CC: David S. Miller <da...@davemloft.net>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

[PATCH net-next 09/11] sctp: remove the typedef sctp_data_chunk_t

2017-06-29 Thread Xin Long

This patch is to remove the typedef sctp_data_chunk_t, and replace
with struct sctp_data_chunk in the places where it's using this
typedef.

Signed-off-by: Xin Long 
---
 include/linux/sctp.h | 4 ++--
 include/net/sctp/constants.h | 2 +-
 include/net/sctp/sm.h| 2 +-
 net/sctp/output.c| 4 ++--
 net/sctp/sm_statefuns.c  | 6 +++---
 net/sctp/ulpqueue.c  | 2 +-
 6 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index 55d84c1..91c888f 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -235,10 +235,10 @@ struct sctp_datahdr {
__u8  payload[0];
 };
 
-typedef struct sctp_data_chunk {
+struct sctp_data_chunk {
struct sctp_chunkhdr chunk_hdr;
struct sctp_datahdr data_hdr;
-} sctp_data_chunk_t;
+};
 
 /* DATA Chuck Specific Flags */
 enum {
diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
index 02e867b..9b18044 100644
--- a/include/net/sctp/constants.h
+++ b/include/net/sctp/constants.h
@@ -152,7 +152,7 @@ SCTP_SUBTYPE_CONSTRUCTOR(PRIMITIVE, sctp_event_primitive_t, 
primitive)
 /* Calculate the actual data size in a data chunk */
 #define SCTP_DATA_SNDSIZE(c) ((int)((unsigned long)(c->chunk_end)\
- (unsigned long)(c->chunk_hdr)\
-   - sizeof(sctp_data_chunk_t)))
+   - sizeof(struct sctp_data_chunk)))
 
 /* Internal error codes */
 typedef enum {
diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 245eb22..860f378 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -347,7 +347,7 @@ static inline __u16 sctp_data_size(struct sctp_chunk *chunk)
__u16 size;
 
size = ntohs(chunk->chunk_hdr->length);
-   size -= sizeof(sctp_data_chunk_t);
+   size -= sizeof(struct sctp_data_chunk);
 
return size;
 }
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 89cee14..977ab5d 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -723,8 +723,8 @@ static sctp_xmit_t sctp_packet_can_append_data(struct 
sctp_packet *packet,
/* Check whether this chunk and all the rest of pending data will fit
 * or delay in hopes of bundling a full sized packet.
 */
-   if (chunk->skb->len + q->out_qlen >
-   transport->pathmtu - packet->overhead - 
sizeof(sctp_data_chunk_t) - 4)
+   if (chunk->skb->len + q->out_qlen > transport->pathmtu -
+   packet->overhead - sizeof(struct sctp_data_chunk) - 4)
/* Enough data queued to fill a packet */
return SCTP_XMIT_OK;
 
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 1ba9a9b..212fe76 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -2990,7 +2990,7 @@ sctp_disposition_t sctp_sf_eat_data_6_2(struct net *net,
return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
}
 
-   if (!sctp_chunk_length_valid(chunk, sizeof(sctp_data_chunk_t)))
+   if (!sctp_chunk_length_valid(chunk, sizeof(struct sctp_data_chunk)))
return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
  commands);
 
@@ -3109,7 +3109,7 @@ sctp_disposition_t sctp_sf_eat_data_fast_4_4(struct net 
*net,
return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
}
 
-   if (!sctp_chunk_length_valid(chunk, sizeof(sctp_data_chunk_t)))
+   if (!sctp_chunk_length_valid(chunk, sizeof(struct sctp_data_chunk)))
return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
  commands);
 
@@ -6262,7 +6262,7 @@ static int sctp_eat_data(const struct sctp_association 
*asoc,
 * Actually, allow a little bit of overflow (up to a MTU).
 */
datalen = ntohs(chunk->chunk_hdr->length);
-   datalen -= sizeof(sctp_data_chunk_t);
+   datalen -= sizeof(struct sctp_data_chunk);
 
deliver = SCTP_CMD_CHUNK_ULP;
 
diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index 25f7e41..0225d62 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -1090,7 +1090,7 @@ void sctp_ulpq_renege(struct sctp_ulpq *ulpq, struct 
sctp_chunk *chunk,
 
if (chunk) {
needed = ntohs(chunk->chunk_hdr->length);
-   needed -= sizeof(sctp_data_chunk_t);
+   needed -= sizeof(struct sctp_data_chunk);
} else
needed = SCTP_DEFAULT_MAXWINDOW;
 
-- 
2.1.0

[PATCH net-next 08/11] sctp: remove the typedef sctp_datahdr_t

2017-06-29 Thread Xin Long

This patch is to remove the typedef sctp_datahdr_t, and replace with
struct sctp_datahdr in the places where it's using this typedef.

It is also to use izeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long 
---
 include/linux/sctp.h|  6 +++---
 net/sctp/sm_statefuns.c | 13 -
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index d5c0dda..55d84c1 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -227,17 +227,17 @@ enum { SCTP_PARAM_ACTION_MASK = cpu_to_be16(0xc000), };
 
 /* RFC 2960 Section 3.3.1 Payload Data (DATA) (0) */
 
-typedef struct sctp_datahdr {
+struct sctp_datahdr {
__be32 tsn;
__be16 stream;
__be16 ssn;
__be32 ppid;
__u8  payload[0];
-} sctp_datahdr_t;
+};
 
 typedef struct sctp_data_chunk {
struct sctp_chunkhdr chunk_hdr;
-   sctp_datahdr_t  data_hdr;
+   struct sctp_datahdr data_hdr;
 } sctp_data_chunk_t;
 
 /* DATA Chuck Specific Flags */
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 0a01c68..1ba9a9b 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -3010,7 +3010,8 @@ sctp_disposition_t sctp_sf_eat_data_6_2(struct net *net,
return SCTP_DISPOSITION_ABORT;
case SCTP_IERROR_PROTO_VIOLATION:
return sctp_sf_abort_violation(net, ep, asoc, chunk, commands,
-   (u8 *)chunk->subh.data_hdr, sizeof(sctp_datahdr_t));
+  (u8 *)chunk->subh.data_hdr,
+  sizeof(struct sctp_datahdr));
default:
BUG();
}
@@ -3124,7 +3125,8 @@ sctp_disposition_t sctp_sf_eat_data_fast_4_4(struct net 
*net,
return SCTP_DISPOSITION_ABORT;
case SCTP_IERROR_PROTO_VIOLATION:
return sctp_sf_abort_violation(net, ep, asoc, chunk, commands,
-   (u8 *)chunk->subh.data_hdr, sizeof(sctp_datahdr_t));
+  (u8 *)chunk->subh.data_hdr,
+  sizeof(struct sctp_datahdr));
default:
BUG();
}
@@ -6197,7 +6199,7 @@ static int sctp_eat_data(const struct sctp_association 
*asoc,
 struct sctp_chunk *chunk,
 sctp_cmd_seq_t *commands)
 {
-   sctp_datahdr_t *data_hdr;
+   struct sctp_datahdr *data_hdr;
struct sctp_chunk *err;
size_t datalen;
sctp_verb_t deliver;
@@ -6210,8 +6212,9 @@ static int sctp_eat_data(const struct sctp_association 
*asoc,
u16 sid;
u8 ordered = 0;
 
-   data_hdr = chunk->subh.data_hdr = (sctp_datahdr_t *)chunk->skb->data;
-   skb_pull(chunk->skb, sizeof(sctp_datahdr_t));
+   data_hdr = (struct sctp_datahdr *)chunk->skb->data;
+   chunk->subh.data_hdr = data_hdr;
+   skb_pull(chunk->skb, sizeof(*data_hdr));
 
tsn = ntohl(data_hdr->tsn);
pr_debug("%s: TSN 0x%x\n", __func__, tsn);
-- 
2.1.0

[PATCH net-next 11/11] sctp: remove the typedef sctp_init_chunk_t

2017-06-29 Thread Xin Long

This patch is to remove the typedef sctp_init_chunk_t, and replace
with struct sctp_init_chunk in the places where it's using this
typedef.

Signed-off-by: Xin Long 
---
 include/linux/sctp.h   |  6 +++---
 include/net/sctp/command.h |  4 ++--
 include/net/sctp/structs.h |  4 ++--
 net/sctp/input.c   |  4 ++--
 net/sctp/sm_make_chunk.c   |  6 +++---
 net/sctp/sm_sideeffect.c   |  2 +-
 net/sctp/sm_statefuns.c| 28 ++--
 7 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index 5624195..99e8664 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -266,10 +266,10 @@ struct sctp_inithdr {
__u8  params[0];
 };
 
-typedef struct sctp_init_chunk {
+struct sctp_init_chunk {
struct sctp_chunkhdr chunk_hdr;
struct sctp_inithdr init_hdr;
-} sctp_init_chunk_t;
+};
 
 
 /* Section 3.3.2.1. IPv4 Address Parameter (5) */
@@ -341,7 +341,7 @@ typedef struct sctp_hmac_algo_param {
  *   The INIT ACK chunk is used to acknowledge the initiation of an SCTP
  *   association.
  */
-typedef sctp_init_chunk_t sctp_initack_chunk_t;
+typedef struct sctp_init_chunk sctp_initack_chunk_t;
 
 /* Section 3.3.3.1 State Cookie (7) */
 typedef struct sctp_cookie_param {
diff --git a/include/net/sctp/command.h b/include/net/sctp/command.h
index d4a20d0..d4679e7 100644
--- a/include/net/sctp/command.h
+++ b/include/net/sctp/command.h
@@ -132,7 +132,7 @@ typedef union {
struct sctp_association *asoc;
struct sctp_transport *transport;
struct sctp_bind_addr *bp;
-   sctp_init_chunk_t *init;
+   struct sctp_init_chunk *init;
struct sctp_ulpevent *ulpevent;
struct sctp_packet *packet;
sctp_sackhdr_t *sackh;
@@ -173,7 +173,7 @@ SCTP_ARG_CONSTRUCTOR(CHUNK, struct sctp_chunk *, chunk)
 SCTP_ARG_CONSTRUCTOR(ASOC, struct sctp_association *, asoc)
 SCTP_ARG_CONSTRUCTOR(TRANSPORT,struct sctp_transport *, transport)
 SCTP_ARG_CONSTRUCTOR(BA,   struct sctp_bind_addr *, bp)
-SCTP_ARG_CONSTRUCTOR(PEER_INIT,sctp_init_chunk_t *, init)
+SCTP_ARG_CONSTRUCTOR(PEER_INIT,struct sctp_init_chunk *, init)
 SCTP_ARG_CONSTRUCTOR(ULPEVENT,  struct sctp_ulpevent *, ulpevent)
 SCTP_ARG_CONSTRUCTOR(PACKET,   struct sctp_packet *, packet)
 SCTP_ARG_CONSTRUCTOR(SACKH,sctp_sackhdr_t *, sackh)
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 2393d2e..07c11fe 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1298,11 +1298,11 @@ int sctp_has_association(struct net *net, const union 
sctp_addr *laddr,
 
 int sctp_verify_init(struct net *net, const struct sctp_endpoint *ep,
 const struct sctp_association *asoc,
-enum sctp_cid cid, sctp_init_chunk_t *peer_init,
+enum sctp_cid cid, struct sctp_init_chunk *peer_init,
 struct sctp_chunk *chunk, struct sctp_chunk **err_chunk);
 int sctp_process_init(struct sctp_association *, struct sctp_chunk *chunk,
  const union sctp_addr *peer,
- sctp_init_chunk_t *init, gfp_t gfp);
+ struct sctp_init_chunk *init, gfp_t gfp);
 __u32 sctp_generate_tag(const struct sctp_endpoint *);
 __u32 sctp_generate_tsn(const struct sctp_endpoint *);
 
diff --git a/net/sctp/input.c b/net/sctp/input.c
index a9994c4..41eb2ec 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -1051,7 +1051,7 @@ static struct sctp_association 
*__sctp_rcv_init_lookup(struct net *net,
union sctp_addr *paddr = 
struct sctphdr *sh = sctp_hdr(skb);
union sctp_params params;
-   sctp_init_chunk_t *init;
+   struct sctp_init_chunk *init;
struct sctp_af *af;
 
/*
@@ -1070,7 +1070,7 @@ static struct sctp_association 
*__sctp_rcv_init_lookup(struct net *net,
/* Find the start of the TLVs and the end of the chunk.  This is
 * the region we search for address parameters.
 */
-   init = (sctp_init_chunk_t *)skb->data;
+   init = (struct sctp_init_chunk *)skb->data;
 
/* Walk the parameters looking for embedded addresses. */
sctp_walk_params(params, init, init_hdr.params) {
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 8b9ca10..3af4dd0 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2242,8 +2242,8 @@ static sctp_ierror_t sctp_verify_param(struct net *net,
 /* Verify the INIT packet before we process it.  */
 int sctp_verify_init(struct net *net, const struct sctp_endpoint *ep,
 const struct sctp_association *asoc, enum sctp_cid cid,
-sctp_init_chunk_t *peer_init, struct sctp_chunk *chunk,
-struct sctp_chunk **errp)
+struct sctp_init_chunk *peer_init,
+struct sctp_chunk *chunk, struct sctp_chunk **errp)
 {
union

[PATCH net-next 10/11] sctp: remove the typedef sctp_inithdr_t

2017-06-29 Thread Xin Long

This patch is to remove the typedef sctp_inithdr_t, and replace
with struct sctp_inithdr in the places where it's using this
typedef.

Signed-off-by: Xin Long 
---
 include/linux/sctp.h|  6 +++---
 net/netfilter/nf_conntrack_proto_sctp.c |  4 ++--
 net/sctp/sm_make_chunk.c|  4 ++--
 net/sctp/sm_statefuns.c | 12 ++--
 4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index 91c888f..5624195 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -257,18 +257,18 @@ enum { SCTP_DATA_FRAG_MASK = 0x03, };
  *  This chunk is used to initiate a SCTP association between two
  *  endpoints.
  */
-typedef struct sctp_inithdr {
+struct sctp_inithdr {
__be32 init_tag;
__be32 a_rwnd;
__be16 num_outbound_streams;
__be16 num_inbound_streams;
__be32 initial_tsn;
__u8  params[0];
-} sctp_inithdr_t;
+};
 
 typedef struct sctp_init_chunk {
struct sctp_chunkhdr chunk_hdr;
-   sctp_inithdr_t init_hdr;
+   struct sctp_inithdr init_hdr;
 } sctp_init_chunk_t;
 
 
diff --git a/net/netfilter/nf_conntrack_proto_sctp.c 
b/net/netfilter/nf_conntrack_proto_sctp.c
index b841a8a..31c6c8e 100644
--- a/net/netfilter/nf_conntrack_proto_sctp.c
+++ b/net/netfilter/nf_conntrack_proto_sctp.c
@@ -395,7 +395,7 @@ static int sctp_packet(struct nf_conn *ct,
/* If it is an INIT or an INIT ACK note down the vtag */
if (sch->type == SCTP_CID_INIT ||
sch->type == SCTP_CID_INIT_ACK) {
-   sctp_inithdr_t _inithdr, *ih;
+   struct sctp_inithdr _inithdr, *ih;
 
ih = skb_header_pointer(skb, offset + sizeof(_sch),
sizeof(_inithdr), &_inithdr);
@@ -471,7 +471,7 @@ static bool sctp_new(struct nf_conn *ct, const struct 
sk_buff *skb,
 
/* Copy the vtag into the state info */
if (sch->type == SCTP_CID_INIT) {
-   sctp_inithdr_t _inithdr, *ih;
+   struct sctp_inithdr _inithdr, *ih;
/* Sec 8.5.1 (A) */
if (sh->vtag)
return false;
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 3ed2108..8b9ca10 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -217,7 +217,7 @@ struct sctp_chunk *sctp_make_init(const struct 
sctp_association *asoc,
 {
struct net *net = sock_net(asoc->base.sk);
struct sctp_endpoint *ep = asoc->ep;
-   sctp_inithdr_t init;
+   struct sctp_inithdr init;
union sctp_params addrs;
size_t chunksize;
struct sctp_chunk *retval = NULL;
@@ -385,7 +385,7 @@ struct sctp_chunk *sctp_make_init_ack(const struct 
sctp_association *asoc,
 const struct sctp_chunk *chunk,
 gfp_t gfp, int unkparam_len)
 {
-   sctp_inithdr_t initack;
+   struct sctp_inithdr initack;
struct sctp_chunk *retval;
union sctp_params addrs;
struct sctp_sock *sp;
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 212fe76..71b6e3f 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -389,10 +389,10 @@ sctp_disposition_t sctp_sf_do_5_1B_init(struct net *net,
}
 
/* Grab the INIT header.  */
-   chunk->subh.init_hdr = (sctp_inithdr_t *)chunk->skb->data;
+   chunk->subh.init_hdr = (struct sctp_inithdr *)chunk->skb->data;
 
/* Tag the variable length parameters.  */
-   chunk->param_hdr.v = skb_pull(chunk->skb, sizeof(sctp_inithdr_t));
+   chunk->param_hdr.v = skb_pull(chunk->skb, sizeof(struct sctp_inithdr));
 
new_asoc = sctp_make_temp_asoc(ep, chunk, GFP_ATOMIC);
if (!new_asoc)
@@ -522,7 +522,7 @@ sctp_disposition_t sctp_sf_do_5_1C_ack(struct net *net,
return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
  commands);
/* Grab the INIT header.  */
-   chunk->subh.init_hdr = (sctp_inithdr_t *) chunk->skb->data;
+   chunk->subh.init_hdr = (struct sctp_inithdr *)chunk->skb->data;
 
/* Verify the INIT chunk before processing it. */
err_chunk = NULL;
@@ -576,7 +576,7 @@ sctp_disposition_t sctp_sf_do_5_1C_ack(struct net *net,
/* Tag the variable length parameters.  Note that we never
 * convert the parameters in an INIT chunk.
 */
-   chunk->param_hdr.v = skb_pull(chunk->skb, sizeof(sctp_inithdr_t));
+   chunk->param_hdr.v = skb_pull(chunk->skb, sizeof(struct sctp_inithdr));
 
initchunk = (sctp_init_chunk_t *) chunk->chunk_hdr;
 
@@ -1454,10 +1454,10 @@ static sctp_disposition_t sctp_sf_do_unexpected_init(
return sctp_sf_violation_chunklen(net, ep, asoc,

[PATCH net-next 06/11] sctp: remove the typedef sctp_param_t

2017-06-29 Thread Xin Long

This patch is to remove the typedef sctp_param_t, and replace with
struct sctp_paramhdr in the places where it's using this typedef.

It is also to remove the useless declaration sctp_addip_addr_config
and fix the lack of params for some other functions' declaration.

Signed-off-by: Xin Long 
---
 include/linux/sctp.h |  4 ++--
 include/net/sctp/sm.h| 14 ++
 net/sctp/sm_make_chunk.c |  2 +-
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index 142bb6a..5eecc0f 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -167,7 +167,7 @@ struct sctp_paramhdr {
__be16 length;
 };
 
-typedef enum {
+enum sctp_param {
 
/* RFC 2960 Section 3.3.5 */
SCTP_PARAM_HEARTBEAT_INFO   = cpu_to_be16(1),
@@ -207,7 +207,7 @@ typedef enum {
SCTP_PARAM_RESET_RESPONSE   = cpu_to_be16(0x0010),
SCTP_PARAM_RESET_ADD_OUT_STREAMS= cpu_to_be16(0x0011),
SCTP_PARAM_RESET_ADD_IN_STREAMS = cpu_to_be16(0x0012),
-} sctp_param_t; /* enum */
+}; /* enum */
 
 
 /* RFC 2960 Section 3.2.1
diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 47113f2..245eb22 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -325,19 +325,17 @@ void sctp_generate_heartbeat_event(unsigned long peer);
 void sctp_generate_reconf_event(unsigned long peer);
 void sctp_generate_proto_unreach_event(unsigned long peer);
 
-void sctp_ootb_pkt_free(struct sctp_packet *);
+void sctp_ootb_pkt_free(struct sctp_packet *packet);
 
-struct sctp_association *sctp_unpack_cookie(const struct sctp_endpoint *,
-  const struct sctp_association *,
-  struct sctp_chunk *,
+struct sctp_association *sctp_unpack_cookie(const struct sctp_endpoint *ep,
+  const struct sctp_association *asoc,
+  struct sctp_chunk *chunk,
   gfp_t gfp, int *err,
   struct sctp_chunk **err_chk_p);
-int sctp_addip_addr_config(struct sctp_association *, sctp_param_t,
-  struct sockaddr_storage*, int);
 
 /* 3rd level prototypes */
-__u32 sctp_generate_tag(const struct sctp_endpoint *);
-__u32 sctp_generate_tsn(const struct sctp_endpoint *);
+__u32 sctp_generate_tag(const struct sctp_endpoint *ep);
+__u32 sctp_generate_tsn(const struct sctp_endpoint *ep);
 
 /* Extern declarations for major data structures.  */
 extern sctp_timer_event_t *sctp_timer_events[SCTP_NUM_TIMEOUT_TYPES];
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 9f9d40c..3ed2108 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -1882,7 +1882,7 @@ struct __sctp_missing {
  * Report a missing mandatory parameter.
  */
 static int sctp_process_missing_param(const struct sctp_association *asoc,
- sctp_param_t paramtype,
+ enum sctp_param paramtype,
  struct sctp_chunk *chunk,
  struct sctp_chunk **errp)
 {
-- 
2.1.0

[PATCH net-next 05/11] sctp: remove the typedef sctp_paramhdr_t

2017-06-29 Thread Xin Long

This patch is to remove the typedef sctp_paramhdr_t, and replace
with struct sctp_paramhdr in the places where it's using this
typedef.

It is also to fix some indents and  use sizeof(variable) instead
of sizeof(type).

Signed-off-by: Xin Long 
---
 include/linux/sctp.h   | 44 +--
 include/net/sctp/sctp.h|  2 +-
 include/net/sctp/structs.h |  5 +++--
 net/sctp/associola.c   |  6 +++---
 net/sctp/auth.c| 22 +-
 net/sctp/endpointola.c |  7 ---
 net/sctp/sm_make_chunk.c   | 47 +++---
 net/sctp/sm_statefuns.c|  6 +++---
 net/sctp/socket.c  |  7 ---
 net/sctp/stream.c  |  4 ++--
 10 files changed, 79 insertions(+), 71 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index ffdccb4..142bb6a 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -162,10 +162,10 @@ enum { SCTP_CHUNK_FLAG_T = 0x01 };
  * Section 3.2.1 Optional/Variable-length Parmaeter Format.
  */
 
-typedef struct sctp_paramhdr {
+struct sctp_paramhdr {
__be16 type;
__be16 length;
-} sctp_paramhdr_t;
+};
 
 typedef enum {
 
@@ -274,37 +274,37 @@ typedef struct sctp_init_chunk {
 
 /* Section 3.3.2.1. IPv4 Address Parameter (5) */
 typedef struct sctp_ipv4addr_param {
-   sctp_paramhdr_t param_hdr;
+   struct sctp_paramhdr param_hdr;
struct in_addr  addr;
 } sctp_ipv4addr_param_t;
 
 /* Section 3.3.2.1. IPv6 Address Parameter (6) */
 typedef struct sctp_ipv6addr_param {
-   sctp_paramhdr_t param_hdr;
+   struct sctp_paramhdr param_hdr;
struct in6_addr addr;
 } sctp_ipv6addr_param_t;
 
 /* Section 3.3.2.1 Cookie Preservative (9) */
 typedef struct sctp_cookie_preserve_param {
-   sctp_paramhdr_t param_hdr;
+   struct sctp_paramhdr param_hdr;
__be32  lifespan_increment;
 } sctp_cookie_preserve_param_t;
 
 /* Section 3.3.2.1 Host Name Address (11) */
 typedef struct sctp_hostname_param {
-   sctp_paramhdr_t param_hdr;
+   struct sctp_paramhdr param_hdr;
uint8_t hostname[0];
 } sctp_hostname_param_t;
 
 /* Section 3.3.2.1 Supported Address Types (12) */
 typedef struct sctp_supported_addrs_param {
-   sctp_paramhdr_t param_hdr;
+   struct sctp_paramhdr param_hdr;
__be16 types[0];
 } sctp_supported_addrs_param_t;
 
 /* Appendix A. ECN Capable (32768) */
 typedef struct sctp_ecn_capable_param {
-   sctp_paramhdr_t param_hdr;
+   struct sctp_paramhdr param_hdr;
 } sctp_ecn_capable_param_t;
 
 /* ADDIP Section 3.2.6 Adaptation Layer Indication */
@@ -321,19 +321,19 @@ typedef struct sctp_supported_ext_param {
 
 /* AUTH Section 3.1 Random */
 typedef struct sctp_random_param {
-   sctp_paramhdr_t param_hdr;
+   struct sctp_paramhdr param_hdr;
__u8 random_val[0];
 } sctp_random_param_t;
 
 /* AUTH Section 3.2 Chunk List */
 typedef struct sctp_chunks_param {
-   sctp_paramhdr_t param_hdr;
+   struct sctp_paramhdr param_hdr;
__u8 chunks[0];
 } sctp_chunks_param_t;
 
 /* AUTH Section 3.3 HMAC Algorithm */
 typedef struct sctp_hmac_algo_param {
-   sctp_paramhdr_t param_hdr;
+   struct sctp_paramhdr param_hdr;
__be16 hmac_ids[0];
 } sctp_hmac_algo_param_t;
 
@@ -345,14 +345,14 @@ typedef sctp_init_chunk_t sctp_initack_chunk_t;
 
 /* Section 3.3.3.1 State Cookie (7) */
 typedef struct sctp_cookie_param {
-   sctp_paramhdr_t p;
+   struct sctp_paramhdr p;
__u8 body[0];
 } sctp_cookie_param_t;
 
 /* Section 3.3.3.1 Unrecognized Parameters (8) */
 typedef struct sctp_unrecognized_param {
-   sctp_paramhdr_t param_hdr;
-   sctp_paramhdr_t unrecognized;
+   struct sctp_paramhdr param_hdr;
+   struct sctp_paramhdr unrecognized;
 } sctp_unrecognized_param_t;
 
 
@@ -399,7 +399,7 @@ typedef struct sctp_sack_chunk {
  */
 
 typedef struct sctp_heartbeathdr {
-   sctp_paramhdr_t info;
+   struct sctp_paramhdr info;
 } sctp_heartbeathdr_t;
 
 typedef struct sctp_heartbeat_chunk {
@@ -639,7 +639,7 @@ struct sctp_fwdtsn_chunk {
  * report status of ASCONF processing.
  */
 typedef struct sctp_addip_param {
-   sctp_paramhdr_t param_hdr;
+   struct sctp_paramhdrparam_hdr;
__be32  crr_id;
 } sctp_addip_param_t;
 
@@ -724,7 +724,7 @@ struct sctp_reconf_chunk {
 };
 
 struct sctp_strreset_outreq {
-   sctp_paramhdr_t param_hdr;
+   struct sctp_paramhdr param_hdr;
__u32 request_seq;
__u32 response_seq;
__u32 send_reset_at_tsn;
@@ -732,18 +732,18 @@ struct sctp_strreset_outreq {
 };
 
 struct sctp_strreset_inreq {
-   sctp_paramhdr_t param_hdr;
+   struct sctp_paramhdr param_hdr;
__u32 request_seq;
__u16 list_of_streams[0];
 };
 
 struct sctp_strreset_tsnreq {
-   sctp_paramhdr_t param_hdr;
+   struct sctp_paramhdr param_hdr;
__u32 request_seq;
 };
 
 struct

[PATCH net-next 07/11] sctp: remove the typedef sctp_param_action_t

2017-06-29 Thread Xin Long

Remove this typedef, there is even no places using it.

Signed-off-by: Xin Long 
---
 include/linux/sctp.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index 5eecc0f..d5c0dda 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -216,12 +216,12 @@ enum sctp_param {
  *  not recognize the Parameter Type.
  *
  */
-typedef enum {
+enum {
SCTP_PARAM_ACTION_DISCARD = cpu_to_be16(0x),
SCTP_PARAM_ACTION_DISCARD_ERR = cpu_to_be16(0x4000),
SCTP_PARAM_ACTION_SKIP= cpu_to_be16(0x8000),
SCTP_PARAM_ACTION_SKIP_ERR= cpu_to_be16(0xc000),
-} sctp_param_action_t;
+};
 
 enum { SCTP_PARAM_ACTION_MASK = cpu_to_be16(0xc000), };
 
-- 
2.1.0

[PATCH net-next 01/11] sctp: remove the typedef sctp_sctphdr_t

2017-06-29 Thread Xin Long

This patch is to remove the typedef sctp_sctphdr_t, and replace
with struct sctphdr in the places where it's using this typedef.

It is also to fix some indents and use sizeof(variable) instead
of sizeof(type).

Signed-off-by: Xin Long 
---
 include/linux/sctp.h|  4 ++--
 net/netfilter/ipset/ip_set_getport.c|  4 ++--
 net/netfilter/ipvs/ip_vs_core.c |  6 +++---
 net/netfilter/ipvs/ip_vs_proto_sctp.c   | 15 +++
 net/netfilter/nf_conntrack_proto_sctp.c |  2 +-
 net/netfilter/nf_nat_proto_sctp.c   |  2 +-
 net/netfilter/xt_sctp.c | 16 
 7 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index 7a4804c..85540ec 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -57,12 +57,12 @@
 #include 
 
 /* Section 3.1.  SCTP Common Header Format */
-typedef struct sctphdr {
+struct sctphdr {
__be16 source;
__be16 dest;
__be32 vtag;
__le32 checksum;
-} sctp_sctphdr_t;
+};
 
 static inline struct sctphdr *sctp_hdr(const struct sk_buff *skb)
 {
diff --git a/net/netfilter/ipset/ip_set_getport.c 
b/net/netfilter/ipset/ip_set_getport.c
index 42c3e3b..3f09cdb 100644
--- a/net/netfilter/ipset/ip_set_getport.c
+++ b/net/netfilter/ipset/ip_set_getport.c
@@ -38,8 +38,8 @@ get_port(const struct sk_buff *skb, int protocol, unsigned 
int protooff,
break;
}
case IPPROTO_SCTP: {
-   sctp_sctphdr_t _sh;
-   const sctp_sctphdr_t *sh;
+   struct sctphdr _sh;
+   const struct sctphdr *sh;
 
sh = skb_header_pointer(skb, protooff, sizeof(_sh), &_sh);
if (!sh)
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index ad99c1c..6f39af9 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1038,8 +1038,8 @@ static int ip_vs_out_icmp_v6(struct netns_ipvs *ipvs, 
struct sk_buff *skb,
 static inline int is_sctp_abort(const struct sk_buff *skb, int nh_len)
 {
sctp_chunkhdr_t *sch, schunk;
-   sch = skb_header_pointer(skb, nh_len + sizeof(sctp_sctphdr_t),
-   sizeof(schunk), );
+   sch = skb_header_pointer(skb, nh_len + sizeof(struct sctphdr),
+sizeof(schunk), );
if (sch == NULL)
return 0;
if (sch->type == SCTP_CID_ABORT)
@@ -1072,7 +1072,7 @@ static inline bool is_new_conn(const struct sk_buff *skb,
case IPPROTO_SCTP: {
sctp_chunkhdr_t *sch, schunk;
 
-   sch = skb_header_pointer(skb, iph->len + sizeof(sctp_sctphdr_t),
+   sch = skb_header_pointer(skb, iph->len + sizeof(struct sctphdr),
 sizeof(schunk), );
if (sch == NULL)
return false;
diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c 
b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index 56f8e4b..6b38cad 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -16,15 +16,14 @@ sctp_conn_schedule(struct netns_ipvs *ipvs, int af, struct 
sk_buff *skb,
 {
struct ip_vs_service *svc;
sctp_chunkhdr_t _schunkh, *sch;
-   sctp_sctphdr_t *sh, _sctph;
+   struct sctphdr *sh, _sctph;
__be16 _ports[2], *ports = NULL;
 
if (likely(!ip_vs_iph_icmp(iph))) {
sh = skb_header_pointer(skb, iph->len, sizeof(_sctph), &_sctph);
if (sh) {
-   sch = skb_header_pointer(
-   skb, iph->len + sizeof(sctp_sctphdr_t),
-   sizeof(_schunkh), &_schunkh);
+   sch = skb_header_pointer(skb, iph->len + sizeof(_sctph),
+sizeof(_schunkh), &_schunkh);
if (sch && (sch->type == SCTP_CID_INIT ||
sysctl_sloppy_sctp(ipvs)))
ports = >source;
@@ -77,7 +76,7 @@ sctp_conn_schedule(struct netns_ipvs *ipvs, int af, struct 
sk_buff *skb,
return 1;
 }
 
-static void sctp_nat_csum(struct sk_buff *skb, sctp_sctphdr_t *sctph,
+static void sctp_nat_csum(struct sk_buff *skb, struct sctphdr *sctph,
  unsigned int sctphoff)
 {
sctph->checksum = sctp_compute_cksum(skb, sctphoff);
@@ -88,7 +87,7 @@ static int
 sctp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
  struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
 {
-   sctp_sctphdr_t *sctph;
+   struct sctphdr *sctph;
unsigned int sctphoff = iph->len;
bool payload_csum = false;
 
@@ -135,7 +134,7 @@ static int
 sctp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
  struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
 {
-   sctp_sctphdr_t *sctph;
+   struct

[PATCH net-next 04/11] sctp: remove the typedef sctp_cid_action_t

2017-06-29 Thread Xin Long

Remove this typedef, there is even no places using it.

Signed-off-by: Xin Long 
---
 include/linux/sctp.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index 6d7b884..ffdccb4 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -117,12 +117,12 @@ enum sctp_cid {
  *  the action that must be taken if the processing endpoint does not
  *  recognize the Chunk Type.
  */
-typedef enum {
+enum {
SCTP_CID_ACTION_DISCARD = 0x00,
SCTP_CID_ACTION_DISCARD_ERR = 0x40,
SCTP_CID_ACTION_SKIP= 0x80,
SCTP_CID_ACTION_SKIP_ERR= 0xc0,
-} sctp_cid_action_t;
+};
 
 enum { SCTP_CID_ACTION_MASK = 0xc0, };
 
-- 
2.1.0

[PATCH net-next 03/11] sctp: remove the typedef sctp_cid_t

2017-06-29 Thread Xin Long

This patch is to remove the typedef sctp_cid_t, and replace
with struct sctp_cid in the places where it's using this
typedef.

Signed-off-by: Xin Long 
---
 include/linux/sctp.h | 4 ++--
 include/net/sctp/auth.h  | 6 --
 include/net/sctp/constants.h | 4 ++--
 include/net/sctp/structs.h   | 2 +-
 net/sctp/auth.c  | 6 +++---
 net/sctp/sm_make_chunk.c | 4 ++--
 net/sctp/sm_statetable.c | 4 ++--
 7 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index 9ad5b9e..6d7b884 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -82,7 +82,7 @@ struct sctp_chunkhdr {
  * Value field. It takes a value from 0 to 254. The value of 255 is
  * reserved for future use as an extension field.
  */
-typedef enum {
+enum sctp_cid {
SCTP_CID_DATA   = 0,
 SCTP_CID_INIT  = 1,
 SCTP_CID_INIT_ACK  = 2,
@@ -109,7 +109,7 @@ typedef enum {
SCTP_CID_ASCONF = 0xC1,
SCTP_CID_ASCONF_ACK = 0x80,
SCTP_CID_RECONF = 0x82,
-} sctp_cid_t; /* enum */
+}; /* enum */
 
 
 /* Section 3.2
diff --git a/include/net/sctp/auth.h b/include/net/sctp/auth.h
index 9b9fb12..171244b 100644
--- a/include/net/sctp/auth.h
+++ b/include/net/sctp/auth.h
@@ -97,8 +97,10 @@ void sctp_auth_asoc_set_default_hmac(struct sctp_association 
*asoc,
 struct sctp_hmac_algo_param *hmacs);
 int sctp_auth_asoc_verify_hmac_id(const struct sctp_association *asoc,
__be16 hmac_id);
-int sctp_auth_send_cid(sctp_cid_t chunk, const struct sctp_association *asoc);
-int sctp_auth_recv_cid(sctp_cid_t chunk, const struct sctp_association *asoc);
+int sctp_auth_send_cid(enum sctp_cid chunk,
+  const struct sctp_association *asoc);
+int sctp_auth_recv_cid(enum sctp_cid chunk,
+  const struct sctp_association *asoc);
 void sctp_auth_calculate_hmac(const struct sctp_association *asoc,
struct sk_buff *skb,
struct sctp_auth_chunk *auth, gfp_t gfp);
diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
index b07a745..02e867b 100644
--- a/include/net/sctp/constants.h
+++ b/include/net/sctp/constants.h
@@ -130,7 +130,7 @@ typedef enum {
  */
 
 typedef union {
-   sctp_cid_t chunk;
+   enum sctp_cid chunk;
sctp_event_timeout_t timeout;
sctp_event_other_t other;
sctp_event_primitive_t primitive;
@@ -141,7 +141,7 @@ static inline sctp_subtype_t\
 SCTP_ST_## _name (_type _arg)  \
 { sctp_subtype_t _retval; _retval._elt = _arg; return _retval; }
 
-SCTP_SUBTYPE_CONSTRUCTOR(CHUNK,sctp_cid_t, chunk)
+SCTP_SUBTYPE_CONSTRUCTOR(CHUNK,enum sctp_cid,  chunk)
 SCTP_SUBTYPE_CONSTRUCTOR(TIMEOUT,  sctp_event_timeout_t,   timeout)
 SCTP_SUBTYPE_CONSTRUCTOR(OTHER,sctp_event_other_t, other)
 SCTP_SUBTYPE_CONSTRUCTOR(PRIMITIVE,sctp_event_primitive_t, primitive)
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index e26763b..9e9605e 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1297,7 +1297,7 @@ int sctp_has_association(struct net *net, const union 
sctp_addr *laddr,
 
 int sctp_verify_init(struct net *net, const struct sctp_endpoint *ep,
 const struct sctp_association *asoc,
-sctp_cid_t, sctp_init_chunk_t *peer_init,
+enum sctp_cid cid, sctp_init_chunk_t *peer_init,
 struct sctp_chunk *chunk, struct sctp_chunk **err_chunk);
 int sctp_process_init(struct sctp_association *, struct sctp_chunk *chunk,
  const union sctp_addr *peer,
diff --git a/net/sctp/auth.c b/net/sctp/auth.c
index f99d485..7171dd3 100644
--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -632,7 +632,7 @@ void sctp_auth_asoc_set_default_hmac(struct 
sctp_association *asoc,
 
 
 /* Check to see if the given chunk is supposed to be authenticated */
-static int __sctp_auth_cid(sctp_cid_t chunk, struct sctp_chunks_param *param)
+static int __sctp_auth_cid(enum sctp_cid chunk, struct sctp_chunks_param 
*param)
 {
unsigned short len;
int found = 0;
@@ -668,7 +668,7 @@ static int __sctp_auth_cid(sctp_cid_t chunk, struct 
sctp_chunks_param *param)
 }
 
 /* Check if peer requested that this chunk is authenticated */
-int sctp_auth_send_cid(sctp_cid_t chunk, const struct sctp_association *asoc)
+int sctp_auth_send_cid(enum sctp_cid chunk, const struct sctp_association 
*asoc)
 {
if (!asoc)
return 0;
@@ -680,7 +680,7 @@ int sctp_auth_send_cid(sctp_cid_t chunk, const struct 
sctp_association *asoc)
 }
 
 /* Check if we requested that peer authenticate this chunk. */
-int sctp_auth_recv_cid(sctp_cid_t chunk, const

[PATCH net-next 02/11] sctp: remove the typedef sctp_chunkhdr_t

2017-06-29 Thread Xin Long

This patch is to remove the typedef sctp_chunkhdr_t, and replace
with struct sctp_chunkhdr in the places where it's using this
typedef.

It is also to fix some indents and use sizeof(variable) instead
of sizeof(type)., especially in sctp_new.

Signed-off-by: Xin Long 
---
 include/linux/sctp.h| 34 -
 include/net/sctp/sctp.h |  2 +-
 net/netfilter/ipvs/ip_vs_core.c |  4 +-
 net/netfilter/ipvs/ip_vs_proto_sctp.c   |  6 +--
 net/netfilter/nf_conntrack_proto_sctp.c | 29 +++---
 net/netfilter/xt_sctp.c |  4 +-
 net/sctp/input.c| 20 +-
 net/sctp/inqueue.c  | 15 
 net/sctp/sm_make_chunk.c| 17 +
 net/sctp/sm_sideeffect.c|  5 ++-
 net/sctp/sm_statefuns.c | 67 +
 net/sctp/ulpevent.c |  2 +-
 12 files changed, 102 insertions(+), 103 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index 85540ec..9ad5b9e 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -70,11 +70,11 @@ static inline struct sctphdr *sctp_hdr(const struct sk_buff 
*skb)
 }
 
 /* Section 3.2.  Chunk Field Descriptions. */
-typedef struct sctp_chunkhdr {
+struct sctp_chunkhdr {
__u8 type;
__u8 flags;
__be16 length;
-} sctp_chunkhdr_t;
+};
 
 
 /* Section 3.2.  Chunk Type Values.
@@ -236,8 +236,8 @@ typedef struct sctp_datahdr {
 } sctp_datahdr_t;
 
 typedef struct sctp_data_chunk {
-sctp_chunkhdr_t chunk_hdr;
-sctp_datahdr_t  data_hdr;
+   struct sctp_chunkhdr chunk_hdr;
+   sctp_datahdr_t  data_hdr;
 } sctp_data_chunk_t;
 
 /* DATA Chuck Specific Flags */
@@ -267,7 +267,7 @@ typedef struct sctp_inithdr {
 } sctp_inithdr_t;
 
 typedef struct sctp_init_chunk {
-   sctp_chunkhdr_t chunk_hdr;
+   struct sctp_chunkhdr chunk_hdr;
sctp_inithdr_t init_hdr;
 } sctp_init_chunk_t;
 
@@ -386,7 +386,7 @@ typedef struct sctp_sackhdr {
 } sctp_sackhdr_t;
 
 typedef struct sctp_sack_chunk {
-   sctp_chunkhdr_t chunk_hdr;
+   struct sctp_chunkhdr chunk_hdr;
sctp_sackhdr_t sack_hdr;
 } sctp_sack_chunk_t;
 
@@ -403,7 +403,7 @@ typedef struct sctp_heartbeathdr {
 } sctp_heartbeathdr_t;
 
 typedef struct sctp_heartbeat_chunk {
-   sctp_chunkhdr_t chunk_hdr;
+   struct sctp_chunkhdr chunk_hdr;
sctp_heartbeathdr_t hb_hdr;
 } sctp_heartbeat_chunk_t;
 
@@ -413,7 +413,7 @@ typedef struct sctp_heartbeat_chunk {
  * chunk descriptor.
  */
 typedef struct sctp_abort_chunk {
-sctp_chunkhdr_t uh;
+   struct sctp_chunkhdr uh;
 } sctp_abort_chunk_t;
 
 
@@ -425,8 +425,8 @@ typedef struct sctp_shutdownhdr {
 } sctp_shutdownhdr_t;
 
 struct sctp_shutdown_chunk_t {
-sctp_chunkhdr_tchunk_hdr;
-sctp_shutdownhdr_t shutdown_hdr;
+   struct sctp_chunkhdr chunk_hdr;
+   sctp_shutdownhdr_t shutdown_hdr;
 };
 
 /* RFC 2960.  Section 3.3.10 Operation Error (ERROR) (9) */
@@ -438,8 +438,8 @@ typedef struct sctp_errhdr {
 } sctp_errhdr_t;
 
 typedef struct sctp_operr_chunk {
-sctp_chunkhdr_t chunk_hdr;
-   sctp_errhdr_t   err_hdr;
+   struct sctp_chunkhdr chunk_hdr;
+   sctp_errhdr_t err_hdr;
 } sctp_operr_chunk_t;
 
 /* RFC 2960 3.3.10 - Operation Error
@@ -528,7 +528,7 @@ typedef struct sctp_ecnehdr {
 } sctp_ecnehdr_t;
 
 typedef struct sctp_ecne_chunk {
-   sctp_chunkhdr_t chunk_hdr;
+   struct sctp_chunkhdr chunk_hdr;
sctp_ecnehdr_t ence_hdr;
 } sctp_ecne_chunk_t;
 
@@ -540,7 +540,7 @@ typedef struct sctp_cwrhdr {
 } sctp_cwrhdr_t;
 
 typedef struct sctp_cwr_chunk {
-   sctp_chunkhdr_t chunk_hdr;
+   struct sctp_chunkhdr chunk_hdr;
sctp_cwrhdr_t cwr_hdr;
 } sctp_cwr_chunk_t;
 
@@ -649,7 +649,7 @@ typedef struct sctp_addiphdr {
 } sctp_addiphdr_t;
 
 typedef struct sctp_addip_chunk {
-   sctp_chunkhdr_t chunk_hdr;
+   struct sctp_chunkhdr chunk_hdr;
sctp_addiphdr_t addip_hdr;
 } sctp_addip_chunk_t;
 
@@ -709,7 +709,7 @@ typedef struct sctp_authhdr {
 } sctp_authhdr_t;
 
 typedef struct sctp_auth_chunk {
-   sctp_chunkhdr_t chunk_hdr;
+   struct sctp_chunkhdr chunk_hdr;
sctp_authhdr_t auth_hdr;
 } sctp_auth_chunk_t;
 
@@ -719,7 +719,7 @@ struct sctp_infox {
 };
 
 struct sctp_reconf_chunk {
-   sctp_chunkhdr_t chunk_hdr;
+   struct sctp_chunkhdr chunk_hdr;
__u8 params[0];
 };
 
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 069582e..d756bd0 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -478,7 +478,7 @@ _sctp_walk_errors((err), (chunk_hdr), 
ntohs((chunk_hdr)->length))
 
 #define _sctp_walk_errors(err, chunk_hdr, end)\
 for (err = (sctp_errhdr_t *)((void *)chunk_hdr + \
-   sizeof(sctp_chunkhdr_t));\
+   sizeof(struct sctp_chunkhdr));\
  (void *)err <= (void *)chunk_hdr + end - ntohs(err->length)

[PATCH net-next 00/11] sctp: remove typedefs from structures part 1

2017-06-29 Thread Xin Long

As we know, typedef is suggested not to use in kernel, even checkpatch.pl
also gives warnings about it. Now sctp is using it for many structures.

All this kind of typedef's using should be removed. As the 1st part, this
patchset is to remove it for 11 basic structures in linux/sctp.h. It is
also to fix some indents.

No any code's logic is changed in these patches, only cleaning up.

Xin Long (11):
  sctp: remove the typedef sctp_sctphdr_t
  sctp: remove the typedef sctp_chunkhdr_t
  sctp: remove the typedef sctp_cid_t
  sctp: remove the typedef sctp_cid_action_t
  sctp: remove the typedef sctp_paramhdr_t
  sctp: remove the typedef sctp_param_t
  sctp: remove the typedef sctp_param_action_t
  sctp: remove the typedef sctp_datahdr_t
  sctp: remove the typedef sctp_data_chunk_t
  sctp: remove the typedef sctp_inithdr_t
  sctp: remove the typedef sctp_init_chunk_t

 include/linux/sctp.h| 118 ++--
 include/net/sctp/auth.h |   6 +-
 include/net/sctp/command.h  |   4 +-
 include/net/sctp/constants.h|   6 +-
 include/net/sctp/sctp.h |   4 +-
 include/net/sctp/sm.h   |  16 ++--
 include/net/sctp/structs.h  |   9 ++-
 net/netfilter/ipset/ip_set_getport.c|   4 +-
 net/netfilter/ipvs/ip_vs_core.c |  10 +--
 net/netfilter/ipvs/ip_vs_proto_sctp.c   |  21 +++--
 net/netfilter/nf_conntrack_proto_sctp.c |  33 
 net/netfilter/nf_nat_proto_sctp.c   |   2 +-
 net/netfilter/xt_sctp.c |  20 ++---
 net/sctp/associola.c|   6 +-
 net/sctp/auth.c |  28 ---
 net/sctp/endpointola.c  |   7 +-
 net/sctp/input.c|  24 +++---
 net/sctp/inqueue.c  |  15 ++--
 net/sctp/output.c   |   4 +-
 net/sctp/sm_make_chunk.c|  80 +--
 net/sctp/sm_sideeffect.c|   7 +-
 net/sctp/sm_statefuns.c | 132 
 net/sctp/sm_statetable.c|   4 +-
 net/sctp/socket.c   |   7 +-
 net/sctp/stream.c   |   4 +-
 net/sctp/ulpevent.c |   2 +-
 net/sctp/ulpqueue.c |   2 +-
 27 files changed, 292 insertions(+), 283 deletions(-)

-- 
2.1.0

[PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-29 Thread Mikulas Patocka

The __vmalloc function has a parameter gfp_mask with the allocation flags,
however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
pages are allocated with the specified gfp flags, but the pagetables are
always allocated with GFP_KERNEL. This allocation can cause unexpected
recursion into the filesystem or I/O subsystem.

It is not practical to extend page table allocation routines with gfp
flags because it would require modification of architecture-specific code
in all architecturs. However, the process can temporarily request that all
allocations are done with GFP_NOFS or GFP_NOIO with with the functions
memalloc_nofs_save and memalloc_noio_save.

This patch makes the vmalloc code use memalloc_nofs_save or
memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
__GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
flag.

The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
before the call to __vmalloc.

Signed-off-by: Mikulas Patocka 

---
 drivers/md/dm-bufio.c |   24 +---
 drivers/md/dm-ioctl.c |6 +-
 fs/xfs/kmem.c |   14 --
 mm/util.c |6 +++---
 mm/vmalloc.c  |   18 +-
 5 files changed, 22 insertions(+), 46 deletions(-)

Index: linux-2.6/mm/vmalloc.c
===
--- linux-2.6.orig/mm/vmalloc.c
+++ linux-2.6/mm/vmalloc.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
unsigned int nr_pages, array_size, i;
const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
+   unsigned noio_flag;
+   int r;
 
nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
array_size = (nr_pages * sizeof(struct page *));
@@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
cond_resched();
}
 
-   if (map_vm_area(area, prot, pages))
+   if (unlikely(!(gfp_mask & __GFP_IO)))
+   noio_flag = memalloc_noio_save();
+   else if (unlikely(!(gfp_mask & __GFP_FS)))
+   noio_flag = memalloc_nofs_save();
+
+   r = map_vm_area(area, prot, pages);
+
+   if (unlikely(!(gfp_mask & __GFP_IO)))
+   memalloc_noio_restore(noio_flag);
+   else if (unlikely(!(gfp_mask & __GFP_FS)))
+   memalloc_nofs_restore(noio_flag);
+
+   if (unlikely(r))
goto fail;
+
return area->addr;
 
 fail:
Index: linux-2.6/mm/util.c
===
--- linux-2.6.orig/mm/util.c
+++ linux-2.6/mm/util.c
@@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
void *ret;
 
/*
-* vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
tables)
-* so the given set of flags has to be compatible.
+* vmalloc uses blocking allocations for some internal allocations
+* (e.g page tables) so the given set of flags has to be compatible.
 */
-   WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
+   WARN_ON_ONCE(!gfpflags_allow_blocking(flags));
 
/*
 * We want to attempt a large physically contiguous block first because
Index: linux-2.6/drivers/md/dm-bufio.c
===
--- linux-2.6.orig/drivers/md/dm-bufio.c
+++ linux-2.6/drivers/md/dm-bufio.c
@@ -386,9 +386,6 @@ static void __cache_size_refresh(void)
 static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
   enum data_mode *data_mode)
 {
-   unsigned noio_flag;
-   void *ptr;
-
if (c->block_size <= DM_BUFIO_BLOCK_SIZE_SLAB_LIMIT) {
*data_mode = DATA_MODE_SLAB;
return kmem_cache_alloc(DM_BUFIO_CACHE(c), gfp_mask);
@@ -402,26 +399,7 @@ static void *alloc_buffer_data(struct dm
}
 
*data_mode = DATA_MODE_VMALLOC;
-
-   /*
-* __vmalloc allocates the data pages and auxiliary structures with
-* gfp_flags that were specified, but pagetables are always allocated
-* with GFP_KERNEL, no matter what was specified as gfp_mask.
-*
-* Consequently, we must set per-process flag PF_MEMALLOC_NOIO so that
-* all allocations done by this process (including pagetables) are done
-* as if GFP_NOIO was specified.
-*/
-
-   if (gfp_mask & __GFP_NORETRY)
-   noio_flag = memalloc_noio_save();
-
-   ptr = __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL);
-
-   if (gfp_mask &

Re: kernel (master) build failure w. !CONFIG_NET_RX_BUSY_POLL

2017-06-29 Thread Mike Galbraith

On Thu, 2017-06-29 at 15:35 -0400, David Miller wrote:
> From: Mike Galbraith 
> Date: Wed, 28 Jun 2017 10:01:00 +0200
> 
> > Greetings network wizards,
> > 
> > The latest RT explicitly disables CONFIG_NET_RX_BUSY_POLL, thus
> > uncovering $subject.  Below is what I did about it.
> > 
> >   -Mike
> > 
> > net: Move napi_hash_add/del() inside CONFIG_NET_RX_BUSY_POLL
> > 
> > Since 545cd5e5ec54 ("net: Busy polling should ignore sender CPUs"),
> > kernel build fails when CONFIG_NET_RX_BUSY_POLL is disabled.  Move
> > napi_hash_add/del() accordingly.
> > 
> > Banged-upon-by: Mike Galbraith 
> 
> First, this is in no way a standard signoff or ack tag.

It's not intended to be, it's intended to stress that this in not a
submission, it's a local fix for a problem that network folks will
likely fix up differently... but who knows, showing what I did about it
after taking a look could perhaps save a wee bit of labor, so...

Hohum, so much for good intentions.  I though I was clearly showing
that separation of the polling business was not as complete as other
changes led me to believe it is intended to be.

> Second, you must provide the complete set of build failure details
> so that anyone just reading your posting can understand and fully
> review your patch without having to go anywhere else for the
> pertinent information.

Here's the naked failure instead.

  CC  net/core/dev.o
net/core/dev.c: In function ‘napi_hash_add’:
net/core/dev.c:5315:43: error: ‘MIN_NAPI_ID’ undeclared (first use in this 
function)
   if (unlikely(++napi_gen_id < MIN_NAPI_ID))
   ^
net/core/dev.c:5315:43: note: each undeclared identifier is reported only once 
for each function it appears in
scripts/Makefile.build:302: recipe for target 'net/core/dev.o' failed
make[1]: *** [net/core/dev.o] Error 1
Makefile:1663: recipe for target 'net/core/dev.o' failed
make: *** [net/core/dev.o] Error 2

-Mike

Re: [PATCH] mm: convert three more cases to kvmalloc

2017-06-29 Thread Mikulas Patocka



On Thu, 29 Jun 2017, Michal Hocko wrote:

> On Wed 28-06-17 23:24:10, Mikulas Patocka wrote:
> [...]
> > From: Mikulas Patocka 
> > 
> > The patch a7c3e901 ("mm: introduce kv[mz]alloc helpers") converted a lot 
> > of kernel code to kvmalloc. This patch converts three more forgotten 
> > cases.
> 
> Thanks! I have two remarks below but other than that feel free to add
> 
> > Signed-off-by: Mikulas Patocka 
> 
> Acked-by: Michal Hocko 
> [...]
> > Index: linux-2.6/kernel/bpf/syscall.c
> > ===
> > --- linux-2.6.orig/kernel/bpf/syscall.c
> > +++ linux-2.6/kernel/bpf/syscall.c
> > @@ -58,16 +58,7 @@ void *bpf_map_area_alloc(size_t size)
> >  * trigger under memory pressure as we really just want to
> >  * fail instead.
> >  */
> > -   const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
> > -   void *area;
> > -
> > -   if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> > -   area = kmalloc(size, GFP_USER | flags);
> > -   if (area != NULL)
> > -   return area;
> > -   }
> > -
> > -   return __vmalloc(size, GFP_KERNEL | flags, PAGE_KERNEL);
> > +   return kvmalloc(size, GFP_USER | __GFP_NOWARN | __GFP_NORETRY | 
> > __GFP_ZERO);
> 
> kvzalloc without additional flags would be more appropriate.
> __GFP_NORETRY is explicitly documented as non-supported

How is __GFP_NORETRY non-supported?

> and NOWARN wouldn't be applied everywhere in the vmalloc path.

__GFP_NORETRY and __GFP_NOWARN wouldn't be applied in the page-table 
allocation and they would be applied in the page allocation - that seems 
acceptable.

But the problem here is that if the system is under memory stress, 
__GFP_NORETRY allocations would randomly fail (they would fail for example 
if there's a plenty of free swap space and the system is busy swapping) 
and that would make the BFP creation code randomly fail.

BPF maintainers, please explain, how are you dealing with the random 
memory allocation failures? Is there some other code in the BPF stack that 
retries the failed allocations?

> >  }
> >  
> >  void bpf_map_area_free(void *area)
> > Index: linux-2.6/kernel/cgroup/cgroup-v1.c
> > ===
> > --- linux-2.6.orig/kernel/cgroup/cgroup-v1.c
> > +++ linux-2.6/kernel/cgroup/cgroup-v1.c
> > @@ -184,15 +184,10 @@ struct cgroup_pidlist {
> >  /*
> >   * The following two functions "fix" the issue where there are more pids
> >   * than kmalloc will give memory for; in such cases, we use vmalloc/vfree.
> > - * TODO: replace with a kernel-wide solution to this problem
> >   */
> > -#define PIDLIST_TOO_LARGE(c) ((c) * sizeof(pid_t) > (PAGE_SIZE * 2))
> >  static void *pidlist_allocate(int count)
> >  {
> > -   if (PIDLIST_TOO_LARGE(count))
> > -   return vmalloc(count * sizeof(pid_t));
> > -   else
> > -   return kmalloc(count * sizeof(pid_t), GFP_KERNEL);
> > +   return kvmalloc(count * sizeof(pid_t), GFP_KERNEL);
> >  }
> 
> I would rather use kvmalloc_array to have an overflow protection as
> well.

Yes.

Mikulas

> >  
> >  static void pidlist_free(void *p)
> 
> -- 
> Michal Hocko
> SUSE Labs
>

Re: RFC: bring UP 'lo' by default after creating new netns

2017-06-29 Thread Eric W. Biederman

"Mahesh Bandewar (महेश बंडेवार)"  writes:

> Creation of new network namespace is almost always followed up by
> bringing up the loopback device.
>
> ip netns add foo
> ip -netns foo link set lo up
>
> I'm not sure if there are any consequences of bringing the device UP
> at the creation of network-namespace.

Hard coded in net/core/net_namespace.c:copy_net_ns is definitely the
wrong place in the code for something like this.  If this lives anywhere
it should live in driver/net/loopback.c, or possibly in
net/core/dev.c:net_dev_init.

If we want this we want to match what we do when we the primary network
namespace.  Just so that there are no unneeded surprises with network
namespaces.

Eric

Re: [PATCH v1 1/2] dt-binding: ptp: add bindings document for dte based ptp clock

2017-06-29 Thread Scott Branden


Hi Rob,


On 17-06-29 03:40 PM, Rob Herring wrote:

On Tue, Jun 27, 2017 at 12:10 AM, Scott Branden
 wrote:

Hi Rob/Florian,

Thanks for input but still don't see any need for SoC specific
compatible stings.  IP revision specific yes.


On 17-06-22 06:04 PM, Florian Fainelli wrote:

On 06/22/2017 05:42 PM, Scott Branden wrote:


On 17-06-21 08:19 PM, Rob Herring wrote:

On Tue, Jun 20, 2017 at 3:48 PM, Scott Branden
 wrote:

Hi Rob,


On 17-06-18 07:04 AM, Rob Herring wrote:

On Mon, Jun 12, 2017 at 01:26:00PM -0700, Arun Parameswaran wrote:

Add device tree binding documentation for the Broadcom DTE
PTP clock driver.

Signed-off-by: Arun Parameswaran 
---
 Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt | 13
+
 1 file changed, 13 insertions(+)
 create mode 100644
Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt

diff --git a/Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt
b/Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt
new file mode 100644
index 000..07590bc
--- /dev/null
+++ b/Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt
@@ -0,0 +1,13 @@
+* Broadcom Digital Timing Engine(DTE) based PTP clock driver

Bindings describe h/w, not drivers.


+
+Required properties:
+- compatible: should be "brcm,ptp-dte"

Looks too generic. You need SoC specific compatible strings.

Rob, could you please help me understand the use of adding SoC specific
compatible strings.
I still don't get it.

It's my understanding that the SoC compatibility string is to future
proof
against bugs/incompatibilities
between different versions of the hardware block due to integration
issues
or any other reason.
You can then compare in your driver because the strings were already
used in
the dtb.

That would make sense if you can't already differentiate what SoC you
are
running on.
But the SoC is already specified in the root of the device tree in the
compatible string?
Why can't you just use of_machine_is_compatible inside your driver when
needed?

Use of of_machine_is_compatible in drivers will result in the same
mess we had with machine_is_X defines pre-DT. It practically
guarantees that you must update the driver for a new SoC (with
fallback compatibles you don't). Plus the matching logic for
of_machine_is_compatible is open coded logic in every driver which is
worse IMO than having a standard match table.

I don't understand what you mean by fallback compatible then.

Let's say I have 3 SoCs that each contain the same ip block.
You want us to add a fallback compatibility per SoC, is that correct?

I think Rob meant a fallback compatibility for the particular block.
E.g: brcm,iproc-ptp is the fallback compatible string, but in your
SoC-specific DTS, you would have at least:

compatible = "brcm,cygnus-ptp", "brcm,iproc-ptp";

Where cygnus-ptp is more specific than iproc-ptp

Then, if there is a workaround discovered in a particular SoC the driver
can be updated in the future without changing the dtb.

Then, the block gets added to a 4th SoC.
You want us to another new compatibility string for the new SoC?
If the new SoC has a bug then the driver has to be updated whether it is
in uses the fallback compatible or machine_is_compatible string.

There is no difference in amount of code added to a driver when a new
SoC is introduced into the system that has bugs that need to be handled
by the driver.

The difference is in your recommendation we need to go through all the
drivers used by the new SoC and add fallback compatibility strings.

Not really, the fallback is what the driver should be matching by
default (hence the name fallback) and if and only if you need to have a
SoC-specific behavior in your driver (because of bugs, or slight
differences) would you be matching this SoC-specific compatible string
to capture that and key the driver behavior based off that.


Then, we have to modify all the devicetree documentation for all the
drivers.  Then, we have to ensure that all dts files populate this new
fallback string (even if it is unused).  We don't see the benefit in
doing any of that.  Using machine_is_compatible and having less
compatibility strings to deal appears much cleaner and more foolproof
for all situations.

When you introduce a new SoC, you would update all the bindings for the
devices (not drivers) that are susceptible to be used by this SoC. If
all goes well, your SoC has little bugs, and your drivers don't even
need to see an update because they are already matching the fallback
compatible string.

That's what I understand from the suggestion but I may be totally off
rails here.

In this particular case, the IP is integrated within and outside a family of
SoCs
(the exact same block is in Capri for example).
The only fallback compatablity string that makes sense is "brcm,
dte-ptp-v1".
When we integrate a new version of this IP we would add "brcm, dte-ptp-v2".

Unless you

linux-next: manual merge of the net-next tree with Linus' tree

2017-06-29 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/ethernet/rocker/rocker_ofdpa.c

between commit:

  acb4b7df48b5 ("rocker: move dereference before free")

from Linus' tree and commit:

  00fc0c51e35b ("rocker: Change world_ops API and implementation to be 
switchdev independant")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/ethernet/rocker/rocker_ofdpa.c
index a9ce82d3e9cf,bd0e3f157e9e..
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@@ -1505,10 -1409,10 +1409,10 @@@ static int ofdpa_port_ipv4_nh(struct of
*index = entry->index;
resolved = false;
} else if (removing) {
 -  ofdpa_neigh_del(found);
*index = found->index;
-   ofdpa_neigh_del(trans, found);
++  ofdpa_neigh_del(found);
} else if (updating) {
-   ofdpa_neigh_update(found, trans, NULL, false);
+   ofdpa_neigh_update(found, NULL, false);
resolved = !is_zero_ether_addr(found->eth_dst);
*index = found->index;
} else {

Re: [PATCH RFC 0/2] kproxy: Kernel Proxy

2017-06-29 Thread Thomas Graf

On 29 June 2017 at 16:21, Tom Herbert  wrote:
> I think the main part of that discussion was around stream parser
> which is needed for message delineation. For a 1:1 proxy,  KCM is
> probably overkill (the whole KCM data path and lock becomes
> superfluous). Also, there's no concept of creating a whole message
> before routing it, in the 1:1 case we should let the message pass
> through once it's cleared by the filter (this is the strparser change
> I referred to). As I mentioned, for L7 load balancing we would want a
> multiplexor probably also M:N, but the structure is different since
> there's still no user facing sockets, they're all TCP for instance.
> IMO, the 1:1 proxy case is compelling to solve in itself...

I see. I was definitely thinking m:n. We should definitely evaluate
whether it makes sense to have a specific 1:1 implementation if we
need m:n anyway. For L7 LB, m:n seems obvious as a particular L4
connection may act as a transport for multiple requests bidirectional.
KCM looks like a good starting point for that.

When I talked about enqueueing entire messages, the main concern is to
buffer up the payload after the TLS handshake to the point to where a
forwarding decision can be made. I would definitely not advocate to
buffer entire messages before starting to forward.

[PATCH net-next v3] datapath: Avoid using stack larger than 1024.

2017-06-29 Thread nickcooper-zhangtonghao

From: Tonghao Zhang 

When compiling OvS-master on 4.4.0-81 kernel,
there is a warning:

CC [M]  /root/ovs/datapath/linux/datapath.o
/root/ovs/datapath/linux/datapath.c: In function
'ovs_flow_cmd_set':
/root/ovs/datapath/linux/datapath.c:1221:1: warning:
the frame size of 1040 bytes is larger than 1024 bytes
[-Wframe-larger-than=]

This patch factors out match-init and action-copy to avoid
"Wframe-larger-than=1024" warning. Because mask is only
used to get actions, we new a function to save some
stack space.

Signed-off-by: Tonghao Zhang 
Acked-by: Pravin B Shelar 
---
 net/openvswitch/datapath.c | 81 
++---
 1 file changed, 58 insertions(+), 23 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index c85029c..fb9f114 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1100,6 +1100,58 @@ static struct sw_flow_actions *get_flow_actions(struct 
net *net,
return acts;
 }
 
+/* Factor out match-init and action-copy to avoid
+ * "Wframe-larger-than=1024" warning. Because mask is only
+ * used to get actions, we new a function to save some
+ * stack space.
+ *
+ * If there are not key and action attrs, we return 0
+ * directly. In the case, the caller will also not use the
+ * match as before. If there is action attr, we try to get
+ * actions and save them to *acts. Before returning from
+ * the function, we reset the match->mask pointer. Because
+ * we should not to return match object with dangling reference
+ * to mask.
+ * */
+static int ovs_nla_init_match_and_action(struct net *net,
+struct sw_flow_match *match,
+struct sw_flow_key *key,
+struct nlattr **a,
+struct sw_flow_actions **acts,
+bool log)
+{
+   struct sw_flow_mask mask;
+   int error = 0;
+
+   if (a[OVS_FLOW_ATTR_KEY]) {
+   ovs_match_init(match, key, true, );
+   error = ovs_nla_get_match(net, match, a[OVS_FLOW_ATTR_KEY],
+ a[OVS_FLOW_ATTR_MASK], log);
+   if (error)
+   goto error;
+   }
+
+   if (a[OVS_FLOW_ATTR_ACTIONS]) {
+   if (!a[OVS_FLOW_ATTR_KEY]) {
+   OVS_NLERR(log,
+ "Flow key attribute not present in set 
flow.");
+   return -EINVAL;
+   }
+
+   *acts = get_flow_actions(net, a[OVS_FLOW_ATTR_ACTIONS], key,
+, log);
+   if (IS_ERR(*acts)) {
+   error = PTR_ERR(*acts);
+   goto error;
+   }
+   }
+
+   /* On success, error is 0. */
+error:
+   match->mask = NULL;
+   return error;
+}
+
 static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
 {
struct net *net = sock_net(skb->sk);
@@ -1107,7 +1159,6 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
struct ovs_header *ovs_header = info->userhdr;
struct sw_flow_key key;
struct sw_flow *flow;
-   struct sw_flow_mask mask;
struct sk_buff *reply = NULL;
struct datapath *dp;
struct sw_flow_actions *old_acts = NULL, *acts = NULL;
@@ -1119,34 +1170,18 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
bool ufid_present;
 
ufid_present = ovs_nla_get_ufid(, a[OVS_FLOW_ATTR_UFID], log);
-   if (a[OVS_FLOW_ATTR_KEY]) {
-   ovs_match_init(, , true, );
-   error = ovs_nla_get_match(net, , a[OVS_FLOW_ATTR_KEY],
- a[OVS_FLOW_ATTR_MASK], log);
-   } else if (!ufid_present) {
+   if (!a[OVS_FLOW_ATTR_KEY] && !ufid_present) {
OVS_NLERR(log,
  "Flow set message rejected, Key attribute missing.");
-   error = -EINVAL;
+   return -EINVAL;
}
+
+   error = ovs_nla_init_match_and_action(net, , , a,
+ , log);
if (error)
goto error;
 
-   /* Validate actions. */
-   if (a[OVS_FLOW_ATTR_ACTIONS]) {
-   if (!a[OVS_FLOW_ATTR_KEY]) {
-   OVS_NLERR(log,
- "Flow key attribute not present in set 
flow.");
-   error = -EINVAL;
-   goto error;
-   }
-
-   acts = get_flow_actions(net, a[OVS_FLOW_ATTR_ACTIONS], ,
-   , log);
-   if (IS_ERR(acts)) {
-   error = PTR_ERR(acts);
-   goto error;
-

Re: [PATCH RFC 25/26] tile: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

On Thu, Jun 29, 2017 at 05:10:41PM -0700, Linus Torvalds wrote:
> On Thu, Jun 29, 2017 at 5:06 PM, Linus Torvalds
>  wrote:
> >
> > Please don't make this one commit fopr every architecture.
> >
> > Once something gets removed, it gets removed. There's no point in
> > "remove it from architecture X". If there are no more users, we're
> > done with it, and making it be 25 patches with the same commit message
> > instead of just one doesn't help anybody.
> 
> Just to clarify: I think the actual *users* are worth doing one by
> one, particularly if there are user-specific explanations of what that
> particular code wanted, and why spin_unlock_wait() doesn't really
> help.

Got it, and I did merge -only- the arch-specific definition removals
into one commit.  Should I also merge the core-code definition removals
into that same commit, or is it OK to remove the core-code definitions
with one commit and the arch-specific definitions with another.

(My guess is that you would prefer I removed -all- definitions with one
commit, including the core-kernel definitions, but at this point I figure
I should just ask.)

> And I think that you actually have those per-user insights by now,
> after looking at the long thread.

One Acked-by thus far, so some progress!

> So I'm not saying "do one patch for the whole series". One patch per
> removal of use is fine - in fact preferred.

Got it.  It allows the developers and maintainers to tell me where my
analysis is wrong, for one thing.  ;-)

> But once there are no actual more users, just remove all the
> architecture definitions in one go, because explaining the same thing
> several times doesn't actually help anything.
> 
> In fact, *if* we end up ever resurrecting that thing, it's good if we
> can resurrect it in one go. Then we can resurrect the one or two users
> that turned out to matter after all and could come up with why some
> particular ordering was ok too.

Understood!

Thanx, Paul

Re: [PATCH RFC 25/26] tile: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

On Thu, Jun 29, 2017 at 05:09:56PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 29, 2017 at 05:06:16PM -0700, Linus Torvalds wrote:
> > On Thu, Jun 29, 2017 at 5:01 PM, Paul E. McKenney
> >  wrote:
> > > There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> > > and it appears that all callers could do just as well with a lock/unlock
> > > pair.  This commit therefore removes the underlying arch-specific
> > > arch_spin_unlock_wait().
> > 
> > Please don't make this one commit fopr every architecture.
> > 
> > Once something gets removed, it gets removed. There's no point in
> > "remove it from architecture X". If there are no more users, we're
> > done with it, and making it be 25 patches with the same commit message
> > instead of just one doesn't help anybody.
> 
> Apologies, I will merge them.

I suppose that I could have sent the merged patch...  Please see below.

Thanx, Paul



commit e81efa4219e9b07e476448572aafe8f9c4ad28c8
Author: Paul E. McKenney 
Date:   Thu Jun 29 15:53:02 2017 -0700

arch: Remove spin_unlock_wait() arch-specific definitions

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait() for all architectures providing them.

Signed-off-by: Paul E. McKenney 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 

diff --git a/arch/alpha/include/asm/spinlock.h 
b/arch/alpha/include/asm/spinlock.h
index a40b9fc0c6c3..718ac0b64adf 100644
--- a/arch/alpha/include/asm/spinlock.h
+++ b/arch/alpha/include/asm/spinlock.h
@@ -16,11 +16,6 @@
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 #define arch_spin_is_locked(x) ((x)->lock != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>lock, !VAL);
-}
-
 static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
 return lock.lock == 0;
diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index 233d5ffe6ec7..a325e6a36523 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -16,11 +16,6 @@
 #define arch_spin_is_locked(x) ((x)->slock != __ARCH_SPIN_LOCK_UNLOCKED__)
 #define arch_spin_lock_flags(lock, flags)  arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>slock, !VAL);
-}
-
 #ifdef CONFIG_ARC_HAS_LLSC
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index 4bec45442072..c030143c18c6 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -52,22 +52,6 @@ static inline void dsb_sev(void)
  * memory.
  */
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   u16 owner = READ_ONCE(lock->tickets.owner);
-
-   for (;;) {
-   arch_spinlock_t tmp = READ_ONCE(*lock);
-
-   if (tmp.tickets.owner == tmp.tickets.next ||
-   tmp.tickets.owner != owner)
-   break;
-
-   wfe();
-   }
-   smp_acquire__after_ctrl_dep();
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
diff --git a/arch/arm64/include/asm/spinlock.h 
b/arch/arm64/include/asm/spinlock.h
index cae331d553f8..f445bd7f2b9f 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -26,58 +26,6 @@
  * The memory barriers are implicit with the load-acquire and store-release
  * instructions.
  */
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   unsigned int tmp;
-   arch_spinlock_t lockval;
-   u32 owner;
-
-   /*
-* Ensure prior spin_lock operations to other locks have completed
-* on this CPU before we test whether "lock" is locked.
-*/
-   smp_mb();
-   owner = READ_ONCE(lock->owner) << 16;
-
-   asm volatile(
-"  sevl\n"
-"1:wfe\n"
-"2:ldaxr   %w0, %2\n"
-   /* Is the lock free? */
-"  eor %w1, %w0, %w0, ror #16\n"
-"  cbz %w1, 3f\n"
-   /* Lock taken -- has there been a subsequent unlock->lock transition? */
-"  eor %w1, %w3, %w0, lsl #16\n"
-"  cbz %w1, 1b\n"
-   /*
-* The owner has been updated, so there was an unlock->lock
-* transition that we missed. That

[PATCH RFC 05/26] exit: Replace spin_unlock_wait() with lock/unlock pair

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics, and
it appears that all callers could do just as well with a lock/unlock pair.
This commit therefore replaces the spin_unlock_wait() call in do_exit()
with spin_lock() followed immediately by spin_unlock().  This should be
safe from a performance perspective because the lock is a per-task lock,
and this is happening only at task-exit time.

Signed-off-by: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Will Deacon 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 kernel/exit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 516acdb0e0ec..1a976e47ddd1 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -832,7 +832,8 @@ void __noreturn do_exit(long code)
 * Ensure that we must observe the pi_state in exit_mm() ->
 * mm_release() -> exit_pi_state_list().
 */
-   raw_spin_unlock_wait(>pi_lock);
+   raw_spin_lock(>pi_lock);
+   raw_spin_unlock(>pi_lock);

if (unlikely(in_atomic())) {
pr_info("note: %s[%d] exited with preempt_count %d\n",
-- 
2.5.2

[PATCH RFC 17/26] metag: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: James Hogan 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/metag/include/asm/spinlock.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/metag/include/asm/spinlock.h 
b/arch/metag/include/asm/spinlock.h
index c0c7a22be1ae..ddf7fe5708a6 100644
--- a/arch/metag/include/asm/spinlock.h
+++ b/arch/metag/include/asm/spinlock.h
@@ -15,11 +15,6 @@
  * locked.
  */
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>lock, !VAL);
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 #definearch_read_lock_flags(lock, flags) arch_read_lock(lock)
-- 
2.5.2

[PATCH RFC 10/26] arc: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Vineet Gupta 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/arc/include/asm/spinlock.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index 233d5ffe6ec7..a325e6a36523 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -16,11 +16,6 @@
 #define arch_spin_is_locked(x) ((x)->slock != __ARCH_SPIN_LOCK_UNLOCKED__)
 #define arch_spin_lock_flags(lock, flags)  arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>slock, !VAL);
-}
-
 #ifdef CONFIG_ARC_HAS_LLSC
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
-- 
2.5.2

[PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
task_work_run() with spin_lock() followed immediately by spin_unlock().
This should be safe from a performance perspective because calls to the
other side of the race, task_work_cancel(), should be rare.

Signed-off-by: Paul E. McKenney 
Cc: Oleg Nesterov 
Cc: Andrew Morton 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 kernel/task_work.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/task_work.c b/kernel/task_work.c
index d513051fcca2..b9b428832229 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -109,7 +109,8 @@ void task_work_run(void)
 * the first entry == work, cmpxchg(task_works) should
 * fail, but it can play with *work and other entries.
 */
-   raw_spin_unlock_wait(>pi_lock);
+   raw_spin_lock(>pi_lock);
+   raw_spin_unlock(>pi_lock);

do {
next = work->next;
-- 
2.5.2

[PATCH RFC 16/26] m32r: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/m32r/include/asm/spinlock.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/m32r/include/asm/spinlock.h b/arch/m32r/include/asm/spinlock.h
index 323c7fc953cd..a56825592b90 100644
--- a/arch/m32r/include/asm/spinlock.h
+++ b/arch/m32r/include/asm/spinlock.h
@@ -30,11 +30,6 @@
 #define arch_spin_is_locked(x) (*(volatile int *)(&(x)->slock) <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>slock, VAL > 0);
-}
-
 /**
  * arch_spin_trylock - Try spin lock and return a result
  * @lock: Pointer to the lock variable
-- 
2.5.2

[PATCH RFC 25/26] tile: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Chris Metcalf 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/tile/include/asm/spinlock_32.h |  2 --
 arch/tile/include/asm/spinlock_64.h |  2 --
 arch/tile/lib/spinlock_32.c | 23 ---
 arch/tile/lib/spinlock_64.c | 22 --
 4 files changed, 49 deletions(-)

diff --git a/arch/tile/include/asm/spinlock_32.h 
b/arch/tile/include/asm/spinlock_32.h
index b14b1ba5bf9c..cba8ba9b8da6 100644
--- a/arch/tile/include/asm/spinlock_32.h
+++ b/arch/tile/include/asm/spinlock_32.h
@@ -64,8 +64,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
lock->current_ticket = old_ticket + TICKET_QUANTUM;
 }
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock);
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
diff --git a/arch/tile/include/asm/spinlock_64.h 
b/arch/tile/include/asm/spinlock_64.h
index b9718fb4e74a..9a2c2d605752 100644
--- a/arch/tile/include/asm/spinlock_64.h
+++ b/arch/tile/include/asm/spinlock_64.h
@@ -58,8 +58,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
__insn_fetchadd4(>lock, 1U << __ARCH_SPIN_CURRENT_SHIFT);
 }
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock);
-
 void arch_spin_lock_slow(arch_spinlock_t *lock, u32 val);
 
 /* Grab the "next" ticket number and bump it atomically.
diff --git a/arch/tile/lib/spinlock_32.c b/arch/tile/lib/spinlock_32.c
index 076c6cc43113..db9333f2447c 100644
--- a/arch/tile/lib/spinlock_32.c
+++ b/arch/tile/lib/spinlock_32.c
@@ -62,29 +62,6 @@ int arch_spin_trylock(arch_spinlock_t *lock)
 }
 EXPORT_SYMBOL(arch_spin_trylock);
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   u32 iterations = 0;
-   int curr = READ_ONCE(lock->current_ticket);
-   int next = READ_ONCE(lock->next_ticket);
-
-   /* Return immediately if unlocked. */
-   if (next == curr)
-   return;
-
-   /* Wait until the current locker has released the lock. */
-   do {
-   delay_backoff(iterations++);
-   } while (READ_ONCE(lock->current_ticket) == curr);
-
-   /*
-* The TILE architecture doesn't do read speculation; therefore
-* a control dependency guarantees a LOAD->{LOAD,STORE} order.
-*/
-   barrier();
-}
-EXPORT_SYMBOL(arch_spin_unlock_wait);
-
 /*
  * The low byte is always reserved to be the marker for a "tns" operation
  * since the low bit is set to "1" by a tns.  The next seven bits are
diff --git a/arch/tile/lib/spinlock_64.c b/arch/tile/lib/spinlock_64.c
index a4b5b2cbce93..de414c22892f 100644
--- a/arch/tile/lib/spinlock_64.c
+++ b/arch/tile/lib/spinlock_64.c
@@ -62,28 +62,6 @@ int arch_spin_trylock(arch_spinlock_t *lock)
 }
 EXPORT_SYMBOL(arch_spin_trylock);
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   u32 iterations = 0;
-   u32 val = READ_ONCE(lock->lock);
-   u32 curr = arch_spin_current(val);
-
-   /* Return immediately if unlocked. */
-   if (arch_spin_next(val) == curr)
-   return;
-
-   /* Wait until the current locker has released the lock. */
-   do {
-   delay_backoff(iterations++);
-   } while (arch_spin_current(READ_ONCE(lock->lock)) == curr);
-
-   /*
-* The TILE architecture doesn't do read speculation; therefore
-* a control dependency guarantees a LOAD->{LOAD,STORE} order.
-*/
-   barrier();
-}
-EXPORT_SYMBOL(arch_spin_unlock_wait);
 
 /*
  * If the read lock fails due to a writer, we retry periodically
-- 
2.5.2

[PATCH RFC 03/26] sched: Replace spin_unlock_wait() with lock/unlock pair

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
do_task_dead() with spin_lock() followed immediately by spin_unlock().
This should be safe from a performance perspective because the lock is
this tasks ->pi_lock, and this is called only after the task exits.

Signed-off-by: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e91138fcde86..6dea3d9728c8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3461,7 +3461,8 @@ void __noreturn do_task_dead(void)
 * is held by try_to_wake_up()
 */
smp_mb();
-   raw_spin_unlock_wait(>pi_lock);
+   raw_spin_lock(>pi_lock);
+   raw_spin_unlock(>pi_lock);

/* Causes final put_task_struct in finish_task_switch(): */
__set_current_state(TASK_DEAD);
-- 
2.5.2

Re: [PATCH RFC 25/26] tile: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Linus Torvalds

On Thu, Jun 29, 2017 at 5:06 PM, Linus Torvalds
 wrote:
>
> Please don't make this one commit fopr every architecture.
>
> Once something gets removed, it gets removed. There's no point in
> "remove it from architecture X". If there are no more users, we're
> done with it, and making it be 25 patches with the same commit message
> instead of just one doesn't help anybody.

Just to clarify: I think the actual *users* are worth doing one by
one, particularly if there are user-specific explanations of what that
particular code wanted, and why spin_unlock_wait() doesn't really
help.

And I think that you actually have those per-user insights by now,
after looking at the long thread.

So I'm not saying "do one patch for the whole series". One patch per
removal of use is fine - in fact preferred.

But once there are no actual more users, just remove all the
architecture definitions in one go, because explaining the same thing
several times doesn't actually help anything.

In fact, *if* we end up ever resurrecting that thing, it's good if we
can resurrect it in one go. Then we can resurrect the one or two users
that turned out to matter after all and could come up with why some
particular ordering was ok too.

   Linus

Re: [PATCH RFC 25/26] tile: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

On Thu, Jun 29, 2017 at 05:06:16PM -0700, Linus Torvalds wrote:
> On Thu, Jun 29, 2017 at 5:01 PM, Paul E. McKenney
>  wrote:
> > There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> > and it appears that all callers could do just as well with a lock/unlock
> > pair.  This commit therefore removes the underlying arch-specific
> > arch_spin_unlock_wait().
> 
> Please don't make this one commit fopr every architecture.
> 
> Once something gets removed, it gets removed. There's no point in
> "remove it from architecture X". If there are no more users, we're
> done with it, and making it be 25 patches with the same commit message
> instead of just one doesn't help anybody.

Apologies, I will merge them.

Thanx, Paul

[PATCH RFC 04/26] completion: Replace spin_unlock_wait() with lock/unlock pair

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
completion_done() with spin_lock() followed immediately by spin_unlock().
This should be safe from a performance perspective because the lock
will be held only the wakeup happens really quickly.

Signed-off-by: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Will Deacon 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 kernel/sched/completion.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 53f9558fa925..d4b89a59629f 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -307,14 +307,9 @@ bool completion_done(struct completion *x)
 * If ->done, we need to wait for complete() to release ->wait.lock
 * otherwise we can end up freeing the completion before complete()
 * is done referencing it.
-*
-* The RMB pairs with complete()'s RELEASE of ->wait.lock and orders
-* the loads of ->done and ->wait.lock such that we cannot observe
-* the lock before complete() acquires it while observing the ->done
-* after it's acquired the lock.
 */
-   smp_rmb();
-   spin_unlock_wait(>wait.lock);
+   spin_lock(>wait.lock);
+   spin_unlock(>wait.lock);
return true;
 }
 EXPORT_SYMBOL(completion_done);
-- 
2.5.2

[PATCH RFC 18/26] mips: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Ralf Baechle 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/mips/include/asm/spinlock.h | 16 
 1 file changed, 16 deletions(-)

diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
index a8df44d60607..81b4945031ee 100644
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -50,22 +50,6 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t 
lock)
 
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   u16 owner = READ_ONCE(lock->h.serving_now);
-   smp_rmb();
-   for (;;) {
-   arch_spinlock_t tmp = READ_ONCE(*lock);
-
-   if (tmp.h.serving_now == tmp.h.ticket ||
-   tmp.h.serving_now != owner)
-   break;
-
-   cpu_relax();
-   }
-   smp_acquire__after_ctrl_dep();
-}
-
 static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
u32 counters = ACCESS_ONCE(lock->lock);
-- 
2.5.2

[PATCH RFC 13/26] blackfin: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Steven Miao 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/blackfin/include/asm/spinlock.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/blackfin/include/asm/spinlock.h 
b/arch/blackfin/include/asm/spinlock.h
index c58f4a83ed6f..f6431439d15d 100644
--- a/arch/blackfin/include/asm/spinlock.h
+++ b/arch/blackfin/include/asm/spinlock.h
@@ -48,11 +48,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
__raw_spin_unlock_asm(>lock);
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>lock, !VAL);
-}
-
 static inline int arch_read_can_lock(arch_rwlock_t *rw)
 {
return __raw_uncached_fetch_asm(>lock) > 0;
-- 
2.5.2

[PATCH RFC 07/26] drivers/ata: Replace spin_unlock_wait() with lock/unlock pair

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore eliminates the spin_unlock_wait() call and
associated else-clause and hoists the then-clause's lock and unlock out of
the "if" statement.  This should be safe from a performance perspective
because according to Tejun there should be few if any drivers that don't
set their own error handler.

Signed-off-by: Paul E. McKenney 
Acked-by: Tejun Heo 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 drivers/ata/libata-eh.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index ef68232b5222..779f6f18c1f4 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -645,12 +645,11 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, 
struct ata_port *ap,
 * completions are honored.  A scmd is determined to have
 * timed out iff its associated qc is active and not failed.
 */
+   spin_lock_irqsave(ap->lock, flags);
if (ap->ops->error_handler) {
struct scsi_cmnd *scmd, *tmp;
int nr_timedout = 0;
 
-   spin_lock_irqsave(ap->lock, flags);
-
/* This must occur under the ap->lock as we don't want
   a polled recovery to race the real interrupt handler
 
@@ -700,12 +699,11 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, 
struct ata_port *ap,
if (nr_timedout)
__ata_port_freeze(ap);
 
-   spin_unlock_irqrestore(ap->lock, flags);
 
/* initialize eh_tries */
ap->eh_tries = ATA_EH_MAX_TRIES;
-   } else
-   spin_unlock_wait(ap->lock);
+   }
+   spin_unlock_irqrestore(ap->lock, flags);
 
 }
 EXPORT_SYMBOL(ata_scsi_cmd_error_handler);
-- 
2.5.2

Re: [PATCH RFC 25/26] tile: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Linus Torvalds

On Thu, Jun 29, 2017 at 5:01 PM, Paul E. McKenney
 wrote:
> There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> and it appears that all callers could do just as well with a lock/unlock
> pair.  This commit therefore removes the underlying arch-specific
> arch_spin_unlock_wait().

Please don't make this one commit fopr every architecture.

Once something gets removed, it gets removed. There's no point in
"remove it from architecture X". If there are no more users, we're
done with it, and making it be 25 patches with the same commit message
instead of just one doesn't help anybody.

   Linus

[PATCH RFC 15/26] ia64: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/ia64/include/asm/spinlock.h | 21 -
 1 file changed, 21 deletions(-)

diff --git a/arch/ia64/include/asm/spinlock.h b/arch/ia64/include/asm/spinlock.h
index ca9e76149a4a..df2c121164b8 100644
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -76,22 +76,6 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
ACCESS_ONCE(*p) = (tmp + 2) & ~1;
 }
 
-static __always_inline void __ticket_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   int *p = (int *)>lock, ticket;
-
-   ia64_invala();
-
-   for (;;) {
-   asm volatile ("ld4.c.nc %0=[%1]" : "=r"(ticket) : "r"(p) : 
"memory");
-   if (!(((ticket >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
-   return;
-   cpu_relax();
-   }
-
-   smp_acquire__after_ctrl_dep();
-}
-
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
long tmp = ACCESS_ONCE(lock->lock);
@@ -143,11 +127,6 @@ static __always_inline void 
arch_spin_lock_flags(arch_spinlock_t *lock,
arch_spin_lock(lock);
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   __ticket_spin_unlock_wait(lock);
-}
-
 #define arch_read_can_lock(rw) (*(volatile int *)(rw) >= 0)
 #define arch_write_can_lock(rw)(*(volatile int *)(rw) == 0)
 
-- 
2.5.2

[PATCH RFC 14/26] hexagon: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Richard Kuo 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/hexagon/include/asm/spinlock.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/hexagon/include/asm/spinlock.h 
b/arch/hexagon/include/asm/spinlock.h
index a1c55788c5d6..53a8d5885887 100644
--- a/arch/hexagon/include/asm/spinlock.h
+++ b/arch/hexagon/include/asm/spinlock.h
@@ -179,11 +179,6 @@ static inline unsigned int 
arch_spin_trylock(arch_spinlock_t *lock)
  */
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>lock, !VAL);
-}
-
 #define arch_spin_is_locked(x) ((x)->lock != 0)
 
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
-- 
2.5.2

[PATCH RFC 12/26] arm64: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/arm64/include/asm/spinlock.h | 58 ---
 1 file changed, 5 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/spinlock.h 
b/arch/arm64/include/asm/spinlock.h
index cae331d553f8..f445bd7f2b9f 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -26,58 +26,6 @@
  * The memory barriers are implicit with the load-acquire and store-release
  * instructions.
  */
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   unsigned int tmp;
-   arch_spinlock_t lockval;
-   u32 owner;
-
-   /*
-* Ensure prior spin_lock operations to other locks have completed
-* on this CPU before we test whether "lock" is locked.
-*/
-   smp_mb();
-   owner = READ_ONCE(lock->owner) << 16;
-
-   asm volatile(
-"  sevl\n"
-"1:wfe\n"
-"2:ldaxr   %w0, %2\n"
-   /* Is the lock free? */
-"  eor %w1, %w0, %w0, ror #16\n"
-"  cbz %w1, 3f\n"
-   /* Lock taken -- has there been a subsequent unlock->lock transition? */
-"  eor %w1, %w3, %w0, lsl #16\n"
-"  cbz %w1, 1b\n"
-   /*
-* The owner has been updated, so there was an unlock->lock
-* transition that we missed. That means we can rely on the
-* store-release of the unlock operation paired with the
-* load-acquire of the lock operation to publish any of our
-* previous stores to the new lock owner and therefore don't
-* need to bother with the writeback below.
-*/
-"  b   4f\n"
-"3:\n"
-   /*
-* Serialise against any concurrent lockers by writing back the
-* unlocked lock value
-*/
-   ARM64_LSE_ATOMIC_INSN(
-   /* LL/SC */
-"  stxr%w1, %w0, %2\n"
-   __nops(2),
-   /* LSE atomics */
-"  mov %w1, %w0\n"
-"  cas %w0, %w0, %2\n"
-"  eor %w1, %w1, %w0\n")
-   /* Somebody else wrote to the lock, GOTO 10 and reload the value */
-"  cbnz%w1, 2b\n"
-"4:"
-   : "=" (lockval), "=" (tmp), "+Q" (*lock)
-   : "r" (owner)
-   : "memory");
-}
 
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
@@ -176,7 +124,11 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t 
lock)
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
-   smp_mb(); /* See arch_spin_unlock_wait */
+   /*
+* Ensure prior spin_lock operations to other locks have completed
+* on this CPU before we test whether "lock" is locked.
+*/
+   smp_mb(); /* ^^^ */
return !arch_spin_value_unlocked(READ_ONCE(*lock));
 }
 
-- 
2.5.2

[PATCH RFC 19/26] mn10300: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: David Howells 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/mn10300/include/asm/spinlock.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/mn10300/include/asm/spinlock.h 
b/arch/mn10300/include/asm/spinlock.h
index 9c7b8f7942d8..fe413b41df6c 100644
--- a/arch/mn10300/include/asm/spinlock.h
+++ b/arch/mn10300/include/asm/spinlock.h
@@ -26,11 +26,6 @@
 
 #define arch_spin_is_locked(x) (*(volatile signed char *)(&(x)->slock) != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>slock, !VAL);
-}
-
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
asm volatile(
-- 
2.5.2

[PATCH RFC 22/26] s390: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/s390/include/asm/spinlock.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/arch/s390/include/asm/spinlock.h b/arch/s390/include/asm/spinlock.h
index f7838ecd83c6..217ee5210c32 100644
--- a/arch/s390/include/asm/spinlock.h
+++ b/arch/s390/include/asm/spinlock.h
@@ -98,13 +98,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lp)
: "cc", "memory");
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   while (arch_spin_is_locked(lock))
-   arch_spin_relax(lock);
-   smp_acquire__after_ctrl_dep();
-}
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
-- 
2.5.2

[PATCH RFC 21/26] powerpc: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/powerpc/include/asm/spinlock.h | 33 -
 1 file changed, 33 deletions(-)

diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 8c1b913de6d7..d256e448ea49 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -170,39 +170,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
lock->slock = 0;
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   arch_spinlock_t lock_val;
-
-   smp_mb();
-
-   /*
-* Atomically load and store back the lock value (unchanged). This
-* ensures that our observation of the lock value is ordered with
-* respect to other lock operations.
-*/
-   __asm__ __volatile__(
-"1:" PPC_LWARX(%0, 0, %2, 0) "\n"
-"  stwcx. %0, 0, %2\n"
-"  bne- 1b\n"
-   : "=" (lock_val), "+m" (*lock)
-   : "r" (lock)
-   : "cr0", "xer");
-
-   if (arch_spin_value_unlocked(lock_val))
-   goto out;
-
-   while (lock->slock) {
-   HMT_low();
-   if (SHARED_PROCESSOR)
-   __spin_yield(lock);
-   }
-   HMT_medium();
-
-out:
-   smp_mb();
-}
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
-- 
2.5.2

[PATCH RFC 01/26] netfilter: Replace spin_unlock_wait() with lock/unlock pair

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() calls
in nf_conntrack_lock() and nf_conntrack_all_lock() with spin_lock()
followed immediately by spin_unlock().  These functions do not appear
to be invoked on any fastpaths.

Signed-off-by: Paul E. McKenney 
Cc: Pablo Neira Ayuso 
Cc: Jozsef Kadlecsik 
Cc: Florian Westphal 
Cc: "David S. Miller" 
Cc: 
Cc: 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 net/netfilter/nf_conntrack_core.c | 26 --
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index e847dbaa0c6b..9f997859d160 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -99,15 +99,11 @@ void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
spin_lock(lock);
while (unlikely(nf_conntrack_locks_all)) {
spin_unlock(lock);
-
-   /*
-* Order the 'nf_conntrack_locks_all' load vs. the
-* spin_unlock_wait() loads below, to ensure
-* that 'nf_conntrack_locks_all_lock' is indeed held:
-*/
-   smp_rmb(); /* spin_lock(_conntrack_locks_all_lock) */
-   spin_unlock_wait(_conntrack_locks_all_lock);
+   /* Wait for nf_conntrack_locks_all_lock holder to release ... */
+   spin_lock(_conntrack_locks_all_lock);
+   spin_unlock(_conntrack_locks_all_lock);
spin_lock(lock);
+   /* ... and retry. */
}
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_lock);
@@ -150,17 +146,11 @@ static void nf_conntrack_all_lock(void)
 
spin_lock(_conntrack_locks_all_lock);
nf_conntrack_locks_all = true;
-
-   /*
-* Order the above store of 'nf_conntrack_locks_all' against
-* the spin_unlock_wait() loads below, such that if
-* nf_conntrack_lock() observes 'nf_conntrack_locks_all'
-* we must observe nf_conntrack_locks[] held:
-*/
-   smp_mb(); /* spin_lock(_conntrack_locks_all_lock) */
-
for (i = 0; i < CONNTRACK_LOCKS; i++) {
-   spin_unlock_wait(_conntrack_locks[i]);
+   /* Wait for any current holder to release lock. */
+   spin_lock(_conntrack_locks[i]);
+   spin_unlock(_conntrack_locks[i]);
+   /* Next acquisition will see nf_conntrack_locks_all == true. */
}
 }
 
-- 
2.5.2

[PATCH RFC 24/26] sparc: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: "David S. Miller" 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/sparc/include/asm/spinlock_32.h | 5 -
 arch/sparc/include/asm/spinlock_64.h | 5 -
 2 files changed, 10 deletions(-)

diff --git a/arch/sparc/include/asm/spinlock_32.h 
b/arch/sparc/include/asm/spinlock_32.h
index 8011e79f59c9..67345b2dc408 100644
--- a/arch/sparc/include/asm/spinlock_32.h
+++ b/arch/sparc/include/asm/spinlock_32.h
@@ -14,11 +14,6 @@
 
 #define arch_spin_is_locked(lock) (*((volatile unsigned char *)(lock)) != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>lock, !VAL);
-}
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
__asm__ __volatile__(
diff --git a/arch/sparc/include/asm/spinlock_64.h 
b/arch/sparc/include/asm/spinlock_64.h
index 07c9f2e9bf57..923d57f9b79d 100644
--- a/arch/sparc/include/asm/spinlock_64.h
+++ b/arch/sparc/include/asm/spinlock_64.h
@@ -26,11 +26,6 @@
 
 #define arch_spin_is_locked(lp)((lp)->lock != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>lock, !VAL);
-}
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
unsigned long tmp;
-- 
2.5.2

[PATCH RFC 23/26] sh: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/sh/include/asm/spinlock-cas.h  | 5 -
 arch/sh/include/asm/spinlock-llsc.h | 5 -
 2 files changed, 10 deletions(-)

diff --git a/arch/sh/include/asm/spinlock-cas.h 
b/arch/sh/include/asm/spinlock-cas.h
index c46e8cc7b515..5ed7dbbd94ff 100644
--- a/arch/sh/include/asm/spinlock-cas.h
+++ b/arch/sh/include/asm/spinlock-cas.h
@@ -29,11 +29,6 @@ static inline unsigned __sl_cas(volatile unsigned *p, 
unsigned old, unsigned new
 #define arch_spin_is_locked(x) ((x)->lock <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>lock, VAL > 0);
-}
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
while (!__sl_cas(>lock, 1, 0));
diff --git a/arch/sh/include/asm/spinlock-llsc.h 
b/arch/sh/include/asm/spinlock-llsc.h
index cec78143fa83..f77263aae760 100644
--- a/arch/sh/include/asm/spinlock-llsc.h
+++ b/arch/sh/include/asm/spinlock-llsc.h
@@ -21,11 +21,6 @@
 #define arch_spin_is_locked(x) ((x)->lock <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>lock, VAL > 0);
-}
-
 /*
  * Simple spin lock operations.  There are two variants, one clears IRQ's
  * on the local processor, one does not.
-- 
2.5.2

[PATCH RFC 26/26] xtensa: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Chris Zankel 
Cc: Max Filippov 
Cc: 
---
 arch/xtensa/include/asm/spinlock.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/xtensa/include/asm/spinlock.h 
b/arch/xtensa/include/asm/spinlock.h
index a36221cf6363..3bb49681ee24 100644
--- a/arch/xtensa/include/asm/spinlock.h
+++ b/arch/xtensa/include/asm/spinlock.h
@@ -33,11 +33,6 @@
 
 #define arch_spin_is_locked(x) ((x)->slock != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>slock, !VAL);
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
-- 
2.5.2

[PATCH RFC 20/26] parisc: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/parisc/include/asm/spinlock.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/arch/parisc/include/asm/spinlock.h 
b/arch/parisc/include/asm/spinlock.h
index e32936cd7f10..55bfe4affca3 100644
--- a/arch/parisc/include/asm/spinlock.h
+++ b/arch/parisc/include/asm/spinlock.h
@@ -14,13 +14,6 @@ static inline int arch_spin_is_locked(arch_spinlock_t *x)
 
 #define arch_spin_lock(lock) arch_spin_lock_flags(lock, 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *x)
-{
-   volatile unsigned int *a = __ldcw_align(x);
-
-   smp_cond_load_acquire(a, VAL);
-}
-
 static inline void arch_spin_lock_flags(arch_spinlock_t *x,
 unsigned long flags)
 {
-- 
2.5.2

[PATCH RFC 09/26] alpha: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/alpha/include/asm/spinlock.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/alpha/include/asm/spinlock.h 
b/arch/alpha/include/asm/spinlock.h
index a40b9fc0c6c3..718ac0b64adf 100644
--- a/arch/alpha/include/asm/spinlock.h
+++ b/arch/alpha/include/asm/spinlock.h
@@ -16,11 +16,6 @@
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 #define arch_spin_is_locked(x) ((x)->lock != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   smp_cond_load_acquire(>lock, !VAL);
-}
-
 static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
 return lock.lock == 0;
-- 
2.5.2

[PATCH RFC 06/26] ipc: Replace spin_unlock_wait() with lock/unlock pair

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
exit_sem() with spin_lock() followed immediately by spin_unlock().
This should be safe from a performance perspective because exit_sem()
is rarely invoked in production.

Signed-off-by: Paul E. McKenney 
Cc: Andrew Morton 
Cc: Davidlohr Bueso 
Cc: Manfred Spraul 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 ipc/sem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index 947dc2348271..e88d0749a929 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -2096,7 +2096,8 @@ void exit_sem(struct task_struct *tsk)
 * possibility where we exit while freeary() didn't
 * finish unlocking sem_undo_list.
 */
-   spin_unlock_wait(>lock);
+   spin_lock(>lock);
+   spin_unlock(>lock);
rcu_read_unlock();
break;
}
-- 
2.5.2

[PATCH RFC 11/26] arm: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Paul E. McKenney

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait().

Signed-off-by: Paul E. McKenney 
Cc: Russell King 
Cc: 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Alan Stern 
Cc: Andrea Parri 
Cc: Linus Torvalds 
---
 arch/arm/include/asm/spinlock.h | 16 
 1 file changed, 16 deletions(-)

diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index 4bec45442072..c030143c18c6 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -52,22 +52,6 @@ static inline void dsb_sev(void)
  * memory.
  */
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-   u16 owner = READ_ONCE(lock->tickets.owner);
-
-   for (;;) {
-   arch_spinlock_t tmp = READ_ONCE(*lock);
-
-   if (tmp.tickets.owner == tmp.tickets.next ||
-   tmp.tickets.owner != owner)
-   break;
-
-   wfe();
-   }
-   smp_acquire__after_ctrl_dep();
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
-- 
2.5.2

[PATCH RFC 0/26] Remove spin_unlock_wait()

2017-06-29 Thread Paul E. McKenney

Hello!

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This series therefore removes spin_unlock_wait() and changes
its users to instead use a lock/unlock pair.  The commits are as follows,
in three groups:

1-7.Change uses of spin_unlock_wait() and raw_spin_unlock_wait()
to instead use a spin_lock/spin_unlock pair.  These may be
applied in any order, but must be applied before any later
commits in this series.  The commit logs state why I believe
that these commits won't noticeably degrade performance.

8.  Remove core-kernel definitions for spin_unlock_wait() and
raw_spin_unlock_wait().

9-26.   Remove arch-specific definitions of arch_spin_unlock_wait().

Thoughts?

Thanx, Paul



 arch/alpha/include/asm/spinlock.h|5 -
 arch/arc/include/asm/spinlock.h  |5 -
 arch/arm/include/asm/spinlock.h  |   16 
 arch/arm64/include/asm/spinlock.h|   58 +
 arch/blackfin/include/asm/spinlock.h |5 -
 arch/hexagon/include/asm/spinlock.h  |5 -
 arch/ia64/include/asm/spinlock.h |   21 --
 arch/m32r/include/asm/spinlock.h |5 -
 arch/metag/include/asm/spinlock.h|5 -
 arch/mips/include/asm/spinlock.h |   16 
 arch/mn10300/include/asm/spinlock.h  |5 -
 arch/parisc/include/asm/spinlock.h   |7 --
 arch/powerpc/include/asm/spinlock.h  |   33 -
 arch/s390/include/asm/spinlock.h |7 --
 arch/sh/include/asm/spinlock-cas.h   |5 -
 arch/sh/include/asm/spinlock-llsc.h  |5 -
 arch/sparc/include/asm/spinlock_32.h |5 -
 arch/sparc/include/asm/spinlock_64.h |5 -
 arch/tile/include/asm/spinlock_32.h  |2 
 arch/tile/include/asm/spinlock_64.h  |2 
 arch/tile/lib/spinlock_32.c  |   23 --
 arch/tile/lib/spinlock_64.c  |   22 --
 arch/xtensa/include/asm/spinlock.h   |5 -
 drivers/ata/libata-eh.c  |8 --
 include/asm-generic/qspinlock.h  |   14 
 include/linux/spinlock.h |   31 -
 include/linux/spinlock_up.h  |6 -
 ipc/sem.c|3 
 kernel/exit.c|3 
 kernel/locking/qspinlock.c   |  117 ---
 kernel/sched/completion.c|9 --
 kernel/sched/core.c  |3 
 kernel/task_work.c   |3 
 net/netfilter/nf_conntrack_core.c|   26 ++-
 34 files changed, 26 insertions(+), 464 deletions(-)

RFC: bring UP 'lo' by default after creating new netns

2017-06-29 Thread महेश बंडेवार

Creation of new network namespace is almost always followed up by
bringing up the loopback device.

ip netns add foo
ip -netns foo link set lo up

I'm not sure if there are any consequences of bringing the device UP
at the creation of network-namespace.

thanks,
--mahesh..

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2178db8e47cd..ac0e86c9a17f 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -428,6 +428,11 @@ struct net *copy_net_ns(unsigned long flags,
net_drop_ns(net);
return ERR_PTR(rv);
}
+   /* Set the loopback device UP */
+   rtnl_lock();
+   dev_open(net->loopback_dev);
+   rtnl_unlock();
+
return net;
 }

Re: [PATCH RFC 0/2] kproxy: Kernel Proxy

2017-06-29 Thread Tom Herbert

On Thu, Jun 29, 2017 at 1:58 PM, Willy Tarreau  wrote:
> On Thu, Jun 29, 2017 at 01:40:26PM -0700, Tom Herbert wrote:
>> > In fact that's not much what I observe in field. In practice, large
>> > data streams are cheaply relayed using splice(), I could achieve
>> > 60 Gbps of HTTP forwarding via HAProxy on a 4-core xeon 2 years ago.
>> > And when you use SSL, the cost of the copy to/from kernel is small
>> > compared to all the crypto operations surrounding this.
>> >
>> Right, getting rid of the extra crypto operations and so called "SSL
>> inspection" is the ultimate goal this is going towards.
>
> Yep but in order to take decisions at L7 you need to decapsulate SSL.
>
Decapsulate or decrypt? There's a big difference... :-) I'm am aiming
to just have to decapsulate.

>> HTTP is only one use case. The are other interesting use cases such as
>> those in container security where the application protocol might be
>> something like simple RPC.
>
> OK that indeed makes sense in such environments.
>
>> Performance is relevant because we
>> potentially want security applied to every message in every
>> communication in a containerized data center. Putting the userspace
>> hop in the datapath of every packet is know to be problematic, not
>> just for the performance hit  also because it increases the attack
>> surface on users' privacy.
>
> While I totally agree on the performance hit when inspecting each packet,
> I fail to see the relation with users' privacy. In fact under some
> circumstances it can even be the opposite. For example, using something
> like kTLS for a TCP/HTTP proxy can result in cleartext being observable
> in strace while it's not visible when TLS is terminated in userland because
> all you see are openssl's read()/write() operations. Maybe you have specific
> attacks in mind ?
>
No, just the normal problem of making yet one more tool systematically
have access to user data.

>> > Regarding kernel-side protocol parsing, there's an unfortunate trend
>> > at moving more and more protocols to userland due to these protocols
>> > evolving very quickly. At least you'll want to find a way to provide
>> > these parsers from userspace, which will inevitably come with its set
>> > of problems or limitations :-/
>> >
>> That's why everything is going BPF now ;-)
>
> Yes, I knew you were going to suggest this :-)  I'm still prudent on it
> to be honnest. I don't think it would be that easy to implement an HPACK
> encoder/decoder using BPF. And even regarding just plain HTTP parsing,
> certain very small operations in haproxy's parser can quickly result in
> a 10% performance degradation when improperly optimized (ie: changing a
> "likely", altering branch prediction, or cache walk patterns when using
> arrays to evaluate character classes faster). But for general usage I
> indeed think it should be OK.
>
HTTP might qualify as a special case, and I believe there's already
been some work to put http in kernel by Alexander Krizhanovsky and
others. In this case maybe http parse could be front end before BPF.
Although, pretty clear we'll need regex in BPF if we want use it with
http.

Tom

Re: [PATCH RFC 0/2] kproxy: Kernel Proxy

2017-06-29 Thread Tom Herbert

On Thu, Jun 29, 2017 at 3:04 PM, Thomas Graf  wrote:
> Hi Tom
>
> On 29 June 2017 at 11:27, Tom Herbert  wrote:
>> This is raw, minimally tested, and error hanlding needs work. Posting
>> as RFC to get feedback on the design...
>>
>> Sidecar proxies are becoming quite popular on server as a means to
>> perform layer 7 processing on application data as it is sent. Such
>> sidecars are used for SSL proxies, application firewalls, and L7
>> load balancers. While these proxies provide nice functionality,
>> their performance is obviously terrible since all the data needs
>> to take an extra hop though userspace.
>
Hi Thomas,

> I really appreciate this work. It would have been nice to at least
> attribute me in some way as this is exactly what I presented at
> Netconf 2017 [0].
>
Sure, will do that!

> I'm also wondering why this is not built on top of KCM which you
> suggested to use when we discussed this.
>

I think the main part of that discussion was around stream parser
which is needed for message delineation. For a 1:1 proxy,  KCM is
probably overkill (the whole KCM data path and lock becomes
superfluous). Also, there's no concept of creating a whole message
before routing it, in the 1:1 case we should let the message pass
through once it's cleared by the filter (this is the strparser change
I referred to). As I mentioned, for L7 load balancing we would want a
multiplexor probably also M:N, but the structure is different since
there's still no user facing sockets, they're all TCP for instance.
IMO, the 1:1 proxy case is compelling to solve in itself...

Tom

> [0] 
> https://docs.google.com/presentation/d/1dwSKSBGpUHD3WO5xxzZWj8awV_-xL-oYhvqQMOBhhtk/edit#slide=id.g203aae413f_0_0

Re: CAN-FD Transceiver Limitations

2017-06-29 Thread Franklin S Cooper Jr



On 06/29/2017 05:36 PM, Kurt Van Dijck wrote:

 mcan@0 {
...
fixed-transceiver {
  max-canfd-speed = <2000>
};
...
 };
> 
> Since when would a transceiver support different speeds for CAN & CANFD?

When I say CAN I'm referring to CAN 2.0 specification which mentioned
speeds upto 1 Mbit/s. While CAN FD supports higher bitrates.

> No transceivers were available, but they are now.
> I see no datalink problem applying 2MBit for regular CAN with apropriate
> physical layer, and CAN does not predefine the physical layer
> (advise != define).
> 
> IMHO,
>   fixed-transceiver {
>   max-arbitration-speed = <200>
>   max-data-speed = <400>
>   };
> is way better to describe the hardware.
> Regular CAN chips would not consider max-data-speed...

What is arbitration speed?

Also if I understand you correctly then I agree drivers for traditional
CAN wouldn't care about this subnode. Although it may be helpful for
max-data-speed to become max-canfd-speed or something along those lines.
Just so the property's purpose is clear.
> 
> Kind regards,
> Kurt
>

[PATCH 03/29] netfilter: dup: resolve warnings about missing prototypes

2017-06-29 Thread Pablo Neira Ayuso

From: stephen hemminger 

Missing include file causes:

net/netfilter/nf_dup_netdev.c:26:6: warning: no previous prototype for 
‘nf_fwd_netdev_egress’ [-Wmissing-prototypes]
net/netfilter/nf_dup_netdev.c:40:6: warning: no previous prototype for 
‘nf_dup_netdev_egress’ [-Wmissing-prototypes]

Signed-off-by: Stephen Hemminger 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_dup_netdev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nf_dup_netdev.c b/net/netfilter/nf_dup_netdev.c
index c9d7f95768ab..f4a566e67213 100644
--- a/net/netfilter/nf_dup_netdev.c
+++ b/net/netfilter/nf_dup_netdev.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static void nf_do_netdev_egress(struct sk_buff *skb, struct net_device *dev)
 {
-- 
2.1.4

Re: [PATCH] sctp: Add peeloff-flags socket option

2017-06-29 Thread Al Viro

On Thu, Jun 29, 2017 at 02:13:40PM -0400, Neil Horman wrote:
> diff --git a/fs/file.c b/fs/file.c
> index 1c2972e..a4521da 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -807,6 +807,7 @@ void set_close_on_exec(unsigned int fd, int flag)
>   __clear_close_on_exec(fd, fdt);
>   spin_unlock(>file_lock);
>  }
> +EXPORT_SYMBOL(set_close_on_exec);

Huh?

> + set_close_on_exec(peeloff->sd, !!(flags & SOCK_CLOEXEC));

Why hadn't you set it when that descriptor had been allocated?
That
retval = get_unused_fd_flags(0);
could just pass O_CLOEXEC and you'd get it set from the very
beginning...

[PATCH 01/29] netfilter: ctnetlink: delete extra spaces

2017-06-29 Thread Pablo Neira Ayuso

From: linzhang 

This patch cleans up extra spaces.

Signed-off-by: linzhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_netlink.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index 9799a50bc604..f08604dd1a59 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -636,11 +636,11 @@ ctnetlink_conntrack_event(unsigned int events, struct 
nf_ct_event *item)
if (events & (1 << IPCT_DESTROY)) {
type = IPCTNL_MSG_CT_DELETE;
group = NFNLGRP_CONNTRACK_DESTROY;
-   } else  if (events & ((1 << IPCT_NEW) | (1 << IPCT_RELATED))) {
+   } else if (events & ((1 << IPCT_NEW) | (1 << IPCT_RELATED))) {
type = IPCTNL_MSG_CT_NEW;
flags = NLM_F_CREATE|NLM_F_EXCL;
group = NFNLGRP_CONNTRACK_NEW;
-   } else  if (events) {
+   } else if (events) {
type = IPCTNL_MSG_CT_NEW;
group = NFNLGRP_CONNTRACK_UPDATE;
} else
-- 
2.1.4

[PATCH 04/29] netfilter: nft_rt: make local functions static

2017-06-29 Thread Pablo Neira Ayuso

From: stephen hemminger 

Resolves warnings:
net/netfilter/nft_rt.c:26:6: warning: no previous prototype for 
‘nft_rt_get_eval’ [-Wmissing-prototypes]
net/netfilter/nft_rt.c:75:5: warning: no previous prototype for 
‘nft_rt_get_init’ [-Wmissing-prototypes]
net/netfilter/nft_rt.c:106:5: warning: no previous prototype for 
‘nft_rt_get_dump’ [-Wmissing-prototypes]

Signed-off-by: Stephen Hemminger 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_rt.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nft_rt.c b/net/netfilter/nft_rt.c
index d3eb640bc784..c7383d8f88d0 100644
--- a/net/netfilter/nft_rt.c
+++ b/net/netfilter/nft_rt.c
@@ -23,9 +23,9 @@ struct nft_rt {
enum nft_registers  dreg:8;
 };
 
-void nft_rt_get_eval(const struct nft_expr *expr,
-struct nft_regs *regs,
-const struct nft_pktinfo *pkt)
+static void nft_rt_get_eval(const struct nft_expr *expr,
+   struct nft_regs *regs,
+   const struct nft_pktinfo *pkt)
 {
const struct nft_rt *priv = nft_expr_priv(expr);
const struct sk_buff *skb = pkt->skb;
@@ -72,9 +72,9 @@ const struct nla_policy nft_rt_policy[NFTA_RT_MAX + 1] = {
[NFTA_RT_KEY]   = { .type = NLA_U32 },
 };
 
-int nft_rt_get_init(const struct nft_ctx *ctx,
-   const struct nft_expr *expr,
-   const struct nlattr * const tb[])
+static int nft_rt_get_init(const struct nft_ctx *ctx,
+  const struct nft_expr *expr,
+  const struct nlattr * const tb[])
 {
struct nft_rt *priv = nft_expr_priv(expr);
unsigned int len;
@@ -103,8 +103,8 @@ int nft_rt_get_init(const struct nft_ctx *ctx,
   NFT_DATA_VALUE, len);
 }
 
-int nft_rt_get_dump(struct sk_buff *skb,
-   const struct nft_expr *expr)
+static int nft_rt_get_dump(struct sk_buff *skb,
+  const struct nft_expr *expr)
 {
const struct nft_rt *priv = nft_expr_priv(expr);
 
-- 
2.1.4

Re: [PATCH net-next v3 0/9] introduce flower offload capabilities

2017-06-29 Thread Jakub Kicinski

On Thu, 29 Jun 2017 22:08:10 +0200, Simon Horman wrote:
> Hi,
> 
> this series adds flower offload to the NFP driver. It builds on recent
> work to add representor and a skeleton flower app - now the app does what
> its name says.
> 
> In general the approach taken is to allow some flows within
> the universe of possible flower matches and tc actions to be offloaded.
> It is planned that this support will grow over time but the support
> offered by this patch-set seems to be a reasonable starting point.
> 
> Key Changes since v2:
> * Revised flow structures to simplify setup/teardown and locking of stats
> * Addressed other code-change review of v2
> 
> Other review questions regarding v2 have been answered on netdev.

Looks OK, thanks Simon and Pieter!

Reviewed-by: Jakub Kicinski

[PATCH 18/29] netfilter: nft_set_hash: add lookup variant for fixed size hashtable

2017-06-29 Thread Pablo Neira Ayuso

This patch provides a faster variant of the lookup function for 2 and 4
byte keys. Optimizing the one byte case is not worth, as the set backend
selection will always select the bitmap set type for such case.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_set_hash.c | 60 ++--
 1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index b2eab94362d6..0fa01d772c5e 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -409,6 +409,38 @@ static bool nft_hash_lookup(const struct net *net, const 
struct nft_set *set,
return false;
 }
 
+/* nft_hash_select_ops() makes sure key size can be either 2 or 4 bytes . */
+static inline u32 nft_hash_key(const u32 *key, u32 klen)
+{
+   if (klen == 4)
+   return *key;
+
+   return *(u16 *)key;
+}
+
+static bool nft_hash_lookup_fast(const struct net *net,
+const struct nft_set *set,
+const u32 *key, const struct nft_set_ext **ext)
+{
+   struct nft_hash *priv = nft_set_priv(set);
+   u8 genmask = nft_genmask_cur(net);
+   const struct nft_hash_elem *he;
+   u32 hash, k1, k2;
+
+   k1 = nft_hash_key(key, set->klen);
+   hash = jhash_1word(k1, priv->seed);
+   hash = reciprocal_scale(hash, priv->buckets);
+   hlist_for_each_entry_rcu(he, >table[hash], node) {
+   k2 = nft_hash_key(nft_set_ext_key(>ext)->data, set->klen);
+   if (k1 == k2 &&
+   nft_set_elem_active(>ext, genmask)) {
+   *ext = >ext;
+   return true;
+   }
+   }
+   return false;
+}
+
 static int nft_hash_insert(const struct net *net, const struct nft_set *set,
   const struct nft_set_elem *elem,
   struct nft_set_ext **ext)
@@ -588,12 +620,36 @@ static struct nft_set_ops nft_hash_ops __read_mostly = {
.features   = NFT_SET_MAP | NFT_SET_OBJECT,
 };
 
+static struct nft_set_ops nft_hash_fast_ops __read_mostly = {
+   .type   = _hash_type,
+   .privsize   = nft_hash_privsize,
+   .elemsize   = offsetof(struct nft_hash_elem, ext),
+   .estimate   = nft_hash_estimate,
+   .init   = nft_hash_init,
+   .destroy= nft_hash_destroy,
+   .insert = nft_hash_insert,
+   .activate   = nft_hash_activate,
+   .deactivate = nft_hash_deactivate,
+   .flush  = nft_hash_flush,
+   .remove = nft_hash_remove,
+   .lookup = nft_hash_lookup_fast,
+   .walk   = nft_hash_walk,
+   .features   = NFT_SET_MAP | NFT_SET_OBJECT,
+};
+
 static const struct nft_set_ops *
 nft_hash_select_ops(const struct nft_ctx *ctx, const struct nft_set_desc *desc,
u32 flags)
 {
-   if (desc->size)
-   return _hash_ops;
+   if (desc->size) {
+   switch (desc->klen) {
+   case 2:
+   case 4:
+   return _hash_fast_ops;
+   default:
+   return _hash_ops;
+   }
+   }
 
return _rhash_ops;
 }
-- 
2.1.4

[PATCH 07/29] netfilter: conntrack: add nf_ct_iterate_destroy

2017-06-29 Thread Pablo Neira Ayuso

From: Florian Westphal 

sledgehammer to be used on module unload (to remove affected conntracks
from all namespaces).

It will also flag all unconfirmed conntracks as dying, i.e. they will
not be committed to main table.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_conntrack.h |  4 ++
 net/netfilter/nf_conntrack_core.c| 87 ++--
 2 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index f21180ea4558..48407569585d 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -229,6 +229,10 @@ void nf_ct_iterate_cleanup_net(struct net *net,
   int (*iter)(struct nf_conn *i, void *data),
   void *data, u32 portid, int report);
 
+/* also set unconfirmed conntracks as dying. Only use in module exit path. */
+void nf_ct_iterate_destroy(int (*iter)(struct nf_conn *i, void *data),
+  void *data);
+
 struct nf_conntrack_zone;
 
 void nf_conntrack_free(struct nf_conn *ct);
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 08733685d732..7ecee79c78b8 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1586,7 +1586,7 @@ static void nf_conntrack_attach(struct sk_buff *nskb, 
const struct sk_buff *skb)
 
 /* Bring out ya dead! */
 static struct nf_conn *
-get_next_corpse(struct net *net, int (*iter)(struct nf_conn *i, void *data),
+get_next_corpse(int (*iter)(struct nf_conn *i, void *data),
void *data, unsigned int *bucket)
 {
struct nf_conntrack_tuple_hash *h;
@@ -1603,8 +1603,7 @@ get_next_corpse(struct net *net, int (*iter)(struct 
nf_conn *i, void *data),
if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
continue;
ct = nf_ct_tuplehash_to_ctrack(h);
-   if (net_eq(nf_ct_net(ct), net) &&
-   iter(ct, data))
+   if (iter(ct, data))
goto found;
}
}
@@ -1621,6 +1620,39 @@ get_next_corpse(struct net *net, int (*iter)(struct 
nf_conn *i, void *data),
return ct;
 }
 
+static void nf_ct_iterate_cleanup(int (*iter)(struct nf_conn *i, void *data),
+ void *data, u32 portid, int report)
+{
+   struct nf_conn *ct;
+   unsigned int bucket = 0;
+
+   might_sleep();
+
+   while ((ct = get_next_corpse(iter, data, )) != NULL) {
+   /* Time to push up daises... */
+
+   nf_ct_delete(ct, portid, report);
+   nf_ct_put(ct);
+   cond_resched();
+   }
+}
+
+struct iter_data {
+   int (*iter)(struct nf_conn *i, void *data);
+   void *data;
+   struct net *net;
+};
+
+static int iter_net_only(struct nf_conn *i, void *data)
+{
+   struct iter_data *d = data;
+
+   if (!net_eq(d->net, nf_ct_net(i)))
+   return 0;
+
+   return d->iter(i, d->data);
+}
+
 static void
 __nf_ct_unconfirmed_destroy(struct net *net)
 {
@@ -1653,8 +1685,7 @@ void nf_ct_iterate_cleanup_net(struct net *net,
   int (*iter)(struct nf_conn *i, void *data),
   void *data, u32 portid, int report)
 {
-   struct nf_conn *ct;
-   unsigned int bucket = 0;
+   struct iter_data d;
 
might_sleep();
 
@@ -1663,21 +1694,51 @@ void nf_ct_iterate_cleanup_net(struct net *net,
 
__nf_ct_unconfirmed_destroy(net);
 
+   d.iter = iter;
+   d.data = data;
+   d.net = net;
+
synchronize_net();
 
-   while ((ct = get_next_corpse(net, iter, data, )) != NULL) {
-   /* Time to push up daises... */
+   nf_ct_iterate_cleanup(iter_net_only, , portid, report);
+}
+EXPORT_SYMBOL_GPL(nf_ct_iterate_cleanup_net);
 
-   nf_ct_delete(ct, portid, report);
-   nf_ct_put(ct);
-   cond_resched();
+/**
+ * nf_ct_iterate_destroy - destroy unconfirmed conntracks and iterate table
+ * @iter: callback to invoke for each conntrack
+ * @data: data to pass to @iter
+ *
+ * Like nf_ct_iterate_cleanup, but first marks conntracks on the
+ * unconfirmed list as dying (so they will not be inserted into
+ * main table).
+ */
+void
+nf_ct_iterate_destroy(int (*iter)(struct nf_conn *i, void *data), void *data)
+{
+   struct net *net;
+
+   rtnl_lock();
+   for_each_net(net) {
+   if (atomic_read(>ct.count) == 0)
+   continue;
+   __nf_ct_unconfirmed_destroy(net);
}
+   rtnl_unlock();
+
+   /* a conntrack could have been unlinked from unconfirmed list
+

[PATCH 08/29] netfilter: conntrack: restart iteration on resize

2017-06-29 Thread Pablo Neira Ayuso

From: Florian Westphal 

We could some conntracks when a resize occurs in parallel.

Avoid this by sampling generation seqcnt and doing a restart if needed.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_core.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 7ecee79c78b8..c3bd9b086dcc 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1623,17 +1623,25 @@ get_next_corpse(int (*iter)(struct nf_conn *i, void 
*data),
 static void nf_ct_iterate_cleanup(int (*iter)(struct nf_conn *i, void *data),
  void *data, u32 portid, int report)
 {
+   unsigned int bucket = 0, sequence;
struct nf_conn *ct;
-   unsigned int bucket = 0;
 
might_sleep();
 
-   while ((ct = get_next_corpse(iter, data, )) != NULL) {
-   /* Time to push up daises... */
+   for (;;) {
+   sequence = read_seqcount_begin(_conntrack_generation);
 
-   nf_ct_delete(ct, portid, report);
-   nf_ct_put(ct);
-   cond_resched();
+   while ((ct = get_next_corpse(iter, data, )) != NULL) {
+   /* Time to push up daises... */
+
+   nf_ct_delete(ct, portid, report);
+   nf_ct_put(ct);
+   cond_resched();
+   }
+
+   if (!read_seqcount_retry(_conntrack_generation, sequence))
+   break;
+   bucket = 0;
}
 }
 
-- 
2.1.4

[PATCH 05/29] netfilter: conntrack: rename nf_ct_iterate_cleanup

2017-06-29 Thread Pablo Neira Ayuso

From: Florian Westphal 

There are several places where we needlesly call nf_ct_iterate_cleanup,
we should instead iterate the full table at module unload time.

This is a leftover from back when the conntrack table got duplicated
per net namespace.

So rename nf_ct_iterate_cleanup to nf_ct_iterate_cleanup_net.
A later patch will then add a non-net variant.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_conntrack.h|  6 +++---
 net/ipv4/netfilter/nf_nat_masquerade_ipv4.c |  4 ++--
 net/ipv6/netfilter/nf_nat_masquerade_ipv6.c | 10 +-
 net/netfilter/nf_conntrack_core.c   | 10 +-
 net/netfilter/nf_conntrack_netlink.c|  4 ++--
 net/netfilter/nf_conntrack_proto.c  |  4 ++--
 net/netfilter/nf_nat_core.c |  6 +++---
 7 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index 8ece3612d0cd..f21180ea4558 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -225,9 +225,9 @@ extern s32 (*nf_ct_nat_offset)(const struct nf_conn *ct,
   u32 seq);
 
 /* Iterate over all conntracks: if iter returns true, it's deleted. */
-void nf_ct_iterate_cleanup(struct net *net,
-  int (*iter)(struct nf_conn *i, void *data),
-  void *data, u32 portid, int report);
+void nf_ct_iterate_cleanup_net(struct net *net,
+  int (*iter)(struct nf_conn *i, void *data),
+  void *data, u32 portid, int report);
 
 struct nf_conntrack_zone;
 
diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c 
b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
index dc1dea15c1b4..f39037fca923 100644
--- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
@@ -98,8 +98,8 @@ static int masq_device_event(struct notifier_block *this,
 */
NF_CT_ASSERT(dev->ifindex != 0);
 
-   nf_ct_iterate_cleanup(net, device_cmp,
- (void *)(long)dev->ifindex, 0, 0);
+   nf_ct_iterate_cleanup_net(net, device_cmp,
+ (void *)(long)dev->ifindex, 0, 0);
}
 
return NOTIFY_DONE;
diff --git a/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c 
b/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c
index 2297c9f073ba..d7b679037bae 100644
--- a/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c
+++ b/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c
@@ -75,8 +75,8 @@ static int masq_device_event(struct notifier_block *this,
struct net *net = dev_net(dev);
 
if (event == NETDEV_DOWN)
-   nf_ct_iterate_cleanup(net, device_cmp,
- (void *)(long)dev->ifindex, 0, 0);
+   nf_ct_iterate_cleanup_net(net, device_cmp,
+ (void *)(long)dev->ifindex, 0, 0);
 
return NOTIFY_DONE;
 }
@@ -99,7 +99,7 @@ static void iterate_cleanup_work(struct work_struct *work)
w = container_of(work, struct masq_dev_work, work);
 
index = w->ifindex;
-   nf_ct_iterate_cleanup(w->net, device_cmp, (void *)index, 0, 0);
+   nf_ct_iterate_cleanup_net(w->net, device_cmp, (void *)index, 0, 0);
 
put_net(w->net);
kfree(w);
@@ -110,12 +110,12 @@ static void iterate_cleanup_work(struct work_struct *work)
 /* ipv6 inet notifier is an atomic notifier, i.e. we cannot
  * schedule.
  *
- * Unfortunately, nf_ct_iterate_cleanup can run for a long
+ * Unfortunately, nf_ct_iterate_cleanup_net can run for a long
  * time if there are lots of conntracks and the system
  * handles high softirq load, so it frequently calls cond_resched
  * while iterating the conntrack table.
  *
- * So we defer nf_ct_iterate_cleanup walk to the system workqueue.
+ * So we defer nf_ct_iterate_cleanup_net walk to the system workqueue.
  *
  * As we can have 'a lot' of inet_events (depending on amount
  * of ipv6 addresses being deleted), we also need to add an upper
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index e847dbaa0c6b..2730f9df33b7 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1634,9 +1634,9 @@ get_next_corpse(struct net *net, int (*iter)(struct 
nf_conn *i, void *data),
return ct;
 }
 
-void nf_ct_iterate_cleanup(struct net *net,
-  int (*iter)(struct nf_conn *i, void *data),
-  void *data, u32 portid, int report)
+void nf_ct_iterate_cleanup_net(struct net *net,
+  int (*iter)(struct nf_conn *i, void *data),
+  void *data, u32 portid, int report)
 {
struct nf_conn *ct;
unsigned int bucket = 0;
@@ -1654,7

[PATCH 16/29] netfilter: nf_tables: allow large allocations for new sets

2017-06-29 Thread Pablo Neira Ayuso

The new fixed size hashtable backend implementation may result in a
large array of buckets that would spew splats from mm. Update this code
to fall back on vmalloc in case the memory allocation order is too
costly.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_tables_api.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 2969016d8cad..bc8f03a53734 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3054,10 +3055,11 @@ static int nf_tables_newset(struct net *net, struct 
sock *nlsk,
if (ops->privsize != NULL)
size = ops->privsize(nla, );
 
-   err = -ENOMEM;
-   set = kzalloc(sizeof(*set) + size + udlen, GFP_KERNEL);
-   if (set == NULL)
+   set = kvzalloc(sizeof(*set) + size + udlen, GFP_KERNEL);
+   if (!set) {
+   err = -ENOMEM;
goto err1;
+   }
 
nla_strlcpy(name, nla[NFTA_SET_NAME], sizeof(set->name));
err = nf_tables_set_alloc_name(, set, name);
@@ -3100,7 +3102,7 @@ static int nf_tables_newset(struct net *net, struct sock 
*nlsk,
 err3:
ops->destroy(set);
 err2:
-   kfree(set);
+   kvfree(set);
 err1:
module_put(ops->type->owner);
return err;
@@ -3110,7 +3112,7 @@ static void nft_set_destroy(struct nft_set *set)
 {
set->ops->destroy(set);
module_put(set->ops->type->owner);
-   kfree(set);
+   kvfree(set);
 }
 
 static void nf_tables_set_destroy(const struct nft_ctx *ctx, struct nft_set 
*set)
-- 
2.1.4

[PATCH 00/29] Netfilter updates for net-next

2017-06-29 Thread Pablo Neira Ayuso

Hi David,

The following patchset contains Netfilter updates for your net-next
tree. This batch contains connection tracking updates for the cleanup
iteration path, patches from Florian Westphal:

X) Skip unconfirmed conntracks in nf_ct_iterate_cleanup_net(), just set
   dying bit to let the CPU release them.

X) Add nf_ct_iterate_destroy() to be used on module removal, to kill
   conntrack from all namespace.

X) Restart iteration on hashtable resizing, since both may occur at
   the same time.

X) Use the new nf_ct_iterate_destroy() to remove conntrack with NAT
   mapping on module removal.

X) Use nf_ct_iterate_destroy() to remove conntrack entries helper
   module removal, from Liping Zhang.

X) Use nf_ct_iterate_cleanup_net() to remove the timeout extension
   if user requests this, also from Liping.

X) Add net_ns_barrier() and use it from FTP helper, so make sure
   no concurrent namespace removal happens at the same time while
   the helper module is being removed.

X) Use NFPROTO_MAX in layer 3 conntrack protocol array, to reduce
   module size. Same thing in nf_tables.

Updates for the nf_tables infrastructure:

X) Prepare usage of the extended ACK reporting infrastructure for
   nf_tables.

X) Remove unnecessary forward declaration in nf_tables hash set.

X) Skip set size estimation if number of element is not specified.

X) Changes to accomodate a (faster) unresizable hash set implementation,
   for anonymous sets and dynamic size fixed sets with no timeouts.

X) Faster lookup function for unresizable hash table for 2 and 4
   bytes key.

And, finally, a bunch of asorted small updates and cleanups:

X) Do not hold reference to netdev from ipt_CLUSTER, instead subscribe
   to device events and look up for index from the packet path, this
   is fixing an issue that is present since the very beginning, patch
   from Xin Long.

X) Use nf_register_net_hook() in ipt_CLUSTER, from Florian Westphal.

X) Use ebt_invalid_target() whenever possible in the ebtables tree,
   from Gao Feng.

X) Calm down compilation warning in nf_dup infrastructure, patch from
   stephen hemminger.

X) Statify functions in nftables rt expression, also from stephen.

X) Update Makefile to use canonical method to specify nf_tables-objs.
   From Jike Song.

X) Use nf_conntrack_helpers_register() in amanda and H323.

X) Space cleanup for ctnetlink, from linzhang.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Thanks!



The following changes since commit 417ccf6b5bc3f1a390505d5ef65ec17f10e8b29a:

  net: make struct request_sock_ops::obj_size unsigned (2017-05-23 11:13:19 
-0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git HEAD

for you to fetch changes up to 04ba724b659c6808b0ca31528121bdb2f2807e00:

  netfilter: nfnetlink: extended ACK reporting (2017-06-19 19:38:24 +0200)


Florian Westphal (10):
  netfilter: ipt_CLUSTERIP: switch to nf_register_net_hook
  netfilter: conntrack: rename nf_ct_iterate_cleanup
  netfilter: conntrack: don't call iter for non-confirmed conntracks
  netfilter: conntrack: add nf_ct_iterate_destroy
  netfilter: conntrack: restart iteration on resize
  netfilter: nat: destroy nat mappings on module exit path only
  netfilter: move table iteration out of netns exit paths
  netns: add and use net_ns_barrier
  netfilter: conntrack: use NFPROTO_MAX to size array
  netfilter: nf_tables: reduce chain type table size

Gao Feng (1):
  netfilter: ebt: Use new helper ebt_invalid_target to check target

Jike Song (1):
  netfilter, kbuild: use canonical method to specify objs.

Liping Zhang (3):
  netfilter: nf_ct_helper: use nf_ct_iterate_destroy to unlink helper objs
  netfilter: cttimeout: use nf_ct_iterate_cleanup_net to unlink timeout objs
  netfilter: use nf_conntrack_helpers_register when possible

Pablo Neira Ayuso (10):
  netfilter: nft_set_hash: unnecessary forward declaration
  netfilter: nf_tables: no size estimation if number of set elements is 
unknown
  netfilter: nft_set_hash: use nft_rhash prefix for resizable set backend
  netfilter: nf_tables: select set backend flavour depending on description
  netfilter: nf_tables: pass set description to ->privsize
  netfilter: nft_set_hash: add nft_hash_buckets()
  netfilter: nf_tables: allow large allocations for new sets
  netfilter: nft_set_hash: add non-resizable hashtable implementation
  netfilter: nft_set_hash: add lookup variant for fixed size hashtable
  netfilter: nfnetlink: extended ACK reporting

Xin Long (1):
  netfilter: ipt_CLUSTERIP: do not hold dev

linzhang (1):
  netfilter: ctnetlink: delete extra spaces

stephen hemminger (2):
  netfilter: dup: resolve warnings about missing prototypes

[PATCH 17/29] netfilter: nft_set_hash: add non-resizable hashtable implementation

2017-06-29 Thread Pablo Neira Ayuso

This patch adds a simple non-resizable hashtable implementation. If the
user specifies the set size, then this new faster hashtable flavour is
selected.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_set_hash.c | 210 +--
 1 file changed, 202 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 466cb7092dfa..b2eab94362d6 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -371,14 +371,181 @@ static u32 nft_hash_buckets(u32 size)
 static bool nft_rhash_estimate(const struct nft_set_desc *desc, u32 features,
   struct nft_set_estimate *est)
 {
-   if (desc->size)
-   est->size = sizeof(struct nft_rhash) +
-   nft_hash_buckets(desc->size) *
-   sizeof(struct nft_rhash_elem *) +
-   desc->size * sizeof(struct nft_rhash_elem);
-   else
-   est->size = ~0;
+   est->size   = ~0;
+   est->lookup = NFT_SET_CLASS_O_1;
+   est->space  = NFT_SET_CLASS_O_N;
+
+   return true;
+}
+
+struct nft_hash {
+   u32 seed;
+   u32 buckets;
+   struct hlist_head   table[];
+};
+
+struct nft_hash_elem {
+   struct hlist_node   node;
+   struct nft_set_ext  ext;
+};
+
+static bool nft_hash_lookup(const struct net *net, const struct nft_set *set,
+   const u32 *key, const struct nft_set_ext **ext)
+{
+   struct nft_hash *priv = nft_set_priv(set);
+   u8 genmask = nft_genmask_cur(net);
+   const struct nft_hash_elem *he;
+   u32 hash;
+
+   hash = jhash(key, set->klen, priv->seed);
+   hash = reciprocal_scale(hash, priv->buckets);
+   hlist_for_each_entry_rcu(he, >table[hash], node) {
+   if (!memcmp(nft_set_ext_key(>ext), key, set->klen) &&
+   nft_set_elem_active(>ext, genmask)) {
+   *ext = >ext;
+   return true;
+   }
+   }
+   return false;
+}
+
+static int nft_hash_insert(const struct net *net, const struct nft_set *set,
+  const struct nft_set_elem *elem,
+  struct nft_set_ext **ext)
+{
+   struct nft_hash_elem *this = elem->priv, *he;
+   struct nft_hash *priv = nft_set_priv(set);
+   u8 genmask = nft_genmask_next(net);
+   u32 hash;
+
+   hash = jhash(nft_set_ext_key(>ext), set->klen, priv->seed);
+   hash = reciprocal_scale(hash, priv->buckets);
+   hlist_for_each_entry(he, >table[hash], node) {
+   if (!memcmp(nft_set_ext_key(>ext),
+   nft_set_ext_key(>ext), set->klen) &&
+   nft_set_elem_active(>ext, genmask)) {
+   *ext = >ext;
+   return -EEXIST;
+   }
+   }
+   hlist_add_head_rcu(>node, >table[hash]);
+   return 0;
+}
+
+static void nft_hash_activate(const struct net *net, const struct nft_set *set,
+ const struct nft_set_elem *elem)
+{
+   struct nft_hash_elem *he = elem->priv;
+
+   nft_set_elem_change_active(net, set, >ext);
+}
+
+static bool nft_hash_flush(const struct net *net,
+  const struct nft_set *set, void *priv)
+{
+   struct nft_hash_elem *he = priv;
+
+   nft_set_elem_change_active(net, set, >ext);
+   return true;
+}
+
+static void *nft_hash_deactivate(const struct net *net,
+const struct nft_set *set,
+const struct nft_set_elem *elem)
+{
+   struct nft_hash *priv = nft_set_priv(set);
+   struct nft_hash_elem *this = elem->priv, *he;
+   u8 genmask = nft_genmask_next(net);
+   u32 hash;
+
+   hash = jhash(nft_set_ext_key(>ext), set->klen, priv->seed);
+   hash = reciprocal_scale(hash, priv->buckets);
+   hlist_for_each_entry(he, >table[hash], node) {
+   if (!memcmp(nft_set_ext_key(>ext), >key.val,
+   set->klen) ||
+   nft_set_elem_active(>ext, genmask)) {
+   nft_set_elem_change_active(net, set, >ext);
+   return he;
+   }
+   }
+   return NULL;
+}
+
+static void nft_hash_remove(const struct net *net,
+   const struct nft_set *set,
+   const struct nft_set_elem *elem)
+{
+   struct nft_hash_elem *he = elem->priv;
+
+   hlist_del_rcu(>node);
+}
+
+static void nft_hash_walk(const struct nft_ctx *ctx, struct nft_set *set,
+ struct nft_set_iter *iter)
+{
+   struct nft_hash *priv = nft_set_priv(set);
+   struct nft_hash_elem *he;
+   struct nft_set_elem elem;
+   int i;
+
+   for (i = 0; i <

[PATCH 12/29] netfilter: nft_set_hash: use nft_rhash prefix for resizable set backend

2017-06-29 Thread Pablo Neira Ayuso

This patch prepares the introduction of a non-resizable hashtable
implementation that is significantly faster.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_set_hash.c | 212 +--
 1 file changed, 106 insertions(+), 106 deletions(-)

diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 1f1cc33895fd..7c21e3da0d88 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -22,43 +22,43 @@
 #include 
 
 /* We target a hash table size of 4, element hint is 75% of final size */
-#define NFT_HASH_ELEMENT_HINT 3
+#define NFT_RHASH_ELEMENT_HINT 3
 
-struct nft_hash {
+struct nft_rhash {
struct rhashtable   ht;
struct delayed_work gc_work;
 };
 
-struct nft_hash_elem {
+struct nft_rhash_elem {
struct rhash_head   node;
struct nft_set_ext  ext;
 };
 
-struct nft_hash_cmp_arg {
+struct nft_rhash_cmp_arg {
const struct nft_set*set;
const u32   *key;
u8  genmask;
 };
 
-static inline u32 nft_hash_key(const void *data, u32 len, u32 seed)
+static inline u32 nft_rhash_key(const void *data, u32 len, u32 seed)
 {
-   const struct nft_hash_cmp_arg *arg = data;
+   const struct nft_rhash_cmp_arg *arg = data;
 
return jhash(arg->key, len, seed);
 }
 
-static inline u32 nft_hash_obj(const void *data, u32 len, u32 seed)
+static inline u32 nft_rhash_obj(const void *data, u32 len, u32 seed)
 {
-   const struct nft_hash_elem *he = data;
+   const struct nft_rhash_elem *he = data;
 
return jhash(nft_set_ext_key(>ext), len, seed);
 }
 
-static inline int nft_hash_cmp(struct rhashtable_compare_arg *arg,
-  const void *ptr)
+static inline int nft_rhash_cmp(struct rhashtable_compare_arg *arg,
+   const void *ptr)
 {
-   const struct nft_hash_cmp_arg *x = arg->key;
-   const struct nft_hash_elem *he = ptr;
+   const struct nft_rhash_cmp_arg *x = arg->key;
+   const struct nft_rhash_elem *he = ptr;
 
if (memcmp(nft_set_ext_key(>ext), x->key, x->set->klen))
return 1;
@@ -69,49 +69,49 @@ static inline int nft_hash_cmp(struct 
rhashtable_compare_arg *arg,
return 0;
 }
 
-static const struct rhashtable_params nft_hash_params = {
-   .head_offset= offsetof(struct nft_hash_elem, node),
-   .hashfn = nft_hash_key,
-   .obj_hashfn = nft_hash_obj,
-   .obj_cmpfn  = nft_hash_cmp,
+static const struct rhashtable_params nft_rhash_params = {
+   .head_offset= offsetof(struct nft_rhash_elem, node),
+   .hashfn = nft_rhash_key,
+   .obj_hashfn = nft_rhash_obj,
+   .obj_cmpfn  = nft_rhash_cmp,
.automatic_shrinking= true,
 };
 
-static bool nft_hash_lookup(const struct net *net, const struct nft_set *set,
-   const u32 *key, const struct nft_set_ext **ext)
+static bool nft_rhash_lookup(const struct net *net, const struct nft_set *set,
+const u32 *key, const struct nft_set_ext **ext)
 {
-   struct nft_hash *priv = nft_set_priv(set);
-   const struct nft_hash_elem *he;
-   struct nft_hash_cmp_arg arg = {
+   struct nft_rhash *priv = nft_set_priv(set);
+   const struct nft_rhash_elem *he;
+   struct nft_rhash_cmp_arg arg = {
.genmask = nft_genmask_cur(net),
.set = set,
.key = key,
};
 
-   he = rhashtable_lookup_fast(>ht, , nft_hash_params);
+   he = rhashtable_lookup_fast(>ht, , nft_rhash_params);
if (he != NULL)
*ext = >ext;
 
return !!he;
 }
 
-static bool nft_hash_update(struct nft_set *set, const u32 *key,
-   void *(*new)(struct nft_set *,
-const struct nft_expr *,
-struct nft_regs *regs),
-   const struct nft_expr *expr,
-   struct nft_regs *regs,
-   const struct nft_set_ext **ext)
+static bool nft_rhash_update(struct nft_set *set, const u32 *key,
+void *(*new)(struct nft_set *,
+ const struct nft_expr *,
+ struct nft_regs *regs),
+const struct nft_expr *expr,
+struct nft_regs *regs,
+const struct nft_set_ext **ext)
 {
-   struct nft_hash *priv = nft_set_priv(set);
-   struct nft_hash_elem *he, *prev;
-   struct nft_hash_cmp_arg arg = {
+   struct nft_rhash *priv = nft_set_priv(set);
+   struct nft_rhash_elem *he, *prev;
+   struct

[PATCH 14/29] netfilter: nf_tables: pass set description to ->privsize

2017-06-29 Thread Pablo Neira Ayuso

The new non-resizable hashtable variant needs this to calculate the
size of the bucket array.

Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_tables.h | 3 ++-
 net/netfilter/nf_tables_api.c | 2 +-
 net/netfilter/nft_set_bitmap.c| 3 ++-
 net/netfilter/nft_set_hash.c  | 3 ++-
 net/netfilter/nft_set_rbtree.c| 3 ++-
 5 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index f27012098846..bd5be0d691d5 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -351,7 +351,8 @@ struct nft_set_ops {
struct nft_set *set,
struct nft_set_iter *iter);
 
-   unsigned int(*privsize)(const struct nlattr * const 
nla[]);
+   unsigned int(*privsize)(const struct nlattr * const 
nla[],
+   const struct nft_set_desc 
*desc);
bool(*estimate)(const struct nft_set_desc 
*desc,
u32 features,
struct nft_set_estimate 
*est);
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index c0b2b19607e1..2969016d8cad 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3052,7 +3052,7 @@ static int nf_tables_newset(struct net *net, struct sock 
*nlsk,
 
size = 0;
if (ops->privsize != NULL)
-   size = ops->privsize(nla);
+   size = ops->privsize(nla, );
 
err = -ENOMEM;
set = kzalloc(sizeof(*set) + size + udlen, GFP_KERNEL);
diff --git a/net/netfilter/nft_set_bitmap.c b/net/netfilter/nft_set_bitmap.c
index 87d17691278f..734989c40579 100644
--- a/net/netfilter/nft_set_bitmap.c
+++ b/net/netfilter/nft_set_bitmap.c
@@ -236,7 +236,8 @@ static inline u32 nft_bitmap_total_size(u32 klen)
return sizeof(struct nft_bitmap) + nft_bitmap_size(klen);
 }
 
-static unsigned int nft_bitmap_privsize(const struct nlattr * const nla[])
+static unsigned int nft_bitmap_privsize(const struct nlattr * const nla[],
+   const struct nft_set_desc *desc)
 {
u32 klen = ntohl(nla_get_be32(nla[NFTA_SET_KEY_LEN]));
 
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 4ba0717408d9..455a11ce8cd0 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -321,7 +321,8 @@ static void nft_rhash_gc(struct work_struct *work)
   nft_set_gc_interval(set));
 }
 
-static unsigned int nft_rhash_privsize(const struct nlattr * const nla[])
+static unsigned int nft_rhash_privsize(const struct nlattr * const nla[],
+  const struct nft_set_desc *desc)
 {
return sizeof(struct nft_rhash);
 }
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 29d41d378339..491e805d3ca2 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -251,7 +251,8 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
read_unlock_bh(>lock);
 }
 
-static unsigned int nft_rbtree_privsize(const struct nlattr * const nla[])
+static unsigned int nft_rbtree_privsize(const struct nlattr * const nla[],
+   const struct nft_set_desc *desc)
 {
return sizeof(struct nft_rbtree);
 }
-- 
2.1.4

[PATCH 20/29] netfilter: cttimeout: use nf_ct_iterate_cleanup_net to unlink timeout objs

2017-06-29 Thread Pablo Neira Ayuso

From: Liping Zhang 

Similar to nf_conntrack_helper, we can use nf_ct_iterare_cleanup_net to
remove these copy & paste code.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nfnetlink_cttimeout.c | 39 +
 1 file changed, 5 insertions(+), 34 deletions(-)

diff --git a/net/netfilter/nfnetlink_cttimeout.c 
b/net/netfilter/nfnetlink_cttimeout.c
index a3e7bb54d96a..49638b03ccc9 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -287,49 +287,20 @@ static int cttimeout_get_timeout(struct net *net, struct 
sock *ctnl,
return ret;
 }
 
-static void untimeout(struct nf_conntrack_tuple_hash *i,
- struct ctnl_timeout *timeout)
+static int untimeout(struct nf_conn *ct, void *timeout)
 {
-   struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(i);
struct nf_conn_timeout *timeout_ext = nf_ct_timeout_find(ct);
 
if (timeout_ext && (!timeout || timeout_ext->timeout == timeout))
RCU_INIT_POINTER(timeout_ext->timeout, NULL);
+
+   /* We are not intended to delete this conntrack. */
+   return 0;
 }
 
 static void ctnl_untimeout(struct net *net, struct ctnl_timeout *timeout)
 {
-   struct nf_conntrack_tuple_hash *h;
-   const struct hlist_nulls_node *nn;
-   unsigned int last_hsize;
-   spinlock_t *lock;
-   int i, cpu;
-
-   for_each_possible_cpu(cpu) {
-   struct ct_pcpu *pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
-
-   spin_lock_bh(>lock);
-   hlist_nulls_for_each_entry(h, nn, >unconfirmed, hnnode)
-   untimeout(h, timeout);
-   spin_unlock_bh(>lock);
-   }
-
-   local_bh_disable();
-restart:
-   last_hsize = nf_conntrack_htable_size;
-   for (i = 0; i < last_hsize; i++) {
-   lock = _conntrack_locks[i % CONNTRACK_LOCKS];
-   nf_conntrack_lock(lock);
-   if (last_hsize != nf_conntrack_htable_size) {
-   spin_unlock(lock);
-   goto restart;
-   }
-
-   hlist_nulls_for_each_entry(h, nn, _conntrack_hash[i], hnnode)
-   untimeout(h, timeout);
-   spin_unlock(lock);
-   }
-   local_bh_enable();
+   nf_ct_iterate_cleanup_net(net, untimeout, timeout, 0, 0);
 }
 
 /* try to delete object, fail if it is still in use. */
-- 
2.1.4

[PATCH 15/29] netfilter: nft_set_hash: add nft_hash_buckets()

2017-06-29 Thread Pablo Neira Ayuso

Add nft_hash_buckets() helper function to calculate the number of
hashtable buckets based on the elements. This function can be reused
from the follow up patch to add non-resizable hashtables.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_set_hash.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 455a11ce8cd0..466cb7092dfa 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -363,12 +363,17 @@ static void nft_rhash_destroy(const struct nft_set *set)
(void *)set);
 }
 
+static u32 nft_hash_buckets(u32 size)
+{
+   return roundup_pow_of_two(size * 4 / 3);
+}
+
 static bool nft_rhash_estimate(const struct nft_set_desc *desc, u32 features,
   struct nft_set_estimate *est)
 {
if (desc->size)
est->size = sizeof(struct nft_rhash) +
-   roundup_pow_of_two(desc->size * 4 / 3) *
+   nft_hash_buckets(desc->size) *
sizeof(struct nft_rhash_elem *) +
desc->size * sizeof(struct nft_rhash_elem);
else
-- 
2.1.4

[PATCH 19/29] netfilter: nf_ct_helper: use nf_ct_iterate_destroy to unlink helper objs

2017-06-29 Thread Pablo Neira Ayuso

From: Liping Zhang 

When we unlink the helper objects, we will iterate the nf_conntrack_hash,
iterate the unconfirmed list, handle the hash resize situation, etc.

Actually this logic is same as the nf_ct_iterate_destroy, so we can use
it to remove these copy & paste code.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_helper.c | 50 +++--
 1 file changed, 4 insertions(+), 46 deletions(-)

diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index 7f6100ca63be..9129bb3b5153 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -285,16 +285,16 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct 
nf_conn *tmpl,
 EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper);
 
 /* appropriate ct lock protecting must be taken by caller */
-static inline int unhelp(struct nf_conntrack_tuple_hash *i,
-const struct nf_conntrack_helper *me)
+static int unhelp(struct nf_conn *ct, void *me)
 {
-   struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(i);
struct nf_conn_help *help = nfct_help(ct);
 
if (help && rcu_dereference_raw(help->helper) == me) {
nf_conntrack_event(IPCT_HELPER, ct);
RCU_INIT_POINTER(help->helper, NULL);
}
+
+   /* We are not intended to delete this conntrack. */
return 0;
 }
 
@@ -437,33 +437,10 @@ int nf_conntrack_helper_register(struct 
nf_conntrack_helper *me)
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_helper_register);
 
-static void __nf_conntrack_helper_unregister(struct nf_conntrack_helper *me,
-struct net *net)
-{
-   struct nf_conntrack_tuple_hash *h;
-   const struct hlist_nulls_node *nn;
-   int cpu;
-
-   /* Get rid of expecteds, set helpers to NULL. */
-   for_each_possible_cpu(cpu) {
-   struct ct_pcpu *pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
-
-   spin_lock_bh(>lock);
-   hlist_nulls_for_each_entry(h, nn, >unconfirmed, hnnode)
-   unhelp(h, me);
-   spin_unlock_bh(>lock);
-   }
-}
-
 void nf_conntrack_helper_unregister(struct nf_conntrack_helper *me)
 {
-   struct nf_conntrack_tuple_hash *h;
struct nf_conntrack_expect *exp;
const struct hlist_node *next;
-   const struct hlist_nulls_node *nn;
-   unsigned int last_hsize;
-   spinlock_t *lock;
-   struct net *net;
unsigned int i;
 
mutex_lock(_ct_helper_mutex);
@@ -491,26 +468,7 @@ void nf_conntrack_helper_unregister(struct 
nf_conntrack_helper *me)
}
spin_unlock_bh(_conntrack_expect_lock);
 
-   rtnl_lock();
-   for_each_net(net)
-   __nf_conntrack_helper_unregister(me, net);
-   rtnl_unlock();
-
-   local_bh_disable();
-restart:
-   last_hsize = nf_conntrack_htable_size;
-   for (i = 0; i < last_hsize; i++) {
-   lock = _conntrack_locks[i % CONNTRACK_LOCKS];
-   nf_conntrack_lock(lock);
-   if (last_hsize != nf_conntrack_htable_size) {
-   spin_unlock(lock);
-   goto restart;
-   }
-   hlist_nulls_for_each_entry(h, nn, _conntrack_hash[i], hnnode)
-   unhelp(h, me);
-   spin_unlock(lock);
-   }
-   local_bh_enable();
+   nf_ct_iterate_destroy(unhelp, me);
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_helper_unregister);
 
-- 
2.1.4

[PATCH 13/29] netfilter: nf_tables: select set backend flavour depending on description

2017-06-29 Thread Pablo Neira Ayuso

This patch adds the infrastructure to support several implementations of
the same set type. This selection will be based on the set description
and the features available for this set. This allow us to select set
backend implementation that will result in better performance numbers.

Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_tables.h | 26 +
 net/netfilter/nf_tables_api.c | 59 ---
 net/netfilter/nft_set_bitmap.c| 10 +--
 net/netfilter/nft_set_hash.c  | 10 +--
 net/netfilter/nft_set_rbtree.c| 10 +--
 5 files changed, 80 insertions(+), 35 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 8a8bab8d7b15..f27012098846 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -281,6 +281,23 @@ struct nft_set_estimate {
enum nft_set_class  space;
 };
 
+/**
+ *  struct nft_set_type - nf_tables set type
+ *
+ *  @select_ops: function to select nft_set_ops
+ *  @ops: default ops, used when no select_ops functions is present
+ *  @list: used internally
+ *  @owner: module reference
+ */
+struct nft_set_type {
+   const struct nft_set_ops*(*select_ops)(const struct nft_ctx *,
+  const struct 
nft_set_desc *desc,
+  u32 flags);
+   const struct nft_set_ops*ops;
+   struct list_headlist;
+   struct module   *owner;
+};
+
 struct nft_set_ext;
 struct nft_expr;
 
@@ -297,8 +314,6 @@ struct nft_expr;
  * @privsize: function to return size of set private data
  * @init: initialize private data of new set instance
  * @destroy: destroy private data of set instance
- * @list: nf_tables_set_ops list node
- * @owner: module reference
  * @elemsize: element private size
  * @features: features supported by the implementation
  */
@@ -345,14 +360,13 @@ struct nft_set_ops {
const struct nlattr * const 
nla[]);
void(*destroy)(const struct nft_set *set);
 
-   struct list_headlist;
-   struct module   *owner;
unsigned intelemsize;
u32 features;
+   const struct nft_set_type   *type;
 };
 
-int nft_register_set(struct nft_set_ops *ops);
-void nft_unregister_set(struct nft_set_ops *ops);
+int nft_register_set(struct nft_set_type *type);
+void nft_unregister_set(struct nft_set_type *type);
 
 /**
  * struct nft_set - nf_tables set instance
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index da314be0c048..c0b2b19607e1 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2377,64 +2377,77 @@ static int nf_tables_delrule(struct net *net, struct 
sock *nlsk,
  * Sets
  */
 
-static LIST_HEAD(nf_tables_set_ops);
+static LIST_HEAD(nf_tables_set_types);
 
-int nft_register_set(struct nft_set_ops *ops)
+int nft_register_set(struct nft_set_type *type)
 {
nfnl_lock(NFNL_SUBSYS_NFTABLES);
-   list_add_tail_rcu(>list, _tables_set_ops);
+   list_add_tail_rcu(>list, _tables_set_types);
nfnl_unlock(NFNL_SUBSYS_NFTABLES);
return 0;
 }
 EXPORT_SYMBOL_GPL(nft_register_set);
 
-void nft_unregister_set(struct nft_set_ops *ops)
+void nft_unregister_set(struct nft_set_type *type)
 {
nfnl_lock(NFNL_SUBSYS_NFTABLES);
-   list_del_rcu(>list);
+   list_del_rcu(>list);
nfnl_unlock(NFNL_SUBSYS_NFTABLES);
 }
 EXPORT_SYMBOL_GPL(nft_unregister_set);
 
+#define NFT_SET_FEATURES   (NFT_SET_INTERVAL | NFT_SET_MAP | \
+NFT_SET_TIMEOUT | NFT_SET_OBJECT)
+
+static bool nft_set_ops_candidate(const struct nft_set_ops *ops, u32 flags)
+{
+   return (flags & ops->features) == (flags & NFT_SET_FEATURES);
+}
+
 /*
  * Select a set implementation based on the data characteristics and the
  * given policy. The total memory use might not be known if no size is
  * given, in that case the amount of memory per element is used.
  */
 static const struct nft_set_ops *
-nft_select_set_ops(const struct nlattr * const nla[],
+nft_select_set_ops(const struct nft_ctx *ctx,
+  const struct nlattr * const nla[],
   const struct nft_set_desc *desc,
   enum nft_set_policies policy)
 {
const struct nft_set_ops *ops, *bops;
struct nft_set_estimate est, best;
-   u32 features;
+   const struct nft_set_type *type;
+   u32 flags = 0;
 
 #ifdef CONFIG_MODULES
-   if (list_empty(_tables_set_ops)) {
+   if (list_empty(_tables_set_types)) {
nfnl_unlock(NFNL_SUBSYS_NFTABLES);
request_module("nft-set");

[PATCH 26/29] netfilter: use nf_conntrack_helpers_register when possible

2017-06-29 Thread Pablo Neira Ayuso

From: Liping Zhang 

amanda_helper, nf_conntrack_helper_ras and nf_conntrack_helper_q931 are
all arrays, so we can use nf_conntrack_helpers_register to register
the ct helper, this will help us to eliminate some "goto errX"
statements.

Also introduce h323_helper_init/exit helper function to register the ct
helpers, this is prepared for the followup patch, which will add net
namespace support for ct helper.

Signed-off-by: Liping Zhang 
Acked-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_amanda.c| 12 +++
 net/netfilter/nf_conntrack_h323_main.c | 63 +++---
 2 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/net/netfilter/nf_conntrack_amanda.c 
b/net/netfilter/nf_conntrack_amanda.c
index 03d2ccffa9fa..20edd589fe06 100644
--- a/net/netfilter/nf_conntrack_amanda.c
+++ b/net/netfilter/nf_conntrack_amanda.c
@@ -197,8 +197,8 @@ static void __exit nf_conntrack_amanda_fini(void)
 {
int i;
 
-   nf_conntrack_helper_unregister(_helper[0]);
-   nf_conntrack_helper_unregister(_helper[1]);
+   nf_conntrack_helpers_unregister(amanda_helper,
+   ARRAY_SIZE(amanda_helper));
for (i = 0; i < ARRAY_SIZE(search); i++)
textsearch_destroy(search[i].ts);
 }
@@ -218,16 +218,12 @@ static int __init nf_conntrack_amanda_init(void)
goto err1;
}
}
-   ret = nf_conntrack_helper_register(_helper[0]);
+   ret = nf_conntrack_helpers_register(amanda_helper,
+   ARRAY_SIZE(amanda_helper));
if (ret < 0)
goto err1;
-   ret = nf_conntrack_helper_register(_helper[1]);
-   if (ret < 0)
-   goto err2;
return 0;
 
-err2:
-   nf_conntrack_helper_unregister(_helper[0]);
 err1:
while (--i >= 0)
textsearch_destroy(search[i].ts);
diff --git a/net/netfilter/nf_conntrack_h323_main.c 
b/net/netfilter/nf_conntrack_h323_main.c
index 3bcdc718484e..f71f0d2558fd 100644
--- a/net/netfilter/nf_conntrack_h323_main.c
+++ b/net/netfilter/nf_conntrack_h323_main.c
@@ -1815,14 +1815,44 @@ static struct nf_conntrack_helper 
nf_conntrack_helper_ras[] __read_mostly = {
},
 };
 
+static int __init h323_helper_init(void)
+{
+   int ret;
+
+   ret = nf_conntrack_helper_register(_conntrack_helper_h245);
+   if (ret < 0)
+   return ret;
+   ret = nf_conntrack_helpers_register(nf_conntrack_helper_q931,
+   ARRAY_SIZE(nf_conntrack_helper_q931));
+   if (ret < 0)
+   goto err1;
+   ret = nf_conntrack_helpers_register(nf_conntrack_helper_ras,
+   ARRAY_SIZE(nf_conntrack_helper_ras));
+   if (ret < 0)
+   goto err2;
+
+   return 0;
+err2:
+   nf_conntrack_helpers_unregister(nf_conntrack_helper_q931,
+   ARRAY_SIZE(nf_conntrack_helper_q931));
+err1:
+   nf_conntrack_helper_unregister(_conntrack_helper_h245);
+   return ret;
+}
+
+static void __exit h323_helper_exit(void)
+{
+   nf_conntrack_helpers_unregister(nf_conntrack_helper_ras,
+   ARRAY_SIZE(nf_conntrack_helper_ras));
+   nf_conntrack_helpers_unregister(nf_conntrack_helper_q931,
+   ARRAY_SIZE(nf_conntrack_helper_q931));
+   nf_conntrack_helper_unregister(_conntrack_helper_h245);
+}
+
 //
 static void __exit nf_conntrack_h323_fini(void)
 {
-   nf_conntrack_helper_unregister(_conntrack_helper_ras[1]);
-   nf_conntrack_helper_unregister(_conntrack_helper_ras[0]);
-   nf_conntrack_helper_unregister(_conntrack_helper_q931[1]);
-   nf_conntrack_helper_unregister(_conntrack_helper_q931[0]);
-   nf_conntrack_helper_unregister(_conntrack_helper_h245);
+   h323_helper_exit();
kfree(h323_buffer);
pr_debug("nf_ct_h323: fini\n");
 }
@@ -1837,32 +1867,11 @@ static int __init nf_conntrack_h323_init(void)
h323_buffer = kmalloc(65536, GFP_KERNEL);
if (!h323_buffer)
return -ENOMEM;
-   ret = nf_conntrack_helper_register(_conntrack_helper_h245);
+   ret = h323_helper_init();
if (ret < 0)
goto err1;
-   ret = nf_conntrack_helper_register(_conntrack_helper_q931[0]);
-   if (ret < 0)
-   goto err2;
-   ret = nf_conntrack_helper_register(_conntrack_helper_q931[1]);
-   if (ret < 0)
-   goto err3;
-   ret = nf_conntrack_helper_register(_conntrack_helper_ras[0]);
-   if (ret < 0)
-   goto err4;
-   ret = nf_conntrack_helper_register(_conntrack_helper_ras[1]);
-   if (ret < 0)
-   goto err5;

[PATCH 06/29] netfilter: conntrack: don't call iter for non-confirmed conntracks

2017-06-29 Thread Pablo Neira Ayuso

From: Florian Westphal 

nf_ct_iterate_cleanup_net currently calls iter() callback also for
conntracks on the unconfirmed list, but this is unsafe.

Acesses to nf_conn are fine, but some users access the extension area
in the iter() callback, but that does only work reliably for confirmed
conntracks (ct->ext can be reallocated at any time for unconfirmed
conntrack).

The seond issue is that there is a short window where a conntrack entry
is neither on the list nor in the table: To confirm an entry, it is first
removed from the unconfirmed list, then insert into the table.

Fix this by iterating the unconfirmed list first and marking all entries
as dying, then wait for rcu grace period.

This makes sure all entries that were about to be confirmed either are
in the main table, or will be dropped soon.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_core.c | 39 +--
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 2730f9df33b7..08733685d732 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1592,7 +1592,6 @@ get_next_corpse(struct net *net, int (*iter)(struct 
nf_conn *i, void *data),
struct nf_conntrack_tuple_hash *h;
struct nf_conn *ct;
struct hlist_nulls_node *n;
-   int cpu;
spinlock_t *lockp;
 
for (; *bucket < nf_conntrack_htable_size; (*bucket)++) {
@@ -1614,24 +1613,40 @@ get_next_corpse(struct net *net, int (*iter)(struct 
nf_conn *i, void *data),
cond_resched();
}
 
+   return NULL;
+found:
+   atomic_inc(>ct_general.use);
+   spin_unlock(lockp);
+   local_bh_enable();
+   return ct;
+}
+
+static void
+__nf_ct_unconfirmed_destroy(struct net *net)
+{
+   int cpu;
+
for_each_possible_cpu(cpu) {
-   struct ct_pcpu *pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
+   struct nf_conntrack_tuple_hash *h;
+   struct hlist_nulls_node *n;
+   struct ct_pcpu *pcpu;
+
+   pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
 
spin_lock_bh(>lock);
hlist_nulls_for_each_entry(h, n, >unconfirmed, hnnode) {
+   struct nf_conn *ct;
+
ct = nf_ct_tuplehash_to_ctrack(h);
-   if (iter(ct, data))
-   set_bit(IPS_DYING_BIT, >status);
+
+   /* we cannot call iter() on unconfirmed list, the
+* owning cpu can reallocate ct->ext at any time.
+*/
+   set_bit(IPS_DYING_BIT, >status);
}
spin_unlock_bh(>lock);
cond_resched();
}
-   return NULL;
-found:
-   atomic_inc(>ct_general.use);
-   spin_unlock(lockp);
-   local_bh_enable();
-   return ct;
 }
 
 void nf_ct_iterate_cleanup_net(struct net *net,
@@ -1646,6 +1661,10 @@ void nf_ct_iterate_cleanup_net(struct net *net,
if (atomic_read(>ct.count) == 0)
return;
 
+   __nf_ct_unconfirmed_destroy(net);
+
+   synchronize_net();
+
while ((ct = get_next_corpse(net, iter, data, )) != NULL) {
/* Time to push up daises... */
 
-- 
2.1.4

[PATCH 23/29] netns: add and use net_ns_barrier

2017-06-29 Thread Pablo Neira Ayuso

From: Florian Westphal 

Quoting Joe Stringer:
  If a user loads nf_conntrack_ftp, sends FTP traffic through a network
  namespace, destroys that namespace then unloads the FTP helper module,
  then the kernel will crash.

Events that lead to the crash:
1. conntrack is created with ftp helper in netns x
2. This netns is destroyed
3. netns destruction is scheduled
4. netns destruction wq starts, removes netns from global list
5. ftp helper is unloaded, which resets all helpers of the conntracks
via for_each_net()

but because netns is already gone from list the for_each_net() loop
doesn't include it, therefore all of these conntracks are unaffected.

6. helper module unload finishes
7. netns wq invokes destructor for rmmod'ed helper

CC: "Eric W. Biederman" 
Reported-by: Joe Stringer 
Signed-off-by: Florian Westphal 
Acked-by: David S. Miller 
Acked-by: "Eric W. Biederman" 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/net_namespace.h   |  3 +++
 net/core/net_namespace.c  | 17 +
 net/netfilter/nf_conntrack_core.c |  9 +
 3 files changed, 29 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index fe80bb48ab1f..a24a57593202 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -158,6 +158,7 @@ extern struct net init_net;
 struct net *copy_net_ns(unsigned long flags, struct user_namespace *user_ns,
struct net *old_net);
 
+void net_ns_barrier(void);
 #else /* CONFIG_NET_NS */
 #include 
 #include 
@@ -168,6 +169,8 @@ static inline struct net *copy_net_ns(unsigned long flags,
return ERR_PTR(-EINVAL);
return old_net;
 }
+
+static inline void net_ns_barrier(void) {}
 #endif /* CONFIG_NET_NS */
 
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 1934efd4a9d4..1f15abb1d733 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -482,6 +482,23 @@ static void cleanup_net(struct work_struct *work)
net_drop_ns(net);
}
 }
+
+/**
+ * net_ns_barrier - wait until concurrent net_cleanup_work is done
+ *
+ * cleanup_net runs from work queue and will first remove namespaces
+ * from the global list, then run net exit functions.
+ *
+ * Call this in module exit path to make sure that all netns
+ * ->exit ops have been invoked before the function is removed.
+ */
+void net_ns_barrier(void)
+{
+   mutex_lock(_mutex);
+   mutex_unlock(_mutex);
+}
+EXPORT_SYMBOL(net_ns_barrier);
+
 static DECLARE_WORK(net_cleanup_work, cleanup_net);
 
 void __put_net(struct net *net)
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index c3bd9b086dcc..9979f46c81dc 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1720,6 +1720,8 @@ EXPORT_SYMBOL_GPL(nf_ct_iterate_cleanup_net);
  * Like nf_ct_iterate_cleanup, but first marks conntracks on the
  * unconfirmed list as dying (so they will not be inserted into
  * main table).
+ *
+ * Can only be called in module exit path.
  */
 void
 nf_ct_iterate_destroy(int (*iter)(struct nf_conn *i, void *data), void *data)
@@ -1734,6 +1736,13 @@ nf_ct_iterate_destroy(int (*iter)(struct nf_conn *i, 
void *data), void *data)
}
rtnl_unlock();
 
+   /* Need to wait for netns cleanup worker to finish, if its
+* running -- it might have deleted a net namespace from
+* the global list, so our __nf_ct_unconfirmed_destroy() might
+* not have affected all namespaces.
+*/
+   net_ns_barrier();
+
/* a conntrack could have been unlinked from unconfirmed list
 * before we grabbed pcpu lock in __nf_ct_unconfirmed_destroy().
 * This makes sure its inserted into conntrack table.
-- 
2.1.4

[PATCH 09/29] netfilter: nat: destroy nat mappings on module exit path only

2017-06-29 Thread Pablo Neira Ayuso

From: Florian Westphal 

We don't need pernetns cleanup anymore.  If the netns is being
destroyed, conntrack netns exit will kill all entries in this namespace,
and neither conntrack hash table nor bysource hash are per namespace.

For the rmmod case, we have to make sure we remove all entries from the
nat bysource table, so call the new nf_ct_iterate_destroy in module exit
path.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_nat_core.c | 37 +
 1 file changed, 5 insertions(+), 32 deletions(-)

diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index daf5b22c07f8..d26cc2f864e6 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -582,12 +582,8 @@ static void nf_nat_l4proto_clean(u8 l3proto, u8 l4proto)
.l3proto = l3proto,
.l4proto = l4proto,
};
-   struct net *net;
 
-   rtnl_lock();
-   for_each_net(net)
-   nf_ct_iterate_cleanup_net(net, nf_nat_proto_remove, , 0, 
0);
-   rtnl_unlock();
+   nf_ct_iterate_destroy(nf_nat_proto_remove, );
 }
 
 static void nf_nat_l3proto_clean(u8 l3proto)
@@ -595,13 +591,8 @@ static void nf_nat_l3proto_clean(u8 l3proto)
struct nf_nat_proto_clean clean = {
.l3proto = l3proto,
};
-   struct net *net;
 
-   rtnl_lock();
-
-   for_each_net(net)
-   nf_ct_iterate_cleanup_net(net, nf_nat_proto_remove, , 0, 
0);
-   rtnl_unlock();
+   nf_ct_iterate_destroy(nf_nat_proto_remove, );
 }
 
 /* Protocol registration. */
@@ -822,17 +813,6 @@ nfnetlink_parse_nat_setup(struct nf_conn *ct,
 }
 #endif
 
-static void __net_exit nf_nat_net_exit(struct net *net)
-{
-   struct nf_nat_proto_clean clean = {};
-
-   nf_ct_iterate_cleanup_net(net, nf_nat_proto_clean, , 0, 0);
-}
-
-static struct pernet_operations nf_nat_net_ops = {
-   .exit = nf_nat_net_exit,
-};
-
 static struct nf_ct_helper_expectfn follow_master_nat = {
.name   = "nat-follow-master",
.expectfn   = nf_nat_follow_master,
@@ -853,10 +833,6 @@ static int __init nf_nat_init(void)
return ret;
}
 
-   ret = register_pernet_subsys(_nat_net_ops);
-   if (ret < 0)
-   goto cleanup_extend;
-
nf_ct_helper_expectfn_register(_master_nat);
 
BUG_ON(nfnetlink_parse_nat_setup_hook != NULL);
@@ -867,18 +843,15 @@ static int __init nf_nat_init(void)
RCU_INIT_POINTER(nf_nat_decode_session_hook, __nf_nat_decode_session);
 #endif
return 0;
-
- cleanup_extend:
-   rhltable_destroy(_nat_bysource_table);
-   nf_ct_extend_unregister(_extend);
-   return ret;
 }
 
 static void __exit nf_nat_cleanup(void)
 {
+   struct nf_nat_proto_clean clean = {};
unsigned int i;
 
-   unregister_pernet_subsys(_nat_net_ops);
+   nf_ct_iterate_destroy(nf_nat_proto_clean, );
+
nf_ct_extend_unregister(_extend);
nf_ct_helper_expectfn_unregister(_master_nat);
RCU_INIT_POINTER(nfnetlink_parse_nat_setup_hook, NULL);
-- 
2.1.4

[PATCH 24/29] netfilter: ebt: Use new helper ebt_invalid_target to check target

2017-06-29 Thread Pablo Neira Ayuso

From: Gao Feng 

Use the new helper function ebt_invalid_target instead of the old
macro INVALID_TARGET and other duplicated codes to enhance the readability.

Signed-off-by: Gao Feng 
Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter_bridge/ebtables.h | 2 --
 net/bridge/netfilter/ebt_dnat.c   | 2 +-
 net/bridge/netfilter/ebt_mark.c   | 2 +-
 net/bridge/netfilter/ebt_redirect.c   | 2 +-
 net/bridge/netfilter/ebt_snat.c   | 2 +-
 5 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/include/linux/netfilter_bridge/ebtables.h 
b/include/linux/netfilter_bridge/ebtables.h
index e0cbf17af780..2c2a5514b0df 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -122,8 +122,6 @@ extern unsigned int ebt_do_table(struct sk_buff *skb,
 #define BASE_CHAIN (par->hook_mask & (1 << NF_BR_NUMHOOKS))
 /* Clear the bit in the hook mask that tells if the rule is on a base chain */
 #define CLEAR_BASE_CHAIN_BIT (par->hook_mask &= ~(1 << NF_BR_NUMHOOKS))
-/* True if the target is not a standard target */
-#define INVALID_TARGET (info->target < -NUM_STANDARD_TARGETS || info->target 
>= 0)
 
 static inline bool ebt_invalid_target(int target)
 {
diff --git a/net/bridge/netfilter/ebt_dnat.c b/net/bridge/netfilter/ebt_dnat.c
index e0bb624c3845..dfc86a0199da 100644
--- a/net/bridge/netfilter/ebt_dnat.c
+++ b/net/bridge/netfilter/ebt_dnat.c
@@ -61,7 +61,7 @@ static int ebt_dnat_tg_check(const struct xt_tgchk_param *par)
(strcmp(par->table, "broute") != 0 ||
hook_mask & ~(1 << NF_BR_BROUTING)))
return -EINVAL;
-   if (INVALID_TARGET)
+   if (ebt_invalid_target(info->target))
return -EINVAL;
return 0;
 }
diff --git a/net/bridge/netfilter/ebt_mark.c b/net/bridge/netfilter/ebt_mark.c
index 66697cbd0a8b..19f0f9592d32 100644
--- a/net/bridge/netfilter/ebt_mark.c
+++ b/net/bridge/netfilter/ebt_mark.c
@@ -44,7 +44,7 @@ static int ebt_mark_tg_check(const struct xt_tgchk_param *par)
tmp = info->target | ~EBT_VERDICT_BITS;
if (BASE_CHAIN && tmp == EBT_RETURN)
return -EINVAL;
-   if (tmp < -NUM_STANDARD_TARGETS || tmp >= 0)
+   if (ebt_invalid_target(tmp))
return -EINVAL;
tmp = info->target & ~EBT_VERDICT_BITS;
if (tmp != MARK_SET_VALUE && tmp != MARK_OR_VALUE &&
diff --git a/net/bridge/netfilter/ebt_redirect.c 
b/net/bridge/netfilter/ebt_redirect.c
index 8d2a85e0594e..a7223eaf490b 100644
--- a/net/bridge/netfilter/ebt_redirect.c
+++ b/net/bridge/netfilter/ebt_redirect.c
@@ -47,7 +47,7 @@ static int ebt_redirect_tg_check(const struct xt_tgchk_param 
*par)
(strcmp(par->table, "broute") != 0 ||
hook_mask & ~(1 << NF_BR_BROUTING)))
return -EINVAL;
-   if (INVALID_TARGET)
+   if (ebt_invalid_target(info->target))
return -EINVAL;
return 0;
 }
diff --git a/net/bridge/netfilter/ebt_snat.c b/net/bridge/netfilter/ebt_snat.c
index e56ccd060d26..11cf9e9e9222 100644
--- a/net/bridge/netfilter/ebt_snat.c
+++ b/net/bridge/netfilter/ebt_snat.c
@@ -51,7 +51,7 @@ static int ebt_snat_tg_check(const struct xt_tgchk_param *par)
if (BASE_CHAIN && tmp == EBT_RETURN)
return -EINVAL;
 
-   if (tmp < -NUM_STANDARD_TARGETS || tmp >= 0)
+   if (ebt_invalid_target(tmp))
return -EINVAL;
tmp = info->target | EBT_VERDICT_BITS;
if ((tmp & ~NAT_ARP_BIT) != ~NAT_ARP_BIT)
-- 
2.1.4

[PATCH 25/29] netfilter, kbuild: use canonical method to specify objs.

2017-06-29 Thread Pablo Neira Ayuso

From: Jike Song 

Should use ":=" instead of "+=".

Signed-off-by: Jike Song 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/Makefile | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index c9b78e7b342f..913380919301 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -70,10 +70,9 @@ obj-$(CONFIG_NETFILTER_SYNPROXY) += nf_synproxy_core.o
 obj-$(CONFIG_NF_DUP_NETDEV)+= nf_dup_netdev.o
 
 # nf_tables
-nf_tables-objs += nf_tables_core.o nf_tables_api.o nf_tables_trace.o
-nf_tables-objs += nft_immediate.o nft_cmp.o nft_range.o
-nf_tables-objs += nft_bitwise.o nft_byteorder.o nft_payload.o
-nf_tables-objs += nft_lookup.o nft_dynset.o
+nf_tables-objs := nf_tables_core.o nf_tables_api.o nf_tables_trace.o \
+ nft_immediate.o nft_cmp.o nft_range.o nft_bitwise.o \
+ nft_byteorder.o nft_payload.o nft_lookup.o nft_dynset.o
 
 obj-$(CONFIG_NF_TABLES)+= nf_tables.o
 obj-$(CONFIG_NF_TABLES_INET)   += nf_tables_inet.o
-- 
2.1.4

[PATCH 29/29] netfilter: nfnetlink: extended ACK reporting

2017-06-29 Thread Pablo Neira Ayuso

Pass down struct netlink_ext_ack as parameter to all of our nfnetlink
subsystem callbacks, so we can work on follow up patches to provide
finer grain error reporting using the new infrastructure that
2d4bc93368f5 ("netlink: extended ACK reporting") provides.

No functional change, just pass down this new object to callbacks.

Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter/nfnetlink.h  | 10 +++---
 net/netfilter/ipset/ip_set_core.c| 39 
 net/netfilter/nf_conntrack_netlink.c | 39 
 net/netfilter/nf_tables_api.c| 59 
 net/netfilter/nfnetlink.c| 21 +
 net/netfilter/nfnetlink_acct.c   |  9 --
 net/netfilter/nfnetlink_cthelper.c   |  9 --
 net/netfilter/nfnetlink_cttimeout.c  | 15 ++---
 net/netfilter/nfnetlink_log.c|  6 ++--
 net/netfilter/nfnetlink_queue.c  | 12 +---
 net/netfilter/nft_compat.c   |  3 +-
 net/netfilter/xt_osf.c   |  6 ++--
 12 files changed, 152 insertions(+), 76 deletions(-)

diff --git a/include/linux/netfilter/nfnetlink.h 
b/include/linux/netfilter/nfnetlink.h
index 996711d8a7b4..41d04e9d088a 100644
--- a/include/linux/netfilter/nfnetlink.h
+++ b/include/linux/netfilter/nfnetlink.h
@@ -1,7 +1,6 @@
 #ifndef _NFNETLINK_H
 #define _NFNETLINK_H
 
-
 #include 
 #include 
 #include 
@@ -10,13 +9,16 @@
 struct nfnl_callback {
int (*call)(struct net *net, struct sock *nl, struct sk_buff *skb,
const struct nlmsghdr *nlh,
-   const struct nlattr * const cda[]);
+   const struct nlattr * const cda[],
+   struct netlink_ext_ack *extack);
int (*call_rcu)(struct net *net, struct sock *nl, struct sk_buff *skb,
const struct nlmsghdr *nlh,
-   const struct nlattr * const cda[]);
+   const struct nlattr * const cda[],
+   struct netlink_ext_ack *extack);
int (*call_batch)(struct net *net, struct sock *nl, struct sk_buff *skb,
  const struct nlmsghdr *nlh,
- const struct nlattr * const cda[]);
+ const struct nlattr * const cda[],
+ struct netlink_ext_ack *extack);
const struct nla_policy *policy;/* netlink attribute policy */
const u_int16_t attr_count; /* number of nlattr's */
 };
diff --git a/net/netfilter/ipset/ip_set_core.c 
b/net/netfilter/ipset/ip_set_core.c
index ba6a5516dc7c..e495b5e484b1 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -841,14 +841,16 @@ find_free_id(struct ip_set_net *inst, const char *name, 
ip_set_id_t *index,
 
 static int ip_set_none(struct net *net, struct sock *ctnl, struct sk_buff *skb,
   const struct nlmsghdr *nlh,
-  const struct nlattr * const attr[])
+  const struct nlattr * const attr[],
+  struct netlink_ext_ack *extack)
 {
return -EOPNOTSUPP;
 }
 
 static int ip_set_create(struct net *net, struct sock *ctnl,
 struct sk_buff *skb, const struct nlmsghdr *nlh,
-const struct nlattr * const attr[])
+const struct nlattr * const attr[],
+struct netlink_ext_ack *extack)
 {
struct ip_set_net *inst = ip_set_pernet(net);
struct ip_set *set, *clash = NULL;
@@ -989,7 +991,8 @@ ip_set_destroy_set(struct ip_set *set)
 
 static int ip_set_destroy(struct net *net, struct sock *ctnl,
  struct sk_buff *skb, const struct nlmsghdr *nlh,
- const struct nlattr * const attr[])
+ const struct nlattr * const attr[],
+ struct netlink_ext_ack *extack)
 {
struct ip_set_net *inst = ip_set_pernet(net);
struct ip_set *s;
@@ -1067,7 +1070,8 @@ ip_set_flush_set(struct ip_set *set)
 
 static int ip_set_flush(struct net *net, struct sock *ctnl, struct sk_buff 
*skb,
const struct nlmsghdr *nlh,
-   const struct nlattr * const attr[])
+   const struct nlattr * const attr[],
+   struct netlink_ext_ack *extack)
 {
struct ip_set_net *inst = ip_set_pernet(net);
struct ip_set *s;
@@ -1106,7 +1110,8 @@ ip_set_setname2_policy[IPSET_ATTR_CMD_MAX + 1] = {
 
 static int ip_set_rename(struct net *net, struct sock *ctnl,
 struct sk_buff *skb, const struct nlmsghdr *nlh,
-const struct nlattr * const attr[])
+const struct nlattr * const attr[],
+struct netlink_ext_ack *extack)
 {
struct ip_set_net *inst = ip_set_pernet(net);
struct ip_set *set, *s;
@@

[PATCH 28/29] netfilter: nf_tables: reduce chain type table size

2017-06-29 Thread Pablo Neira Ayuso

From: Florian Westphal 

text  data  bss dechex filename
old: 151590  2240 1152  154982  25d66 net/netfilter/nf_tables_api.o
new: 151666  2240  416  154322  25ad2 net/netfilter/nf_tables_api.o

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_tables_api.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index bc8f03a53734..5f3339978f6b 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -387,7 +387,7 @@ static inline u64 nf_tables_alloc_handle(struct nft_table 
*table)
return ++table->hgenerator;
 }
 
-static const struct nf_chain_type *chain_type[AF_MAX][NFT_CHAIN_T_MAX];
+static const struct nf_chain_type 
*chain_type[NFPROTO_NUMPROTO][NFT_CHAIN_T_MAX];
 
 static const struct nf_chain_type *
 __nf_tables_chain_type_lookup(int family, const struct nlattr *nla)
@@ -870,6 +870,9 @@ int nft_register_chain_type(const struct nf_chain_type 
*ctype)
 {
int err = 0;
 
+   if (WARN_ON(ctype->family >= NFPROTO_NUMPROTO))
+   return -EINVAL;
+
nfnl_lock(NFNL_SUBSYS_NFTABLES);
if (chain_type[ctype->family][ctype->type] != NULL) {
err = -EBUSY;
-- 
2.1.4

[PATCH 22/29] netfilter: move table iteration out of netns exit paths

2017-06-29 Thread Pablo Neira Ayuso

From: Florian Westphal 

We only need to iterate & remove in case of module removal;
for netns destruction all conntracks will be removed anyway.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_proto.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/net/netfilter/nf_conntrack_proto.c 
b/net/netfilter/nf_conntrack_proto.c
index b7d01f27d463..6a36623e897c 100644
--- a/net/netfilter/nf_conntrack_proto.c
+++ b/net/netfilter/nf_conntrack_proto.c
@@ -265,6 +265,8 @@ void nf_ct_l3proto_unregister(struct nf_conntrack_l3proto 
*proto)
mutex_unlock(_ct_proto_mutex);
 
synchronize_rcu();
+   /* Remove all contrack entries for this protocol */
+   nf_ct_iterate_destroy(kill_l3proto, proto);
 }
 EXPORT_SYMBOL_GPL(nf_ct_l3proto_unregister);
 
@@ -280,9 +282,6 @@ void nf_ct_l3proto_pernet_unregister(struct net *net,
 */
if (proto->net_ns_put)
proto->net_ns_put(net);
-
-   /* Remove all contrack entries for this protocol */
-   nf_ct_iterate_cleanup_net(net, kill_l3proto, proto, 0, 0);
 }
 EXPORT_SYMBOL_GPL(nf_ct_l3proto_pernet_unregister);
 
@@ -421,17 +420,23 @@ int nf_ct_l4proto_pernet_register_one(struct net *net,
 }
 EXPORT_SYMBOL_GPL(nf_ct_l4proto_pernet_register_one);
 
-void nf_ct_l4proto_unregister_one(struct nf_conntrack_l4proto *l4proto)
+static void __nf_ct_l4proto_unregister_one(struct nf_conntrack_l4proto 
*l4proto)
+
 {
BUG_ON(l4proto->l3proto >= PF_MAX);
 
-   mutex_lock(_ct_proto_mutex);
BUG_ON(rcu_dereference_protected(
nf_ct_protos[l4proto->l3proto][l4proto->l4proto],
lockdep_is_held(_ct_proto_mutex)
) != l4proto);
rcu_assign_pointer(nf_ct_protos[l4proto->l3proto][l4proto->l4proto],
   _conntrack_l4proto_generic);
+}
+
+void nf_ct_l4proto_unregister_one(struct nf_conntrack_l4proto *l4proto)
+{
+   mutex_lock(_ct_proto_mutex);
+   __nf_ct_l4proto_unregister_one(l4proto);
mutex_unlock(_ct_proto_mutex);
 
synchronize_rcu();
@@ -448,9 +453,6 @@ void nf_ct_l4proto_pernet_unregister_one(struct net *net,
 
pn->users--;
nf_ct_l4proto_unregister_sysctl(net, pn, l4proto);
-
-   /* Remove all contrack entries for this protocol */
-   nf_ct_iterate_cleanup_net(net, kill_l4proto, l4proto, 0, 0);
 }
 EXPORT_SYMBOL_GPL(nf_ct_l4proto_pernet_unregister_one);
 
@@ -500,8 +502,14 @@ EXPORT_SYMBOL_GPL(nf_ct_l4proto_pernet_register);
 void nf_ct_l4proto_unregister(struct nf_conntrack_l4proto *l4proto[],
  unsigned int num_proto)
 {
+   mutex_lock(_ct_proto_mutex);
while (num_proto-- != 0)
-   nf_ct_l4proto_unregister_one(l4proto[num_proto]);
+   __nf_ct_l4proto_unregister_one(l4proto[num_proto]);
+   mutex_unlock(_ct_proto_mutex);
+
+   synchronize_net();
+   /* Remove all contrack entries for this protocol */
+   nf_ct_iterate_destroy(kill_l4proto, l4proto);
 }
 EXPORT_SYMBOL_GPL(nf_ct_l4proto_unregister);
 
-- 
2.1.4

[PATCH 21/29] netfilter: ipt_CLUSTERIP: do not hold dev

2017-06-29 Thread Pablo Neira Ayuso

From: Xin Long 

It's a terrible thing to hold dev in iptables target. When the dev is
being removed, unregister_netdevice has to wait for the dev to become
free. dmesg will keep logging the err:

  kernel:unregister_netdevice: waiting for veth0_in to become free. \
  Usage count = 1

until iptables rules with this target are removed manually.

The worse thing is when deleting a netns, a virtual nic will be deleted
instead of reset to init_net in default_device_ops exit/exit_batch. As
it is earlier than to flush the iptables rules in iptable_filter_net_ops
exit, unregister_netdevice will block to wait for the nic to become free.

As unregister_netdevice is actually waiting for iptables rules flushing
while iptables rules have to be flushed after unregister_netdevice. This
'dead lock' will cause unregister_netdevice to block there forever. As
the netns is not available to operate at that moment, iptables rules can
not even be flushed manually either.

The reproducer can be:

  # ip netns add test
  # ip link add veth0_in type veth peer name veth0_out
  # ip link set veth0_in netns test
  # ip netns exec test ip link set lo up
  # ip netns exec test ip link set veth0_in up
  # ip netns exec test iptables -I INPUT -d 1.2.3.4 -i veth0_in -j \
CLUSTERIP --new --clustermac 89:d4:47:eb:9a:fa --total-nodes 3 \
--local-node 1 --hashmode sourceip-sourceport
  # ip netns del test

This issue can be triggered by all virtual nics with ipt_CLUSTERIP.

This patch is to fix it by not holding dev in ipt_CLUSTERIP, but saving
the dev->ifindex instead of the dev.

As Pablo Neira Ayuso's suggestion, it will refresh c->ifindex and dev's
mc by registering a netdevice notifier, just as what xt_TEE does. So it
removes the old codes updating dev's mc, and also no need to initialize
c->ifindex with dev->ifindex.

But as one config can be shared by more than one targets, and the netdev
notifier is per config, not per target. It couldn't get e->ip.iniface
in the notifier handler. So e->ip.iniface has to be saved into config.

Note that for backwards compatibility, this patch doesn't remove the
codes checking if the dev exists before creating a config.

v1->v2:
  - As Pablo Neira Ayuso's suggestion, register a netdevice notifier to
manage c->ifindex and dev's mc.

Reported-by: Jianlin Shi 
Signed-off-by: Xin Long 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 101 +++--
 1 file changed, 73 insertions(+), 28 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index f30bee8e407b..7d72decb80f9 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -47,7 +47,7 @@ struct clusterip_config {
 
__be32 clusterip;   /* the IP address */
u_int8_t clustermac[ETH_ALEN];  /* the MAC address */
-   struct net_device *dev; /* device */
+   int ifindex;/* device ifindex */
u_int16_t num_total_nodes;  /* total number of nodes */
unsigned long local_nodes;  /* node number array */
 
@@ -57,6 +57,9 @@ struct clusterip_config {
enum clusterip_hashmode hash_mode;  /* which hashing mode */
u_int32_t hash_initval; /* hash initialization */
struct rcu_head rcu;
+
+   char ifname[IFNAMSIZ];  /* device ifname */
+   struct notifier_block notifier; /* refresh c->ifindex in it */
 };
 
 #ifdef CONFIG_PROC_FS
@@ -98,9 +101,8 @@ clusterip_config_put(struct clusterip_config *c)
  * entry(rule) is removed, remove the config from lists, but don't free it
  * yet, since proc-files could still be holding references */
 static inline void
-clusterip_config_entry_put(struct clusterip_config *c)
+clusterip_config_entry_put(struct net *net, struct clusterip_config *c)
 {
-   struct net *net = dev_net(c->dev);
struct clusterip_net *cn = net_generic(net, clusterip_net_id);
 
local_bh_disable();
@@ -109,8 +111,7 @@ clusterip_config_entry_put(struct clusterip_config *c)
spin_unlock(>lock);
local_bh_enable();
 
-   dev_mc_del(c->dev, c->clustermac);
-   dev_put(c->dev);
+   unregister_netdevice_notifier(>notifier);
 
/* In case anyone still accesses the file, the open/close
 * functions are also incrementing the refcount on their own,
@@ -170,19 +171,55 @@ clusterip_config_init_nodelist(struct clusterip_config *c,
set_bit(i->local_nodes[n] - 1, >local_nodes);
 }
 
-static struct clusterip_config *
-clusterip_config_init(const struct ipt_clusterip_tgt_info *i, __be32 ip,
- struct net_device *dev)
+static int
+clusterip_netdev_event(struct notifier_block *this, unsigned long event,

[PATCH 27/29] netfilter: conntrack: use NFPROTO_MAX to size array

2017-06-29 Thread Pablo Neira Ayuso

From: Florian Westphal 

We don't support anything larger than NFPROTO_MAX, so we can shrink this a bit:

 text data  dec  hex filename
old: 8259 1096 9355 248b net/netfilter/nf_conntrack_proto.o
new: 8259  624 8883 22b3 net/netfilter/nf_conntrack_proto.o

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_conntrack_l3proto.h |  4 ++--
 net/netfilter/nf_conntrack_proto.c   | 18 +-
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_l3proto.h 
b/include/net/netfilter/nf_conntrack_l3proto.h
index e01559b4d781..6d14b36e3a49 100644
--- a/include/net/netfilter/nf_conntrack_l3proto.h
+++ b/include/net/netfilter/nf_conntrack_l3proto.h
@@ -71,7 +71,7 @@ struct nf_conntrack_l3proto {
struct module *me;
 };
 
-extern struct nf_conntrack_l3proto __rcu *nf_ct_l3protos[AF_MAX];
+extern struct nf_conntrack_l3proto __rcu *nf_ct_l3protos[NFPROTO_NUMPROTO];
 
 #ifdef CONFIG_SYSCTL
 /* Protocol pernet registration. */
@@ -100,7 +100,7 @@ extern struct nf_conntrack_l3proto 
nf_conntrack_l3proto_generic;
 static inline struct nf_conntrack_l3proto *
 __nf_ct_l3proto_find(u_int16_t l3proto)
 {
-   if (unlikely(l3proto >= AF_MAX))
+   if (unlikely(l3proto >= NFPROTO_NUMPROTO))
return _conntrack_l3proto_generic;
return rcu_dereference(nf_ct_l3protos[l3proto]);
 }
diff --git a/net/netfilter/nf_conntrack_proto.c 
b/net/netfilter/nf_conntrack_proto.c
index 6a36623e897c..1dcad229c3cc 100644
--- a/net/netfilter/nf_conntrack_proto.c
+++ b/net/netfilter/nf_conntrack_proto.c
@@ -28,8 +28,8 @@
 #include 
 #include 
 
-static struct nf_conntrack_l4proto __rcu **nf_ct_protos[PF_MAX] __read_mostly;
-struct nf_conntrack_l3proto __rcu *nf_ct_l3protos[AF_MAX] __read_mostly;
+static struct nf_conntrack_l4proto __rcu **nf_ct_protos[NFPROTO_NUMPROTO] 
__read_mostly;
+struct nf_conntrack_l3proto __rcu *nf_ct_l3protos[NFPROTO_NUMPROTO] 
__read_mostly;
 EXPORT_SYMBOL_GPL(nf_ct_l3protos);
 
 static DEFINE_MUTEX(nf_ct_proto_mutex);
@@ -68,7 +68,7 @@ nf_ct_unregister_sysctl(struct ctl_table_header **header,
 struct nf_conntrack_l4proto *
 __nf_ct_l4proto_find(u_int16_t l3proto, u_int8_t l4proto)
 {
-   if (unlikely(l3proto >= AF_MAX || nf_ct_protos[l3proto] == NULL))
+   if (unlikely(l3proto >= NFPROTO_NUMPROTO || nf_ct_protos[l3proto] == 
NULL))
return _conntrack_l4proto_generic;
 
return rcu_dereference(nf_ct_protos[l3proto][l4proto]);
@@ -212,7 +212,7 @@ int nf_ct_l3proto_register(struct nf_conntrack_l3proto 
*proto)
int ret = 0;
struct nf_conntrack_l3proto *old;
 
-   if (proto->l3proto >= AF_MAX)
+   if (proto->l3proto >= NFPROTO_NUMPROTO)
return -EBUSY;
 
if (proto->tuple_to_nlattr && !proto->nlattr_tuple_size)
@@ -254,7 +254,7 @@ EXPORT_SYMBOL_GPL(nf_ct_l3proto_pernet_register);
 
 void nf_ct_l3proto_unregister(struct nf_conntrack_l3proto *proto)
 {
-   BUG_ON(proto->l3proto >= AF_MAX);
+   BUG_ON(proto->l3proto >= NFPROTO_NUMPROTO);
 
mutex_lock(_ct_proto_mutex);
BUG_ON(rcu_dereference_protected(nf_ct_l3protos[proto->l3proto],
@@ -341,7 +341,7 @@ int nf_ct_l4proto_register_one(struct nf_conntrack_l4proto 
*l4proto)
 {
int ret = 0;
 
-   if (l4proto->l3proto >= PF_MAX)
+   if (l4proto->l3proto >= ARRAY_SIZE(nf_ct_protos))
return -EBUSY;
 
if ((l4proto->to_nlattr && !l4proto->nlattr_size) ||
@@ -423,7 +423,7 @@ EXPORT_SYMBOL_GPL(nf_ct_l4proto_pernet_register_one);
 static void __nf_ct_l4proto_unregister_one(struct nf_conntrack_l4proto 
*l4proto)
 
 {
-   BUG_ON(l4proto->l3proto >= PF_MAX);
+   BUG_ON(l4proto->l3proto >= ARRAY_SIZE(nf_ct_protos));
 
BUG_ON(rcu_dereference_protected(
nf_ct_protos[l4proto->l3proto][l4proto->l4proto],
@@ -556,7 +556,7 @@ void nf_conntrack_proto_pernet_fini(struct net *net)
 int nf_conntrack_proto_init(void)
 {
unsigned int i;
-   for (i = 0; i < AF_MAX; i++)
+   for (i = 0; i < NFPROTO_NUMPROTO; i++)
rcu_assign_pointer(nf_ct_l3protos[i],
   _conntrack_l3proto_generic);
return 0;
@@ -566,6 +566,6 @@ void nf_conntrack_proto_fini(void)
 {
unsigned int i;
/* free l3proto protocol tables */
-   for (i = 0; i < PF_MAX; i++)
+   for (i = 0; i < ARRAY_SIZE(nf_ct_protos); i++)
kfree(nf_ct_protos[i]);
 }
-- 
2.1.4

[PATCH 11/29] netfilter: nf_tables: no size estimation if number of set elements is unknown

2017-06-29 Thread Pablo Neira Ayuso

This size estimation is ignored by the existing set backend selection
logic, since this estimation structure is stack allocated, set this to
~0 to make it easier to catch bugs in future changes.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_set_hash.c   | 17 -
 net/netfilter/nft_set_rbtree.c |  8 +++-
 2 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 850be3a00e62..1f1cc33895fd 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -365,22 +365,13 @@ static void nft_hash_destroy(const struct nft_set *set)
 static bool nft_hash_estimate(const struct nft_set_desc *desc, u32 features,
  struct nft_set_estimate *est)
 {
-   unsigned int esize;
-
-   esize = sizeof(struct nft_hash_elem);
-   if (desc->size) {
+   if (desc->size)
est->size = sizeof(struct nft_hash) +
roundup_pow_of_two(desc->size * 4 / 3) *
sizeof(struct nft_hash_elem *) +
-   desc->size * esize;
-   } else {
-   /* Resizing happens when the load drops below 30% or goes
-* above 75%. The average of 52.5% load (approximated by 50%)
-* is used for the size estimation of the hash buckets,
-* meaning we calculate two buckets per element.
-*/
-   est->size = esize + 2 * sizeof(struct nft_hash_elem *);
-   }
+   desc->size * sizeof(struct nft_hash_elem);
+   else
+   est->size = ~0;
 
est->lookup = NFT_SET_CLASS_O_1;
est->space  = NFT_SET_CLASS_O_N;
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index e97e2fb53f0a..fbfb3cbb3916 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -283,13 +283,11 @@ static void nft_rbtree_destroy(const struct nft_set *set)
 static bool nft_rbtree_estimate(const struct nft_set_desc *desc, u32 features,
struct nft_set_estimate *est)
 {
-   unsigned int nsize;
-
-   nsize = sizeof(struct nft_rbtree_elem);
if (desc->size)
-   est->size = sizeof(struct nft_rbtree) + desc->size * nsize;
+   est->size = sizeof(struct nft_rbtree) +
+   desc->size * sizeof(struct nft_rbtree_elem);
else
-   est->size = nsize;
+   est->size = ~0;
 
est->lookup = NFT_SET_CLASS_O_LOG_N;
est->space  = NFT_SET_CLASS_O_N;
-- 
2.1.4

[PATCH 02/29] netfilter: ipt_CLUSTERIP: switch to nf_register_net_hook

2017-06-29 Thread Pablo Neira Ayuso

From: Florian Westphal 

one of the last remaining users of the old api, hopefully followup commit
can remove it soon.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 038f293c2376..f30bee8e407b 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -743,14 +743,20 @@ static const struct file_operations clusterip_proc_fops = 
{
 static int clusterip_net_init(struct net *net)
 {
struct clusterip_net *cn = net_generic(net, clusterip_net_id);
+   int ret;
 
INIT_LIST_HEAD(>configs);
 
spin_lock_init(>lock);
 
+   ret = nf_register_net_hook(net, _arp_ops);
+   if (ret < 0)
+   return ret;
+
 #ifdef CONFIG_PROC_FS
cn->procdir = proc_mkdir("ipt_CLUSTERIP", net->proc_net);
if (!cn->procdir) {
+   nf_unregister_net_hook(net, _arp_ops);
pr_err("Unable to proc dir entry\n");
return -ENOMEM;
}
@@ -765,6 +771,7 @@ static void clusterip_net_exit(struct net *net)
struct clusterip_net *cn = net_generic(net, clusterip_net_id);
proc_remove(cn->procdir);
 #endif
+   nf_unregister_net_hook(net, _arp_ops);
 }
 
 static struct pernet_operations clusterip_net_ops = {
@@ -786,17 +793,11 @@ static int __init clusterip_tg_init(void)
if (ret < 0)
goto cleanup_subsys;
 
-   ret = nf_register_hook(_arp_ops);
-   if (ret < 0)
-   goto cleanup_target;
-
pr_info("ClusterIP Version %s loaded successfully\n",
CLUSTERIP_VERSION);
 
return 0;
 
-cleanup_target:
-   xt_unregister_target(_tg_reg);
 cleanup_subsys:
unregister_pernet_subsys(_net_ops);
return ret;
@@ -806,7 +807,6 @@ static void __exit clusterip_tg_exit(void)
 {
pr_info("ClusterIP Version %s unloading\n", CLUSTERIP_VERSION);
 
-   nf_unregister_hook(_arp_ops);
xt_unregister_target(_tg_reg);
unregister_pernet_subsys(_net_ops);
 
-- 
2.1.4

[PATCH 10/29] netfilter: nft_set_hash: unnecessary forward declaration

2017-06-29 Thread Pablo Neira Ayuso

Replace struct rhashtable_params forward declaration by the structure
definition itself.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_set_hash.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 3d3a6df4ce70..850be3a00e62 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -40,8 +40,6 @@ struct nft_hash_cmp_arg {
u8  genmask;
 };
 
-static const struct rhashtable_params nft_hash_params;
-
 static inline u32 nft_hash_key(const void *data, u32 len, u32 seed)
 {
const struct nft_hash_cmp_arg *arg = data;
@@ -71,6 +69,14 @@ static inline int nft_hash_cmp(struct rhashtable_compare_arg 
*arg,
return 0;
 }
 
+static const struct rhashtable_params nft_hash_params = {
+   .head_offset= offsetof(struct nft_hash_elem, node),
+   .hashfn = nft_hash_key,
+   .obj_hashfn = nft_hash_obj,
+   .obj_cmpfn  = nft_hash_cmp,
+   .automatic_shrinking= true,
+};
+
 static bool nft_hash_lookup(const struct net *net, const struct nft_set *set,
const u32 *key, const struct nft_set_ext **ext)
 {
@@ -320,14 +326,6 @@ static unsigned int nft_hash_privsize(const struct nlattr 
* const nla[])
return sizeof(struct nft_hash);
 }
 
-static const struct rhashtable_params nft_hash_params = {
-   .head_offset= offsetof(struct nft_hash_elem, node),
-   .hashfn = nft_hash_key,
-   .obj_hashfn = nft_hash_obj,
-   .obj_cmpfn  = nft_hash_cmp,
-   .automatic_shrinking= true,
-};
-
 static int nft_hash_init(const struct nft_set *set,
 const struct nft_set_desc *desc,
 const struct nlattr * const tb[])
-- 
2.1.4

Re: CAN-FD Transceiver Limitations

2017-06-29 Thread Kurt Van Dijck

> >>
> >> mcan@0 {
> >>...
> >>fixed-transceiver {
> >>  max-canfd-speed = <2000>
> >>};
> >>...
> >> };

Since when would a transceiver support different speeds for CAN & CANFD?
No transceivers were available, but they are now.
I see no datalink problem applying 2MBit for regular CAN with apropriate
physical layer, and CAN does not predefine the physical layer
(advise != define).

IMHO,
fixed-transceiver {
max-arbitration-speed = <200>
max-data-speed = <400>
};
is way better to describe the hardware.
Regular CAN chips would not consider max-data-speed...

Kind regards,
Kurt

Re: [PATCH v1 1/2] dt-binding: ptp: add bindings document for dte based ptp clock

2017-06-29 Thread Rob Herring

On Tue, Jun 27, 2017 at 12:10 AM, Scott Branden
 wrote:
> Hi Rob/Florian,
>
> Thanks for input but still don't see any need for SoC specific
> compatible stings.  IP revision specific yes.
>
>
> On 17-06-22 06:04 PM, Florian Fainelli wrote:
>>
>> On 06/22/2017 05:42 PM, Scott Branden wrote:
>>>
>>>
>>> On 17-06-21 08:19 PM, Rob Herring wrote:

 On Tue, Jun 20, 2017 at 3:48 PM, Scott Branden
  wrote:
>
> Hi Rob,
>
>
> On 17-06-18 07:04 AM, Rob Herring wrote:
>>
>> On Mon, Jun 12, 2017 at 01:26:00PM -0700, Arun Parameswaran wrote:
>>>
>>> Add device tree binding documentation for the Broadcom DTE
>>> PTP clock driver.
>>>
>>> Signed-off-by: Arun Parameswaran 
>>> ---
>>> Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt | 13
>>> +
>>> 1 file changed, 13 insertions(+)
>>> create mode 100644
>>> Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt
>>>
>>> diff --git a/Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt
>>> b/Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt
>>> new file mode 100644
>>> index 000..07590bc
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt
>>> @@ -0,0 +1,13 @@
>>> +* Broadcom Digital Timing Engine(DTE) based PTP clock driver
>>
>> Bindings describe h/w, not drivers.
>>
>>> +
>>> +Required properties:
>>> +- compatible: should be "brcm,ptp-dte"
>>
>> Looks too generic. You need SoC specific compatible strings.
>
> Rob, could you please help me understand the use of adding SoC specific
> compatible strings.
> I still don't get it.
>
> It's my understanding that the SoC compatibility string is to future
> proof
> against bugs/incompatibilities
> between different versions of the hardware block due to integration
> issues
> or any other reason.
> You can then compare in your driver because the strings were already
> used in
> the dtb.
>
> That would make sense if you can't already differentiate what SoC you
> are
> running on.
> But the SoC is already specified in the root of the device tree in the
> compatible string?
> Why can't you just use of_machine_is_compatible inside your driver when
> needed?

 Use of of_machine_is_compatible in drivers will result in the same
 mess we had with machine_is_X defines pre-DT. It practically
 guarantees that you must update the driver for a new SoC (with
 fallback compatibles you don't). Plus the matching logic for
 of_machine_is_compatible is open coded logic in every driver which is
 worse IMO than having a standard match table.
>>>
>>> I don't understand what you mean by fallback compatible then.
>>>
>>> Let's say I have 3 SoCs that each contain the same ip block.
>>> You want us to add a fallback compatibility per SoC, is that correct?
>>
>> I think Rob meant a fallback compatibility for the particular block.
>> E.g: brcm,iproc-ptp is the fallback compatible string, but in your
>> SoC-specific DTS, you would have at least:
>>
>> compatible = "brcm,cygnus-ptp", "brcm,iproc-ptp";
>>
>> Where cygnus-ptp is more specific than iproc-ptp
>>>
>>> Then, if there is a workaround discovered in a particular SoC the driver
>>> can be updated in the future without changing the dtb.
>>>
>>> Then, the block gets added to a 4th SoC.
>>> You want us to another new compatibility string for the new SoC?
>>> If the new SoC has a bug then the driver has to be updated whether it is
>>> in uses the fallback compatible or machine_is_compatible string.
>>>
>>> There is no difference in amount of code added to a driver when a new
>>> SoC is introduced into the system that has bugs that need to be handled
>>> by the driver.
>>>
>>> The difference is in your recommendation we need to go through all the
>>> drivers used by the new SoC and add fallback compatibility strings.
>>
>> Not really, the fallback is what the driver should be matching by
>> default (hence the name fallback) and if and only if you need to have a
>> SoC-specific behavior in your driver (because of bugs, or slight
>> differences) would you be matching this SoC-specific compatible string
>> to capture that and key the driver behavior based off that.
>>
>>> Then, we have to modify all the devicetree documentation for all the
>>> drivers.  Then, we have to ensure that all dts files populate this new
>>> fallback string (even if it is unused).  We don't see the benefit in
>>> doing any of that.  Using machine_is_compatible and having less
>>> compatibility strings to deal appears much cleaner and more foolproof
>>> for all situations.
>>
>> When you introduce a new SoC, you would update all the bindings for the
>> devices (not drivers) that are susceptible to be used by this

Re: [PATCH net-next v4 04/16] bpf: Sample bpf program to set SYN/SYN-ACK RTOs

2017-06-29 Thread Lawrence Brakmo


On 6/29/17, 12:39 PM, "netdev-ow...@vger.kernel.org on behalf of Jesper 
Dangaard Brouer"  
wrote:

On Wed, 28 Jun 2017 10:31:12 -0700
Lawrence Brakmo  wrote:

> +++ b/samples/bpf/tcp_synrto_kern.c
> @@ -0,0 +1,60 @@
> +/* Copyright (c) 2017 Facebook
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + *
> + * BPF program to set SYN and SYN-ACK RTOs to 10ms when using IPv6 
addresses
> + * and the first 5.5 bytes of the IPv6 addresses are the same (in this 
example
> + * that means both hosts are in the same datacenter.

Missing end ")".

I really like this short comment of what the program does, as it helps
people browsing these sample programs. 

Can you also mention in the comment (of all these) bpf programs that
people load this bpf object file via the program 'load_sock_ops'?

Thank you for finding the typo and for the comment on adding how to load the 
sample programs. Will be done in v5 due later today.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_in_brouer=DwICAg=5VD0RTtNlTh3ycd41b3MUw=pq_Mqvzfy-C8ltkgyx1u_g=EJ1TyanCGEOIXEPnAm8BicVjUXEJLsvUQY1vNC_4r7g=INcdT-mimhOZEgFLw7hqg2V6VJ70XZJoeY83vp6V8YY=

Re: [PATCH RFC 0/2] kproxy: Kernel Proxy

2017-06-29 Thread Thomas Graf

Hi Tom

On 29 June 2017 at 11:27, Tom Herbert  wrote:
> This is raw, minimally tested, and error hanlding needs work. Posting
> as RFC to get feedback on the design...
>
> Sidecar proxies are becoming quite popular on server as a means to
> perform layer 7 processing on application data as it is sent. Such
> sidecars are used for SSL proxies, application firewalls, and L7
> load balancers. While these proxies provide nice functionality,
> their performance is obviously terrible since all the data needs
> to take an extra hop though userspace.

I really appreciate this work. It would have been nice to at least
attribute me in some way as this is exactly what I presented at
Netconf 2017 [0].

I'm also wondering why this is not built on top of KCM which you
suggested to use when we discussed this.

[0] 
https://docs.google.com/presentation/d/1dwSKSBGpUHD3WO5xxzZWj8awV_-xL-oYhvqQMOBhhtk/edit#slide=id.g203aae413f_0_0

RE: [net-next v2 6/6] ixgbe: Add malicious driver detection support

2017-06-29 Thread Tantilov, Emil S

>-Original Message-
>From: David Miller [mailto:da...@davemloft.net]
>Sent: Thursday, June 29, 2017 11:28 AM
>To: gerlitz...@gmail.com
>Cc: Tantilov, Emil S ; Kirsher, Jeffrey T
>; Greenwalt, Paul ;
>netdev@vger.kernel.org; nhor...@redhat.com; sassm...@redhat.com;
>jogre...@redhat.com
>Subject: Re: [net-next v2 6/6] ixgbe: Add malicious driver detection
>support
>
>From: Or Gerlitz 
>Date: Wed, 28 Jun 2017 17:28:59 +0300
>
>> On Wed, Jun 28, 2017 at 1:14 AM, Tantilov, Emil S
>>  wrote:
>>
>>> Mainly because I am not sure that other (non-Intel) drivers will benefit
>from
>>> such an option. In normal operation this functionality should not cause
>issues
>>> and if it doesn't we may be able to deprecate the private flag in the
>future.
>>
>> If you think this functionality makes sense any driver running over HW
>> implementing
>> it would like to be able to expose that and hence you better not use
>> private flag.

As I mentioned I don't know if this will be useful for other drivers.
The i40e driver enables it by default without possibility to disable it
and if this protection does not cause problems for ixgbe then we may not
need the option in the future. Because of this I wasn't sure if it's
worth polluting the tools with options that may end up not being 
needed/used at all.

>> Are we sure the trust UAPI can't be extended for that matter?
>
>Yeah, we should probably make this a generic control if possible.

MDD is set globally for the device, while the trusted option is set per VF.
So if we do end up adding an option it probably won't work as extension for
trusted.

Thanks,
Emil

Re: [PATCH RFC 0/2] kproxy: Kernel Proxy

2017-06-29 Thread Willy Tarreau

On Thu, Jun 29, 2017 at 01:40:26PM -0700, Tom Herbert wrote:
> > In fact that's not much what I observe in field. In practice, large
> > data streams are cheaply relayed using splice(), I could achieve
> > 60 Gbps of HTTP forwarding via HAProxy on a 4-core xeon 2 years ago.
> > And when you use SSL, the cost of the copy to/from kernel is small
> > compared to all the crypto operations surrounding this.
> >
> Right, getting rid of the extra crypto operations and so called "SSL
> inspection" is the ultimate goal this is going towards.

Yep but in order to take decisions at L7 you need to decapsulate SSL.

> HTTP is only one use case. The are other interesting use cases such as
> those in container security where the application protocol might be
> something like simple RPC.

OK that indeed makes sense in such environments.

> Performance is relevant because we
> potentially want security applied to every message in every
> communication in a containerized data center. Putting the userspace
> hop in the datapath of every packet is know to be problematic, not
> just for the performance hit  also because it increases the attack
> surface on users' privacy.

While I totally agree on the performance hit when inspecting each packet,
I fail to see the relation with users' privacy. In fact under some
circumstances it can even be the opposite. For example, using something
like kTLS for a TCP/HTTP proxy can result in cleartext being observable
in strace while it's not visible when TLS is terminated in userland because
all you see are openssl's read()/write() operations. Maybe you have specific
attacks in mind ?

> > Regarding kernel-side protocol parsing, there's an unfortunate trend
> > at moving more and more protocols to userland due to these protocols
> > evolving very quickly. At least you'll want to find a way to provide
> > these parsers from userspace, which will inevitably come with its set
> > of problems or limitations :-/
> >
> That's why everything is going BPF now ;-)

Yes, I knew you were going to suggest this :-)  I'm still prudent on it
to be honnest. I don't think it would be that easy to implement an HPACK
encoder/decoder using BPF. And even regarding just plain HTTP parsing,
certain very small operations in haproxy's parser can quickly result in
a 10% performance degradation when improperly optimized (ie: changing a
"likely", altering branch prediction, or cache walk patterns when using
arrays to evaluate character classes faster). But for general usage I
indeed think it should be OK.

> > All this to say that while I can definitely imagine the benefits of
> > having in-kernel sockets for in-kernel L7 processing or filtering,
> > I'm having strong doubts about the benefits that userland may receive
> > by using this (or maybe you already have any performance numbers
> > supporting this ?).
> >
> Nope, no numbers yet.

OK, no worries. Thanks for your explanations!

Willy

Re: [PATCH RFC 0/2] kproxy: Kernel Proxy

2017-06-29 Thread Tom Herbert

Hi Willy,

Thanks for the comments!

> In fact that's not much what I observe in field. In practice, large
> data streams are cheaply relayed using splice(), I could achieve
> 60 Gbps of HTTP forwarding via HAProxy on a 4-core xeon 2 years ago.
> And when you use SSL, the cost of the copy to/from kernel is small
> compared to all the crypto operations surrounding this.
>
Right, getting rid of the extra crypto operations and so called "SSL
inspection" is the ultimate goal this is going towards.

> Another point is that most HTTP requests are quite small (typically ~80%
> 20kB or less), and in this case the L7 processing and certain syscalls
> significantly dominate the operations, data copies are comparatively
> small. Simply parsing a HTTP header takes time (when you do it correctly).
> You can hardly parse and index more than 800MB-1GB/s of HTTP headers
> per core, which limits you to roughly 1-1.2 M req+resp per second for
> a 400 byte request and a 400 byte response, and that's without any
> processing at all. But when doing this, certain syscalls like connect(),
> close() or epollctl() start to be quite expensive. Even splice() is
> expensive to forward small data chunks because you need two calls, and
> recv+send is faster. In fact our TCP stack has been so much optimized
> for realistic workloads over the years that it becomes hard to gain
> more by cheating on it :-)
>
> In the end in haproxy I'm seeing about 300k req+resp per second in
> HTTP keep-alive and more like 100-130k with close, when disabling
> TCP quick-ack during accept() and connect() to save one ACK on each
> side (just doing this generally brings performance gains between 7
> and 10%).
>
HTTP is only one use case. The are other interesting use cases such as
those in container security where the application protocol might be
something like simple RPC. Performance is relevant because we
potentially want security applied to every message in every
communication in a containerized data center. Putting the userspace
hop in the datapath of every packet is know to be problematic, not
just for the performance hit  also because it increases the attack
surface on users' privacy.

> Regarding kernel-side protocol parsing, there's an unfortunate trend
> at moving more and more protocols to userland due to these protocols
> evolving very quickly. At least you'll want to find a way to provide
> these parsers from userspace, which will inevitably come with its set
> of problems or limitations :-/
>
That's why everything is going BPF now ;-)

> All this to say that while I can definitely imagine the benefits of
> having in-kernel sockets for in-kernel L7 processing or filtering,
> I'm having strong doubts about the benefits that userland may receive
> by using this (or maybe you already have any performance numbers
> supporting this ?).
>
Nope, no numbers yet.

> Just my two cents,
> Willy

[GIT] Networking

2017-06-29 Thread David Miller


1) Need to access netdev->num_rx_queues behind an accessor in netvsc
   driver otherwise the build breaks with some configs, from Arnd Bergmann.

2) Add dummy xfrm_dev_event() so that build doesn't fail when
   CONFIG_XFRM_OFFLOAD is not set.  From Hangbin Liu.

3) Don't OOPS when pfkey_msg2xfrm_state() signals an erros, from
   Dan Carpenter.

4) Fix MCDI command size for filter operations in sfc driver, from
   Martin Habets.

5) Fix UFO segmenting so that we don't calculate incorrect checksums,
   from Michal Kubecek.

6) When ipv6 datagram connects fail, reset destination address and
   port.  From Wei Wang.

7) TCP disconnect must reset the cached receive DST, from WANG Cong.

8) Fix sign extension bug on 32-bit in dev_get_stats(), from Eric
   Dumazet.

9) fman driver has to depend on HAS_DMA, from Madalin Bucur.

10) Fix bpf pointer leak with xadd in verifier, from Daniel Borkmann.

11) Fix negative page counts with GFO, from Michal Kubecek.

Please pull, thanks a lot!

The following changes since commit 48b6bbef9a1789f0365c1a385879a1fea4460016:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-06-21 
12:40:20 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to d58299a478c416c0b48e4b31c6332fe7beb63000:

  sfc: fix attempt to translate invalid filter ID (2017-06-29 15:59:38 -0400)


Andrew F. Davis (1):
  net: usb: asix88179_178a: Add support for the Belkin B2B128

Arnd Bergmann (1):
  netvsc: don't access netdev->num_rx_queues directly

Colin Ian King (1):
  Trivial fix to spelling mistake in arc_printk message

Dan Carpenter (3):
  xfrm: Oops on error in pfkey_msg2xfrm_state()
  xfrm: NULL dereference on allocation failure
  rocker: move dereference before free

Daniel Borkmann (1):
  bpf: prevent leaking pointer via xadd on unpriviledged

David S. Miller (4):
  Merge branch 'macvlan-Fix-some-issues-with-changing-mac-addresses'
  Merge branch 'master' of git://git.kernel.org/.../klassert/ipsec
  Merge branch 'bnxt_en-fixes'
  Merge branch 'arcnet-fixes'

Edward Cree (1):
  sfc: fix attempt to translate invalid filter ID

Eric Dumazet (1):
  net: prevent sign extension in dev_get_stats()

Gao Feng (1):
  net: sched: Fix one possible panic when no destroy callback

Hangbin Liu (2):
  xfrm: fix xfrm_dev_event() missing when compile without 
CONFIG_XFRM_OFFLOAD
  xfrm: move xfrm_garbage_collect out of xfrm_policy_flush

Ido Schimmel (1):
  mlxsw: spectrum_router: Fix NULL pointer dereference

Jason Wang (1):
  virtio-net: serialize tx routine during reset

Lokesh Vutla (1):
  drivers: net: cpsw-common: Fix reading of mac address for am43 SoCs

Madalin Bucur (1):
  fsl/fman: add dependency on HAS_DMA

Martin Habets (1):
  sfc: Fix MCDI command size for filter operations

Michael Chan (2):
  bnxt_en: Add missing logic to handle TPA end error conditions.
  bnxt_en: Fix netpoll handling.

Michael Grzeschik (4):
  arcnet: change irq handler to lock irqsave
  arcnet: com20020: remove needless base_addr assignment
  arcnet: com20020-pci: fix dev_id calculation
  arcnet: com20020-pci: add missing pdev setup in netdev structure

Michal Kubeček (2):
  net: account for current skb length when deciding about UFO
  net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()

Mintz, Yuval (1):
  bnx2x: Don't log mc removal needlessly

Richard Cochran (1):
  net: dp83640: Avoid NULL pointer dereference.

Serhey Popovych (1):
  veth: Be more robust on network device creation when no attributes

Vlad Yasevich (4):
  macvlan: Do not return error when setting the same mac address
  macvlan: Fix passthru macvlan mac address inheritance
  macvlan: convert port passthru to flags.
  macvlan: Let passthru macvlan correctly restore lower mac address

WANG Cong (4):
  ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
  ipv6: avoid unregistering inet6_dev for loopback
  sit: use __GFP_NOWARN for user controlled allocation
  tcp: reset sk_rx_dst in tcp_disconnect()

Wei Liu (1):
  xen-netback: correctly schedule rate-limited queues

Wei Wang (1):
  net: ipv6: reset daddr and dport in sk if connect() fails

Yossi Kuperman (2):
  xfrm6: Fix IPv6 payload_len in xfrm6_transport_finish
  esp6_offload: Fix IP6CB(skb)->nhoff for ESP GRO

Zach Brown (1):
  net/phy: micrel: configure intterupts after autoneg workaround

 drivers/net/arcnet/arcnet.c   |  7 ---
 drivers/net/arcnet/capmode.c  |  2 +-
 drivers/net/arcnet/com20020-pci.c |  6 --
 drivers/net/arcnet/com20020.c |  2 --
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  |  2 +-

Re: BUG: Dentry ffff9f795a08fe60{i=af565f,n=lo} still in use (1) [unmount of proc proc]

2017-06-29 Thread Andrei Vagin

On Thu, Jun 29, 2017 at 12:06 PM, Eric W. Biederman
 wrote:
> Andrei Vagin  writes:
>
>> Hello,
>>
>> We run CRIU tests on linus' tree and today we found this issue.
>>
>> CRIU tests are the set of small programs to check checkpoint/restore
>> of different primitives (files, sockets, signals, pipes, etc).
>> https://github.com/xemul/criu/tree/master/test
>>
>> Each test is executed three times: without namespaces, in a set of all
>> namespaces except userns, in a set of all namespaces. When a test
>> passed the preparation tests, it sends a signal to an executer, and
>> then the executer dumps and restores tests processes, and sends a
>> signal to the test back to check that everything are restored
>> correctly.
>
> I am not certain what you are saying, and you seem to have Cc'd
> every list except the netdev and netfilter lists that are needed
> to deal with this.
>
> Are you saing that the change from Liping Zhang is needed? Or are you
> saying that change introduces the problem below?

Hi Eric,

Here I tried to explain our usecase. I don't know which changes in the
kernel affect this issue.

Actually I reported about the similar problem a few month ago on the linux-next:
https://lkml.org/lkml/2017/3/10/1586

So I don't think that the change from Liping Zhang affects this issue
somehow. I mentioned it just to describe what kernel we used.

And I don't know how to reproduce the issue. You can see from the
kernel log, that the kernel worked for more than 6 hours in out case.
During this perioud we run all our tests a few times, so I think there
is a kind of race.

>
> I could not find the mentioned commits.  Are the in Linus's tree or
> someone's next tree that feeds into linux-next?

Here is the patch from Liping Zhang
https://patchwork.ozlabs.org/patch/770887/

The second mentioned commit is HEAD of the master branch in Linus' tree:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6474924e2b5ddb0030c38966adcbe3b49022

Thanks,
Andrei

>
> Eric
>
>> [root@zdtm linux]# git describe HEAD~1
>> v4.12-rc7-25-g6474924
>>
>> And there is one more patch from the netfilter tree:
>> commit b216759c0cb5d37d1eec3cd5b67ba38bace94fd8
>> Author: Liping Zhang
>> Date:   Sun Jun 4 19:17:34 2017 +0800
>>
>> netfilter: nf_ct_dccp/sctp: fix memory leak after netns cleanup
>>
>> [root@zdtm linux]# uname -a
>> Linux zdtm.openvz.org 4.12.0-rc7+ #9 SMP Thu Jun 29 08:28:18 CEST 2017
>> x86_64 x86_64 x86_64 GNU/Linux
>>
>> A kernel config is attached.
>>
>>
>> [22458.504137] BUG: Dentry 9f795a08fe60{i=af565f,n=lo}  still in
>> use (1) [unmount of proc proc]
>> [22458.505117] [ cut here ]
>> [22458.505299] WARNING: CPU: 0 PID: 15036 at fs/dcache.c:1445
>> umount_check+0x66/0x80
>> [22458.505564] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace
>> sunrpc macvlan tun veth nf_conntrack_netlink xt_mark udp_diag tcp_diag
>> inet_diag netlink_diag af_packet_diag unix_diag binfmt_misc
>> ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink
>> ebtable_broute bridge stp llc ebtable_nat ip6table_security
>> ip6table_mangle ip6table_raw ip6table_nat nf_conntrack_ipv6
>> nf_defrag_ipv6 nf_nat_ipv6 iptable_security iptable_mangle iptable_raw
>> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
>> nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables btrfs
>> xor raid6_pq loop ppdev crct10dif_pclmul crc32_pclmul
>> ghash_clmulni_intel lpc_ich sbs virtio_balloon shpchp sbshc parport_pc
>> parport tpm_tis tpm_tis_core tpm xfs libcrc32c crc32c_intel
>> ata_generic serio_raw pata_acpi
>> [22458.507586]  virtio_pci virtio virtio_ring e1000
>> [22458.507771] CPU: 0 PID: 15036 Comm: kworker/0:2 Not tainted 4.12.0-rc7+ #9
>> [22458.507830] systemd-journald[561]: Compressed data object 807 ->
>> 605 using LZ4
>> [22458.508184] Hardware name: Parallels Software International Inc.
>> Parallels Virtual Platform/Parallels Virtual Platform, BIOS
>> 6.12.26068.1232434 02/27/2017
>> [22458.508641] Workqueue: events proc_cleanup_work
>> [22458.508848] task: 9f797be8 task.stack: b2b8825f
>> [22458.509033] RIP: 0010:umount_check+0x66/0x80
>> [22458.509172] RSP: 0018:b2b8825f3c68 EFLAGS: 00010286
>> [22458.509363] RAX: 0054 RBX: 9f798492afa0 RCX: 
>> 
>> [22458.509589] RDX:  RSI: 9f79bce0e388 RDI: 
>> 9f79bce0e388
>> [22458.509823] RBP: b2b8825f3c70 R08:  R09: 
>> 
>> [22458.510022] R10: b2b8825f3bf0 R11:  R12: 
>> 9f795a08fe60
>> [22458.510217] R13: 9f795a08ff50 R14: 9f795a08fee0 R15: 
>> 9f798492afa0
>> [22458.510456] FS:  () GS:9f79bce0()
>> knlGS:
>> [22458.510726] CS:  0010 DS:  ES:  CR0: 80050033
>> [22458.510953] CR2: 7f834657a6b0 CR3: 602a3000 CR4: 
>> 06f0
>>

Re: [PATCH NET V7 0/2] Add loopback support in phy_driver and hns ethtool fix

2017-06-29 Thread David Miller

From: Andrew Lunn 
Date: Thu, 29 Jun 2017 20:22:16 +0200

Is less broken a sufficient criteria for acceptance?

Sometimes, it depends upon the situation.

If you are continuing to resolve those issues, I'll wait and
watch for future respins then.

1 2 3 >

1 - 100 of 229 matches

Mail list logo