date:20150820

Re: Exporting obscene amounts of data in rtnl_link_ops-fill_info()

2015-08-20 Thread Scott Feldman

On Wed, Aug 19, 2015 at 5:47 PM, Jason A. Donenfeld ja...@zx2c4.com wrote:
 Hi guys,

 I have a new link driver that registers a rtnl_link_ops. For many
 things, the rtnl interfaces are perfectly suited: I can use netlink in
 userspace to check out packet counts, adjust interface parameters, and
 all sorts of things. There is even the fill_info function exporting
 interface-specific types of data to userspace through the standard
 netlink interfaces. I'm glad this is here, because it's exactly what I
 want.

 Problem: sometimes I want to export a *lot* of data to userspace. When
 this happens, even if I make the netlink socket receive buffer really
 huge, this code path is still reached in rtnetlink.c:

 err = rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK,
NETLINK_CB(cb-skb).portid,
cb-nlh-nlmsg_seq, 0,
NLM_F_MULTI,
ext_filter_mask);
 /* If we ran out of room on the first message,
  * we're in trouble
  */
 WARN_ON((err == -EMSGSIZE)  (skb-len == 0));

 That is -- it tries to fill the skb (for sending it back to
 userspace), but doesn't have enough room, so it returns -EMSGSIZE.
 That seems like reasonable behavior, but it doesn't really help me
 obtain my goal. I'd like to send quite a bit of data back to userspace
 for a network interface, and I'd like to do it using the standard
 netlink APIs. Is this possible?



What kind of data are you sending up?  Maybe there is an alternate interface.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 3/9] tunnel: introduce udp_tun_rx_dst()

2015-08-20 Thread Thomas Graf

On 08/17/15 at 02:11pm, Pravin B Shelar wrote:
 Introduce function udp_tun_rx_dst() to initialize tunnel dst on
 receive path.
 
 Signed-off-by: Pravin B Shelar pshe...@nicira.com
 Reviewed-by: Jesse Gross je...@nicira.com

This looks great but conflicts with Jiri Benc's IPv6 series. Can we
rebase this on top of his work so we get IPv6 support in the new
helpers from the beginning?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 4/9] geneve: Make dst-port configurable.

2015-08-20 Thread Thomas Graf

On 08/17/15 at 02:11pm, Pravin B Shelar wrote:
 @@ -403,6 +416,7 @@ static size_t geneve_get_size(const struct net_device 
 *dev)
   nla_total_size(sizeof(struct in_addr)) + /* IFLA_GENEVE_REMOTE 
 */
   nla_total_size(sizeof(__u8)) +  /* IFLA_GENEVE_TTL */
   nla_total_size(sizeof(__u8)) +  /* IFLA_GENEVE_TOS */
 + nla_total_size(sizeof(__u16)) +  /* IFLA_GENEVE_PORT */
   0;
  }
  
 @@ -423,6 +437,9 @@ static int geneve_fill_info(struct sk_buff *skb, const 
 struct net_device *dev)
   nla_put_u8(skb, IFLA_GENEVE_TOS, geneve-tos))
   goto nla_put_failure;
  
 + if (nla_put_u32(skb, IFLA_GENEVE_PORT, ntohs(geneve-dst_port)))
 + goto nla_put_failure;

nla_put_u16?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] tcp: fix slow start after idle vs TSO/GSO

2015-08-20 Thread Eric Dumazet

From: Eric Dumazet eduma...@google.com

slow start after idle might reduce cwnd, but we perform this
after first packet was cooked and sent.

With TSO/GSO, it means that we might send a full TSO packet
even if cwnd should have been reduced to IW10.

Moving the SSAI check in skb_entail() makes sense, because
we slightly reduce number of times this check is done,
especially for large send() and TCP Small queue callbacks from
softirq context.

Tested:

Following packetdrill test demonstrates the problem
// Test of slow start after idle

`sysctl -q net.ipv4.tcp_slow_start_after_idle=1`

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0bind(3, ..., ...) = 0
+0listen(3, 1) = 0

+0 S 0:0(0) win 65535 mss 1000,sackOK,nop,nop,nop,wscale 7
+0 S. 0:0(0) ack 1 mss 1460,nop,nop,sackOK,nop,wscale 6
+.100  . 1:1(0) ack 1 win 511
+0accept(3, ..., ...) = 4
+0setsockopt(4, SOL_SOCKET, SO_SNDBUF, [20], 4) = 0

+0write(4, ..., 26000) = 26000
+0 . 1:5001(5000) ack 1
+0 . 5001:10001(5000) ack 1
+0%{ assert tcpi_snd_cwnd == 10 }%

+.100  . 1:1(0) ack 10001 win 511
+0%{ assert tcpi_snd_cwnd == 20, tcpi_snd_cwnd }%
+0 . 10001:20001(1) ack 1
+0 P. 20001:26001(6000) ack 1

+.100  . 1:1(0) ack 26001 win 511
+0%{ assert tcpi_snd_cwnd == 36, tcpi_snd_cwnd }%

+4 write(4, ..., 2) = 2
// If slow start after idle works properly, we should send 5 MSS here (cwnd/2)
+0 . 26001:31001(5000) ack 1
+0%{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%
+0 . 31001:36001(5000) ack 1

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Neal Cardwell ncardw...@google.com
Cc: Yuchung Cheng ych...@google.com
---
 include/net/tcp.h |1 +
 net/ipv4/tcp.c|8 
 net/ipv4/tcp_output.c |   12 
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 364426a..639f64e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1165,6 +1165,7 @@ static inline void tcp_sack_reset(struct 
tcp_options_received *rx_opt)
 }
 
 u32 tcp_default_init_rwnd(u32 mss);
+void tcp_cwnd_restart(struct sock *sk, s32 delta);
 
 /* Determine a window scaling and initial window to offer. */
 void tcp_select_initial_window(int __space, __u32 mss, __u32 *rcv_wnd,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 45534a5..e228433 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -627,6 +627,14 @@ static void skb_entail(struct sock *sk, struct sk_buff 
*skb)
sk_mem_charge(sk, skb-truesize);
if (tp-nonagle  TCP_NAGLE_PUSH)
tp-nonagle = ~TCP_NAGLE_PUSH;
+
+   if (sysctl_tcp_slow_start_after_idle 
+   sk-sk_write_queue.next == skb) {
+   s32 delta = tcp_time_stamp - tp-lsndtime;
+
+   if (delta  inet_csk(sk)-icsk_rto)
+   tcp_cwnd_restart(sk, delta);
+   }
 }
 
 static inline void tcp_mark_urg(struct tcp_sock *tp, int flags)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 444ab5b..1188e4f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -137,12 +137,12 @@ static __u16 tcp_advertise_mss(struct sock *sk)
 }
 
 /* RFC2861. Reset CWND after idle period longer RTO to restart window.
- * This is the first part of cwnd validation mechanism. */
-static void tcp_cwnd_restart(struct sock *sk, const struct dst_entry *dst)
+ * This is the first part of cwnd validation mechanism.
+ */
+void tcp_cwnd_restart(struct sock *sk, s32 delta)
 {
struct tcp_sock *tp = tcp_sk(sk);
-   s32 delta = tcp_time_stamp - tp-lsndtime;
-   u32 restart_cwnd = tcp_init_cwnd(tp, dst);
+   u32 restart_cwnd = tcp_init_cwnd(tp, __sk_dst_get(sk));
u32 cwnd = tp-snd_cwnd;
 
tcp_ca_event(sk, CA_EVENT_CWND_RESTART);
@@ -164,10 +164,6 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
struct inet_connection_sock *icsk = inet_csk(sk);
const u32 now = tcp_time_stamp;
 
-   if (sysctl_tcp_slow_start_after_idle 
-   (!tp-packets_out  (s32)(now - tp-lsndtime)  icsk-icsk_rto))
-   tcp_cwnd_restart(sk, __sk_dst_get(sk));
-
tp-lsndtime = now;
 
/* If it is a reply for ato after last received


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] usbnet: dereference after null check in usbnet_start_xmit() and __usbnet_read_cmd()

2015-08-20 Thread Bjørn Mork

Vivek Kumar Bhagat vivek.bha...@samsung.com writes:

 Dear Bjorn,

This is wrong.  There are usbnet minidrivers depending on info-tx_fixup
 being called with a NULL skb.
 Also, if dev_hard_start_xmit() ensures that skb can not be NULL in 
 usbnet_start_xmit()
 then we should remove below check.
 if (skb)  --- This check is confusing which says skb can be NULL.
 skb_tx_timestamp(skb); 


No, that test is there because of the ugly hack in cdc_ncm.  It doesn't
go through dev_hard_start_xmit(), but calls usbnet_start_xmit() directly
with a NULL skb as a signal to itself.  Yes, I told you it was ugly ;)

I do agree that it would be nice to make this go away.  But until that
happens usbnet_start_xmit() has to deal with NULL skbs, forwarding them
to the tx_fixup hook.


Bjørn
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] bridge: Enable configuration of ageing interval for bridges and switch devices.

2015-08-20 Thread Premkumar Jonnala

 -Original Message-
 From: Michal Kubecek [mailto:mkube...@suse.cz]
 Sent: Thursday, August 20, 2015 12:00 PM
 To: Premkumar Jonnala
 Cc: Wilson, Daniel G; Scott Feldman; netdev@vger.kernel.org
 Subject: Re: [PATCH] bridge: Enable configuration of ageing interval for 
 bridges
 and switch devices.

 On Thu, Aug 20, 2015 at 05:08:51AM +, Premkumar Jonnala wrote:
   From: Wilson, Daniel G [mailto:daniel.wil...@intel.com]

Can you extend bridge command to allow setting/getting these bridge
 attrs?
Looks like you construct a RTM_NEWLINK IFLA_INFO_DATA msg.  No
 changes
needed to the kernel.

bridge link set dev br0 ageing_time 1000

 --or--

ip link set dev br0 type bridge ageing_time 1000

   Being able to set these attributes via both bridge and ip would be great.

  IMHO, we should choose only one command.  Otherwise, we'd have to
  spend effort in trying to keep both the commands in sync.

 As long as they are using the same netlink interface, I don't think it's
 a serious problem. After all, there will be also other tools (wicked,
 perhaps systemd-networkd) setting it directly via netlink rather than
 calling either ip or bridge.

  My vote would be for the bridge command - since the options/parameters
  are related to bridges.  If there is no objection, I'll move all the
  bridge options from 'ip link' command to 'bridge' command.

 This would break existing scripts using ip to set the parameter. Is the
 possibility to use any of the two really that bad?

There was another email on this thread where Scott indicated existence of other 
commands
where both ip and bridge are available, and they are for the same function.

I will keep both the ip and bridge commands, and try to share the underlying 
code as much as possible.

-Prem

  Michal Kubecek

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] netlink: mmap: fix status setting in skb destructor

2015-08-20 Thread Ken-ichirou MATSUZAWA

I don't know the intension of setting VALID status in the skb
destructor. But I think it need to be set UNUSED status in case of
error then release skb, or rx ring might be filled with RESERVED
frames.

Signed-off-by: Ken-ichirou MATSUZAWA cha...@h4.dion.ne.jp
---
 net/netlink/af_netlink.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index e6134f4..85ccd8b 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -864,7 +864,7 @@ static void netlink_skb_destructor(struct sk_buff *skb)
} else {
if (!(NETLINK_CB(skb).flags  NETLINK_SKB_DELIVERED)) {
hdr-nm_len = 0;
-   netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
+   netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
}
ring = nlk_sk(sk)-rx_ring;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment

2015-08-20 Thread Grumbach, Emmanuel



On 08/20/2015 10:21 AM, Grumbach, Emmanuel wrote:
 
 
 On 08/19/2015 11:39 PM, Eric Dumazet wrote:
 On Wed, 2015-08-19 at 19:17 +, Grumbach, Emmanuel wrote:

 Hm.. how would net/core/tso.c avoid this?

 Because a driver using these helpers keep around the original LSO packet
 and frees it normally at TX completion time.

 I can't see anything related to truesize there.
 Note that this work since it is guaranteed that we release the skbs in
 order.


 (BTW TCP packets do not have sock_wfree as destructor but tcp_wfree(),
 yet we want backpressure mostly for TCP stack (TCP Small Queues))



 I am not sure I follow here.
 You want me to test:
 if (skb_gso-destructor == tcp_wfree) ?


 Yes.

 Look for example at tcp_gso_segment() (called from skb_gso_segment())

 copy_destructor = gso_skb-destructor == tcp_wfree;
 ...
 /* Following permits TCP Small Queues to work well with GSO :
  * The callback to TCP stack will be called at the time last frag
  * is freed at TX completion, and not right now when gso_skb
  * is freed by GSO engine
  */
 if (copy_destructor) {
 swap(gso_skb-sk, skb-sk);
 swap(gso_skb-destructor, skb-destructor);
 sum_truesize += skb-truesize;
 atomic_add(sum_truesize - gso_skb-truesize,
skb-sk-sk_wmem_alloc);
 }



 I checked that code using iperf and saw that I don't get into this if,
 but I (probably wrongly) assumed that other applications would set a
 flag on the socket (forgive my ignorance) that would make this if be taken.

 If you do not see skb-destructor == tcp_wfree, then something is
 definitely wrong on your setup.

 
 tcp_wfree isn't exported. I can change that. It will be a challenge for
 backport though. Hm
 

But you seem to say that gso_skb-destructor *should* point to
tcp_wfree, so maybe testing gso_skb-destructor isn't NULL is good enough?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2.2 21/22] fjes: handle receive cancellation request interrupt

2015-08-20 Thread Taku Izumi

This patch adds implementation of handling IRQ
of other receiver's receive cancellation request.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes_main.c | 78 
 1 file changed, 78 insertions(+)

diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index faaf2ed..ba7e607 100644
--- a/drivers/net/fjes/fjes_main.c
+++ b/drivers/net/fjes/fjes_main.c
@@ -822,6 +822,74 @@ static int fjes_vlan_rx_kill_vid(struct net_device *netdev,
return 0;
 }
 
+static void fjes_txrx_stop_req_irq(struct fjes_adapter *adapter,
+  int src_epid)
+{
+   struct fjes_hw *hw = adapter-hw;
+   enum ep_partner_status status;
+
+   status = fjes_hw_get_partner_ep_status(hw, src_epid);
+   switch (status) {
+   case EP_PARTNER_UNSHARE:
+   case EP_PARTNER_COMPLETE:
+   default:
+   break;
+   case EP_PARTNER_WAITING:
+   if (src_epid  hw-my_epid) {
+   hw-ep_shm_info[src_epid].tx.info-v1i.rx_status |=
+   FJES_RX_STOP_REQ_DONE;
+
+   clear_bit(src_epid, hw-txrx_stop_req_bit);
+   set_bit(src_epid, adapter-unshare_watch_bitmask);
+
+   if (!work_pending(adapter-unshare_watch_task))
+   queue_work(adapter-control_wq,
+  adapter-unshare_watch_task);
+   }
+   break;
+   case EP_PARTNER_SHARED:
+   if (hw-ep_shm_info[src_epid].rx.info-v1i.rx_status 
+   FJES_RX_STOP_REQ_REQUEST) {
+   set_bit(src_epid, hw-epstop_req_bit);
+   if (!work_pending(hw-epstop_task))
+   queue_work(adapter-control_wq,
+  hw-epstop_task);
+   }
+   break;
+   }
+}
+
+static void fjes_stop_req_irq(struct fjes_adapter *adapter, int src_epid)
+{
+   struct fjes_hw *hw = adapter-hw;
+   enum ep_partner_status status;
+
+   set_bit(src_epid, hw-hw_info.buffer_unshare_reserve_bit);
+
+   status = fjes_hw_get_partner_ep_status(hw, src_epid);
+   switch (status) {
+   case EP_PARTNER_WAITING:
+   hw-ep_shm_info[src_epid].tx.info-v1i.rx_status |=
+   FJES_RX_STOP_REQ_DONE;
+   clear_bit(src_epid, hw-txrx_stop_req_bit);
+   /* fall through */
+   case EP_PARTNER_UNSHARE:
+   case EP_PARTNER_COMPLETE:
+   default:
+   set_bit(src_epid, adapter-unshare_watch_bitmask);
+   if (!work_pending(adapter-unshare_watch_task))
+   queue_work(adapter-control_wq,
+  adapter-unshare_watch_task);
+   break;
+   case EP_PARTNER_SHARED:
+   set_bit(src_epid, hw-epstop_req_bit);
+
+   if (!work_pending(hw-epstop_task))
+   queue_work(adapter-control_wq, hw-epstop_task);
+   break;
+   }
+}
+
 static void fjes_update_zone_irq(struct fjes_adapter *adapter,
 int src_epid)
 {
@@ -844,6 +912,16 @@ static irqreturn_t fjes_intr(int irq, void *data)
if (icr  REG_ICTL_MASK_RX_DATA)
fjes_rx_irq(adapter, icr  REG_IS_MASK_EPID);
 
+   if (icr  REG_ICTL_MASK_DEV_STOP_REQ)
+   fjes_stop_req_irq(adapter, icr  REG_IS_MASK_EPID);
+
+   if (icr  REG_ICTL_MASK_TXRX_STOP_REQ)
+   fjes_txrx_stop_req_irq(adapter, icr  REG_IS_MASK_EPID);
+
+   if (icr  REG_ICTL_MASK_TXRX_STOP_DONE)
+   fjes_hw_set_irqmask(hw,
+   REG_ICTL_MASK_TXRX_STOP_DONE, true);
+
if (icr  REG_ICTL_MASK_INFO_UPDATE)
fjes_update_zone_irq(adapter, icr  REG_IS_MASK_EPID);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2.2 06/22] fjes: buffer address regist/unregistration routine

2015-08-20 Thread Taku Izumi

This patch adds buffer address regist/unregistration routine.

This function is mainly invoked when network device's
activation (open) and deactivation (close)
in order to retist/unregist shared buffer address.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes_hw.c | 186 +
 drivers/net/fjes/fjes_hw.h |   9 ++-
 2 files changed, 194 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c
index 3fbe68e..5f43957 100644
--- a/drivers/net/fjes/fjes_hw.c
+++ b/drivers/net/fjes/fjes_hw.c
@@ -452,6 +452,192 @@ int fjes_hw_request_info(struct fjes_hw *hw)
return result;
 }
 
+int fjes_hw_register_buff_addr(struct fjes_hw *hw, int dest_epid,
+  struct ep_share_mem_info *buf_pair)
+{
+   union fjes_device_command_req *req_buf = hw-hw_info.req_buf;
+   union fjes_device_command_res *res_buf = hw-hw_info.res_buf;
+   enum fjes_dev_command_response_e ret;
+   int i, idx;
+   int page_count;
+   void *addr;
+   int timeout;
+   int result;
+
+   if (test_bit(dest_epid, hw-hw_info.buffer_share_bit))
+   return 0;
+
+   memset(req_buf, 0, hw-hw_info.req_buf_size);
+   memset(res_buf, 0, hw-hw_info.res_buf_size);
+
+   req_buf-share_buffer.length = FJES_DEV_COMMAND_SHARE_BUFFER_REQ_LEN(
+   buf_pair-tx.size,
+   buf_pair-rx.size);
+   req_buf-share_buffer.epid = dest_epid;
+
+   idx = 0;
+   req_buf-share_buffer.buffer[idx++] = buf_pair-tx.size;
+   page_count = buf_pair-tx.size / EP_BUFFER_INFO_SIZE;
+   for (i = 0; i  page_count; i++) {
+   addr = ((u8 *)(buf_pair-tx.buffer)) +
+   (i * EP_BUFFER_INFO_SIZE);
+   req_buf-share_buffer.buffer[idx++] =
+   (__le64)(page_to_phys(vmalloc_to_page(addr)) +
+   offset_in_page(addr));
+   }
+
+   req_buf-share_buffer.buffer[idx++] = buf_pair-rx.size;
+   page_count = buf_pair-rx.size / EP_BUFFER_INFO_SIZE;
+   for (i = 0; i  page_count; i++) {
+   addr = ((u8 *)(buf_pair-rx.buffer)) +
+   (i * EP_BUFFER_INFO_SIZE);
+   req_buf-share_buffer.buffer[idx++] =
+   (__le64)(page_to_phys(vmalloc_to_page(addr)) +
+   offset_in_page(addr));
+   }
+
+   res_buf-share_buffer.length = 0;
+   res_buf-share_buffer.code = 0;
+
+   ret = fjes_hw_issue_request_command(hw, FJES_CMD_REQ_SHARE_BUFFER);
+
+   timeout = FJES_COMMAND_REQ_BUFF_TIMEOUT * 1000;
+   while ((ret == FJES_CMD_STATUS_NORMAL) 
+  (res_buf-share_buffer.length ==
+   FJES_DEV_COMMAND_SHARE_BUFFER_RES_LEN) 
+  (res_buf-share_buffer.code == FJES_CMD_REQ_RES_CODE_BUSY) 
+  (timeout  0)) {
+   msleep(200 + hw-my_epid * 20);
+   timeout -= (200 + hw-my_epid * 20);
+
+   res_buf-share_buffer.length = 0;
+   res_buf-share_buffer.code = 0;
+
+   ret = fjes_hw_issue_request_command(
+   hw, FJES_CMD_REQ_SHARE_BUFFER);
+   }
+
+   result = 0;
+
+   if (res_buf-share_buffer.length !=
+   FJES_DEV_COMMAND_SHARE_BUFFER_RES_LEN)
+   result = -ENOMSG;
+   else if (ret == FJES_CMD_STATUS_NORMAL) {
+   switch (res_buf-share_buffer.code) {
+   case FJES_CMD_REQ_RES_CODE_NORMAL:
+   result = 0;
+   set_bit(dest_epid, hw-hw_info.buffer_share_bit);
+   break;
+   case FJES_CMD_REQ_RES_CODE_BUSY:
+   result = -EBUSY;
+   break;
+   default:
+   result = -EPERM;
+   break;
+   }
+   } else {
+   switch (ret) {
+   case FJES_CMD_STATUS_UNKNOWN:
+   result = -EPERM;
+   break;
+   case FJES_CMD_STATUS_TIMEOUT:
+   result = -EBUSY;
+   break;
+   case FJES_CMD_STATUS_ERROR_PARAM:
+   case FJES_CMD_STATUS_ERROR_STATUS:
+   default:
+   result = -EPERM;
+   break;
+   }
+   }
+
+   return result;
+}
+
+int fjes_hw_unregister_buff_addr(struct fjes_hw *hw, int dest_epid)
+{
+   union fjes_device_command_req *req_buf = hw-hw_info.req_buf;
+   union fjes_device_command_res *res_buf = hw-hw_info.res_buf;
+   struct fjes_device_shared_info *share = hw-hw_info.share;
+   enum fjes_dev_command_response_e ret;
+

[PATCH v2.2 01/22] fjes: Introduce FUJITSU Extended Socket Network Device driver

2015-08-20 Thread Taku Izumi

This patch adds the basic code of FUJITSU Extended Socket
Network Device driver.

When PNP0C02 is found in ACPI DSDT, it evaluates _STR
to check if PNP0C02 is for Extended Socket device driver
and retrieves ACPI resource information. Then creates
platform_device.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/Kconfig  |   7 ++
 drivers/net/Makefile |   2 +
 drivers/net/fjes/Makefile|  31 +++
 drivers/net/fjes/fjes.h  |  33 +++
 drivers/net/fjes/fjes_main.c | 214 +++
 5 files changed, 287 insertions(+)
 create mode 100644 drivers/net/fjes/Makefile
 create mode 100644 drivers/net/fjes/fjes.h
 create mode 100644 drivers/net/fjes/fjes_main.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c18f9e6..c78a81a 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -407,6 +407,13 @@ config VMXNET3
  To compile this driver as a module, choose M here: the
  module will be called vmxnet3.
 
+config FUJITSU_ES
+   tristate FUJITSU Extended Socket Network Device driver
+   depends on ACPI
+   help
+ This driver provides support for Extended Socket network device
+  on Extended Partitioning of FUJITSU PRIMEQUEST 2000 E2 series.
+
 source drivers/net/hyperv/Kconfig
 
 endif # NETDEVICES
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index c12cb22..677c7b4 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -67,3 +67,5 @@ obj-$(CONFIG_USB_NET_DRIVERS) += usb/
 
 obj-$(CONFIG_HYPERV_NET) += hyperv/
 obj-$(CONFIG_NTB_NETDEV) += ntb_netdev.o
+
+obj-$(CONFIG_FUJITSU_ES) += fjes/
diff --git a/drivers/net/fjes/Makefile b/drivers/net/fjes/Makefile
new file mode 100644
index 000..98e59cb
--- /dev/null
+++ b/drivers/net/fjes/Makefile
@@ -0,0 +1,31 @@
+
+#
+# FUJITSU Extended Socket Network Device driver
+# Copyright (c) 2015 FUJITSU LIMITED
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# You should have received a copy of the GNU General Public License along with
+# this program; if not, see http://www.gnu.org/licenses/.
+#
+# The full GNU General Public License is included in this distribution in
+# the file called COPYING.
+#
+
+
+
+#
+# Makefile for the FUJITSU Extended Socket network device driver
+#
+
+obj-$(CONFIG_FUJITSU_ES) += fjes.o
+
+fjes-objs := fjes_main.o
+
diff --git a/drivers/net/fjes/fjes.h b/drivers/net/fjes/fjes.h
new file mode 100644
index 000..4622da1
--- /dev/null
+++ b/drivers/net/fjes/fjes.h
@@ -0,0 +1,33 @@
+/*
+ *  FUJITSU Extended Socket Network Device driver
+ *  Copyright (c) 2015 FUJITSU LIMITED
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, see http://www.gnu.org/licenses/.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called COPYING.
+ *
+ */
+
+#ifndef FJES_H_
+#define FJES_H_
+
+#include linux/acpi.h
+
+#define FJES_ACPI_SYMBOL   Extended Socket
+
+extern char fjes_driver_name[];
+extern char fjes_driver_version[];
+extern u32 fjes_support_mtu[];
+
+#endif /* FJES_H_ */
diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
new file mode 100644
index 000..ab4d20c
--- /dev/null
+++ b/drivers/net/fjes/fjes_main.c
@@ -0,0 +1,214 @@
+/*
+ *  FUJITSU Extended Socket Network Device driver
+ *  Copyright (c) 2015 FUJITSU LIMITED
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this

[PATCH v2.2 20/22] fjes: epstop_task

2015-08-20 Thread Taku Izumi

This patch adds epstop_task.
This task is used to process other receiver's
cancellation request.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes_hw.c   | 30 ++
 drivers/net/fjes/fjes_hw.h   |  1 +
 drivers/net/fjes/fjes_main.c |  1 +
 3 files changed, 32 insertions(+)

diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c
index 4588ef3..ada0b9e 100644
--- a/drivers/net/fjes/fjes_hw.c
+++ b/drivers/net/fjes/fjes_hw.c
@@ -23,6 +23,7 @@
 #include fjes.h
 
 static void fjes_hw_update_zone_task(struct work_struct *);
+static void fjes_hw_epstop_task(struct work_struct *);
 
 /* supported MTU list */
 const u32 fjes_support_mtu[] = {
@@ -325,6 +326,7 @@ int fjes_hw_init(struct fjes_hw *hw)
fjes_hw_set_irqmask(hw, REG_ICTL_MASK_ALL, true);
 
INIT_WORK(hw-update_zone_task, fjes_hw_update_zone_task);
+   INIT_WORK(hw-epstop_task, fjes_hw_epstop_task);
 
mutex_init(hw-hw_info.lock);
 
@@ -355,6 +357,7 @@ void fjes_hw_exit(struct fjes_hw *hw)
fjes_hw_cleanup(hw);
 
cancel_work_sync(hw-update_zone_task);
+   cancel_work_sync(hw-epstop_task);
 }
 
 static enum fjes_dev_command_response_e
@@ -1085,3 +1088,30 @@ static void fjes_hw_update_zone_task(struct work_struct 
*work)
}
 }
 
+static void fjes_hw_epstop_task(struct work_struct *work)
+{
+   struct fjes_hw *hw = container_of(work, struct fjes_hw, epstop_task);
+   struct fjes_adapter *adapter = (struct fjes_adapter *)hw-back;
+   int epid_bit;
+   unsigned long remain_bit;
+
+   while ((remain_bit = hw-epstop_req_bit)) {
+   for (epid_bit = 0; remain_bit; remain_bit = 1, epid_bit++) {
+   if (remain_bit  1) {
+   hw-ep_shm_info[epid_bit].
+   tx.info-v1i.rx_status |=
+   FJES_RX_STOP_REQ_DONE;
+
+   clear_bit(epid_bit, hw-epstop_req_bit);
+   set_bit(epid_bit,
+   adapter-unshare_watch_bitmask);
+
+   if (!work_pending(adapter-unshare_watch_task))
+   queue_work(
+   adapter-control_wq,
+   adapter-unshare_watch_task);
+   }
+   }
+   }
+}
+
diff --git a/drivers/net/fjes/fjes_hw.h b/drivers/net/fjes/fjes_hw.h
index fe51041..4d3184a 100644
--- a/drivers/net/fjes/fjes_hw.h
+++ b/drivers/net/fjes/fjes_hw.h
@@ -283,6 +283,7 @@ struct fjes_hw {
unsigned long txrx_stop_req_bit;
unsigned long epstop_req_bit;
struct work_struct update_zone_task;
+   struct work_struct epstop_task;
 
int my_epid;
int max_epid;
diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index 615c1ef..faaf2ed 100644
--- a/drivers/net/fjes/fjes_main.c
+++ b/drivers/net/fjes/fjes_main.c
@@ -316,6 +316,7 @@ static int fjes_close(struct net_device *netdev)
cancel_work_sync(adapter-tx_stall_task);
 
cancel_work_sync(hw-update_zone_task);
+   cancel_work_sync(hw-epstop_task);
 
fjes_hw_wait_epstop(hw);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2.2 16/22] fjes: interrupt_watch_task

2015-08-20 Thread Taku Izumi

This patch adds interrupt_watch_task.
This task is used to prevent delay of interrupts.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes.h  |  5 +
 drivers/net/fjes/fjes_main.c | 40 +++-
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fjes/fjes.h b/drivers/net/fjes/fjes.h
index b04ea9d..1743dbb 100644
--- a/drivers/net/fjes/fjes.h
+++ b/drivers/net/fjes/fjes.h
@@ -32,6 +32,7 @@
 #define FJES_TX_RETRY_TIMEOUT  (100)
 #define FJES_TX_TX_STALL_TIMEOUT   (FJES_TX_RETRY_INTERVAL / 2)
 #define FJES_OPEN_ZONE_UPDATE_WAIT (300) /* msec */
+#define FJES_IRQ_WATCH_DELAY   (HZ)
 
 /* board specific private data structure */
 struct fjes_adapter {
@@ -52,10 +53,14 @@ struct fjes_adapter {
bool irq_registered;
 
struct workqueue_struct *txrx_wq;
+   struct workqueue_struct *control_wq;
 
struct work_struct tx_stall_task;
struct work_struct raise_intr_rxdata_task;
 
+   struct delayed_work interrupt_watch_task;
+   bool interrupt_watch_enable;
+
struct fjes_hw hw;
 };
 
diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index 1bb9347..2bf9f71 100644
--- a/drivers/net/fjes/fjes_main.c
+++ b/drivers/net/fjes/fjes_main.c
@@ -71,7 +71,7 @@ static int fjes_remove(struct platform_device *);
 
 static int fjes_sw_init(struct fjes_adapter *);
 static void fjes_netdev_setup(struct net_device *);
-
+static void fjes_irq_watch_task(struct work_struct *);
 static void fjes_rx_irq(struct fjes_adapter *, int);
 static int fjes_poll(struct napi_struct *, int);
 
@@ -197,6 +197,13 @@ static int fjes_request_irq(struct fjes_adapter *adapter)
struct net_device *netdev = adapter-netdev;
int result = -1;
 
+   adapter-interrupt_watch_enable = true;
+   if (!delayed_work_pending(adapter-interrupt_watch_task)) {
+   queue_delayed_work(adapter-control_wq,
+  adapter-interrupt_watch_task,
+  FJES_IRQ_WATCH_DELAY);
+   }
+
if (!adapter-irq_registered) {
result = request_irq(adapter-hw.hw_res.irq, fjes_intr,
 IRQF_SHARED, netdev-name, adapter);
@@ -213,6 +220,9 @@ static void fjes_free_irq(struct fjes_adapter *adapter)
 {
struct fjes_hw *hw = adapter-hw;
 
+   adapter-interrupt_watch_enable = false;
+   cancel_delayed_work_sync(adapter-interrupt_watch_task);
+
fjes_hw_set_irqmask(hw, REG_ICTL_MASK_ALL, true);
 
if (adapter-irq_registered) {
@@ -297,6 +307,7 @@ static int fjes_close(struct net_device *netdev)
 
fjes_free_irq(adapter);
 
+   cancel_delayed_work_sync(adapter-interrupt_watch_task);
cancel_work_sync(adapter-raise_intr_rxdata_task);
cancel_work_sync(adapter-tx_stall_task);
 
@@ -999,11 +1010,15 @@ static int fjes_probe(struct platform_device *plat_dev)
adapter-open_guard = false;
 
adapter-txrx_wq = create_workqueue(DRV_NAME /txrx);
+   adapter-control_wq = create_workqueue(DRV_NAME /control);
 
INIT_WORK(adapter-tx_stall_task, fjes_tx_stall_task);
INIT_WORK(adapter-raise_intr_rxdata_task,
  fjes_raise_intr_rxdata_task);
 
+   INIT_DELAYED_WORK(adapter-interrupt_watch_task, fjes_irq_watch_task);
+   adapter-interrupt_watch_enable = false;
+
res = platform_get_resource(plat_dev, IORESOURCE_MEM, 0);
hw-hw_res.start = res-start;
hw-hw_res.size = res-end - res-start + 1;
@@ -1044,8 +1059,11 @@ static int fjes_remove(struct platform_device *plat_dev)
struct fjes_adapter *adapter = netdev_priv(netdev);
struct fjes_hw *hw = adapter-hw;
 
+   cancel_delayed_work_sync(adapter-interrupt_watch_task);
cancel_work_sync(adapter-raise_intr_rxdata_task);
cancel_work_sync(adapter-tx_stall_task);
+   if (adapter-control_wq)
+   destroy_workqueue(adapter-control_wq);
if (adapter-txrx_wq)
destroy_workqueue(adapter-txrx_wq);
 
@@ -1081,6 +1099,26 @@ static void fjes_netdev_setup(struct net_device *netdev)
netdev-features |= NETIF_F_HW_CSUM | NETIF_F_HW_VLAN_CTAG_FILTER;
 }
 
+static void fjes_irq_watch_task(struct work_struct *work)
+{
+   struct fjes_adapter *adapter = container_of(to_delayed_work(work),
+   struct fjes_adapter, interrupt_watch_task);
+
+   local_irq_disable();
+   fjes_intr(adapter-hw.hw_res.irq, adapter);
+   local_irq_enable();
+
+   if (fjes_rxframe_search_exist(adapter, 0) = 0)
+   napi_schedule(adapter-napi);
+
+   if (adapter-interrupt_watch_enable) {
+   if (!delayed_work_pending(adapter-interrupt_watch_task))
+   queue_delayed_work(adapter-control_wq,
+  adapter-interrupt_watch_task,
+

[PATCH v2.2 18/22] fjes: unshare_watch_task

2015-08-20 Thread Taku Izumi

This patch adds unshare_watch_task.
Shared buffer's status can be changed into unshared.
This task is used to monitor shared buffer's status.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes.h  |   3 ++
 drivers/net/fjes/fjes_main.c | 126 +++
 2 files changed, 129 insertions(+)

diff --git a/drivers/net/fjes/fjes.h b/drivers/net/fjes/fjes.h
index d31d4c3..57feee8 100644
--- a/drivers/net/fjes/fjes.h
+++ b/drivers/net/fjes/fjes.h
@@ -59,6 +59,9 @@ struct fjes_adapter {
struct work_struct tx_stall_task;
struct work_struct raise_intr_rxdata_task;
 
+   struct work_struct unshare_watch_task;
+   unsigned long unshare_watch_bitmask;
+
struct delayed_work interrupt_watch_task;
bool interrupt_watch_enable;
 
diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index 3a8cc5b..e31a229 100644
--- a/drivers/net/fjes/fjes_main.c
+++ b/drivers/net/fjes/fjes_main.c
@@ -73,6 +73,7 @@ static int fjes_remove(struct platform_device *);
 static int fjes_sw_init(struct fjes_adapter *);
 static void fjes_netdev_setup(struct net_device *);
 static void fjes_irq_watch_task(struct work_struct *);
+static void fjes_watch_unshare_task(struct work_struct *);
 static void fjes_rx_irq(struct fjes_adapter *, int);
 static int fjes_poll(struct napi_struct *, int);
 
@@ -309,6 +310,8 @@ static int fjes_close(struct net_device *netdev)
fjes_free_irq(adapter);
 
cancel_delayed_work_sync(adapter-interrupt_watch_task);
+   cancel_work_sync(adapter-unshare_watch_task);
+   adapter-unshare_watch_bitmask = 0;
cancel_work_sync(adapter-raise_intr_rxdata_task);
cancel_work_sync(adapter-tx_stall_task);
 
@@ -1028,6 +1031,8 @@ static int fjes_probe(struct platform_device *plat_dev)
INIT_WORK(adapter-tx_stall_task, fjes_tx_stall_task);
INIT_WORK(adapter-raise_intr_rxdata_task,
  fjes_raise_intr_rxdata_task);
+   INIT_WORK(adapter-unshare_watch_task, fjes_watch_unshare_task);
+   adapter-unshare_watch_bitmask = 0;
 
INIT_DELAYED_WORK(adapter-interrupt_watch_task, fjes_irq_watch_task);
adapter-interrupt_watch_enable = false;
@@ -1073,6 +1078,7 @@ static int fjes_remove(struct platform_device *plat_dev)
struct fjes_hw *hw = adapter-hw;
 
cancel_delayed_work_sync(adapter-interrupt_watch_task);
+   cancel_work_sync(adapter-unshare_watch_task);
cancel_work_sync(adapter-raise_intr_rxdata_task);
cancel_work_sync(adapter-tx_stall_task);
if (adapter-control_wq)
@@ -1132,6 +1138,126 @@ static void fjes_irq_watch_task(struct work_struct 
*work)
}
 }
 
+static void fjes_watch_unshare_task(struct work_struct *work)
+{
+   struct fjes_adapter *adapter =
+   container_of(work, struct fjes_adapter, unshare_watch_task);
+
+   struct fjes_hw *hw = adapter-hw;
+   struct net_device *netdev = adapter-netdev;
+   int epidx;
+   int max_epid, my_epid;
+   unsigned long unshare_watch_bitmask;
+   int wait_time = 0;
+   int is_shared;
+   int stop_req, stop_req_done;
+   int unshare_watch, unshare_reserve;
+   int ret;
+
+   my_epid = hw-my_epid;
+   max_epid = hw-max_epid;
+
+   unshare_watch_bitmask = adapter-unshare_watch_bitmask;
+   adapter-unshare_watch_bitmask = 0;
+
+   while ((unshare_watch_bitmask || hw-txrx_stop_req_bit) 
+  (wait_time  3000)) {
+   for (epidx = 0; epidx  hw-max_epid; epidx++) {
+   if (epidx == hw-my_epid)
+   continue;
+
+   is_shared = fjes_hw_epid_is_shared(hw-hw_info.share,
+  epidx);
+
+   stop_req = test_bit(epidx, hw-txrx_stop_req_bit);
+
+   stop_req_done = 
hw-ep_shm_info[epidx].rx.info-v1i.rx_status 
+   FJES_RX_STOP_REQ_DONE;
+
+   unshare_watch = test_bit(epidx, unshare_watch_bitmask);
+
+   unshare_reserve = test_bit(epidx,
+  
hw-hw_info.buffer_unshare_reserve_bit);
+
+   if ((!stop_req ||
+(is_shared  (!is_shared || !stop_req_done))) 
+   (is_shared || !unshare_watch || !unshare_reserve))
+   continue;
+
+   mutex_lock(hw-hw_info.lock);
+   ret = fjes_hw_unregister_buff_addr(hw, epidx);
+   switch (ret) {
+   case 0:
+   break;
+   case -ENOMSG:
+   case -EBUSY:
+   default:
+   if (!work_pending(
+   adapter-force_close_task)) {
+

[PATCH v2.2 19/22] fjes: update_zone_task

2015-08-20 Thread Taku Izumi

This patch adds update_zone_task.
Zoning information can be changed by user.
This task is used to monitor if zoning information is
changed or not.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes_hw.c   | 171 +++
 drivers/net/fjes/fjes_hw.h   |   1 +
 drivers/net/fjes/fjes_main.c |  14 
 3 files changed, 186 insertions(+)

diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c
index 46e114c..4588ef3 100644
--- a/drivers/net/fjes/fjes_hw.c
+++ b/drivers/net/fjes/fjes_hw.c
@@ -22,6 +22,8 @@
 #include fjes_hw.h
 #include fjes.h
 
+static void fjes_hw_update_zone_task(struct work_struct *);
+
 /* supported MTU list */
 const u32 fjes_support_mtu[] = {
FJES_MTU_DEFINE(8 * 1024),
@@ -322,6 +324,8 @@ int fjes_hw_init(struct fjes_hw *hw)
 
fjes_hw_set_irqmask(hw, REG_ICTL_MASK_ALL, true);
 
+   INIT_WORK(hw-update_zone_task, fjes_hw_update_zone_task);
+
mutex_init(hw-hw_info.lock);
 
hw-max_epid = fjes_hw_get_max_epid(hw);
@@ -349,6 +353,8 @@ void fjes_hw_exit(struct fjes_hw *hw)
}
 
fjes_hw_cleanup(hw);
+
+   cancel_work_sync(hw-update_zone_task);
 }
 
 static enum fjes_dev_command_response_e
@@ -914,3 +920,168 @@ int fjes_hw_epbuf_tx_pkt_send(struct epbuf_handler *epbh,
return 0;
 }
 
+static void fjes_hw_update_zone_task(struct work_struct *work)
+{
+   struct fjes_hw *hw = container_of(work,
+   struct fjes_hw, update_zone_task);
+   struct fjes_adapter *adapter = (struct fjes_adapter *)hw-back;
+   struct net_device *netdev = adapter-netdev;
+   int ret;
+   int epidx;
+   enum ep_partner_status pstatus;
+   unsigned long share_bit = 0;
+   unsigned long unshare_bit = 0;
+   unsigned long irq_bit = 0;
+   union fjes_device_command_res *res_buf = hw-hw_info.res_buf;
+   struct my_s {u8 es_status; u8 zone; } *info =
+   (struct my_s *)res_buf-info.info;
+
+   mutex_lock(hw-hw_info.lock);
+
+   ret = fjes_hw_request_info(hw);
+   switch (ret) {
+   case -ENOMSG:
+   case -EBUSY:
+   default:
+   if (!work_pending(adapter-force_close_task)) {
+   adapter-force_reset = true;
+   schedule_work(adapter-force_close_task);
+   }
+   break;
+
+   case 0:
+
+   for (epidx = 0; epidx  hw-max_epid; epidx++) {
+   if (epidx == hw-my_epid) {
+   hw-ep_shm_info[epidx].es_status =
+   info[epidx].es_status;
+   hw-ep_shm_info[epidx].zone =
+   info[epidx].zone;
+   continue;
+   }
+
+   pstatus = fjes_hw_get_partner_ep_status(hw, epidx);
+   switch (pstatus) {
+   case EP_PARTNER_UNSHARE:
+   default:
+   if ((info[epidx].zone !=
+   FJES_ZONING_ZONE_TYPE_NONE) 
+   (info[epidx].es_status ==
+   FJES_ZONING_STATUS_ENABLE) 
+   (info[epidx].zone ==
+   info[hw-my_epid].zone))
+   set_bit(epidx, share_bit);
+   else
+   set_bit(epidx, unshare_bit);
+   break;
+
+   case EP_PARTNER_COMPLETE:
+   case EP_PARTNER_WAITING:
+   if ((info[epidx].zone ==
+   FJES_ZONING_ZONE_TYPE_NONE) ||
+   (info[epidx].es_status !=
+   FJES_ZONING_STATUS_ENABLE) ||
+   (info[epidx].zone !=
+   info[hw-my_epid].zone)) {
+   set_bit(epidx,
+   
adapter-unshare_watch_bitmask);
+   set_bit(epidx,
+   
hw-hw_info.buffer_unshare_reserve_bit);
+   }
+   break;
+
+   case EP_PARTNER_SHARED:
+   if ((info[epidx].zone ==
+   FJES_ZONING_ZONE_TYPE_NONE) ||
+   (info[epidx].es_status !=
+   FJES_ZONING_STATUS_ENABLE) ||
+   (info[epidx].zone !=
+   info[hw-my_epid].zone))
+   set_bit(epidx, irq_bit);
+

[PATCH v2.2 13/22] fjes: net_device_ops.ndo_change_mtu

2015-08-20 Thread Taku Izumi

This patch adds net_device_ops.ndo_change_mtu.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes_main.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index 842edbb..bb94890 100644
--- a/drivers/net/fjes/fjes_main.c
+++ b/drivers/net/fjes/fjes_main.c
@@ -57,6 +57,7 @@ static void fjes_tx_stall_task(struct work_struct *);
 static irqreturn_t fjes_intr(int, void*);
 static struct rtnl_link_stats64 *
 fjes_get_stats64(struct net_device *, struct rtnl_link_stats64 *);
+static int fjes_change_mtu(struct net_device *, int);
 
 static int fjes_acpi_add(struct acpi_device *);
 static int fjes_acpi_remove(struct acpi_device *);
@@ -222,6 +223,7 @@ static const struct net_device_ops fjes_netdev_ops = {
.ndo_stop   = fjes_close,
.ndo_start_xmit = fjes_xmit_frame,
.ndo_get_stats64= fjes_get_stats64,
+   .ndo_change_mtu = fjes_change_mtu,
 };
 
 /* fjes_open - Called when a network interface is made active */
@@ -715,6 +717,33 @@ fjes_get_stats64(struct net_device *netdev, struct 
rtnl_link_stats64 *stats)
return stats;
 }
 
+static int fjes_change_mtu(struct net_device *netdev, int new_mtu)
+{
+   int idx;
+   bool running = netif_running(netdev);
+   int ret = 0;
+
+   for (idx = 0; fjes_support_mtu[idx] != 0; idx++) {
+   if (new_mtu = fjes_support_mtu[idx]) {
+   new_mtu = fjes_support_mtu[idx];
+   if (new_mtu == netdev-mtu)
+   return 0;
+
+   if (running)
+   fjes_close(netdev);
+
+   netdev-mtu = new_mtu;
+
+   if (running)
+   ret = fjes_open(netdev);
+
+   return ret;
+   }
+   }
+
+   return -EINVAL;
+}
+
 static irqreturn_t fjes_intr(int irq, void *data)
 {
struct fjes_adapter *adapter = data;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2.2 17/22] fjes: force_close_task

2015-08-20 Thread Taku Izumi

This patch adds force_close_task.
This task is used to close network device forcibly.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes.h  |  1 +
 drivers/net/fjes/fjes_main.c | 13 +
 2 files changed, 14 insertions(+)

diff --git a/drivers/net/fjes/fjes.h b/drivers/net/fjes/fjes.h
index 1743dbb..d31d4c3 100644
--- a/drivers/net/fjes/fjes.h
+++ b/drivers/net/fjes/fjes.h
@@ -47,6 +47,7 @@ struct fjes_adapter {
unsigned long rx_last_jiffies;
bool unset_rx_last;
 
+   struct work_struct force_close_task;
bool force_reset;
bool open_guard;
 
diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index 2bf9f71..3a8cc5b 100644
--- a/drivers/net/fjes/fjes_main.c
+++ b/drivers/net/fjes/fjes_main.c
@@ -54,6 +54,7 @@ static void fjes_free_resources(struct fjes_adapter *);
 static netdev_tx_t fjes_xmit_frame(struct sk_buff *, struct net_device *);
 static void fjes_raise_intr_rxdata_task(struct work_struct *);
 static void fjes_tx_stall_task(struct work_struct *);
+static void fjes_force_close_task(struct work_struct *);
 static irqreturn_t fjes_intr(int, void*);
 static struct rtnl_link_stats64 *
 fjes_get_stats64(struct net_device *, struct rtnl_link_stats64 *);
@@ -494,6 +495,17 @@ static void fjes_tx_stall_task(struct work_struct *work)
queue_work(adapter-txrx_wq, adapter-tx_stall_task);
 }
 
+static void fjes_force_close_task(struct work_struct *work)
+{
+   struct fjes_adapter *adapter = container_of(work,
+   struct fjes_adapter, force_close_task);
+   struct net_device *netdev = adapter-netdev;
+
+   rtnl_lock();
+   dev_close(netdev);
+   rtnl_unlock();
+}
+
 static void fjes_raise_intr_rxdata_task(struct work_struct *work)
 {
struct fjes_adapter *adapter = container_of(work,
@@ -1006,6 +1018,7 @@ static int fjes_probe(struct platform_device *plat_dev)
if (err)
goto err_sw_init;
 
+   INIT_WORK(adapter-force_close_task, fjes_force_close_task);
adapter-force_reset = false;
adapter-open_guard = false;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2.2 00/22] FUJITSU Extended Socket network device driver

2015-08-20 Thread Taku Izumi

This patchsets adds FUJITSU Extended Socket network device driver.
Extended Socket network device is a shared memory based high-speed network
interface between Extended Partitions of PRIMEQUEST 2000 E2 series.
 
You can get some information about Extended Partition and Extended
Socket by referring the following manual.
 
http://globalsp.ts.fujitsu.com/dmsp/Publications/public/CA92344-0537.pdf
3.2.1 Extended Partitioning
3.2.2 Extended Socke

v2.1 - v2.2:
   - minor fix patch 21/22 according to Sergei's comment


Taku Izumi (22):
  fjes: Introduce FUJITSU Extended Socket Network Device driver
  fjes: Hardware initialization routine
  fjes: Hardware cleanup routine
  fjes: platform_driver's .probe and .remove routine
  fjes: ES information acquisition routine
  fjes: buffer address regist/unregistration routine
  fjes: net_device_ops.ndo_open and .ndo_stop
  fjes: net_device_ops.ndo_start_xmit
  fjes: raise_intr_rxdata_task
  fjes: tx_stall_task
  fjes: NAPI polling function
  fjes: net_device_ops.ndo_get_stats64
  fjes: net_device_ops.ndo_change_mtu
  fjes: net_device_ops.ndo_tx_timeout
  fjes: net_device_ops.ndo_vlan_rx_add/kill_vid
  fjes: interrupt_watch_task
  fjes: force_close_task
  fjes: unshare_watch_task
  fjes: update_zone_task
  fjes: epstop_task
  fjes: handle receive cancellation request interrupt
  fjes: ethtool support

 drivers/net/Kconfig |7 +
 drivers/net/Makefile|2 +
 drivers/net/fjes/Makefile   |   31 +
 drivers/net/fjes/fjes.h |   77 +++
 drivers/net/fjes/fjes_ethtool.c |  135 
 drivers/net/fjes/fjes_hw.c  | 1117 +++
 drivers/net/fjes/fjes_hw.h  |  334 ++
 drivers/net/fjes/fjes_main.c| 1388 +++
 drivers/net/fjes/fjes_regs.h|  142 
 9 files changed, 3233 insertions(+)
 create mode 100644 drivers/net/fjes/Makefile
 create mode 100644 drivers/net/fjes/fjes.h
 create mode 100644 drivers/net/fjes/fjes_ethtool.c
 create mode 100644 drivers/net/fjes/fjes_hw.c
 create mode 100644 drivers/net/fjes/fjes_hw.h
 create mode 100644 drivers/net/fjes/fjes_main.c
 create mode 100644 drivers/net/fjes/fjes_regs.h

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] [PATCH v3 17/20] net/xen-netfront: Make it running on 64KB page granularity

2015-08-20 Thread David Vrabel

On 07/08/15 17:46, Julien Grall wrote:
 The PV network protocol is using 4KB page granularity. The goal of this
 patch is to allow a Linux using 64KB page granularity using network
 device on a non-modified Xen.
 
 It's only necessary to adapt the ring size and break skb data in small
 chunk of 4KB. The rest of the code is relying on the grant table code.
 
 Note that we allocate a Linux page for each rx skb but only the first
 4KB is used. We may improve the memory usage by extending the size of
 the rx skb.

Reviewed-by: David Vrabel david.vra...@citrix.com

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 3/3] tipc: fix stale link problem during synchronization

2015-08-20 Thread Jon Maloy

Recent changes to the link synchronization means that we can now just
drop packets arriving on the synchronizing link before the synch point
is reached. This has lead to significant simplifications to the
implementation, but also turns out to have a flip side that we need
to consider.

Under unlucky circumstances, the two endpoints may end up
repeatedly dropping each other's packets, while immediately
asking for retransmission of the same packets, just to drop
them once more. This pattern will eventually be broken when
the synch point is reached on the other link, but before that,
the endpoints may have arrived at the retransmission limit
(stale counter) that indicates that the link should be broken.
We see this happen at rare occasions.

The fix for this is to not ask for retransmissions when a link is in
state LINK_SYNCHING. The fact that the link has reached this state
means that it has already received the first SYNCH packet, and that it
knows the synch point. Hence, it doesn't need any more packets until the
other link has reached the synch point, whereafter it can go ahead and
ask for the missing packets.

However, because of the reduced traffic on the synching link that
follows this change, it may now take longer to discover that the
synch point has been reached. We compensate for this by letting all
packets, on any of the links, trig a check for synchronization
termination. This is possible because the packets themselves don't
contain any information that is needed for discovering this condition.

Reviewed-by: Ying Xue ying@windriver.com
Signed-off-by: Jon Maloy jon.ma...@ericsson.com
---
 net/tipc/link.c |  3 ++-
 net/tipc/node.c | 12 ++--
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 7058c86..75db07c 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1330,6 +1330,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, 
struct sk_buff *skb,
u16 peers_snd_nxt =  msg_next_sent(hdr);
u16 peers_tol = msg_link_tolerance(hdr);
u16 peers_prio = msg_linkprio(hdr);
+   u16 rcv_nxt = l-rcv_nxt;
char *if_name;
int rc = 0;
 
@@ -1393,7 +1394,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, 
struct sk_buff *skb,
break;
 
/* Send NACK if peer has sent pkts we haven't received yet */
-   if (more(peers_snd_nxt, l-rcv_nxt))
+   if (more(peers_snd_nxt, rcv_nxt)  !tipc_link_is_synching(l))
rcvgap = peers_snd_nxt - l-rcv_nxt;
if (rcvgap || (msg_probe(hdr)))
tipc_link_build_proto_msg(l, STATE_MSG, 0, rcvgap,
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 937cc61..703875f 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1079,7 +1079,7 @@ static bool tipc_node_check_state(struct tipc_node *n, 
struct sk_buff *skb,
u16 exp_pkts = msg_msgcnt(hdr);
u16 rcv_nxt, syncpt, dlv_nxt;
int state = n-state;
-   struct tipc_link *l, *pl = NULL;
+   struct tipc_link *l, *tnl, *pl = NULL;
struct tipc_media_addr *maddr;
int i, pb_id;
 
@@ -1164,12 +1164,20 @@ static bool tipc_node_check_state(struct tipc_node *n, 
struct sk_buff *skb,
 
/* Open tunnel link when parallel link reaches synch point */
if ((n-state == NODE_SYNCHING)  tipc_link_is_synching(l)) {
+   if (tipc_link_is_synching(l)) {
+   tnl = l;
+   } else {
+   tnl = pl;
+   pl = l;
+   }
dlv_nxt = pl-rcv_nxt - mod(skb_queue_len(pl-inputq));
if (more(dlv_nxt, n-sync_point)) {
-   tipc_link_fsm_evt(l, LINK_SYNCH_END_EVT);
+   tipc_link_fsm_evt(tnl, LINK_SYNCH_END_EVT);
tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT);
return true;
}
+   if (l == pl)
+   return true;
if ((usr == TUNNEL_PROTOCOL)  (mtyp == SYNCH_MSG))
return true;
if (usr == LINK_PROTOCOL)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 2/3] tipc: interrupt link synchronization when a link goes down

2015-08-20 Thread Jon Maloy

When we introduced the new link failover/synch mechanism
in commit 6e498158a827fd515b514842e9a06bdf0f75ab86
(tipc: move link synch and failover to link aggregation level),
we missed the case when the non-tunnel link goes down during the link
synchronization period. In this case the tunnel link will remain in
state LINK_SYNCHING, something leading to unpredictable behavior when
the failover procedure is initiated.

In this commit, we ensure that the node and remaining link goes
back to regular communication state (SELF_UP_PEER_UP/LINK_ESTABLISHED)
when one of the parallel links goes down. We also ensure that we don't
re-enter synch mode if subsequent SYNCH packets arrive on the remaining
link.

Reviewed-by: Ying Xue ying@windriver.com
Signed-off-by: Jon Maloy jon.ma...@ericsson.com
---
 net/tipc/link.c |  2 +-
 net/tipc/node.c | 11 ---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index f067e54..7058c86 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -351,11 +351,11 @@ int tipc_link_fsm_evt(struct tipc_link *l, int evt)
l-state = LINK_RESET;
break;
case LINK_ESTABLISH_EVT:
+   case LINK_SYNCH_END_EVT:
break;
case LINK_SYNCH_BEGIN_EVT:
l-state = LINK_SYNCHING;
break;
-   case LINK_SYNCH_END_EVT:
case LINK_FAILOVER_BEGIN_EVT:
case LINK_FAILOVER_END_EVT:
default:
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 004834b..937cc61 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -423,6 +423,8 @@ static void __tipc_node_link_down(struct tipc_node *n, int 
*bearer_id,
 
/* There is still a working link = initiate failover */
tnl = node_active_link(n, 0);
+   tipc_link_fsm_evt(tnl, LINK_SYNCH_END_EVT);
+   tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT);
n-sync_point = tnl-rcv_nxt + (U16_MAX / 2 - 1);
tipc_link_tnl_prepare(l, tnl, FAILOVER_MSG, xmitq);
tipc_link_reset(l);
@@ -1140,6 +1142,10 @@ static bool tipc_node_check_state(struct tipc_node *n, 
struct sk_buff *skb,
return true;
}
 
+   /* No synching needed if only one link */
+   if (!pl || !tipc_link_is_up(pl))
+   return true;
+
/* Initiate or update synch mode if applicable */
if ((usr == TUNNEL_PROTOCOL)  (mtyp == SYNCH_MSG)) {
syncpt = iseqno + exp_pkts - 1;
@@ -1158,9 +1164,8 @@ static bool tipc_node_check_state(struct tipc_node *n, 
struct sk_buff *skb,
 
/* Open tunnel link when parallel link reaches synch point */
if ((n-state == NODE_SYNCHING)  tipc_link_is_synching(l)) {
-   if (pl)
-   dlv_nxt = mod(pl-rcv_nxt - skb_queue_len(pl-inputq));
-   if (!pl || more(dlv_nxt, n-sync_point)) {
+   dlv_nxt = pl-rcv_nxt - mod(skb_queue_len(pl-inputq));
+   if (more(dlv_nxt, n-sync_point)) {
tipc_link_fsm_evt(l, LINK_SYNCH_END_EVT);
tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT);
return true;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] bridge: Enable configuration of ageing interval for bridges and switch devices.

2015-08-20 Thread Premkumar Jonnala

 -Original Message-
 From: Scott Feldman [mailto:sfel...@gmail.com]
 Sent: Thursday, August 20, 2015 11:09 AM
 To: Premkumar Jonnala
 Cc: netdev@vger.kernel.org
 Subject: Re: [PATCH] bridge: Enable configuration of ageing interval for 
 bridges
 and switch devices.

 On Wed, Aug 19, 2015 at 10:12 PM, Premkumar Jonnala
 pjonn...@broadcom.com wrote:

  -Original Message-
  From: Scott Feldman [mailto:sfel...@gmail.com]
  Sent: Thursday, August 20, 2015 10:31 AM
  To: Premkumar Jonnala
  Cc: netdev@vger.kernel.org
  Subject: Re: [PATCH] bridge: Enable configuration of ageing interval for
 bridges
  and switch devices.

  On Wed, Aug 19, 2015 at 9:56 PM, Premkumar Jonnala
  pjonn...@broadcom.com wrote:
   Thank you Scott.  Please see inline.

-Original Message-
From: Scott Feldman [mailto:sfel...@gmail.com]
Sent: Tuesday, August 18, 2015 12:48 PM
To: Premkumar Jonnala
Cc: netdev@vger.kernel.org
Subject: Re: [PATCH] bridge: Enable configuration of ageing interval 
for
   bridges
and switch devices.

On Fri, 14 Aug 2015, Premkumar Jonnala wrote:

 Bridge devices have ageing interval used to age out MAC addresses
 from FDB.  This ageing interval was not configuratble.

 Enable netlink based configuration of ageing interval for bridges 
 and
 switch devices.  The ageing interval changes the timer used to 
 purge
 inactive FDB entries in bridges.  The ageing interval config is
 propagated to switch devices, so that platform or hardware based
 ageing works according to configuration.

 Signed-off-by: Premkumar Jonnala pjonn...@broadcom.com

Hi Premkumar,

I agree with Roopa that we should use existing
 IFLA_BR_AGEING_TIME.

What is the motivation for using 'ip link' command to configure bridge
   attributes?  IMHO,
bridge command is better suited for that.

   Can you extend bridge command to allow setting/getting these bridge
   attrs?  Looks like you construct a RTM_NEWLINK IFLA_INFO_DATA msg.
 No
   changes needed to the kernel.

   bridge link set dev br0 ageing_time 1000

--or--

   ip link set dev br0 type bridge ageing_time 1000

   I'd prefer to deprecate/remove all the 6 options on the 'ip link' command
 and
  move them to 'bridge' command.

  We're probably stuck with the 'ip link' commands, since they're
  already release in the wild and folks may have dependency on them.
  However, when looking at iproute2 code, look for opportunity to use
  same code for both paths.

  Ok.  Then we can have both ip and bridge commands supporting these options,
 and freeze the 'ip link' command as it
  exists today.  Any new options in future should be added to the bridge
 command.  Does that sound okay?

 It would be ideal if both command paths use same code so new options
 work with either command.  There are other examples where shared code
 approach could be used to create synonymous commands:

 ip link add name br0 type bridge   --or-- bridge add dev br0
 ip link del br0  --or-- bridge del dev br0
 ip link set dev sw1p1 master br0   --or-- bridge link add dev sw1p1 br0
 ip link set dev sw1p1 nomaster br0--or-- bridge link del sw1p1

 There is some precedence of synonymous bridge commands:

 ip link set dev sw1p1 type bridge_slave flood on
 --or--
 bridge link set dev sw1p1 flood on

 But I don't know if the same code path is used for both of these.  (It 
 should).

Sounds reasonable.  I'll try to follow this approach - with code sharing 
requirement in mind.

-Prem

 -scott
N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

Re: [RFC v2 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment

2015-08-20 Thread Grumbach, Emmanuel



On 08/19/2015 11:39 PM, Eric Dumazet wrote:
 On Wed, 2015-08-19 at 19:17 +, Grumbach, Emmanuel wrote:
 
 Hm.. how would net/core/tso.c avoid this?
 
 Because a driver using these helpers keep around the original LSO packet
 and frees it normally at TX completion time.
 

Which is why I can't really use it. The complexity is that I have to
(ieee802.11 specification) split an LSO is several 802.11 packets. The
maximal 802.11 packet I can send under ideal condition is 11K long or
so. So I *must* generate several 802.11 frames from one single LSO
packet. OTOH, I can have more than MSS bytes in a 802.11 A-MSDU.

Maybe what would help would be to be able to dynamically change the
maximal size of an LSO packet. That would allow the wifi driver to
ensure that the LSO can fit in a single 802.11 packet. Note that since
the maximal length of the A-MSDU can vary based on link conditions
(since there is only one CRC for the whole A-MSDU, you don't want long
A-MSDUs in bad link conditions) the driver would need to be able to tell
the TCP stack to modify the length of an LSO packet.
To me, this sounds to be ... an overkill?

I'll sum up all this considerations in the v3 I'll send later today.

 I can't see anything related to truesize there.
 Note that this work since it is guaranteed that we release the skbs in
 order.


 (BTW TCP packets do not have sock_wfree as destructor but tcp_wfree(),
 yet we want backpressure mostly for TCP stack (TCP Small Queues))



 I am not sure I follow here.
 You want me to test:
 if (skb_gso-destructor == tcp_wfree) ?
 
 
 Yes.
 
 Look for example at tcp_gso_segment() (called from skb_gso_segment())
 
 copy_destructor = gso_skb-destructor == tcp_wfree;
 ...
 /* Following permits TCP Small Queues to work well with GSO :
  * The callback to TCP stack will be called at the time last frag
  * is freed at TX completion, and not right now when gso_skb
  * is freed by GSO engine
  */
 if (copy_destructor) {
 swap(gso_skb-sk, skb-sk);
 swap(gso_skb-destructor, skb-destructor);
 sum_truesize += skb-truesize;
 atomic_add(sum_truesize - gso_skb-truesize,
skb-sk-sk_wmem_alloc);
 }
 
 

 I checked that code using iperf and saw that I don't get into this if,
 but I (probably wrongly) assumed that other applications would set a
 flag on the socket (forgive my ignorance) that would make this if be taken.
 
 If you do not see skb-destructor == tcp_wfree, then something is
 definitely wrong on your setup.
 

I'll check this today. Thanks.

 
 
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bridge: Enable configuration of ageing interval for bridges and switch devices.

2015-08-20 Thread Michal Kubecek

On Thu, Aug 20, 2015 at 05:08:51AM +, Premkumar Jonnala wrote:
  From: Wilson, Daniel G [mailto:daniel.wil...@intel.com]
  
   Can you extend bridge command to allow setting/getting these bridge attrs?
   Looks like you construct a RTM_NEWLINK IFLA_INFO_DATA msg.  No changes
   needed to the kernel.
  
   bridge link set dev br0 ageing_time 1000
  
--or--
  
   ip link set dev br0 type bridge ageing_time 1000
  
  Being able to set these attributes via both bridge and ip would be great.
  
 IMHO, we should choose only one command.  Otherwise, we'd have to
 spend effort in trying to keep both the commands in sync.

As long as they are using the same netlink interface, I don't think it's
a serious problem. After all, there will be also other tools (wicked,
perhaps systemd-networkd) setting it directly via netlink rather than
calling either ip or bridge.

 My vote would be for the bridge command - since the options/parameters
 are related to bridges.  If there is no objection, I'll move all the
 bridge options from 'ip link' command to 'bridge' command.

This would break existing scripts using ip to set the parameter. Is the
possibility to use any of the two really that bad?

 Michal Kubecek

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/18] Export SPI and OF module aliases in missing drivers

2015-08-20 Thread Javier Martinez Canillas

Hello,

Short version:

This patch series is the SPI equivalent of the I2C one posted before [0].

This series add the missing MODULE_DEVICE_TABLE() for OF and SPI tables
to export that information so modules have the correct aliases built-in
and autoloading works correctly.

Longer version:

The SPI core always reports the MODALIAS uevent as spi:modalias
regardless of the mechanism that was used to register the device (i.e:
OF or board code) and the table that is used later to match the driver
with the device (i.e: SPI id table or OF match table).

But this means that OF-only drivers needs to have both OF and SPI id
tables that have to be kept in sync and also the device node's compatible
manufacturer prefix is stripped when reporting the MODALIAS. Which can
lead to issues if two vendors use the same SPI device name for example.

Also, there are many SPI drivers whose module auto-loading is not working
because of this fact that the SPI core always reports the MODALIAS as
spi:modalias and many developers didn't expect this since is not how
other subsystems behave.

I've identified SPI drivers with 3 types of different issues:

a) Those that have an spi_table but are not exported. The match works
   if the driver is built-in but since the ID table is not exported,
   module auto-load won't work.

b) Those that have a of_table but are not exported. This is currently
   not an issue since even when the of_table is used to match the dev
   with the driver, an OF modalias is not reported by the SPI core.
   But if the SPI core is changed to report the MODALIAS of the form
   of:N*T*C as it's made by other subsystems, then module auto-load
   will break for these drivers.

c) Those that don't have an of_table but should since are OF drivers
   with DT bindings doc for them. Since the SPI core does not report
   a OF modalias and since spi_match_device() fallbacks to match the
   device part of the compatible string with the SPI device ID table,
   many OF drivers don't have an of_table to match. After all having
   a SPI device ID table is mandatory so it works without a of_table.

So, in order to not make mandatory to have a SPI device ID table, all
these three kind of issues have to be addressed. This series does that.

I split the changes so the patches in this series are independent and
can be picked individually by subsystem maintainers.

Patches #1 and #2 solves a), patches #3 to #8 solves b) and patches

Patch #18 changes the logic of spi_uevent() to report an OF modalias if
the device was registered using OF. But this patch is included in the
series only as an RFC for illustration purposes since changing that
without first applying all the other patches in this series, will break
module autoloading for the drivers of devices registered using OF but
that lacks an of_match_table. I'll repost patch #18 once all the patches
in this series have landed.

[0]: https://lkml.org/lkml/2015/7/30/519

Best regards,
Javier


Javier Martinez Canillas (18):
  iio: Export SPI module alias information in missing drivers
  staging: iio: hmc5843: Export missing SPI module alias information
  mtd: dataflash: Export OF module alias information
  OMAPDSS: panel-sony-acx565akm: Export OF module alias information
  mmc: mmc_spi: Export OF module alias information
  staging: mt29f_spinand: Export OF module alias information
  net: ks8851: Export OF module alias information
  [media] s5c73m3: Export OF module alias information
  mfd: cros_ec: spi: Add OF match table
  iio: dac: ad7303: Add OF match table
  iio: adc: max1027: Set struct spi_driver .of_match_table
  mfd: stmpe: Add OF match table
  iio: adc: mcp320x: Set struct spi_driver .of_match_table
  iio: as3935: Add OF match table
  iio: adc128s052: Add OF match table
  iio: frequency: adf4350: Add OF match table
  NFC: trf7970a: Add OF match table
  spi: (RFC, don't apply) report OF style modalias when probing using DT

 drivers/iio/adc/max1027.c   |  1 +
 drivers/iio/adc/mcp320x.c   |  1 +
 drivers/iio/adc/ti-adc128s052.c |  8 
 drivers/iio/amplifiers/ad8366.c |  1 +
 drivers/iio/dac/ad7303.c|  7 +++
 drivers/iio/frequency/adf4350.c |  9 +
 drivers/iio/proximity/as3935.c  |  7 +++
 drivers/media/i2c/s5c73m3/s5c73m3-spi.c |  1 +
 drivers/mfd/cros_ec_spi.c   |  7 +++
 drivers/mfd/stmpe-spi.c | 13 +
 drivers/mmc/host/mmc_spi.c  |  1 +
 drivers/mtd/devices/mtd_dataflash.c |  1 +
 drivers/net/ethernet/micrel/ks8851.c|  1 +
 drivers/nfc/trf7970a.c  |  7 +++
 drivers/spi/spi.c   |  8

[PATCH 07/18] net: ks8851: Export OF module alias information

2015-08-20 Thread Javier Martinez Canillas

The SPI core always reports the MODALIAS uevent as spi:modalias
regardless of the mechanism that was used to register the device
(i.e: OF or board code) and the table that is used later to match
the driver with the device (i.e: SPI id table or OF match table).

So drivers needs to export the SPI id table and this be built into
the module or udev won't have the necessary information to autoload
the needed driver module when the device is added.

But this means that OF-only drivers needs to have both OF and SPI id
tables that have to be kept in sync and also the dev node compatible
manufacturer prefix is stripped when reporting the MODALIAS. Which can
lead to issues if two vendors use the same SPI device name for example.

To avoid the above, the SPI core behavior may be changed in the future
to not require an SPI device table for OF-only drivers and report the
OF module alias. So, it's better to also export the OF table even when
is unused now to prevent breaking module loading when the core changes.

Signed-off-by: Javier Martinez Canillas jav...@osg.samsung.com
---

 drivers/net/ethernet/micrel/ks8851.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/micrel/ks8851.c 
b/drivers/net/ethernet/micrel/ks8851.c
index 66d4ab703f45..60f43ec22175 100644
--- a/drivers/net/ethernet/micrel/ks8851.c
+++ b/drivers/net/ethernet/micrel/ks8851.c
@@ -1601,6 +1601,7 @@ static const struct of_device_id ks8851_match_table[] = {
{ .compatible = micrel,ks8851 },
{ }
 };
+MODULE_DEVICE_TABLE(of, ks8851_match_table);
 
 static struct spi_driver ks8851_driver = {
.driver = {
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v3 1/3] iwlwifi: mvm: add real TSO implementation

2015-08-20 Thread Emmanuel Grumbach

The segmentation is done completely in software. The
driver creates several MPDUs out of a single large send.
Each MPDU is a newly allocated SKB.
A page is allocated to create the headers that need to be
duplicated (SNAP / IP / TCP). The WiFi header is in the
header of the newly created SKBs.

type=feature

Change-Id: I238ffa79cacc5bbdacdfbf3e9673c8d4f02b462a
Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com
---
 drivers/net/wireless/iwlwifi/mvm/tx.c | 513 +++---
 1 file changed, 481 insertions(+), 32 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c 
b/drivers/net/wireless/iwlwifi/mvm/tx.c
index 90f0ea1..a63686c 100644
--- a/drivers/net/wireless/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/iwlwifi/mvm/tx.c
@@ -65,6 +65,7 @@
 #include linux/ieee80211.h
 #include linux/etherdevice.h
 #include net/tcp.h
+#include net/ip.h
 
 #include iwl-trans.h
 #include iwl-eeprom-parse.h
@@ -435,32 +436,471 @@ int iwl_mvm_tx_skb_non_sta(struct iwl_mvm *mvm, struct 
sk_buff *skb)
return 0;
 }
 
+/*
+ * Update the IP / TCP headers and recompute the IP header CSUM +
+ * pseudo header CSUM.
+ */
+static void iwl_update_ip_tcph(void *iph, struct tcphdr *tcph, bool ipv6,
+  unsigned int len, unsigned int tcp_seq_offset,
+  u16 num_segment)
+{
+   be32_add_cpu(tcph-seq, tcp_seq_offset);
+
+   if (ipv6) {
+   struct ipv6hdr *iphv6 = iph;
+
+   iphv6-payload_len = cpu_to_be16(len + tcph-doff * 4);
+
+   /* Compute CSUM on the the pseudo-header */
+   tcph-check = ~csum_ipv6_magic(iphv6-saddr, iphv6-daddr,
+  len + tcph-doff * 4,
+  IPPROTO_TCP, 0);
+   } else {
+   struct iphdr *iphv4 = iph;
+
+   iphv4-tot_len =
+   cpu_to_be16(len + tcph-doff * 4 + iphv4-ihl * 4);
+   be16_add_cpu(iphv4-id, num_segment);
+   ip_send_check(iphv4);
+
+   /* Compute CSUM on the the pseudo-header */
+   tcph-check = ~csum_tcpudp_magic(iphv4-saddr, iphv4-daddr,
+len + tcph-doff * 4,
+IPPROTO_TCP, 0);
+   }
+}
+
+/**
+ * struct iwl_lso_splitter - state of the split.
+ * @linear_payload_len: The length of the payload inside the header of the
+ * original GSO skb.
+ * @gso_frag_num: The fragment number from which to take the data in the
+ * original GSO skb.
+ * @gso_payload_len: The length of the payload in the original GSO skb.
+ * @gso_payload_pos: The incrementing position in the payload of the original
+ * GSO skb.
+ * @gso_offset_in_page: The offset in the page of gso_frag_num.
+ * @gso_current_frag_size: The size of gso_frag_num.
+ * @gso_offset_in_frag: The offset in the gso_frag_num.
+ * @frag_in_mpdu: The index of the frag inside the new (split) MPDU.
+ * @mss: The maximal segment size.
+ * @si: Points to the the shared info of the original GSO skb.
+ * @ieee80211_hdr *hdr: Points to the WiFi header.
+ * @gso_nr_frags: The number of frags in the original GSO skb.
+ * @wifi_hdr_iv_len: The length of the WiFi header including IV.
+ * @tcp_fin: True if TCP_FIN is set in the original GSO skb.
+ * @tcp_push: True if TCP_PSH is set in the original GSO skb.
+ */
+struct iwl_lso_splitter {
+   unsigned int linear_payload_len;
+   unsigned int gso_frag_num;
+   unsigned int gso_payload_len;
+   unsigned int gso_payload_pos;
+   unsigned int gso_offset_in_page;
+   unsigned int gso_current_frag_size;
+   unsigned int gso_offset_in_frag;
+   unsigned int frag_in_mpdu;
+   unsigned int mss;
+   struct skb_shared_info *si;
+   struct ieee80211_hdr *hdr;
+   u8 gso_nr_frags;
+   u8 wifi_hdr_iv_len;
+   bool tcp_fin;
+   bool tcp_push;
+};
+
+/*
+ * Adds a TCP segment from skb_gso to skb. All the state is taken from
+ * and fed back to p. This function takes care about the payload only.
+ * This MSDU might already have msdu_sz bytes of payload that come from
+ * the original GSO skb's header.
+ */
+static unsigned int
+iwl_add_tcp_segment(struct iwl_mvm *mvm, struct sk_buff *skb_gso,
+   struct sk_buff *skb, struct iwl_lso_splitter *p,
+   unsigned int msdu_sz)
+{
+   while (msdu_sz  p-mss) {
+   unsigned int frag_sz =
+   min_t(unsigned int, p-gso_current_frag_size,
+ p-mss - msdu_sz);
+
+   if (p-frag_in_mpdu = mvm-trans-max_skb_frags)
+   return msdu_sz;
+
+   skb_add_rx_frag(skb, p-frag_in_mpdu,
+   skb_frag_page(p-si-frags[p-gso_frag_num]),
+   p-gso_offset_in_page, frag_sz, 0);
+
+   /* We just added one frag to the mpdu ... */

[RFC v3 2/3] iwlwifi: mvm: allow to create A-MSDUs from a large send

2015-08-20 Thread Emmanuel Grumbach

Now that we can get a big chunk of data from the network
stack, we can create an A-MSDU out of it. The purpose is to
get a throughput improvement since sending one single A-MSDU
is more efficient than sending several MSDUs at least under
ideal link conditions.

type=feature

Change-Id: I5ea1b1132a57542187cd4c34c5299dbf44fe8b01
Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com
---
 drivers/net/wireless/iwlwifi/mvm/mac80211.c |   3 +-
 drivers/net/wireless/iwlwifi/mvm/sta.c  |   4 +-
 drivers/net/wireless/iwlwifi/mvm/sta.h  |   6 +-
 drivers/net/wireless/iwlwifi/mvm/tx.c   | 159 ++--
 4 files changed, 160 insertions(+), 12 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/mvm/mac80211.c 
b/drivers/net/wireless/iwlwifi/mvm/mac80211.c
index 3dd4e97..dd15e04 100644
--- a/drivers/net/wireless/iwlwifi/mvm/mac80211.c
+++ b/drivers/net/wireless/iwlwifi/mvm/mac80211.c
@@ -925,7 +925,8 @@ static int iwl_mvm_mac_ampdu_action(struct ieee80211_hw *hw,
ret = iwl_mvm_sta_tx_agg_flush(mvm, vif, sta, tid);
break;
case IEEE80211_AMPDU_TX_OPERATIONAL:
-   ret = iwl_mvm_sta_tx_agg_oper(mvm, vif, sta, tid, buf_size);
+   ret = iwl_mvm_sta_tx_agg_oper(mvm, vif, sta, tid,
+ buf_size, amsdu);
break;
default:
WARN_ON_ONCE(1);
diff --git a/drivers/net/wireless/iwlwifi/mvm/sta.c 
b/drivers/net/wireless/iwlwifi/mvm/sta.c
index df216cd..606fc09 100644
--- a/drivers/net/wireless/iwlwifi/mvm/sta.c
+++ b/drivers/net/wireless/iwlwifi/mvm/sta.c
@@ -976,7 +976,8 @@ int iwl_mvm_sta_tx_agg_start(struct iwl_mvm *mvm, struct 
ieee80211_vif *vif,
 }
 
 int iwl_mvm_sta_tx_agg_oper(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
-   struct ieee80211_sta *sta, u16 tid, u8 buf_size)
+   struct ieee80211_sta *sta, u16 tid, u8 buf_size,
+   bool amsdu)
 {
struct iwl_mvm_sta *mvmsta = iwl_mvm_sta_from_mac80211(sta);
struct iwl_mvm_tid_data *tid_data = mvmsta-tid_data[tid];
@@ -995,6 +996,7 @@ int iwl_mvm_sta_tx_agg_oper(struct iwl_mvm *mvm, struct 
ieee80211_vif *vif,
queue = tid_data-txq_id;
tid_data-state = IWL_AGG_ON;
mvmsta-agg_tids |= BIT(tid);
+   tid_data-amsdu_in_ampdu_allowed = amsdu;
tid_data-ssn = 0x;
spin_unlock_bh(mvmsta-lock);
 
diff --git a/drivers/net/wireless/iwlwifi/mvm/sta.h 
b/drivers/net/wireless/iwlwifi/mvm/sta.h
index eedb215..26d1e31 100644
--- a/drivers/net/wireless/iwlwifi/mvm/sta.h
+++ b/drivers/net/wireless/iwlwifi/mvm/sta.h
@@ -258,6 +258,8 @@ enum iwl_mvm_agg_state {
  * Tx response (TX_CMD), and the block ack notification (COMPRESSED_BA).
  * @reduced_tpc: Reduced tx power. Holds the data between the
  * Tx response (TX_CMD), and the block ack notification (COMPRESSED_BA).
+ * @amsdu_in_ampdu_allowed: true if A-MSDU in A-MPDU is allowed. Relevant only
+ * if state is %IWL_AGG_ON.
  * @state: state of the BA agreement establishment / tear down.
  * @txq_id: Tx queue used by the BA session
  * @ssn: the first packet to be sent in AGG HW queue in Tx AGG start flow, or
@@ -272,6 +274,7 @@ struct iwl_mvm_tid_data {
/* The rest is Tx AGG related */
u32 rate_n_flags;
u8 reduced_tpc;
+   bool amsdu_in_ampdu_allowed;
enum iwl_mvm_agg_state state;
u16 txq_id;
u16 ssn;
@@ -387,7 +390,8 @@ int iwl_mvm_sta_rx_agg(struct iwl_mvm *mvm, struct 
ieee80211_sta *sta,
 int iwl_mvm_sta_tx_agg_start(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
struct ieee80211_sta *sta, u16 tid, u16 *ssn);
 int iwl_mvm_sta_tx_agg_oper(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
-   struct ieee80211_sta *sta, u16 tid, u8 buf_size);
+   struct ieee80211_sta *sta, u16 tid, u8 buf_size,
+   bool amsdu);
 int iwl_mvm_sta_tx_agg_stop(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
struct ieee80211_sta *sta, u16 tid);
 int iwl_mvm_sta_tx_agg_flush(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c 
b/drivers/net/wireless/iwlwifi/mvm/tx.c
index a63686c..5046833 100644
--- a/drivers/net/wireless/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/iwlwifi/mvm/tx.c
@@ -488,8 +488,10 @@ static void iwl_update_ip_tcph(void *iph, struct tcphdr 
*tcph, bool ipv6,
  * @ieee80211_hdr *hdr: Points to the WiFi header.
  * @gso_nr_frags: The number of frags in the original GSO skb.
  * @wifi_hdr_iv_len: The length of the WiFi header including IV.
+ * @amsdu_pad: Number of bytes for the A-MSDU subframe
  * @tcp_fin: True if TCP_FIN is set in the original GSO skb.
  * @tcp_push: True if TCP_PSH is set in the original GSO skb.
+ * @amsdu: True if we are building an A-MSDU
  */
 struct iwl_lso_splitter {
unsigned int

[RFC v3 0/3] add TSO / A-MSDU TX for iwlwifi

2015-08-20 Thread Emmanuel Grumbach

An A-MSDU is a 802.11 frame that comprises several L3
segments:
Each subframe has a ETH / SNAP / IP / TCP header (in case 
TCP is used of course). Each subframe has no 802.11
limitation on its length (only the MSS limitation), but
the whole A-MSDU has a size limit both in number of
subframes, and in bytes. In the best conditions, the
A-MSDU is unlimited in number of subframes, and can be up
to 11K byte long. An A-MSDU has only one WiFi MAC address,
and hence, each A-MSDU can be sent to only one WiFi peer.

There a 2 ways to get A-MSDU:
 * xmit_more
 * LSO
 * LSO + skb_gso_segment

The problem with xmit_more is that in AP mode, the Qdisc
can have packets for several L2 clients which makes is
harder to use since as stated above, an A-MSDU can be sent
to one single WiFi peer.

The problem with LSO is that the LSO packet is very likely
to be way bigger than the maximal A-MSDU length. This means
that a single LSO packet will generate several WiFi packets.
From an arch point of view, having different WiFi packets
sitting in one single skb is a problem, so we need to create
several skbs out of one single LSO packet. This makes the
usage of currently existing helpers (net/core/tso.c) very
hard.

Using LSO and skb_gso_segment is a bit wasteful since we
don't need an skb for each MSS, so that we would have to
reassemble the segs into one single packet. This is
sub-efficient.

Out of the two aforementioned ways, I chose to work with
LSO and implemented the segmentation in the driver itself.
While this code can technically be made driver agnostic,
the trend in the industry seems to show that more and more
vendors split the buffers in the firmware, so that I don't
expect to see any new devices coming up with the same
behavior as ours. I did take a look on currently existing
drivers and they already do the work in firmware.
If someone in linux-wireless thinks that this code can
serve his purposes, please speak up. All you'd need is to
support TX_CSUM and SG.

The additional headers needed (SNAP / IP / TCP) are copied
and updated on a page which is allocated in xmit. This is
supposed to very cheap. I add skb frags from that page to 
add the skb. Since I can have several 802.11 packets from a
single LSO packet, I allocated skb's on the way if needed.

I am quite a newbie in skb handling, so I guess that this
code can be improved. I have tested it decently using iperf,
but this doesn't mean that there are no issues using other
applications. We are enabling pktgen on TCP (using patches
that were sent a year ago or so) to test the different
layouts of the skb (payload partition amongst the header
and the different frags).

Changes since v2:
*
   + I fixed the whitespace problem spotted by Sergei.
   + I changed the D'tor check for truesize accounting.
 Since the D'tor seems to be set to tcp_wfree if it
 exists, I just the D'tor not being NULL.

Emmanuel Grumbach (3):
  iwlwifi: mvm: add real TSO implementation
  iwlwifi: mvm: allow to create A-MSDUs from a large send
  iwlwifi: mvm: transfer the truesize to the last TSO segment

 drivers/net/wireless/iwlwifi/mvm/mac80211.c |   3 +-
 drivers/net/wireless/iwlwifi/mvm/sta.c  |   4 +-
 drivers/net/wireless/iwlwifi/mvm/sta.h  |   6 +-
 drivers/net/wireless/iwlwifi/mvm/tx.c   | 669 ++--
 4 files changed, 647 insertions(+), 35 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v3 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment

2015-08-20 Thread Emmanuel Grumbach

This allows to release the backpressure on the socket only
when the last segment is released.
Now the truesize looks like this:
if the truesize of the original skb is 65420, all the
segments will have a truesize of 704 (skb itself) and the
last one will have 65420.

Change-Id: I3c894cf2afc0aedfe7b2a5b992ba41653ff79c0e
Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com
---
 drivers/net/wireless/iwlwifi/mvm/tx.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c 
b/drivers/net/wireless/iwlwifi/mvm/tx.c
index 5046833..2aeb5fd 100644
--- a/drivers/net/wireless/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/iwlwifi/mvm/tx.c
@@ -764,7 +764,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct 
sk_buff *skb_gso,
bool ipv6 = skb_shinfo(skb_gso)-gso_type  SKB_GSO_TCPV6;
struct iwl_lso_splitter s = {};
struct page *hdr_page;
-   unsigned int mpdu_sz;
+   unsigned int mpdu_sz, sum_truesize = 0;
u8 *hdr_page_pos, *qc, tid;
int i, ret;
 
@@ -898,6 +898,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct 
sk_buff *skb_gso,
mpdu_sz, tcp_hdrlen(skb_gso));
 
__skb_queue_tail(mpdus_skb, skb_gso);
+   sum_truesize += skb_gso-truesize;
 
/* mss bytes have been consumed from the data */
s.gso_payload_pos = s.mss;
@@ -1034,6 +1035,20 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct 
sk_buff *skb_gso,
}
 
__skb_queue_tail(mpdus_skb, skb);
+   sum_truesize += skb-truesize;
+   }
+
+   /* Release the backpressure on the socket only when
+* the last segment is released.
+*/
+   if (skb_gso-destructor) {
+   struct sk_buff *tail = mpdus_skb-prev;
+
+   swap(tail-truesize, skb_gso-truesize);
+   swap(tail-destructor, skb_gso-destructor);
+   swap(tail-sk, skb_gso-sk);
+   atomic_add(sum_truesize - skb_gso-truesize,
+  skb_gso-sk-sk_wmem_alloc);
}
 
ret = 0;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2.2 03/22] fjes: Hardware cleanup routine

2015-08-20 Thread Taku Izumi

This patch adds hardware cleanup routine to be
invoked at driver's .remove routine.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes_hw.c | 66 ++
 drivers/net/fjes/fjes_hw.h |  1 +
 2 files changed, 67 insertions(+)

diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c
index e94538f..abe583e 100644
--- a/drivers/net/fjes/fjes_hw.c
+++ b/drivers/net/fjes/fjes_hw.c
@@ -56,6 +56,12 @@ static u8 *fjes_hw_iomap(struct fjes_hw *hw)
return base;
 }
 
+static void fjes_hw_iounmap(struct fjes_hw *hw)
+{
+   iounmap(hw-base);
+   release_mem_region(hw-hw_res.start, hw-hw_res.size);
+}
+
 int fjes_hw_reset(struct fjes_hw *hw)
 {
int timeout;
@@ -109,6 +115,12 @@ static int fjes_hw_alloc_shared_status_region(struct 
fjes_hw *hw)
return 0;
 }
 
+static void fjes_hw_free_shared_status_region(struct fjes_hw *hw)
+{
+   kfree(hw-hw_info.share);
+   hw-hw_info.share = NULL;
+}
+
 static int fjes_hw_alloc_epbuf(struct epbuf_handler *epbh)
 {
void *mem;
@@ -126,6 +138,18 @@ static int fjes_hw_alloc_epbuf(struct epbuf_handler *epbh)
return 0;
 }
 
+static void fjes_hw_free_epbuf(struct epbuf_handler *epbh)
+{
+   if (epbh-buffer)
+   vfree(epbh-buffer);
+
+   epbh-buffer = NULL;
+   epbh-size = 0;
+
+   epbh-info = NULL;
+   epbh-ring = NULL;
+}
+
 void fjes_hw_setup_epbuf(struct epbuf_handler *epbh, u8 *mac_addr, u32 mtu)
 {
union ep_buffer_info *info = epbh-info;
@@ -258,6 +282,32 @@ static int fjes_hw_setup(struct fjes_hw *hw)
return 0;
 }
 
+static void fjes_hw_cleanup(struct fjes_hw *hw)
+{
+   int epidx;
+
+   if (!hw-ep_shm_info)
+   return;
+
+   fjes_hw_free_shared_status_region(hw);
+
+   kfree(hw-hw_info.req_buf);
+   hw-hw_info.req_buf = NULL;
+
+   kfree(hw-hw_info.res_buf);
+   hw-hw_info.res_buf = NULL;
+
+   for (epidx = 0; epidx  hw-max_epid ; epidx++) {
+   if (epidx == hw-my_epid)
+   continue;
+   fjes_hw_free_epbuf(hw-ep_shm_info[epidx].tx);
+   fjes_hw_free_epbuf(hw-ep_shm_info[epidx].rx);
+   }
+
+   kfree(hw-ep_shm_info);
+   hw-ep_shm_info = NULL;
+}
+
 int fjes_hw_init(struct fjes_hw *hw)
 {
int ret;
@@ -285,6 +335,22 @@ int fjes_hw_init(struct fjes_hw *hw)
return ret;
 }
 
+void fjes_hw_exit(struct fjes_hw *hw)
+{
+   int ret;
+
+   if (hw-base) {
+   ret = fjes_hw_reset(hw);
+   if (ret)
+   pr_err(%s: reset error, __func__);
+
+   fjes_hw_iounmap(hw);
+   hw-base = NULL;
+   }
+
+   fjes_hw_cleanup(hw);
+}
+
 void fjes_hw_set_irqmask(struct fjes_hw *hw,
 enum REG_ICTL_MASK intr_mask, bool mask)
 {
diff --git a/drivers/net/fjes/fjes_hw.h b/drivers/net/fjes/fjes_hw.h
index 836ebe2..1b3e9ca 100644
--- a/drivers/net/fjes/fjes_hw.h
+++ b/drivers/net/fjes/fjes_hw.h
@@ -241,6 +241,7 @@ struct fjes_hw {
 };
 
 int fjes_hw_init(struct fjes_hw *);
+void fjes_hw_exit(struct fjes_hw *);
 int fjes_hw_reset(struct fjes_hw *);
 
 void fjes_hw_init_command_registers(struct fjes_hw *,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2.2 04/22] fjes: platform_driver's .probe and .remove routine

2015-08-20 Thread Taku Izumi

This patch implements platform_driver's .probe and .remove
routine, and also adds board specific private data structure.

This driver registers net_device at platform_driver's .probe
routine and unregisters net_device at its .remove routine.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes.h  | 25 
 drivers/net/fjes/fjes_main.c | 95 
 2 files changed, 120 insertions(+)

diff --git a/drivers/net/fjes/fjes.h b/drivers/net/fjes/fjes.h
index 15ded96..54bc189 100644
--- a/drivers/net/fjes/fjes.h
+++ b/drivers/net/fjes/fjes.h
@@ -24,7 +24,32 @@
 
 #include linux/acpi.h
 
+#include fjes_hw.h
+
 #define FJES_ACPI_SYMBOL   Extended Socket
+#define FJES_MAX_QUEUES1
+#define FJES_TX_RETRY_INTERVAL (20 * HZ)
+
+/* board specific private data structure */
+struct fjes_adapter {
+   struct net_device *netdev;
+   struct platform_device *plat_dev;
+
+   struct napi_struct napi;
+   struct rtnl_link_stats64 stats64;
+
+   unsigned int tx_retry_count;
+   unsigned long tx_start_jiffies;
+   unsigned long rx_last_jiffies;
+   bool unset_rx_last;
+
+   bool force_reset;
+   bool open_guard;
+
+   bool irq_registered;
+
+   struct fjes_hw hw;
+};
 
 extern char fjes_driver_name[];
 extern char fjes_driver_version[];
diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index ab4d20c..7695b84 100644
--- a/drivers/net/fjes/fjes_main.c
+++ b/drivers/net/fjes/fjes_main.c
@@ -23,6 +23,7 @@
 #include linux/types.h
 #include linux/nls.h
 #include linux/platform_device.h
+#include linux/netdevice.h
 
 #include fjes.h
 
@@ -49,6 +50,9 @@ static acpi_status fjes_get_acpi_resource(struct 
acpi_resource *, void*);
 static int fjes_probe(struct platform_device *);
 static int fjes_remove(struct platform_device *);
 
+static int fjes_sw_init(struct fjes_adapter *);
+static void fjes_netdev_setup(struct net_device *);
+
 static const struct acpi_device_id fjes_acpi_ids[] = {
{PNP0C02, 0},
{, 0},
@@ -166,18 +170,109 @@ fjes_get_acpi_resource(struct acpi_resource *acpi_res, 
void *data)
return AE_OK;
 }
 
+static const struct net_device_ops fjes_netdev_ops = {
+};
+
 /* fjes_probe - Device Initialization Routine */
 static int fjes_probe(struct platform_device *plat_dev)
 {
+   struct net_device *netdev;
+   struct fjes_adapter *adapter;
+   struct fjes_hw *hw;
+   struct resource *res;
+   int err;
+
+   err = -ENOMEM;
+   netdev = alloc_netdev_mq(sizeof(struct fjes_adapter), es%d,
+NET_NAME_UNKNOWN, fjes_netdev_setup,
+FJES_MAX_QUEUES);
+
+   if (!netdev)
+   goto err_alloc_netdev;
+
+   SET_NETDEV_DEV(netdev, plat_dev-dev);
+
+   dev_set_drvdata(plat_dev-dev, netdev);
+   adapter = netdev_priv(netdev);
+   adapter-netdev = netdev;
+   adapter-plat_dev = plat_dev;
+   hw = adapter-hw;
+   hw-back = adapter;
+
+   /* setup the private structure */
+   err = fjes_sw_init(adapter);
+   if (err)
+   goto err_sw_init;
+
+   adapter-force_reset = false;
+   adapter-open_guard = false;
+
+   res = platform_get_resource(plat_dev, IORESOURCE_MEM, 0);
+   hw-hw_res.start = res-start;
+   hw-hw_res.size = res-end - res-start + 1;
+   hw-hw_res.irq = platform_get_irq(plat_dev, 0);
+   err = fjes_hw_init(adapter-hw);
+   if (err)
+   goto err_hw_init;
+
+   /* setup MAC address (02:00:00:00:00:[epid])*/
+   netdev-dev_addr[0] = 2;
+   netdev-dev_addr[1] = 0;
+   netdev-dev_addr[2] = 0;
+   netdev-dev_addr[3] = 0;
+   netdev-dev_addr[4] = 0;
+   netdev-dev_addr[5] = hw-my_epid; /* EPID */
+
+   err = register_netdev(netdev);
+   if (err)
+   goto err_register;
+
+   netif_carrier_off(netdev);
+
return 0;
+
+err_register:
+   fjes_hw_exit(adapter-hw);
+err_hw_init:
+err_sw_init:
+   free_netdev(netdev);
+err_alloc_netdev:
+   return err;
 }
 
 /* fjes_remove - Device Removal Routine */
 static int fjes_remove(struct platform_device *plat_dev)
 {
+   struct net_device *netdev = dev_get_drvdata(plat_dev-dev);
+   struct fjes_adapter *adapter = netdev_priv(netdev);
+   struct fjes_hw *hw = adapter-hw;
+
+   unregister_netdev(netdev);
+
+   fjes_hw_exit(hw);
+
+   free_netdev(netdev);
+
return 0;
 }
 
+static int fjes_sw_init(struct fjes_adapter *adapter)
+{
+   return 0;
+}
+
+/* fjes_netdev_setup - netdevice initialization routine */
+static void fjes_netdev_setup(struct net_device *netdev)
+{
+   ether_setup(netdev);
+
+   netdev-watchdog_timeo = FJES_TX_RETRY_INTERVAL;
+   netdev-netdev_ops = fjes_netdev_ops;
+   netdev-mtu = fjes_support_mtu[0];
+   netdev-flags |= IFF_BROADCAST;
+   netdev-features |= NETIF_F_HW_CSUM |

[PATCH v2.2 14/22] fjes: net_device_ops.ndo_tx_timeout

2015-08-20 Thread Taku Izumi

This patch adds net_device_ops.ndo_tx_timeout callback.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes_main.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index bb94890..c611c58 100644
--- a/drivers/net/fjes/fjes_main.c
+++ b/drivers/net/fjes/fjes_main.c
@@ -58,6 +58,7 @@ static irqreturn_t fjes_intr(int, void*);
 static struct rtnl_link_stats64 *
 fjes_get_stats64(struct net_device *, struct rtnl_link_stats64 *);
 static int fjes_change_mtu(struct net_device *, int);
+static void fjes_tx_retry(struct net_device *);
 
 static int fjes_acpi_add(struct acpi_device *);
 static int fjes_acpi_remove(struct acpi_device *);
@@ -224,6 +225,7 @@ static const struct net_device_ops fjes_netdev_ops = {
.ndo_start_xmit = fjes_xmit_frame,
.ndo_get_stats64= fjes_get_stats64,
.ndo_change_mtu = fjes_change_mtu,
+   .ndo_tx_timeout = fjes_tx_retry,
 };
 
 /* fjes_open - Called when a network interface is made active */
@@ -707,6 +709,13 @@ fjes_xmit_frame(struct sk_buff *skb, struct net_device 
*netdev)
return ret;
 }
 
+static void fjes_tx_retry(struct net_device *netdev)
+{
+   struct netdev_queue *queue = netdev_get_tx_queue(netdev, 0);
+
+   netif_tx_wake_queue(queue);
+}
+
 static struct rtnl_link_stats64 *
 fjes_get_stats64(struct net_device *netdev, struct rtnl_link_stats64 *stats)
 {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2.2 12/22] fjes: net_device_ops.ndo_get_stats64

2015-08-20 Thread Taku Izumi

This patch adds net_device_ops.ndo_get_stats64 callback.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes_main.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index 186197d..842edbb 100644
--- a/drivers/net/fjes/fjes_main.c
+++ b/drivers/net/fjes/fjes_main.c
@@ -55,6 +55,8 @@ static netdev_tx_t fjes_xmit_frame(struct sk_buff *, struct 
net_device *);
 static void fjes_raise_intr_rxdata_task(struct work_struct *);
 static void fjes_tx_stall_task(struct work_struct *);
 static irqreturn_t fjes_intr(int, void*);
+static struct rtnl_link_stats64 *
+fjes_get_stats64(struct net_device *, struct rtnl_link_stats64 *);
 
 static int fjes_acpi_add(struct acpi_device *);
 static int fjes_acpi_remove(struct acpi_device *);
@@ -219,6 +221,7 @@ static const struct net_device_ops fjes_netdev_ops = {
.ndo_open   = fjes_open,
.ndo_stop   = fjes_close,
.ndo_start_xmit = fjes_xmit_frame,
+   .ndo_get_stats64= fjes_get_stats64,
 };
 
 /* fjes_open - Called when a network interface is made active */
@@ -702,6 +705,16 @@ fjes_xmit_frame(struct sk_buff *skb, struct net_device 
*netdev)
return ret;
 }
 
+static struct rtnl_link_stats64 *
+fjes_get_stats64(struct net_device *netdev, struct rtnl_link_stats64 *stats)
+{
+   struct fjes_adapter *adapter = netdev_priv(netdev);
+
+   memcpy(stats, adapter-stats64, sizeof(struct rtnl_link_stats64));
+
+   return stats;
+}
+
 static irqreturn_t fjes_intr(int irq, void *data)
 {
struct fjes_adapter *adapter = data;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2.2 11/22] fjes: NAPI polling function

2015-08-20 Thread Taku Izumi

This patch adds NAPI polling function and receive related work.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes_hw.c   |  40 ++
 drivers/net/fjes/fjes_hw.h   |   5 ++
 drivers/net/fjes/fjes_main.c | 172 ++-
 3 files changed, 215 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c
index 4965791..ae7b7cd 100644
--- a/drivers/net/fjes/fjes_hw.c
+++ b/drivers/net/fjes/fjes_hw.c
@@ -825,6 +825,46 @@ bool fjes_hw_check_vlan_id(struct epbuf_handler *epbh, u16 
vlan_id)
return ret;
 }
 
+bool fjes_hw_epbuf_rx_is_empty(struct epbuf_handler *epbh)
+{
+   union ep_buffer_info *info = epbh-info;
+
+   if (info-v1i.count_max == 0)
+   return true;
+
+   return EP_RING_EMPTY(info-v1i.head, info-v1i.tail,
+info-v1i.count_max);
+}
+
+void *fjes_hw_epbuf_rx_curpkt_get_addr(struct epbuf_handler *epbh,
+  size_t *psize)
+{
+   union ep_buffer_info *info = epbh-info;
+   struct esmem_frame *ring_frame;
+   void *frame;
+
+   ring_frame = (struct esmem_frame *)(epbh-ring[EP_RING_INDEX
+(info-v1i.head,
+ info-v1i.count_max) *
+info-v1i.frame_max]);
+
+   *psize = (size_t)ring_frame-frame_size;
+
+   frame = ring_frame-frame_data;
+
+   return frame;
+}
+
+void fjes_hw_epbuf_rx_curpkt_drop(struct epbuf_handler *epbh)
+{
+   union ep_buffer_info *info = epbh-info;
+
+   if (fjes_hw_epbuf_rx_is_empty(epbh))
+   return;
+
+   EP_RING_INDEX_INC(epbh-info-v1i.head, info-v1i.count_max);
+}
+
 int fjes_hw_epbuf_tx_pkt_send(struct epbuf_handler *epbh,
  void *frame, size_t size)
 {
diff --git a/drivers/net/fjes/fjes_hw.h b/drivers/net/fjes/fjes_hw.h
index 2b49421..0b43bc3 100644
--- a/drivers/net/fjes/fjes_hw.h
+++ b/drivers/net/fjes/fjes_hw.h
@@ -69,6 +69,8 @@ struct fjes_hw;
((_num) = EP_RING_INDEX((_num) + 1, (_max)))
 #define EP_RING_FULL(_head, _tail, _max)   \
(0 == EP_RING_INDEX(((_tail) - (_head)), (_max)))
+#define EP_RING_EMPTY(_head, _tail, _max) \
+   (1 == EP_RING_INDEX(((_tail) - (_head)), (_max)))
 
 #define FJES_MTU_TO_BUFFER_SIZE(mtu) \
(ETH_HLEN + VLAN_HLEN + (mtu) + ETH_FCS_LEN)
@@ -320,6 +322,9 @@ int fjes_hw_epid_is_shared(struct fjes_device_shared_info 
*, int);
 bool fjes_hw_check_epbuf_version(struct epbuf_handler *, u32);
 bool fjes_hw_check_mtu(struct epbuf_handler *, u32);
 bool fjes_hw_check_vlan_id(struct epbuf_handler *, u16);
+bool fjes_hw_epbuf_rx_is_empty(struct epbuf_handler *);
+void *fjes_hw_epbuf_rx_curpkt_get_addr(struct epbuf_handler *, size_t *);
+void fjes_hw_epbuf_rx_curpkt_drop(struct epbuf_handler *);
 int fjes_hw_epbuf_tx_pkt_send(struct epbuf_handler *, void *, size_t);
 
 #endif /* FJES_HW_H_ */
diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index 8cc687e..186197d 100644
--- a/drivers/net/fjes/fjes_main.c
+++ b/drivers/net/fjes/fjes_main.c
@@ -66,6 +66,9 @@ static int fjes_remove(struct platform_device *);
 static int fjes_sw_init(struct fjes_adapter *);
 static void fjes_netdev_setup(struct net_device *);
 
+static void fjes_rx_irq(struct fjes_adapter *, int);
+static int fjes_poll(struct napi_struct *, int);
+
 static const struct acpi_device_id fjes_acpi_ids[] = {
{PNP0C02, 0},
{, 0},
@@ -235,6 +238,8 @@ static int fjes_open(struct net_device *netdev)
hw-txrx_stop_req_bit = 0;
hw-epstop_req_bit = 0;
 
+   napi_enable(adapter-napi);
+
fjes_hw_capture_interrupt_status(hw);
 
result = fjes_request_irq(adapter);
@@ -250,6 +255,7 @@ static int fjes_open(struct net_device *netdev)
 
 err_req_irq:
fjes_free_irq(adapter);
+   napi_disable(adapter-napi);
 
 err_setup_res:
fjes_free_resources(adapter);
@@ -268,6 +274,8 @@ static int fjes_close(struct net_device *netdev)
 
fjes_hw_raise_epstop(hw);
 
+   napi_disable(adapter-napi);
+
for (epidx = 0; epidx  hw-max_epid; epidx++) {
if (epidx == hw-my_epid)
continue;
@@ -703,14 +711,168 @@ static irqreturn_t fjes_intr(int irq, void *data)
 
icr = fjes_hw_capture_interrupt_status(hw);
 
-   if (icr  REG_IS_MASK_IS_ASSERT)
+   if (icr  REG_IS_MASK_IS_ASSERT) {
+   if (icr  REG_ICTL_MASK_RX_DATA)
+   fjes_rx_irq(adapter, icr  REG_IS_MASK_EPID);
+
ret = IRQ_HANDLED;
-   else
+   } else {
ret = IRQ_NONE;
+   }
 
return ret;
 }
 
+static int fjes_rxframe_search_exist(struct fjes_adapter *adapter,
+int start_epid)
+{
+   struct fjes_hw *hw = adapter-hw;
+   int cur_epid;
+

[PATCH v2.2 15/22] fjes: net_device_ops.ndo_vlan_rx_add/kill_vid

2015-08-20 Thread Taku Izumi

This patch adds net_device_ops.ndo_vlan_rx_add_vid and
net_device_ops.ndo_vlan_rx_kill_vid callback.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes_hw.c   | 27 +++
 drivers/net/fjes/fjes_hw.h   |  2 ++
 drivers/net/fjes/fjes_main.c | 40 
 3 files changed, 69 insertions(+)

diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c
index ae7b7cd..46e114c 100644
--- a/drivers/net/fjes/fjes_hw.c
+++ b/drivers/net/fjes/fjes_hw.c
@@ -825,6 +825,33 @@ bool fjes_hw_check_vlan_id(struct epbuf_handler *epbh, u16 
vlan_id)
return ret;
 }
 
+bool fjes_hw_set_vlan_id(struct epbuf_handler *epbh, u16 vlan_id)
+{
+   union ep_buffer_info *info = epbh-info;
+   int i;
+
+   for (i = 0; i  EP_BUFFER_SUPPORT_VLAN_MAX; i++) {
+   if (info-v1i.vlan_id[i] == 0) {
+   info-v1i.vlan_id[i] = vlan_id;
+   return true;
+   }
+   }
+   return false;
+}
+
+void fjes_hw_del_vlan_id(struct epbuf_handler *epbh, u16 vlan_id)
+{
+   union ep_buffer_info *info = epbh-info;
+   int i;
+
+   if (0 != vlan_id) {
+   for (i = 0; i  EP_BUFFER_SUPPORT_VLAN_MAX; i++) {
+   if (vlan_id == info-v1i.vlan_id[i])
+   info-v1i.vlan_id[i] = 0;
+   }
+   }
+}
+
 bool fjes_hw_epbuf_rx_is_empty(struct epbuf_handler *epbh)
 {
union ep_buffer_info *info = epbh-info;
diff --git a/drivers/net/fjes/fjes_hw.h b/drivers/net/fjes/fjes_hw.h
index 0b43bc3..2fcbfeb 100644
--- a/drivers/net/fjes/fjes_hw.h
+++ b/drivers/net/fjes/fjes_hw.h
@@ -322,6 +322,8 @@ int fjes_hw_epid_is_shared(struct fjes_device_shared_info 
*, int);
 bool fjes_hw_check_epbuf_version(struct epbuf_handler *, u32);
 bool fjes_hw_check_mtu(struct epbuf_handler *, u32);
 bool fjes_hw_check_vlan_id(struct epbuf_handler *, u16);
+bool fjes_hw_set_vlan_id(struct epbuf_handler *, u16);
+void fjes_hw_del_vlan_id(struct epbuf_handler *, u16);
 bool fjes_hw_epbuf_rx_is_empty(struct epbuf_handler *);
 void *fjes_hw_epbuf_rx_curpkt_get_addr(struct epbuf_handler *, size_t *);
 void fjes_hw_epbuf_rx_curpkt_drop(struct epbuf_handler *);
diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index c611c58..1bb9347 100644
--- a/drivers/net/fjes/fjes_main.c
+++ b/drivers/net/fjes/fjes_main.c
@@ -58,6 +58,8 @@ static irqreturn_t fjes_intr(int, void*);
 static struct rtnl_link_stats64 *
 fjes_get_stats64(struct net_device *, struct rtnl_link_stats64 *);
 static int fjes_change_mtu(struct net_device *, int);
+static int fjes_vlan_rx_add_vid(struct net_device *, __be16 proto, u16);
+static int fjes_vlan_rx_kill_vid(struct net_device *, __be16 proto, u16);
 static void fjes_tx_retry(struct net_device *);
 
 static int fjes_acpi_add(struct acpi_device *);
@@ -226,6 +228,8 @@ static const struct net_device_ops fjes_netdev_ops = {
.ndo_get_stats64= fjes_get_stats64,
.ndo_change_mtu = fjes_change_mtu,
.ndo_tx_timeout = fjes_tx_retry,
+   .ndo_vlan_rx_add_vid= fjes_vlan_rx_add_vid,
+   .ndo_vlan_rx_kill_vid = fjes_vlan_rx_kill_vid,
 };
 
 /* fjes_open - Called when a network interface is made active */
@@ -753,6 +757,42 @@ static int fjes_change_mtu(struct net_device *netdev, int 
new_mtu)
return -EINVAL;
 }
 
+static int fjes_vlan_rx_add_vid(struct net_device *netdev,
+   __be16 proto, u16 vid)
+{
+   struct fjes_adapter *adapter = netdev_priv(netdev);
+   bool ret = true;
+   int epid;
+
+   for (epid = 0; epid  adapter-hw.max_epid; epid++) {
+   if (epid == adapter-hw.my_epid)
+   continue;
+
+   if (!fjes_hw_check_vlan_id(
+   adapter-hw.ep_shm_info[epid].tx, vid))
+   ret = fjes_hw_set_vlan_id(
+   adapter-hw.ep_shm_info[epid].tx, vid);
+   }
+
+   return ret ? 0 : -ENOSPC;
+}
+
+static int fjes_vlan_rx_kill_vid(struct net_device *netdev,
+__be16 proto, u16 vid)
+{
+   struct fjes_adapter *adapter = netdev_priv(netdev);
+   int epid;
+
+   for (epid = 0; epid  adapter-hw.max_epid; epid++) {
+   if (epid == adapter-hw.my_epid)
+   continue;
+
+   fjes_hw_del_vlan_id(adapter-hw.ep_shm_info[epid].tx, vid);
+   }
+
+   return 0;
+}
+
 static irqreturn_t fjes_intr(int irq, void *data)
 {
struct fjes_adapter *adapter = data;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2.2 05/22] fjes: ES information acquisition routine

2015-08-20 Thread Taku Izumi

This patch adds ES information acquisition routine.
ES information can be retrieved issuing information
request command. ES information includes which
receiver is same zone.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes_hw.c   | 101 +++
 drivers/net/fjes/fjes_hw.h   |  24 ++
 drivers/net/fjes/fjes_regs.h |  23 ++
 3 files changed, 148 insertions(+)

diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c
index abe583e..3fbe68e 100644
--- a/drivers/net/fjes/fjes_hw.c
+++ b/drivers/net/fjes/fjes_hw.c
@@ -351,6 +351,107 @@ void fjes_hw_exit(struct fjes_hw *hw)
fjes_hw_cleanup(hw);
 }
 
+static enum fjes_dev_command_response_e
+fjes_hw_issue_request_command(struct fjes_hw *hw,
+ enum fjes_dev_command_request_type type)
+{
+   union REG_CR cr;
+   union REG_CS cs;
+   enum fjes_dev_command_response_e ret = FJES_CMD_STATUS_UNKNOWN;
+   int timeout;
+
+   cr.reg = 0;
+   cr.bits.req_start = 1;
+   cr.bits.req_code = type;
+   wr32(XSCT_CR, cr.reg);
+   cr.reg = rd32(XSCT_CR);
+
+   if (cr.bits.error == 0) {
+   timeout = FJES_COMMAND_REQ_TIMEOUT * 1000;
+   cs.reg = rd32(XSCT_CS);
+
+   while ((cs.bits.complete != 1)  timeout  0) {
+   msleep(1000);
+   cs.reg = rd32(XSCT_CS);
+   timeout -= 1000;
+   }
+
+   if (cs.bits.complete == 1)
+   ret = FJES_CMD_STATUS_NORMAL;
+   else if (timeout = 0)
+   ret = FJES_CMD_STATUS_TIMEOUT;
+
+   } else {
+   switch (cr.bits.err_info) {
+   case FJES_CMD_REQ_ERR_INFO_PARAM:
+   ret = FJES_CMD_STATUS_ERROR_PARAM;
+   break;
+   case FJES_CMD_REQ_ERR_INFO_STATUS:
+   ret = FJES_CMD_STATUS_ERROR_STATUS;
+   break;
+   default:
+   ret = FJES_CMD_STATUS_UNKNOWN;
+   break;
+   }
+   }
+
+   return ret;
+}
+
+int fjes_hw_request_info(struct fjes_hw *hw)
+{
+   union fjes_device_command_req *req_buf = hw-hw_info.req_buf;
+   union fjes_device_command_res *res_buf = hw-hw_info.res_buf;
+   enum fjes_dev_command_response_e ret;
+   int result;
+
+   memset(req_buf, 0, hw-hw_info.req_buf_size);
+   memset(res_buf, 0, hw-hw_info.res_buf_size);
+
+   req_buf-info.length = FJES_DEV_COMMAND_INFO_REQ_LEN;
+
+   res_buf-info.length = 0;
+   res_buf-info.code = 0;
+
+   ret = fjes_hw_issue_request_command(hw, FJES_CMD_REQ_INFO);
+
+   result = 0;
+
+   if (FJES_DEV_COMMAND_INFO_RES_LEN((*hw-hw_info.max_epid)) !=
+   res_buf-info.length) {
+   result = -ENOMSG;
+   } else if (ret == FJES_CMD_STATUS_NORMAL) {
+   switch (res_buf-info.code) {
+   case FJES_CMD_REQ_RES_CODE_NORMAL:
+   result = 0;
+   break;
+   default:
+   result = -EPERM;
+   break;
+   }
+   } else {
+   switch (ret) {
+   case FJES_CMD_STATUS_UNKNOWN:
+   result = -EPERM;
+   break;
+   case FJES_CMD_STATUS_TIMEOUT:
+   result = -EBUSY;
+   break;
+   case FJES_CMD_STATUS_ERROR_PARAM:
+   result = -EPERM;
+   break;
+   case FJES_CMD_STATUS_ERROR_STATUS:
+   result = -EPERM;
+   break;
+   default:
+   result = -EPERM;
+   break;
+   }
+   }
+
+   return result;
+}
+
 void fjes_hw_set_irqmask(struct fjes_hw *hw,
 enum REG_ICTL_MASK intr_mask, bool mask)
 {
diff --git a/drivers/net/fjes/fjes_hw.h b/drivers/net/fjes/fjes_hw.h
index 1b3e9ca..df30d3d 100644
--- a/drivers/net/fjes/fjes_hw.h
+++ b/drivers/net/fjes/fjes_hw.h
@@ -34,6 +34,12 @@ struct fjes_hw;
 #define EP_BUFFER_INFO_SIZE 4096
 
 #define FJES_DEVICE_RESET_TIMEOUT  ((17 + 1) * 3) /* sec */
+#define FJES_COMMAND_REQ_TIMEOUT  (5 + 1) /* sec */
+
+#define FJES_CMD_REQ_ERR_INFO_PARAM  (0x0001)
+#define FJES_CMD_REQ_ERR_INFO_STATUS (0x0002)
+
+#define FJES_CMD_REQ_RES_CODE_NORMAL (0)
 
 #define EP_BUFFER_SIZE \
(((sizeof(union ep_buffer_info) + (128 * (64 * 1024))) \
@@ -50,6 +56,7 @@ struct fjes_hw;
((size) - sizeof(struct esmem_frame) - \
(ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN))
 
+#define FJES_DEV_COMMAND_INFO_REQ_LEN  (4)
 #define FJES_DEV_COMMAND_INFO_RES_LEN(epnum) (8 + 2 * (epnum))
 #define FJES_DEV_COMMAND_SHARE_BUFFER_REQ_LEN(txb, rxb) \
(24 + (8 * ((txb) / EP_BUFFER_INFO_SIZE + (rxb)

[PATCH v2.2 10/22] fjes: tx_stall_task

2015-08-20 Thread Taku Izumi

This patch adds tx_stall_task.
When receiver's buffer is full, sender stops
its tx queue. This task is used to monitor
receiver's status and when receiver's buffer
is avairable, it resumes tx queue.

Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
 drivers/net/fjes/fjes.h  |  2 ++
 drivers/net/fjes/fjes_main.c | 63 
 2 files changed, 65 insertions(+)

diff --git a/drivers/net/fjes/fjes.h b/drivers/net/fjes/fjes.h
index 8e9899e..b04ea9d 100644
--- a/drivers/net/fjes/fjes.h
+++ b/drivers/net/fjes/fjes.h
@@ -30,6 +30,7 @@
 #define FJES_MAX_QUEUES1
 #define FJES_TX_RETRY_INTERVAL (20 * HZ)
 #define FJES_TX_RETRY_TIMEOUT  (100)
+#define FJES_TX_TX_STALL_TIMEOUT   (FJES_TX_RETRY_INTERVAL / 2)
 #define FJES_OPEN_ZONE_UPDATE_WAIT (300) /* msec */
 
 /* board specific private data structure */
@@ -52,6 +53,7 @@ struct fjes_adapter {
 
struct workqueue_struct *txrx_wq;
 
+   struct work_struct tx_stall_task;
struct work_struct raise_intr_rxdata_task;
 
struct fjes_hw hw;
diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index 8d67097..8cc687e 100644
--- a/drivers/net/fjes/fjes_main.c
+++ b/drivers/net/fjes/fjes_main.c
@@ -53,6 +53,7 @@ static int fjes_setup_resources(struct fjes_adapter *);
 static void fjes_free_resources(struct fjes_adapter *);
 static netdev_tx_t fjes_xmit_frame(struct sk_buff *, struct net_device *);
 static void fjes_raise_intr_rxdata_task(struct work_struct *);
+static void fjes_tx_stall_task(struct work_struct *);
 static irqreturn_t fjes_intr(int, void*);
 
 static int fjes_acpi_add(struct acpi_device *);
@@ -278,6 +279,7 @@ static int fjes_close(struct net_device *netdev)
fjes_free_irq(adapter);
 
cancel_work_sync(adapter-raise_intr_rxdata_task);
+   cancel_work_sync(adapter-tx_stall_task);
 
fjes_hw_wait_epstop(hw);
 
@@ -407,6 +409,61 @@ static void fjes_free_resources(struct fjes_adapter 
*adapter)
}
 }
 
+static void fjes_tx_stall_task(struct work_struct *work)
+{
+   struct fjes_adapter *adapter = container_of(work,
+   struct fjes_adapter, tx_stall_task);
+   struct fjes_hw *hw = adapter-hw;
+   struct net_device *netdev = adapter-netdev;
+   enum ep_partner_status pstatus;
+   int epid;
+   int max_epid, my_epid;
+   union ep_buffer_info *info;
+   int all_queue_available;
+   int i;
+   int sendable;
+
+   if (((long)jiffies -
+   (long)(netdev-trans_start))  FJES_TX_TX_STALL_TIMEOUT) {
+   netif_wake_queue(netdev);
+   return;
+   }
+
+   my_epid = hw-my_epid;
+   max_epid = hw-max_epid;
+
+   for (i = 0; i  5; i++) {
+   all_queue_available = 1;
+
+   for (epid = 0; epid  max_epid; epid++) {
+   if (my_epid == epid)
+   continue;
+
+   pstatus = fjes_hw_get_partner_ep_status(hw, epid);
+   sendable = (pstatus == EP_PARTNER_SHARED);
+   if (!sendable)
+   continue;
+
+   info = adapter-hw.ep_shm_info[epid].tx.info;
+
+   if (EP_RING_FULL(info-v1i.head, info-v1i.tail,
+info-v1i.count_max)) {
+   all_queue_available = 0;
+   break;
+   }
+   }
+
+   if (all_queue_available) {
+   netif_wake_queue(netdev);
+   return;
+   }
+   }
+
+   usleep_range(50, 100);
+
+   queue_work(adapter-txrx_wq, adapter-tx_stall_task);
+}
+
 static void fjes_raise_intr_rxdata_task(struct work_struct *work)
 {
struct fjes_adapter *adapter = container_of(work,
@@ -602,6 +659,10 @@ fjes_xmit_frame(struct sk_buff *skb, struct net_device 
*netdev)
netdev-trans_start = jiffies;
netif_tx_stop_queue(cur_queue);
 
+   if 
(!work_pending(adapter-tx_stall_task))
+   queue_work(adapter-txrx_wq,
+  
adapter-tx_stall_task);
+
ret = NETDEV_TX_BUSY;
}
} else {
@@ -686,6 +747,7 @@ static int fjes_probe(struct platform_device *plat_dev)
 
adapter-txrx_wq = create_workqueue(DRV_NAME /txrx);
 
+   INIT_WORK(adapter-tx_stall_task, fjes_tx_stall_task);
INIT_WORK(adapter-raise_intr_rxdata_task,
  fjes_raise_intr_rxdata_task);
 
@@ -730,6 +792,7 @@ static int fjes_remove(struct platform_device *plat_dev)
struct fjes_hw *hw = adapter-hw;
 
cancel_work_sync(adapter-raise_intr_rxdata_task);
+

Re: [PATCH] r8169: Add values missing in @get_stats64 from HW counters

2015-08-20 Thread Corinna Vinschen

On Aug 20 02:43, Hayes Wang wrote:
 Corinna Vinschen [mailto:vinsc...@redhat.com]
  Sent: Thursday, August 20, 2015 3:24 AM
 [...]
  +   /*
  +* Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the
  +* tally counters.
  +*/
  +   if (tp-mac_version = RTL_GIGA_MAC_VER_19) {
  +   RTL_W32(CounterAddrHigh, 0);
  +   RTL_W32(CounterAddrLow, CounterReset);
 
 I check these with our engineers, and they say the bit 6 ~ 63 should be the
 valid 64 byte alignment memory address. Although you don’t want to dump
 the counters, the hw may also clear the data in the memory which is indicated
 by bit 6 ~ 63, when you reset the counters.

Ok, that's easy enough to implement.  What about CmdRxEnb?  Are there
chips which need this flag set to perform the counter reset?


Thanks,
Corinna


pgpGrVZNY4vq8.pgp
Description: PGP signature

RE: [PATCH] r8169: Add values missing in @get_stats64 from HW counters

2015-08-20 Thread Hayes Wang

Corinna Vinschen [mailto:vinsc...@redhat.com]
 Sent: Thursday, August 20, 2015 5:42 PM
[...]
 What about CmdRxEnb?  Are there
 chips which need this flag set to perform the counter reset?

No. CmdRxEnb is used to enable/disable the rx, and you could
reset the counters without changing CmdRxEnb.

Best Regards,
Hayes

N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

[PATCH net-next 0/3] tipc: fix link failover/synch problems

2015-08-20 Thread Jon Maloy

We fix three problems with the new link failover/synch implementation,
which was introduced earlier in this release cycle. They are all related
to situations where there is a very short interval between the disabling
and enabling of interfaces.

Jon Maloy (3):
  tipc: eliminate risk of premature link setup during failover
  tipc: interrupt link synchronization when a link goes down
  tipc: fix stale link problem during synchronization

 net/tipc/link.c |  5 +++--
 net/tipc/node.c | 27 +--
 2 files changed, 24 insertions(+), 8 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 1/3] tipc: eliminate risk of premature link setup during failover

2015-08-20 Thread Jon Maloy

When a link goes down, and there is still a working link towards its
destination node, a failover is initiated, and the failed link is not
allowed to re-establish until that procedure is finished. To ensure
this, the concerned link endpoints are set to state LINK_FAILINGOVER,
and the node endpoints to NODE_FAILINGOVER during the failover period.

However, if the link reset is due to a disabled bearer, the corres-
ponding link endpoint is deleted, and only the node endpoint knows
about the ongoing failover. Now, if the disabled bearer is re-enabled
during the failover period, the discovery mechanism may create a new
link endpoint that is ready to be established, despite that this is not
permitted. This situation may cause both the ongoing failover and any
subsequent link synchronization to fail.

In this commit, we ensure that a newly created link goes directly to
state LINK_FAILINGOVER if the corresponding node state is
NODE_FAILINGOVER. This eliminates the problem described above.

Furthermore, we tighten the criteria for which packets are allowed
to end a failover state in the function tipc_node_check_state().
By checking that the receiving link is up and running, instead of just
checking that it is not in failover mode, we eliminate the risk that
protocol packets from the re-created link may cause the failover to
be prematurely terminated.

Reviewed-by: Ying Xue ying@windriver.com
Signed-off-by: Jon Maloy jon.ma...@ericsson.com
---
 net/tipc/node.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/tipc/node.c b/net/tipc/node.c
index 7c19164..004834b 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -565,6 +565,8 @@ void tipc_node_check_dest(struct net *net, u32 onode,
goto exit;
}
tipc_link_reset(l);
+   if (n-state == NODE_FAILINGOVER)
+   tipc_link_fsm_evt(l, LINK_FAILOVER_BEGIN_EVT);
le-link = l;
n-link_cnt++;
tipc_node_calculate_timer(n, l);
@@ -1129,7 +1131,7 @@ static bool tipc_node_check_state(struct tipc_node *n, 
struct sk_buff *skb,
}
 
/* Open parallel link when tunnel link reaches synch point */
-   if ((n-state == NODE_FAILINGOVER)  !tipc_link_is_failingover(l)) {
+   if ((n-state == NODE_FAILINGOVER)  tipc_link_is_up(l)) {
if (!more(rcv_nxt, n-sync_point))
return true;
tipc_node_fsm_evt(n, NODE_FAILOVER_END_EVT);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bridge: Enable configuration of ageing interval for bridges and switch devices.

2015-08-20 Thread Michal Kubecek

On Thu, Aug 20, 2015 at 06:40:01AM +, Premkumar Jonnala wrote:
  From: Michal Kubecek [mailto:mkube...@suse.cz]
  
  This would break existing scripts using ip to set the parameter. Is the
  possibility to use any of the two really that bad?
 
 There was another email on this thread where Scott indicated existence
 of other commands where both ip and bridge are available, and they are
 for the same function.

Yes, I already noticed it. I'm sorry, I should have checked the whole
thread before replying.

   Michal Kubecek

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment

2015-08-20 Thread Grumbach, Emmanuel



On 08/19/2015 11:39 PM, Eric Dumazet wrote:
 On Wed, 2015-08-19 at 19:17 +, Grumbach, Emmanuel wrote:
 
 Hm.. how would net/core/tso.c avoid this?
 
 Because a driver using these helpers keep around the original LSO packet
 and frees it normally at TX completion time.
 
 I can't see anything related to truesize there.
 Note that this work since it is guaranteed that we release the skbs in
 order.


 (BTW TCP packets do not have sock_wfree as destructor but tcp_wfree(),
 yet we want backpressure mostly for TCP stack (TCP Small Queues))



 I am not sure I follow here.
 You want me to test:
 if (skb_gso-destructor == tcp_wfree) ?
 
 
 Yes.
 
 Look for example at tcp_gso_segment() (called from skb_gso_segment())
 
 copy_destructor = gso_skb-destructor == tcp_wfree;
 ...
 /* Following permits TCP Small Queues to work well with GSO :
  * The callback to TCP stack will be called at the time last frag
  * is freed at TX completion, and not right now when gso_skb
  * is freed by GSO engine
  */
 if (copy_destructor) {
 swap(gso_skb-sk, skb-sk);
 swap(gso_skb-destructor, skb-destructor);
 sum_truesize += skb-truesize;
 atomic_add(sum_truesize - gso_skb-truesize,
skb-sk-sk_wmem_alloc);
 }
 
 

 I checked that code using iperf and saw that I don't get into this if,
 but I (probably wrongly) assumed that other applications would set a
 flag on the socket (forgive my ignorance) that would make this if be taken.
 
 If you do not see skb-destructor == tcp_wfree, then something is
 definitely wrong on your setup.
 

tcp_wfree isn't exported. I can change that. It will be a challenge for
backport though. Hm
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/15] netfilter: nf_conntrack: push zone object into functions

2015-08-20 Thread Pablo Neira Ayuso

From: Daniel Borkmann dan...@iogearbox.net

This patch replaces the zone id which is pushed down into functions
with the actual zone object. It's a bigger one-time change, but
needed for later on extending zones with a direction parameter, and
thus decoupling this additional information from all call-sites.

No functional changes in this patch.

The default zone becomes a global const object, namely nf_ct_zone_dflt
and will be returned directly in various cases, one being, when there's
f.e. no zoning support.

Signed-off-by: Daniel Borkmann dan...@iogearbox.net
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/net/netfilter/nf_conntrack.h   |   10 ++-
 include/net/netfilter/nf_conntrack_core.h  |3 +-
 include/net/netfilter/nf_conntrack_expect.h|   11 +++-
 include/net/netfilter/nf_conntrack_zones.h |   33 +++---
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |2 +-
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c   |3 +-
 net/ipv4/netfilter/nf_defrag_ipv4.c|   11 ++--
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |2 +-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |3 +-
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c  |   12 ++--
 net/netfilter/ipvs/ip_vs_nfct.c|2 +-
 net/netfilter/nf_conntrack_core.c  |   75 +
 net/netfilter/nf_conntrack_expect.c|   21 +++---
 net/netfilter/nf_conntrack_netlink.c   |   84 +---
 net/netfilter/nf_conntrack_pptp.c  |3 +-
 net/netfilter/nf_conntrack_standalone.c|   17 +++--
 net/netfilter/nf_nat_core.c|   19 --
 net/netfilter/nf_synproxy_core.c   |4 +-
 net/netfilter/xt_CT.c  |6 +-
 net/netfilter/xt_connlimit.c   |9 +--
 net/sched/act_connmark.c   |5 +-
 21 files changed, 203 insertions(+), 132 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index 37cd391..f5e23c6 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -250,8 +250,12 @@ void nf_ct_untracked_status_or(unsigned long bits);
 void nf_ct_iterate_cleanup(struct net *net,
   int (*iter)(struct nf_conn *i, void *data),
   void *data, u32 portid, int report);
+
+struct nf_conntrack_zone;
+
 void nf_conntrack_free(struct nf_conn *ct);
-struct nf_conn *nf_conntrack_alloc(struct net *net, u16 zone,
+struct nf_conn *nf_conntrack_alloc(struct net *net,
+  const struct nf_conntrack_zone *zone,
   const struct nf_conntrack_tuple *orig,
   const struct nf_conntrack_tuple *repl,
   gfp_t gfp);
@@ -291,7 +295,9 @@ extern unsigned int nf_conntrack_max;
 extern unsigned int nf_conntrack_hash_rnd;
 void init_nf_conntrack_hash_rnd(void);
 
-struct nf_conn *nf_ct_tmpl_alloc(struct net *net, u16 zone, gfp_t flags);
+struct nf_conn *nf_ct_tmpl_alloc(struct net *net,
+const struct nf_conntrack_zone *zone,
+gfp_t flags);
 
 #define NF_CT_STAT_INC(net, count)   __this_cpu_inc((net)-ct.stat-count)
 #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)-ct.stat-count)
diff --git a/include/net/netfilter/nf_conntrack_core.h 
b/include/net/netfilter/nf_conntrack_core.h
index f2f0fa3..c03f9c4 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -52,7 +52,8 @@ bool nf_ct_invert_tuple(struct nf_conntrack_tuple *inverse,
 
 /* Find a connection corresponding to a tuple. */
 struct nf_conntrack_tuple_hash *
-nf_conntrack_find_get(struct net *net, u16 zone,
+nf_conntrack_find_get(struct net *net,
+ const struct nf_conntrack_zone *zone,
  const struct nf_conntrack_tuple *tuple);
 
 int __nf_conntrack_confirm(struct sk_buff *skb);
diff --git a/include/net/netfilter/nf_conntrack_expect.h 
b/include/net/netfilter/nf_conntrack_expect.h
index 3f3aecb..dce56f0 100644
--- a/include/net/netfilter/nf_conntrack_expect.h
+++ b/include/net/netfilter/nf_conntrack_expect.h
@@ -4,7 +4,9 @@
 
 #ifndef _NF_CONNTRACK_EXPECT_H
 #define _NF_CONNTRACK_EXPECT_H
+
 #include net/netfilter/nf_conntrack.h
+#include net/netfilter/nf_conntrack_zones.h
 
 extern unsigned int nf_ct_expect_hsize;
 extern unsigned int nf_ct_expect_max;
@@ -76,15 +78,18 @@ int nf_conntrack_expect_init(void);
 void nf_conntrack_expect_fini(void);
 
 struct nf_conntrack_expect *
-__nf_ct_expect_find(struct net *net, u16 zone,
+__nf_ct_expect_find(struct net *net,
+   const struct nf_conntrack_zone *zone,
const struct nf_conntrack_tuple *tuple);
 
 struct nf_conntrack_expect *
-nf_ct_expect_find_get(struct net *net, u16 zone,

[PATCH 07/15] netfilter: nft_limit: factor out shared code with per-byte limiting

2015-08-20 Thread Pablo Neira Ayuso

This patch prepares the introduction of per-byte limiting.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/nft_limit.c |   86 -
 1 file changed, 53 insertions(+), 33 deletions(-)

diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c
index c79703e..c4d1b1b 100644
--- a/net/netfilter/nft_limit.c
+++ b/net/netfilter/nft_limit.c
@@ -27,65 +27,54 @@ struct nft_limit {
u64 nsecs;
 };
 
-static void nft_limit_pkts_eval(const struct nft_expr *expr,
-   struct nft_regs *regs,
-   const struct nft_pktinfo *pkt)
+static inline bool nft_limit_eval(struct nft_limit *limit, u64 cost)
 {
-   struct nft_limit *priv = nft_expr_priv(expr);
-   u64 now, tokens, cost = div_u64(priv-nsecs, priv-rate);
+   u64 now, tokens;
s64 delta;
 
spin_lock_bh(limit_lock);
now = ktime_get_ns();
-   tokens = priv-tokens + now - priv-last;
-   if (tokens  priv-tokens_max)
-   tokens = priv-tokens_max;
+   tokens = limit-tokens + now - limit-last;
+   if (tokens  limit-tokens_max)
+   tokens = limit-tokens_max;
 
-   priv-last = now;
+   limit-last = now;
delta = tokens - cost;
if (delta = 0) {
-   priv-tokens = delta;
+   limit-tokens = delta;
spin_unlock_bh(limit_lock);
-   return;
+   return false;
}
-   priv-tokens = tokens;
+   limit-tokens = tokens;
spin_unlock_bh(limit_lock);
-
-   regs-verdict.code = NFT_BREAK;
+   return true;
 }
 
-static const struct nla_policy nft_limit_policy[NFTA_LIMIT_MAX + 1] = {
-   [NFTA_LIMIT_RATE]   = { .type = NLA_U64 },
-   [NFTA_LIMIT_UNIT]   = { .type = NLA_U64 },
-};
-
-static int nft_limit_init(const struct nft_ctx *ctx,
- const struct nft_expr *expr,
+static int nft_limit_init(struct nft_limit *limit,
  const struct nlattr * const tb[])
 {
-   struct nft_limit *priv = nft_expr_priv(expr);
u64 unit;
 
if (tb[NFTA_LIMIT_RATE] == NULL ||
tb[NFTA_LIMIT_UNIT] == NULL)
return -EINVAL;
 
-   priv-rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE]));
+   limit-rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE]));
unit = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT]));
-   priv-nsecs = unit * NSEC_PER_SEC;
-   if (priv-rate == 0 || priv-nsecs  unit)
+   limit-nsecs = unit * NSEC_PER_SEC;
+   if (limit-rate == 0 || limit-nsecs  unit)
return -EOVERFLOW;
-   priv-tokens = priv-tokens_max = priv-nsecs;
-   priv-last = ktime_get_ns();
+   limit-tokens = limit-tokens_max = limit-nsecs;
+   limit-last = ktime_get_ns();
+
return 0;
 }
 
-static int nft_limit_dump(struct sk_buff *skb, const struct nft_expr *expr)
+static int nft_limit_dump(struct sk_buff *skb, const struct nft_limit *limit)
 {
-   const struct nft_limit *priv = nft_expr_priv(expr);
-   u64 secs = div_u64(priv-nsecs, NSEC_PER_SEC);
+   u64 secs = div_u64(limit-nsecs, NSEC_PER_SEC);
 
-   if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(priv-rate)) ||
+   if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(limit-rate)) ||
nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(secs)))
goto nla_put_failure;
return 0;
@@ -94,13 +83,44 @@ nla_put_failure:
return -1;
 }
 
+static void nft_limit_pkts_eval(const struct nft_expr *expr,
+   struct nft_regs *regs,
+   const struct nft_pktinfo *pkt)
+{
+   struct nft_limit *priv = nft_expr_priv(expr);
+
+   if (nft_limit_eval(priv, div_u64(priv-nsecs, priv-rate)))
+   regs-verdict.code = NFT_BREAK;
+}
+
+static const struct nla_policy nft_limit_policy[NFTA_LIMIT_MAX + 1] = {
+   [NFTA_LIMIT_RATE]   = { .type = NLA_U64 },
+   [NFTA_LIMIT_UNIT]   = { .type = NLA_U64 },
+};
+
+static int nft_limit_pkts_init(const struct nft_ctx *ctx,
+  const struct nft_expr *expr,
+  const struct nlattr * const tb[])
+{
+   struct nft_limit *priv = nft_expr_priv(expr);
+
+   return nft_limit_init(priv, tb);
+}
+
+static int nft_limit_pkts_dump(struct sk_buff *skb, const struct nft_expr 
*expr)
+{
+   const struct nft_limit *priv = nft_expr_priv(expr);
+
+   return nft_limit_dump(skb, priv);
+}
+
 static struct nft_expr_type nft_limit_type;
 static const struct nft_expr_ops nft_limit_pkts_ops = {
.type   = nft_limit_type,
.size   = NFT_EXPR_SIZE(sizeof(struct nft_limit)),
.eval   = nft_limit_pkts_eval,
-   .init   = nft_limit_init,
-   .dump   = nft_limit_dump,
+   .init   = nft_limit_pkts_init,
+

[PATCH 10/15] netfilter: nft_limit: add per-byte limiting

2015-08-20 Thread Pablo Neira Ayuso

This patch adds a new NFTA_LIMIT_TYPE netlink attribute to indicate the type of
limiting.

Contrary to per-packet limiting, the cost is calculated from the packet path
since this depends on the packet length.

The burst attribute indicates the number of bytes in which the rate can be
exceeded.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/uapi/linux/netfilter/nf_tables.h |7 
 net/netfilter/nft_limit.c|   63 --
 2 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_tables.h 
b/include/uapi/linux/netfilter/nf_tables.h
index cafd789..d8c8a7c 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -756,18 +756,25 @@ enum nft_ct_attributes {
 };
 #define NFTA_CT_MAX(__NFTA_CT_MAX - 1)
 
+enum nft_limit_type {
+   NFT_LIMIT_PKTS,
+   NFT_LIMIT_PKT_BYTES
+};
+
 /**
  * enum nft_limit_attributes - nf_tables limit expression netlink attributes
  *
  * @NFTA_LIMIT_RATE: refill rate (NLA_U64)
  * @NFTA_LIMIT_UNIT: refill unit (NLA_U64)
  * @NFTA_LIMIT_BURST: burst (NLA_U32)
+ * @NFTA_LIMIT_TYPE: type of limit (NLA_U32: enum nft_limit_type)
  */
 enum nft_limit_attributes {
NFTA_LIMIT_UNSPEC,
NFTA_LIMIT_RATE,
NFTA_LIMIT_UNIT,
NFTA_LIMIT_BURST,
+   NFTA_LIMIT_TYPE,
__NFTA_LIMIT_MAX
 };
 #define NFTA_LIMIT_MAX (__NFTA_LIMIT_MAX - 1)
diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c
index b418698..5d67938 100644
--- a/net/netfilter/nft_limit.c
+++ b/net/netfilter/nft_limit.c
@@ -83,14 +83,16 @@ static int nft_limit_init(struct nft_limit *limit,
return 0;
 }
 
-static int nft_limit_dump(struct sk_buff *skb, const struct nft_limit *limit)
+static int nft_limit_dump(struct sk_buff *skb, const struct nft_limit *limit,
+ enum nft_limit_type type)
 {
u64 secs = div_u64(limit-nsecs, NSEC_PER_SEC);
u64 rate = limit-rate - limit-burst;
 
if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(rate)) ||
nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(secs)) ||
-   nla_put_be32(skb, NFTA_LIMIT_BURST, htonl(limit-burst)))
+   nla_put_be32(skb, NFTA_LIMIT_BURST, htonl(limit-burst)) ||
+   nla_put_be32(skb, NFTA_LIMIT_TYPE, htonl(type)))
goto nla_put_failure;
return 0;
 
@@ -117,6 +119,7 @@ static const struct nla_policy 
nft_limit_policy[NFTA_LIMIT_MAX + 1] = {
[NFTA_LIMIT_RATE]   = { .type = NLA_U64 },
[NFTA_LIMIT_UNIT]   = { .type = NLA_U64 },
[NFTA_LIMIT_BURST]  = { .type = NLA_U32 },
+   [NFTA_LIMIT_TYPE]   = { .type = NLA_U32 },
 };
 
 static int nft_limit_pkts_init(const struct nft_ctx *ctx,
@@ -138,7 +141,7 @@ static int nft_limit_pkts_dump(struct sk_buff *skb, const 
struct nft_expr *expr)
 {
const struct nft_limit_pkts *priv = nft_expr_priv(expr);
 
-   return nft_limit_dump(skb, priv-limit);
+   return nft_limit_dump(skb, priv-limit, NFT_LIMIT_PKTS);
 }
 
 static struct nft_expr_type nft_limit_type;
@@ -150,9 +153,61 @@ static const struct nft_expr_ops nft_limit_pkts_ops = {
.dump   = nft_limit_pkts_dump,
 };
 
+static void nft_limit_pkt_bytes_eval(const struct nft_expr *expr,
+struct nft_regs *regs,
+const struct nft_pktinfo *pkt)
+{
+   struct nft_limit *priv = nft_expr_priv(expr);
+   u64 cost = div_u64(priv-nsecs * pkt-skb-len, priv-rate);
+
+   if (nft_limit_eval(priv, cost))
+   regs-verdict.code = NFT_BREAK;
+}
+
+static int nft_limit_pkt_bytes_init(const struct nft_ctx *ctx,
+   const struct nft_expr *expr,
+   const struct nlattr * const tb[])
+{
+   struct nft_limit *priv = nft_expr_priv(expr);
+
+   return nft_limit_init(priv, tb);
+}
+
+static int nft_limit_pkt_bytes_dump(struct sk_buff *skb,
+   const struct nft_expr *expr)
+{
+   const struct nft_limit *priv = nft_expr_priv(expr);
+
+   return nft_limit_dump(skb, priv, NFT_LIMIT_PKT_BYTES);
+}
+
+static const struct nft_expr_ops nft_limit_pkt_bytes_ops = {
+   .type   = nft_limit_type,
+   .size   = NFT_EXPR_SIZE(sizeof(struct nft_limit)),
+   .eval   = nft_limit_pkt_bytes_eval,
+   .init   = nft_limit_pkt_bytes_init,
+   .dump   = nft_limit_pkt_bytes_dump,
+};
+
+static const struct nft_expr_ops *
+nft_limit_select_ops(const struct nft_ctx *ctx,
+const struct nlattr * const tb[])
+{
+   if (tb[NFTA_LIMIT_TYPE] == NULL)
+   return nft_limit_pkts_ops;
+
+   switch (ntohl(nla_get_be32(tb[NFTA_LIMIT_TYPE]))) {
+   case NFT_LIMIT_PKTS:
+   return nft_limit_pkts_ops;
+   case NFT_LIMIT_PKT_BYTES:
+

[PATCH 11/15] netfilter: nfacct: per network namespace support

2015-08-20 Thread Pablo Neira Ayuso

From: Andreas Schultz aschu...@tpip.net

- Move the nfnl_acct_list into the network namespace, initialize
  and destroy it per namespace
- Keep track of refcnt on nfacct objects, the old logic does not
  longer work with a per namespace list
- Adjust xt_nfacct to pass the namespace when registring objects

Signed-off-by: Andreas Schultz aschu...@tpip.net
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/linux/netfilter/nfnetlink_acct.h |3 +-
 include/net/net_namespace.h  |3 ++
 net/netfilter/nfnetlink_acct.c   |   71 +-
 net/netfilter/xt_nfacct.c|2 +-
 4 files changed, 56 insertions(+), 23 deletions(-)

diff --git a/include/linux/netfilter/nfnetlink_acct.h 
b/include/linux/netfilter/nfnetlink_acct.h
index 6ec9757..80ca889 100644
--- a/include/linux/netfilter/nfnetlink_acct.h
+++ b/include/linux/netfilter/nfnetlink_acct.h
@@ -2,6 +2,7 @@
 #define _NFNL_ACCT_H_
 
 #include uapi/linux/netfilter/nfnetlink_acct.h
+#include net/net_namespace.h
 
 enum {
NFACCT_NO_QUOTA = -1,
@@ -11,7 +12,7 @@ enum {
 
 struct nf_acct;
 
-struct nf_acct *nfnl_acct_find_get(const char *filter_name);
+struct nf_acct *nfnl_acct_find_get(struct net *net, const char *filter_name);
 void nfnl_acct_put(struct nf_acct *acct);
 void nfnl_acct_update(const struct sk_buff *skb, struct nf_acct *nfacct);
 extern int nfnl_acct_overquota(const struct sk_buff *skb,
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index e951453..2dcea63 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -118,6 +118,9 @@ struct net {
 #endif
struct sock *nfnl;
struct sock *nfnl_stash;
+#if IS_ENABLED(CONFIG_NETFILTER_NETLINK_ACCT)
+   struct list_headnfnl_acct_list;
+#endif
 #endif
 #ifdef CONFIG_WEXT_CORE
struct sk_buff_head wext_nlevents;
diff --git a/net/netfilter/nfnetlink_acct.c b/net/netfilter/nfnetlink_acct.c
index c18af2f..fefbf5f 100644
--- a/net/netfilter/nfnetlink_acct.c
+++ b/net/netfilter/nfnetlink_acct.c
@@ -27,8 +27,6 @@ MODULE_LICENSE(GPL);
 MODULE_AUTHOR(Pablo Neira Ayuso pa...@netfilter.org);
 MODULE_DESCRIPTION(nfacct: Extended Netfilter accounting infrastructure);
 
-static LIST_HEAD(nfnl_acct_list);
-
 struct nf_acct {
atomic64_t  pkts;
atomic64_t  bytes;
@@ -53,6 +51,7 @@ nfnl_acct_new(struct sock *nfnl, struct sk_buff *skb,
 const struct nlmsghdr *nlh, const struct nlattr * const tb[])
 {
struct nf_acct *nfacct, *matching = NULL;
+   struct net *net = sock_net(nfnl);
char *acct_name;
unsigned int size = 0;
u32 flags = 0;
@@ -64,7 +63,7 @@ nfnl_acct_new(struct sock *nfnl, struct sk_buff *skb,
if (strlen(acct_name) == 0)
return -EINVAL;
 
-   list_for_each_entry(nfacct, nfnl_acct_list, head) {
+   list_for_each_entry(nfacct, net-nfnl_acct_list, head) {
if (strncmp(nfacct-name, acct_name, NFACCT_NAME_MAX) != 0)
continue;
 
@@ -124,7 +123,7 @@ nfnl_acct_new(struct sock *nfnl, struct sk_buff *skb,
 be64_to_cpu(nla_get_be64(tb[NFACCT_PKTS])));
}
atomic_set(nfacct-refcnt, 1);
-   list_add_tail_rcu(nfacct-head, nfnl_acct_list);
+   list_add_tail_rcu(nfacct-head, net-nfnl_acct_list);
return 0;
 }
 
@@ -185,6 +184,7 @@ nla_put_failure:
 static int
 nfnl_acct_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   struct net *net = sock_net(skb-sk);
struct nf_acct *cur, *last;
const struct nfacct_filter *filter = cb-data;
 
@@ -196,7 +196,7 @@ nfnl_acct_dump(struct sk_buff *skb, struct netlink_callback 
*cb)
cb-args[1] = 0;
 
rcu_read_lock();
-   list_for_each_entry_rcu(cur, nfnl_acct_list, head) {
+   list_for_each_entry_rcu(cur, net-nfnl_acct_list, head) {
if (last) {
if (cur != last)
continue;
@@ -257,6 +257,7 @@ static int
 nfnl_acct_get(struct sock *nfnl, struct sk_buff *skb,
 const struct nlmsghdr *nlh, const struct nlattr * const tb[])
 {
+   struct net *net = sock_net(nfnl);
int ret = -ENOENT;
struct nf_acct *cur;
char *acct_name;
@@ -283,7 +284,7 @@ nfnl_acct_get(struct sock *nfnl, struct sk_buff *skb,
return -EINVAL;
acct_name = nla_data(tb[NFACCT_NAME]);
 
-   list_for_each_entry(cur, nfnl_acct_list, head) {
+   list_for_each_entry(cur, net-nfnl_acct_list, head) {
struct sk_buff *skb2;
 
if (strncmp(cur-name, acct_name, NFACCT_NAME_MAX)!= 0)
@@ -336,19 +337,20 @@ static int
 nfnl_acct_del(struct sock *nfnl, struct sk_buff *skb,
 const struct nlmsghdr *nlh, const struct nlattr * const tb[])
 {
+   struct net *net = sock_net(nfnl);
char *acct_name;

[PATCH 05/15] netfilter: nft_limit: rename to nft_limit_pkts

2015-08-20 Thread Pablo Neira Ayuso

To prepare introduction of bytes ratelimit support.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/nft_limit.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c
index 435c1cc..d0788e1 100644
--- a/net/netfilter/nft_limit.c
+++ b/net/netfilter/nft_limit.c
@@ -26,9 +26,9 @@ struct nft_limit {
unsigned long   stamp;
 };
 
-static void nft_limit_eval(const struct nft_expr *expr,
-  struct nft_regs *regs,
-  const struct nft_pktinfo *pkt)
+static void nft_limit_pkts_eval(const struct nft_expr *expr,
+   struct nft_regs *regs,
+   const struct nft_pktinfo *pkt)
 {
struct nft_limit *priv = nft_expr_priv(expr);
 
@@ -85,17 +85,17 @@ nla_put_failure:
 }
 
 static struct nft_expr_type nft_limit_type;
-static const struct nft_expr_ops nft_limit_ops = {
+static const struct nft_expr_ops nft_limit_pkts_ops = {
.type   = nft_limit_type,
.size   = NFT_EXPR_SIZE(sizeof(struct nft_limit)),
-   .eval   = nft_limit_eval,
+   .eval   = nft_limit_pkts_eval,
.init   = nft_limit_init,
.dump   = nft_limit_dump,
 };
 
 static struct nft_expr_type nft_limit_type __read_mostly = {
.name   = limit,
-   .ops= nft_limit_ops,
+   .ops= nft_limit_pkts_ops,
.policy = nft_limit_policy,
.maxattr= NFTA_LIMIT_MAX,
.flags  = NFT_EXPR_STATEFUL,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/15] netfilter: nft_payload: work around vlan header stripping

2015-08-20 Thread Pablo Neira Ayuso

From: Florian Westphal f...@strlen.de

make payload expression aware of the fact that VLAN offload may have
removed a vlan header.

When we encounter tagged skb, transparently insert the tag into the
register so that vlan header matching can work without userspace being
aware of offload features.

Signed-off-by: Florian Westphal f...@strlen.de
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/nft_payload.c |   57 ++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index 94fb3b2..09b4b07 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -9,6 +9,7 @@
  */
 
 #include linux/kernel.h
+#include linux/if_vlan.h
 #include linux/init.h
 #include linux/module.h
 #include linux/netlink.h
@@ -17,6 +18,53 @@
 #include net/netfilter/nf_tables_core.h
 #include net/netfilter/nf_tables.h
 
+/* add vlan header into the user buffer for if tag was removed by offloads */
+static bool
+nft_payload_copy_vlan(u32 *d, const struct sk_buff *skb, u8 offset, u8 len)
+{
+   int mac_off = skb_mac_header(skb) - skb-data;
+   u8 vlan_len, *vlanh, *dst_u8 = (u8 *) d;
+   struct vlan_ethhdr veth;
+
+   vlanh = (u8 *) veth;
+   if (offset  ETH_HLEN) {
+   u8 ethlen = min_t(u8, len, ETH_HLEN - offset);
+
+   if (skb_copy_bits(skb, mac_off, veth, ETH_HLEN))
+   return false;
+
+   veth.h_vlan_proto = skb-vlan_proto;
+
+   memcpy(dst_u8, vlanh + offset, ethlen);
+
+   len -= ethlen;
+   if (len == 0)
+   return true;
+
+   dst_u8 += ethlen;
+   offset = ETH_HLEN;
+   } else if (offset = VLAN_ETH_HLEN) {
+   offset -= VLAN_HLEN;
+   goto skip;
+   }
+
+   veth.h_vlan_TCI = htons(skb_vlan_tag_get(skb));
+   veth.h_vlan_encapsulated_proto = skb-protocol;
+
+   vlanh += offset;
+
+   vlan_len = min_t(u8, len, VLAN_ETH_HLEN - offset);
+   memcpy(dst_u8, vlanh, vlan_len);
+
+   len -= vlan_len;
+   if (!len)
+   return true;
+
+   dst_u8 += vlan_len;
+ skip:
+   return skb_copy_bits(skb, offset + mac_off, dst_u8, len) == 0;
+}
+
 static void nft_payload_eval(const struct nft_expr *expr,
 struct nft_regs *regs,
 const struct nft_pktinfo *pkt)
@@ -26,10 +74,18 @@ static void nft_payload_eval(const struct nft_expr *expr,
u32 *dest = regs-data[priv-dreg];
int offset;
 
+   dest[priv-len / NFT_REG32_SIZE] = 0;
switch (priv-base) {
case NFT_PAYLOAD_LL_HEADER:
if (!skb_mac_header_was_set(skb))
goto err;
+
+   if (skb_vlan_tag_present(skb)) {
+   if (!nft_payload_copy_vlan(dest, skb,
+  priv-offset, priv-len))
+   goto err;
+   return;
+   }
offset = skb_mac_header(skb) - skb-data;
break;
case NFT_PAYLOAD_NETWORK_HEADER:
@@ -43,7 +99,6 @@ static void nft_payload_eval(const struct nft_expr *expr,
}
offset += priv-offset;
 
-   dest[priv-len / NFT_REG32_SIZE] = 0;
if (skb_copy_bits(skb, offset, dest, priv-len)  0)
goto err;
return;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RX packet loss on i.MX6Q running 4.2-rc7

2015-08-20 Thread Jon Nettleton

On Fri, Aug 21, 2015 at 12:30 AM, Clemens Gruber
clemens.gru...@pqgruber.com wrote:
 Hi,

 I am experiencing massive RX packet loss on my i.MX6Q (Chip rev 1.3) on Linux
 4.2-rc7 with a Marvell 88E1510 Gigabit Ethernet PHY connected over RGMII.
 I noticed it when doing an UDP benchmark with iperf3. When sending UDP packets
 from a Debian PC to the i.MX6 with a rate of 100 Mbit/s, 99% of the packets 
 are
 lost. With a rate of 10 Mbit/s, we are still losing 93% of all packets. TCP RX
 does suffer from packet loss too, but still achieves about 211 Mbit/s.
 TX is not affected.

 Steps to reproduce:
 On the i.MX6: iperf3 -s
 On a desktop PC:  iperf3 -b 10M -u -c MX6IP

 The iperf3 results:
 [ ID] Interval   Transfer Bandwidth   JitterLost/Total
 [  4]   0.00-10.00  sec  11.8 MBytes  9.90 Mbits/sec  0.687 ms  1397/1497 
 (93%)

 During the 10 Mbit UDP test, the IEEE_rx_macerr counter increased to 5371.
 ifconfig eth0 shows:
  RX packets:9216 errors:5248 dropped:170 overruns:5248 frame:5248
  TX packets:83 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0

 Here are the TCP results with iperf3 -c MX6IP:
 [ ID] Interval   Transfer Bandwidth   Retr
 [  4]   0.00-10.00  sec   252 MBytes   211 Mbits/sec  4343 sender
 [  4]   0.00-10.00  sec   251 MBytes   211 Mbits/sec  receiver

 During the TCP test, IEEE_rx_macerr increased to 4059.
 ifconfig eth0 shows:
 RX packets:186368 errors:4206 dropped:50 overruns:4206 frame:4206
 TX packets:41861 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0

 Freescale errata entry ERR004512 did mention a RX FIFO overrun. Is this 
 related?

 Forcing pause frames via ethtool -A eth0 rx on tx on, does not improve it:
 Same amount of UDP packet loss with reduced TCP throughput of 190 Mbit/s.
 IEEE_rx_macerr increased up to 5232 during UDP 10Mbit and up to 4270 for TCP.

 I am already using the MX6QDL_PAD_GPIO_6__ENET_IRQ workaround, which solved 
 the
 ping latency issues from ERR006687 but not the packet loss problem.

 I read through the mailing list archives and found a discussion between 
 Russell
 King, Marek Vasut, Eric Nelson, Fugang Duan and others about a similar 
 problem.
 I therefore added you and contributors to fec_main.c to the CC.

 One suggestion I found, was adding udelay(210); to fec_enet_rx():
 https://lkml.org/lkml/2014/8/22/88
 But this also did not reduce the packet loss. (I added it to the fec_enet_rx
 function just before return pkt_received; but I still got 93% packet loss)

 Does anyone have the equipment/setup to trace an i.MX6Q during UDP RX traffic
 from iperf3 to find the root cause of this packet loss problem?

 What else could we do to fix this?


This is a bug in iperf3's UDP tests.  Do the same test with iperf2 and
you will see expected performance.  I believe there is a bug open in
github about it.

-Jon
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next RFC 00/10] socket sendmsg MSG_ZEROCOPY

2015-08-20 Thread David Miller

From: Willem de Bruijn will...@google.com
Date: Thu, 20 Aug 2015 22:49:25 -0400

 But there may still be others. Most obvious use case for copy
 avoidance is pure device transmit. Excluding loopback may be
 a reasonable way to initially limit the attack surface. With a flag
 NETIF_F_ZC not supported on lo.

Good luck avoiding every case where a packet can be looped back into
the machine in some way.  A simple device flag is not going to do it.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/15] netfilter: nft_limit: add burst parameter

2015-08-20 Thread Pablo Neira Ayuso

This patch adds the burst parameter. This burst indicates the number of packets
that can exceed the limit.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/uapi/linux/netfilter/nf_tables.h |2 ++
 net/netfilter/nft_limit.c|   20 ++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_tables.h 
b/include/uapi/linux/netfilter/nf_tables.h
index 2ef35f2..cafd789 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -761,11 +761,13 @@ enum nft_ct_attributes {
  *
  * @NFTA_LIMIT_RATE: refill rate (NLA_U64)
  * @NFTA_LIMIT_UNIT: refill unit (NLA_U64)
+ * @NFTA_LIMIT_BURST: burst (NLA_U32)
  */
 enum nft_limit_attributes {
NFTA_LIMIT_UNSPEC,
NFTA_LIMIT_RATE,
NFTA_LIMIT_UNIT,
+   NFTA_LIMIT_BURST,
__NFTA_LIMIT_MAX
 };
 #define NFTA_LIMIT_MAX (__NFTA_LIMIT_MAX - 1)
diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c
index c4d1b1b..d8c5ff1 100644
--- a/net/netfilter/nft_limit.c
+++ b/net/netfilter/nft_limit.c
@@ -25,6 +25,7 @@ struct nft_limit {
u64 tokens_max;
u64 rate;
u64 nsecs;
+   u32 burst;
 };
 
 static inline bool nft_limit_eval(struct nft_limit *limit, u64 cost)
@@ -65,6 +66,18 @@ static int nft_limit_init(struct nft_limit *limit,
if (limit-rate == 0 || limit-nsecs  unit)
return -EOVERFLOW;
limit-tokens = limit-tokens_max = limit-nsecs;
+
+   if (tb[NFTA_LIMIT_BURST]) {
+   u64 rate;
+
+   limit-burst = ntohl(nla_get_be32(tb[NFTA_LIMIT_BURST]));
+
+   rate = limit-rate + limit-burst;
+   if (rate  limit-rate)
+   return -EOVERFLOW;
+
+   limit-rate = rate;
+   }
limit-last = ktime_get_ns();
 
return 0;
@@ -73,9 +86,11 @@ static int nft_limit_init(struct nft_limit *limit,
 static int nft_limit_dump(struct sk_buff *skb, const struct nft_limit *limit)
 {
u64 secs = div_u64(limit-nsecs, NSEC_PER_SEC);
+   u64 rate = limit-rate - limit-burst;
 
-   if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(limit-rate)) ||
-   nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(secs)))
+   if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(rate)) ||
+   nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(secs)) ||
+   nla_put_be32(skb, NFTA_LIMIT_BURST, htonl(limit-burst)))
goto nla_put_failure;
return 0;
 
@@ -96,6 +111,7 @@ static void nft_limit_pkts_eval(const struct nft_expr *expr,
 static const struct nla_policy nft_limit_policy[NFTA_LIMIT_MAX + 1] = {
[NFTA_LIMIT_RATE]   = { .type = NLA_U64 },
[NFTA_LIMIT_UNIT]   = { .type = NLA_U64 },
+   [NFTA_LIMIT_BURST]  = { .type = NLA_U32 },
 };
 
 static int nft_limit_pkts_init(const struct nft_ctx *ctx,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/15] netfilter: factor out packet duplication for IPv4/IPv6

2015-08-20 Thread Pablo Neira Ayuso

Extracted from the xtables TEE target. This creates two new modules for IPv4
and IPv6 that are shared between the TEE target and the new nf_tables dup
expressions.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/net/netfilter/ipv4/nf_dup_ipv4.h |7 ++
 include/net/netfilter/ipv6/nf_dup_ipv6.h |7 ++
 net/ipv4/netfilter/Kconfig   |6 ++
 net/ipv4/netfilter/Makefile  |2 +
 net/ipv4/netfilter/nf_dup_ipv4.c |  120 +++
 net/ipv6/netfilter/Kconfig   |6 ++
 net/ipv6/netfilter/Makefile  |2 +
 net/ipv6/netfilter/nf_dup_ipv6.c |   96 ++
 net/netfilter/Kconfig|2 +
 net/netfilter/xt_TEE.c   |  158 ++
 10 files changed, 254 insertions(+), 152 deletions(-)
 create mode 100644 include/net/netfilter/ipv4/nf_dup_ipv4.h
 create mode 100644 include/net/netfilter/ipv6/nf_dup_ipv6.h
 create mode 100644 net/ipv4/netfilter/nf_dup_ipv4.c
 create mode 100644 net/ipv6/netfilter/nf_dup_ipv6.c

diff --git a/include/net/netfilter/ipv4/nf_dup_ipv4.h 
b/include/net/netfilter/ipv4/nf_dup_ipv4.h
new file mode 100644
index 000..42008f1
--- /dev/null
+++ b/include/net/netfilter/ipv4/nf_dup_ipv4.h
@@ -0,0 +1,7 @@
+#ifndef _NF_DUP_IPV4_H_
+#define _NF_DUP_IPV4_H_
+
+void nf_dup_ipv4(struct sk_buff *skb, unsigned int hooknum,
+const struct in_addr *gw, int oif);
+
+#endif /* _NF_DUP_IPV4_H_ */
diff --git a/include/net/netfilter/ipv6/nf_dup_ipv6.h 
b/include/net/netfilter/ipv6/nf_dup_ipv6.h
new file mode 100644
index 000..ed6bd66
--- /dev/null
+++ b/include/net/netfilter/ipv6/nf_dup_ipv6.h
@@ -0,0 +1,7 @@
+#ifndef _NF_DUP_IPV6_H_
+#define _NF_DUP_IPV6_H_
+
+void nf_dup_ipv6(struct sk_buff *skb, unsigned int hooknum,
+const struct in6_addr *gw, int oif);
+
+#endif /* _NF_DUP_IPV6_H_ */
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 2199a5d..0142ea2 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -67,6 +67,12 @@ config NF_TABLES_ARP
 
 endif # NF_TABLES
 
+config NF_DUP_IPV4
+   tristate Netfilter IPv4 packet duplication to alternate destination
+   help
+ This option enables the nf_dup_ipv4 core, which duplicates an IPv4
+ packet to be rerouted to another destination.
+
 config NF_LOG_ARP
tristate ARP packet logging
default m if NETFILTER_ADVANCED=n
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 7fe6c70..9136ffc 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -70,3 +70,5 @@ obj-$(CONFIG_IP_NF_ARP_MANGLE) += arpt_mangle.o
 
 # just filtering instance of ARP tables for now
 obj-$(CONFIG_IP_NF_ARPFILTER) += arptable_filter.o
+
+obj-$(CONFIG_NF_DUP_IPV4) += nf_dup_ipv4.o
diff --git a/net/ipv4/netfilter/nf_dup_ipv4.c b/net/ipv4/netfilter/nf_dup_ipv4.c
new file mode 100644
index 000..eff85ab
--- /dev/null
+++ b/net/ipv4/netfilter/nf_dup_ipv4.c
@@ -0,0 +1,120 @@
+/*
+ * (C) 2007 by Sebastian Cla??en sebastian.clas...@freenet.ag
+ * (C) 2007-2010 by Jan Engelhardt jeng...@medozas.de
+ *
+ * Extracted from xt_TEE.c
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 or later, as
+ * published by the Free Software Foundation.
+ */
+#include linux/ip.h
+#include linux/module.h
+#include linux/percpu.h
+#include linux/route.h
+#include linux/skbuff.h
+#include net/checksum.h
+#include net/icmp.h
+#include net/ip.h
+#include net/route.h
+#include net/netfilter/ipv4/nf_dup_ipv4.h
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include net/netfilter/nf_conntrack.h
+#endif
+
+static struct net *pick_net(struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_NS
+   const struct dst_entry *dst;
+
+   if (skb-dev != NULL)
+   return dev_net(skb-dev);
+   dst = skb_dst(skb);
+   if (dst != NULL  dst-dev != NULL)
+   return dev_net(dst-dev);
+#endif
+   return init_net;
+}
+
+static bool nf_dup_ipv4_route(struct sk_buff *skb, const struct in_addr *gw,
+ int oif)
+{
+   const struct iphdr *iph = ip_hdr(skb);
+   struct net *net = pick_net(skb);
+   struct rtable *rt;
+   struct flowi4 fl4;
+
+   memset(fl4, 0, sizeof(fl4));
+   if (oif != -1)
+   fl4.flowi4_oif = oif;
+
+   fl4.daddr = gw-s_addr;
+   fl4.flowi4_tos = RT_TOS(iph-tos);
+   fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
+   fl4.flowi4_flags = FLOWI_FLAG_KNOWN_NH;
+   rt = ip_route_output_key(net, fl4);
+   if (IS_ERR(rt))
+   return false;
+
+   skb_dst_drop(skb);
+   skb_dst_set(skb, rt-dst);
+   skb-dev  = rt-dst.dev;
+   skb-protocol = htons(ETH_P_IP);
+
+   return true;
+}
+
+void nf_dup_ipv4(struct sk_buff *skb, unsigned int hooknum,
+const struct in_addr

[PATCH 14/15] netfilter: nf_conntrack: add efficient mark to zone mapping

2015-08-20 Thread Pablo Neira Ayuso

From: Daniel Borkmann dan...@iogearbox.net

This work adds the possibility of deriving the zone id from the skb-mark
field in a scalable manner. This allows for having only a single template
serving hundreds/thousands of different zones, for example, instead of the
need to have one match for each zone as an extra CT jump target.

Note that we'd need to have this information attached to the template as at
the time when we're trying to lookup a possible ct object, we already need
to know zone information for a possible match when going into
__nf_conntrack_find_get(). This work provides a minimal implementation for
a possible mapping.

In order to not add/expose an extra ct-status bit, the zone structure has
been extended to carry a flag for deriving the mark.

Signed-off-by: Daniel Borkmann dan...@iogearbox.net
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/net/netfilter/nf_conntrack_zones.h |   45 +++--
 include/uapi/linux/netfilter/xt_CT.h   |4 +-
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c   |3 +-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |4 +-
 net/netfilter/nf_conntrack_core.c  |   50 
 net/netfilter/nf_conntrack_netlink.c   |5 +--
 net/netfilter/xt_CT.c  |5 ++-
 7 files changed, 72 insertions(+), 44 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_zones.h 
b/include/net/netfilter/nf_conntrack_zones.h
index 3942ddf..5316c7b 100644
--- a/include/net/netfilter/nf_conntrack_zones.h
+++ b/include/net/netfilter/nf_conntrack_zones.h
@@ -10,9 +10,12 @@
 
 #define NF_CT_DEFAULT_ZONE_DIR (NF_CT_ZONE_DIR_ORIG | NF_CT_ZONE_DIR_REPL)
 
+#define NF_CT_FLAG_MARK1
+
 struct nf_conntrack_zone {
u16 id;
-   u16 dir;
+   u8  flags;
+   u8  dir;
 };
 
 extern const struct nf_conntrack_zone nf_ct_zone_dflt;
@@ -32,9 +35,45 @@ nf_ct_zone(const struct nf_conn *ct)
 }
 
 static inline const struct nf_conntrack_zone *
-nf_ct_zone_tmpl(const struct nf_conn *tmpl)
+nf_ct_zone_init(struct nf_conntrack_zone *zone, u16 id, u8 dir, u8 flags)
+{
+   zone-id = id;
+   zone-flags = flags;
+   zone-dir = dir;
+
+   return zone;
+}
+
+static inline const struct nf_conntrack_zone *
+nf_ct_zone_tmpl(const struct nf_conn *tmpl, const struct sk_buff *skb,
+   struct nf_conntrack_zone *tmp)
+{
+   const struct nf_conntrack_zone *zone;
+
+   if (!tmpl)
+   return nf_ct_zone_dflt;
+
+   zone = nf_ct_zone(tmpl);
+   if (zone-flags  NF_CT_FLAG_MARK)
+   zone = nf_ct_zone_init(tmp, skb-mark, zone-dir, 0);
+
+   return zone;
+}
+
+static inline int nf_ct_zone_add(struct nf_conn *ct, gfp_t flags,
+const struct nf_conntrack_zone *info)
 {
-   return tmpl ? nf_ct_zone(tmpl) : nf_ct_zone_dflt;
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+   struct nf_conntrack_zone *nf_ct_zone;
+
+   nf_ct_zone = nf_ct_ext_add(ct, NF_CT_EXT_ZONE, flags);
+   if (!nf_ct_zone)
+   return -ENOMEM;
+
+   nf_ct_zone_init(nf_ct_zone, info-id, info-dir,
+   info-flags);
+#endif
+   return 0;
 }
 
 static inline bool nf_ct_zone_matches_dir(const struct nf_conntrack_zone *zone,
diff --git a/include/uapi/linux/netfilter/xt_CT.h 
b/include/uapi/linux/netfilter/xt_CT.h
index 452005f..9e52041 100644
--- a/include/uapi/linux/netfilter/xt_CT.h
+++ b/include/uapi/linux/netfilter/xt_CT.h
@@ -8,9 +8,11 @@ enum {
XT_CT_NOTRACK_ALIAS = 1  1,
XT_CT_ZONE_DIR_ORIG = 1  2,
XT_CT_ZONE_DIR_REPL = 1  3,
+   XT_CT_ZONE_MARK = 1  4,
 
XT_CT_MASK  = XT_CT_NOTRACK | XT_CT_NOTRACK_ALIAS |
- XT_CT_ZONE_DIR_ORIG | XT_CT_ZONE_DIR_REPL,
+ XT_CT_ZONE_DIR_ORIG | XT_CT_ZONE_DIR_REPL |
+ XT_CT_ZONE_MARK,
 };
 
 struct xt_ct_target_info {
diff --git a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c 
b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
index 8a2f41c..cdde3ec 100644
--- a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
+++ b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
@@ -135,9 +135,10 @@ icmp_error_message(struct net *net, struct nf_conn *tmpl, 
struct sk_buff *skb,
const struct nf_conntrack_l4proto *innerproto;
const struct nf_conntrack_tuple_hash *h;
const struct nf_conntrack_zone *zone;
+   struct nf_conntrack_zone tmp;
 
NF_CT_ASSERT(skb-nfct == NULL);
-   zone = nf_ct_zone_tmpl(tmpl);
+   zone = nf_ct_zone_tmpl(tmpl, skb, tmp);
 
/* Are they talking about one of our connections? */
if (!nf_ct_get_tuplepr(skb,
diff --git a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c 
b/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
index 2029141..0e6fae1 100644
--- a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
+++

[PATCH 06/15] netfilter: nft_limit: convert to token-based limiting at nanosecond granularity

2015-08-20 Thread Pablo Neira Ayuso

Rework the limit expression to use a token-based limiting approach that refills
the bucket gradually. The tokens are calculated at nanosecond granularity
instead jiffies to improve precision.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/nft_limit.c |   42 ++
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c
index d0788e1..c79703e 100644
--- a/net/netfilter/nft_limit.c
+++ b/net/netfilter/nft_limit.c
@@ -20,10 +20,11 @@
 static DEFINE_SPINLOCK(limit_lock);
 
 struct nft_limit {
+   u64 last;
u64 tokens;
+   u64 tokens_max;
u64 rate;
-   u64 unit;
-   unsigned long   stamp;
+   u64 nsecs;
 };
 
 static void nft_limit_pkts_eval(const struct nft_expr *expr,
@@ -31,18 +32,23 @@ static void nft_limit_pkts_eval(const struct nft_expr *expr,
const struct nft_pktinfo *pkt)
 {
struct nft_limit *priv = nft_expr_priv(expr);
+   u64 now, tokens, cost = div_u64(priv-nsecs, priv-rate);
+   s64 delta;
 
spin_lock_bh(limit_lock);
-   if (time_after_eq(jiffies, priv-stamp)) {
-   priv-tokens = priv-rate;
-   priv-stamp = jiffies + priv-unit * HZ;
-   }
-
-   if (priv-tokens = 1) {
-   priv-tokens--;
+   now = ktime_get_ns();
+   tokens = priv-tokens + now - priv-last;
+   if (tokens  priv-tokens_max)
+   tokens = priv-tokens_max;
+
+   priv-last = now;
+   delta = tokens - cost;
+   if (delta = 0) {
+   priv-tokens = delta;
spin_unlock_bh(limit_lock);
return;
}
+   priv-tokens = tokens;
spin_unlock_bh(limit_lock);
 
regs-verdict.code = NFT_BREAK;
@@ -58,25 +64,29 @@ static int nft_limit_init(const struct nft_ctx *ctx,
  const struct nlattr * const tb[])
 {
struct nft_limit *priv = nft_expr_priv(expr);
+   u64 unit;
 
if (tb[NFTA_LIMIT_RATE] == NULL ||
tb[NFTA_LIMIT_UNIT] == NULL)
return -EINVAL;
 
-   priv-rate   = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE]));
-   priv-unit   = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT]));
-   priv-stamp  = jiffies + priv-unit * HZ;
-   priv-tokens = priv-rate;
+   priv-rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE]));
+   unit = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT]));
+   priv-nsecs = unit * NSEC_PER_SEC;
+   if (priv-rate == 0 || priv-nsecs  unit)
+   return -EOVERFLOW;
+   priv-tokens = priv-tokens_max = priv-nsecs;
+   priv-last = ktime_get_ns();
return 0;
 }
 
 static int nft_limit_dump(struct sk_buff *skb, const struct nft_expr *expr)
 {
const struct nft_limit *priv = nft_expr_priv(expr);
+   u64 secs = div_u64(priv-nsecs, NSEC_PER_SEC);
 
-   if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(priv-rate)))
-   goto nla_put_failure;
-   if (nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(priv-unit)))
+   if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(priv-rate)) ||
+   nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(secs)))
goto nla_put_failure;
return 0;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/15] netfilter: nf_tables: add nft_dup expression

2015-08-20 Thread Pablo Neira Ayuso

This new expression uses the nf_dup engine to clone packets to a given gateway.
Unlike xt_TEE, we use an index to indicate output interface which should be
fine at this stage.

Moreover, change to the preemtion-safe this_cpu_read(nf_skb_duplicated) from
nf_dup_ipv{4,6} to silence a lockdep splat.

Based on the original tee expression from Arturo Borrero Gonzalez, although
this patch has diverted quite a bit from this initial effort due to the
change to support maps.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/net/netfilter/nft_dup.h  |9 +++
 include/uapi/linux/netfilter/nf_tables.h |   14 
 net/ipv4/netfilter/Kconfig   |6 ++
 net/ipv4/netfilter/Makefile  |1 +
 net/ipv4/netfilter/nf_dup_ipv4.c |2 +-
 net/ipv4/netfilter/nft_dup_ipv4.c|  110 ++
 net/ipv6/netfilter/Kconfig   |6 ++
 net/ipv6/netfilter/Makefile  |1 +
 net/ipv6/netfilter/nf_dup_ipv6.c |2 +-
 net/ipv6/netfilter/nft_dup_ipv6.c|  108 +
 10 files changed, 257 insertions(+), 2 deletions(-)
 create mode 100644 include/net/netfilter/nft_dup.h
 create mode 100644 net/ipv4/netfilter/nft_dup_ipv4.c
 create mode 100644 net/ipv6/netfilter/nft_dup_ipv6.c

diff --git a/include/net/netfilter/nft_dup.h b/include/net/netfilter/nft_dup.h
new file mode 100644
index 000..6b84cf6
--- /dev/null
+++ b/include/net/netfilter/nft_dup.h
@@ -0,0 +1,9 @@
+#ifndef _NFT_DUP_H_
+#define _NFT_DUP_H_
+
+struct nft_dup_inet {
+   enum nft_registers  sreg_addr:8;
+   enum nft_registers  sreg_dev:8;
+};
+
+#endif /* _NFT_DUP_H_ */
diff --git a/include/uapi/linux/netfilter/nf_tables.h 
b/include/uapi/linux/netfilter/nf_tables.h
index a99e6a9..2ef35f2 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -936,6 +936,20 @@ enum nft_redir_attributes {
 #define NFTA_REDIR_MAX (__NFTA_REDIR_MAX - 1)
 
 /**
+ * enum nft_dup_attributes - nf_tables dup expression netlink attributes
+ *
+ * @NFTA_DUP_SREG_ADDR: source register of address (NLA_U32: nft_registers)
+ * @NFTA_DUP_SREG_DEV: source register of output interface (NLA_U32: 
nft_register)
+ */
+enum nft_dup_attributes {
+   NFTA_DUP_UNSPEC,
+   NFTA_DUP_SREG_ADDR,
+   NFTA_DUP_SREG_DEV,
+   __NFTA_DUP_MAX
+};
+#define NFTA_DUP_MAX   (__NFTA_DUP_MAX - 1)
+
+/**
  * enum nft_gen_attributes - nf_tables ruleset generation attributes
  *
  * @NFTA_GEN_ID: Ruleset generation ID (NLA_U32)
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 0142ea2..690d27d 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -58,6 +58,12 @@ config NFT_REJECT_IPV4
default NFT_REJECT
tristate
 
+config NFT_DUP_IPV4
+   tristate IPv4 nf_tables packet duplication support
+   select NF_DUP_IPV4
+   help
+ This module enables IPv4 packet duplication support for nf_tables.
+
 endif # NF_TABLES_IPV4
 
 config NF_TABLES_ARP
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 9136ffc..87b073d 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -41,6 +41,7 @@ obj-$(CONFIG_NFT_CHAIN_NAT_IPV4) += nft_chain_nat_ipv4.o
 obj-$(CONFIG_NFT_REJECT_IPV4) += nft_reject_ipv4.o
 obj-$(CONFIG_NFT_MASQ_IPV4) += nft_masq_ipv4.o
 obj-$(CONFIG_NFT_REDIR_IPV4) += nft_redir_ipv4.o
+obj-$(CONFIG_NFT_DUP_IPV4) += nft_dup_ipv4.o
 obj-$(CONFIG_NF_TABLES_ARP) += nf_tables_arp.o
 
 # generic IP tables 
diff --git a/net/ipv4/netfilter/nf_dup_ipv4.c b/net/ipv4/netfilter/nf_dup_ipv4.c
index eff85ab..b5bb375 100644
--- a/net/ipv4/netfilter/nf_dup_ipv4.c
+++ b/net/ipv4/netfilter/nf_dup_ipv4.c
@@ -69,7 +69,7 @@ void nf_dup_ipv4(struct sk_buff *skb, unsigned int hooknum,
 {
struct iphdr *iph;
 
-   if (__this_cpu_read(nf_skb_duplicated))
+   if (this_cpu_read(nf_skb_duplicated))
return;
/*
 * Copy the skb, and route the copy. Will later return %XT_CONTINUE for
diff --git a/net/ipv4/netfilter/nft_dup_ipv4.c 
b/net/ipv4/netfilter/nft_dup_ipv4.c
new file mode 100644
index 000..25419fb
--- /dev/null
+++ b/net/ipv4/netfilter/nft_dup_ipv4.c
@@ -0,0 +1,110 @@
+/*
+ * Copyright (c) 2015 Pablo Neira Ayuso pa...@netfilter.org
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include linux/kernel.h
+#include linux/init.h
+#include linux/module.h
+#include linux/netlink.h
+#include linux/netfilter.h
+#include linux/netfilter/nf_tables.h
+#include net/netfilter/nf_tables.h
+#include net/netfilter/ipv4/nf_dup_ipv4.h
+
+struct nft_dup_ipv4 {
+   enum nft_registers  sreg_addr:8;
+   enum nft_registers  sreg_dev:8;
+};
+
+static void nft_dup_ipv4_eval(const struct nft_expr

[PATCH 02/15] netfilter: xt_TEE: get rid of WITH_CONNTRACK definition

2015-08-20 Thread Pablo Neira Ayuso

Use IS_ENABLED(CONFIG_NF_CONNTRACK) instead.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/xt_TEE.c |8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
index c5d6556..0ed9fb6 100644
--- a/net/netfilter/xt_TEE.c
+++ b/net/netfilter/xt_TEE.c
@@ -24,10 +24,8 @@
 #include net/route.h
 #include linux/netfilter/x_tables.h
 #include linux/netfilter/xt_TEE.h
-
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
-#  define WITH_CONNTRACK 1
-#  include net/netfilter/nf_conntrack.h
+#include net/netfilter/nf_conntrack.h
 #endif
 
 struct xt_tee_priv {
@@ -99,7 +97,7 @@ tee_tg4(struct sk_buff *skb, const struct xt_action_param 
*par)
if (skb == NULL)
return XT_CONTINUE;
 
-#ifdef WITH_CONNTRACK
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
/* Avoid counting cloned packets towards the original connection. */
nf_conntrack_put(skb-nfct);
skb-nfct = nf_ct_untracked_get()-ct_general;
@@ -175,7 +173,7 @@ tee_tg6(struct sk_buff *skb, const struct xt_action_param 
*par)
if (skb == NULL)
return XT_CONTINUE;
 
-#ifdef WITH_CONNTRACK
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
nf_conntrack_put(skb-nfct);
skb-nfct = nf_ct_untracked_get()-ct_general;
skb-nfctinfo = IP_CT_NEW;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/15] netfilter: nft_counter: convert it to use per-cpu counters

2015-08-20 Thread Pablo Neira Ayuso

This patch converts the existing seqlock to per-cpu counters.

Suggested-by: Eric Dumazet eric.duma...@gmail.com
Suggested-by: Patrick McHardy ka...@trash.net
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/nft_counter.c |   97 ++-
 1 file changed, 69 insertions(+), 28 deletions(-)

diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c
index 1759123..1067fb4 100644
--- a/net/netfilter/nft_counter.c
+++ b/net/netfilter/nft_counter.c
@@ -18,39 +18,59 @@
 #include net/netfilter/nf_tables.h
 
 struct nft_counter {
-   seqlock_t   lock;
u64 bytes;
u64 packets;
 };
 
+struct nft_counter_percpu {
+   struct nft_counter  counter;
+   struct u64_stats_sync   syncp;
+};
+
+struct nft_counter_percpu_priv {
+   struct nft_counter_percpu __percpu *counter;
+};
+
 static void nft_counter_eval(const struct nft_expr *expr,
 struct nft_regs *regs,
 const struct nft_pktinfo *pkt)
 {
-   struct nft_counter *priv = nft_expr_priv(expr);
-
-   write_seqlock_bh(priv-lock);
-   priv-bytes += pkt-skb-len;
-   priv-packets++;
-   write_sequnlock_bh(priv-lock);
+   struct nft_counter_percpu_priv *priv = nft_expr_priv(expr);
+   struct nft_counter_percpu *this_cpu;
+
+   local_bh_disable();
+   this_cpu = this_cpu_ptr(priv-counter);
+   u64_stats_update_begin(this_cpu-syncp);
+   this_cpu-counter.bytes += pkt-skb-len;
+   this_cpu-counter.packets++;
+   u64_stats_update_end(this_cpu-syncp);
+   local_bh_enable();
 }
 
 static int nft_counter_dump(struct sk_buff *skb, const struct nft_expr *expr)
 {
-   struct nft_counter *priv = nft_expr_priv(expr);
+   struct nft_counter_percpu_priv *priv = nft_expr_priv(expr);
+   struct nft_counter_percpu *cpu_stats;
+   struct nft_counter total;
+   u64 bytes, packets;
unsigned int seq;
-   u64 bytes;
-   u64 packets;
-
-   do {
-   seq = read_seqbegin(priv-lock);
-   bytes   = priv-bytes;
-   packets = priv-packets;
-   } while (read_seqretry(priv-lock, seq));
-
-   if (nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(bytes)))
-   goto nla_put_failure;
-   if (nla_put_be64(skb, NFTA_COUNTER_PACKETS, cpu_to_be64(packets)))
+   int cpu;
+
+   memset(total, 0, sizeof(total));
+   for_each_possible_cpu(cpu) {
+   cpu_stats = per_cpu_ptr(priv-counter, cpu);
+   do {
+   seq = u64_stats_fetch_begin_irq(cpu_stats-syncp);
+   bytes   = cpu_stats-counter.bytes;
+   packets = cpu_stats-counter.packets;
+   } while (u64_stats_fetch_retry_irq(cpu_stats-syncp, seq));
+
+   total.packets += packets;
+   total.bytes += bytes;
+   }
+
+   if (nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(total.bytes)) ||
+   nla_put_be64(skb, NFTA_COUNTER_PACKETS, cpu_to_be64(total.packets)))
goto nla_put_failure;
return 0;
 
@@ -67,23 +87,44 @@ static int nft_counter_init(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nlattr * const tb[])
 {
-   struct nft_counter *priv = nft_expr_priv(expr);
+   struct nft_counter_percpu_priv *priv = nft_expr_priv(expr);
+   struct nft_counter_percpu __percpu *cpu_stats;
+   struct nft_counter_percpu *this_cpu;
+
+   cpu_stats = netdev_alloc_pcpu_stats(struct nft_counter_percpu);
+   if (cpu_stats == NULL)
+   return ENOMEM;
+
+   preempt_disable();
+   this_cpu = this_cpu_ptr(cpu_stats);
+   if (tb[NFTA_COUNTER_PACKETS]) {
+   this_cpu-counter.packets =
+   be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_PACKETS]));
+   }
+   if (tb[NFTA_COUNTER_BYTES]) {
+   this_cpu-counter.bytes =
+   be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_BYTES]));
+   }
+   preempt_enable();
+   priv-counter = cpu_stats;
+   return 0;
+}
 
-   if (tb[NFTA_COUNTER_PACKETS])
-   priv-packets = 
be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_PACKETS]));
-   if (tb[NFTA_COUNTER_BYTES])
-   priv-bytes = be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_BYTES]));
+static void nft_counter_destroy(const struct nft_ctx *ctx,
+   const struct nft_expr *expr)
+{
+   struct nft_counter_percpu_priv *priv = nft_expr_priv(expr);
 
-   seqlock_init(priv-lock);
-   return 0;
+   free_percpu(priv-counter);
 }
 
 static struct nft_expr_type nft_counter_type;
 static const struct nft_expr_ops nft_counter_ops = {
.type   = nft_counter_type,
-   .size   = NFT_EXPR_SIZE(sizeof(struct nft_counter)),
+   .size

[PATCH 00/15] Netfilter updates for net-next

2015-08-20 Thread Pablo Neira Ayuso

Hi David,

This is second pull request includes the conflict resolution patch that
resulted from the updates that we got for the conntrack template through
kmalloc. No changes with regards to the previously sent 15 patches.

The following patchset contains Netfilter updates for your net-next tree, they
are:

1) Rework the existing nf_tables counter expression to make it per-cpu.

2) Prepare and factor out common packet duplication code from the TEE target so
   it can be reused from the new dup expression.

3) Add the new dup expression for the nf_tables IPv4 and IPv6 families.

4) Convert the nf_tables limit expression to use a token-based approach with
   64-bits precision.

5) Enhance the nf_tables limit expression to support limiting at packet byte.
   This comes after several preparation patches.

6) Add a burst parameter to indicate the amount of packets or bytes that can
   exceed the limiting.

7) Add netns support to nfacct, from Andreas Schultz.

8) Pass the nf_conn_zone structure instead of the zone ID in nf_tables to allow
   accessing more zone specific information, from Daniel Borkmann.

9) Allow to define zone per-direction to support netns containers with
   overlapping network addressing, also from Daniel.

10) Extend the CT target to allow setting the zone based on the skb-mark as a
   way to support simple mappings from iptables, also from Daniel.

11) Make the nf_tables payload expression aware of the fact that VLAN offload
may have removed a vlan header, from Florian Westphal.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Thanks!



The following changes since commit 938049e18dca57bcd2f93986fc1cbb5a83cdf027:

  net: xgene Remove xgene specific phy and MAC lookup functions (2015-08-20 
14:43:49 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master

for you to fetch changes up to 81bf1c64e7fe08f956c74fe2b0f1fa6eb163bd91:

  Merge branch 'master' of 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next (2015-08-21 
06:09:05 +0200)



Andreas Schultz (1):
  netfilter: nfacct: per network namespace support

Daniel Borkmann (3):
  netfilter: nf_conntrack: push zone object into functions
  netfilter: nf_conntrack: add direction support for zones
  netfilter: nf_conntrack: add efficient mark to zone mapping

Florian Westphal (1):
  netfilter: nft_payload: work around vlan header stripping

Pablo Neira Ayuso (11):
  netfilter: nft_counter: convert it to use per-cpu counters
  netfilter: xt_TEE: get rid of WITH_CONNTRACK definition
  netfilter: factor out packet duplication for IPv4/IPv6
  netfilter: nf_tables: add nft_dup expression
  netfilter: nft_limit: rename to nft_limit_pkts
  netfilter: nft_limit: convert to token-based limiting at nanosecond 
granularity
  netfilter: nft_limit: factor out shared code with per-byte limiting
  netfilter: nft_limit: add burst parameter
  netfilter: nft_limit: constant token cost per packet
  netfilter: nft_limit: add per-byte limiting
  Merge branch 'master' of git://git.kernel.org/.../davem/net-next

 include/linux/netfilter/nfnetlink_acct.h   |3 +-
 include/net/net_namespace.h|3 +
 include/net/netfilter/ipv4/nf_dup_ipv4.h   |7 +
 include/net/netfilter/ipv6/nf_dup_ipv6.h   |7 +
 include/net/netfilter/nf_conntrack.h   |   10 +-
 include/net/netfilter/nf_conntrack_core.h  |3 +-
 include/net/netfilter/nf_conntrack_expect.h|   11 +-
 include/net/netfilter/nf_conntrack_zones.h |   99 -
 include/net/netfilter/nft_dup.h|9 +
 include/uapi/linux/netfilter/nf_tables.h   |   23 ++
 include/uapi/linux/netfilter/nfnetlink_conntrack.h |1 +
 include/uapi/linux/netfilter/xt_CT.h   |8 +-
 net/ipv4/netfilter/Kconfig |   12 ++
 net/ipv4/netfilter/Makefile|3 +
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |2 +-
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c   |4 +-
 net/ipv4/netfilter/nf_defrag_ipv4.c|   17 +-
 net/ipv4/netfilter/nf_dup_ipv4.c   |  120 +++
 net/ipv4/netfilter/nft_dup_ipv4.c  |  110 ++
 net/ipv6/netfilter/Kconfig |   12 ++
 net/ipv6/netfilter/Makefile|3 +
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |2 +-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |5 +-
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c  |   18 +-
 net/ipv6/netfilter/nf_dup_ipv6.c   |   96 +
 net/ipv6/netfilter/nft_dup_ipv6.c  |  108 ++

Bug in tc of iproute2 ? Deleting single filter, deletes all the filters (apart from hashtable 800::) ...

2015-08-20 Thread Akshat Kakkar

When I am trying to  delete a single tc filter, it deleting all the
filters with the same priority/preference. i.e. it is ignoring the
handle specified.

But, When I am deleting in hashtable 800: it is deleting only the
specified filter.

For example, following set of commands  create a hashtable 15: and add
2 filters to it.

tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor 256
tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32
ht 15:2: match ip src 10.0.0.2 flowid 1:10
tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32
ht 15:2: match ip src 10.0.0.3 flowid 1:10

Now following command DELETES ALL THE FILTERS, though it should only
delete FILTER 15:2:3 !
tc filter del dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32

O/p of tc filter show eth0 is this case is blank. As all filters are deleted.


However, similar commands when executed for hashtable 800: is deleting
only the specified filter
tc filter add dev eth0 protocol ip parent 1: prio 5 handle 800:0:2 u32
ht 800:0: match ip src 10.0.0.2 flowid 1:10
tc filter add dev eth0 protocol ip parent 1: prio 5 handle 800:0:3 u32
ht 800:0: match ip src 10.0.0.3 flowid 1:10

tc filter del dev eth0 protocol ip parent 1: prio 5 handle 800:0:2 u32

Above mentioned command only deletes single filter.
O/p of tc filter show eth0 is 2nd case is

filter parent 1: protocol ip pref 5 u32
filter parent 1: protocol ip pref 5 u32 fh 800: ht divisor 1
filter parent 1: protocol ip pref 5 u32 fh 800::3 order 3 key ht 800
bkt 0 flowid 1:10
  match 0a03/ at 12
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/15] Netfilter updates for net-next

2015-08-20 Thread David Miller

From: Pablo Neira Ayuso pa...@netfilter.org
Date: Fri, 21 Aug 2015 06:32:29 +0200

 This is second pull request includes the conflict resolution patch that
 resulted from the updates that we got for the conntrack template through
 kmalloc. No changes with regards to the previously sent 15 patches.
 ...
 You can pull these changes from:

   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

This looks better, pulled, thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] 3c59x: Add BQL support for 3c59x ethernet driver.

2015-08-20 Thread David Miller


This is not a proper submission.

You must at the very least provide an appropriate signoff in the
commit message of your change.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH ipsec-next v2] xfrm: Use VRF master index if output device is enslaved

2015-08-20 Thread Nikolay Aleksandrov


 On Aug 21, 2015, at 1:06 AM, David Ahern d...@cumulusnetworks.com wrote:
 
 Directs route lookups to VRF table. Compiles out if NET_VRF is not
 enabled. With this patch able to successfully bring up ipsec tunnels
 in VRFs, even with duplicate network configuration.
 
 Signed-off-by: David Ahern d...@cumulusnetworks.com
 ---
 v2
 - use vrf_master_ifindex rather than vrf_master_ifindex_rcu
 
 net/ipv4/xfrm4_policy.c | 7 +--
 net/ipv6/xfrm6_policy.c | 7 +--
 2 files changed, 10 insertions(+), 4 deletions(-)
 

Looks good to me,

Acked-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv4 net-next 10/10] openvswitch: Allow attaching helpers to ct action

2015-08-20 Thread Joe Stringer

On 19 August 2015 at 15:57, Pravin Shelar pshe...@nicira.com wrote:
 On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer joestrin...@nicira.com wrote:
 Add support for using conntrack helpers to assist protocol detection.
 The new OVS_CT_ATTR_HELPER attribute of the ct action specifies a helper
 to be used for this connection.

 Example ODP flows allowing FTP connections from ports 1-2:
 in_port=1,tcp,action=ct(helper=ftp,commit),2
 in_port=2,tcp,ct_state=-trk,action=ct(),recirc(1)
 recirc_id=1,in_port=2,tcp,ct_state=+trk-new+est,action=1
 recirc_id=1,in_port=2,tcp,ct_state=+trk+rel,action=1

 Signed-off-by: Joe Stringer joestrin...@nicira.com
 ---
 v2-v3: No change.
 v4: Change error code for unknown helper ENOENT-EINVAL.

 I got following compilation warning :

 net/openvswitch/conntrack.c:352:42: error: incompatible types in
 comparison expression (different address spaces)

Is this made available via another sparse flag? It looks like it's
related to the __rcu as you've mentioned below, but I'm not seeing
this (latest sparse, gcc-4.9.2)

 +static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char 
 *name,
 +const struct sw_flow_key *key, bool log)
 +{
 +   struct nf_conntrack_helper *helper;
 +   struct nf_conn_help *help;
 +
 +   helper = nf_conntrack_helper_try_module_get(name, info-family,
 +   key-ip.proto);
 +   if (!helper) {
 +   OVS_NLERR(log, Unknown helper \%s\, name);
 +   return -EINVAL;
 +   }
 +
 +   help = nf_ct_helper_ext_add(info-ct, helper, GFP_KERNEL);
 +   if (!help) {
 +   module_put(helper-me);
 +   return -ENOMEM;
 +   }
 +
 +   help-helper = helper;
 helper is rcu pointer so need to use rcu API to set the value. I know
 it is not required here, but it is still cleaner to use the API.

Will update, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH, net-next] r8169: Disable some bits on pcie

2015-08-20 Thread Corcodel Marian

Disable legacy interrupt on pci express interface use msi 
 disable some bits from pci express interface wich is not   need on this nic.


Signed-off-by: Corcodel Marian corcodel.mar...@gmail.com

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 6d16de3..b1fb54f 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -8164,6 +8164,13 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
if (!pci_is_pcie(pdev))
netif_info(tp, probe, dev, not PCI Express\n);
 
+   if (pci_is_pcie(pdev))
+   pci_write_config_word(pdev, PCI_COMMAND, 
~(PCI_COMMAND_FAST_BACK | PCI_COMMAND_WAIT |
+   PCI_COMMAND_VGA_PALETTE | 
PCI_COMMAND_INVALIDATE | PCI_COMMAND_SPECIAL));
+
+   if (pci_is_pcie(pdev))
+   pci_intx(pdev, 0);
+
/* Identify chip attached to board */
rtl8169_get_mac_version(tp, dev, cfg-default_ver);
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 02/13] ip_tunnels: use u8/u16/u32

2015-08-20 Thread Jiri Benc

The ip_tunnels.h include file uses mixture of __u16 and u16 (etc.) types.
Unify it to the non-underscore variants.

Signed-off-by: Jiri Benc jb...@redhat.com
Acked-by: Thomas Graf tg...@suug.ch
---
 include/net/ip_tunnels.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 81cf11c931e4..ca173f22f07f 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -32,8 +32,8 @@ struct ip_tunnel_key {
__be32  ipv4_src;
__be32  ipv4_dst;
__be16  tun_flags;
-   __u8ipv4_tos;
-   __u8ipv4_ttl;
+   u8  ipv4_tos;
+   u8  ipv4_ttl;
__be16  tp_src;
__be16  tp_dst;
 };
@@ -64,8 +64,8 @@ struct ip_tunnel_6rd_parm {
 #endif
 
 struct ip_tunnel_encap {
-   __u16   type;
-   __u16   flags;
+   u16 type;
+   u16 flags;
__be16  sport;
__be16  dport;
 };
@@ -95,8 +95,8 @@ struct ip_tunnel {
 * arrived */
 
/* These four fields used only by GRE */
-   __u32   i_seqno;/* The last seen seqno  */
-   __u32   o_seqno;/* The last output seqno */
+   u32 i_seqno;/* The last seen seqno  */
+   u32 o_seqno;/* The last output seqno */
int tun_hlen;   /* Precalculated header length */
int mlink;
 
@@ -273,8 +273,8 @@ static inline u8 ip_tunnel_ecn_encap(u8 tos, const struct 
iphdr *iph,
 
 int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto);
 int iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
- __be32 src, __be32 dst, __u8 proto,
- __u8 tos, __u8 ttl, __be16 df, bool xnet);
+ __be32 src, __be32 dst, u8 proto,
+ u8 tos, u8 ttl, __be16 df, bool xnet);
 
 struct sk_buff *iptunnel_handle_offloads(struct sk_buff *skb, bool gre_csum,
 int gso_type_mask);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 12/13] ipv6: route: extend flow representation with tunnel key

2015-08-20 Thread Jiri Benc

Use flowi_tunnel in flowi6 similarly to what is done with IPv4.
This complements commit 1b7179d3adff (route: Extend flow representation
with tunnel key).

Signed-off-by: Jiri Benc jb...@redhat.com
Acked-by: Thomas Graf tg...@suug.ch
---
 include/net/flow.h | 1 +
 net/ipv6/route.c   | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/include/net/flow.h b/include/net/flow.h
index f305588fc162..9e0297c4c11d 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -130,6 +130,7 @@ struct flowi6 {
 #define flowi6_proto   __fl_common.flowic_proto
 #define flowi6_flags   __fl_common.flowic_flags
 #define flowi6_secid   __fl_common.flowic_secid
+#define flowi6_tun_key __fl_common.flowic_tun_key
struct in6_addr daddr;
struct in6_addr saddr;
__be32  flowlabel;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c4f3b9fcca9d..6c0fe4c7ce8d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -54,11 +54,13 @@
 #include net/tcp.h
 #include linux/rtnetlink.h
 #include net/dst.h
+#include net/dst_metadata.h
 #include net/xfrm.h
 #include net/netevent.h
 #include net/netlink.h
 #include net/nexthop.h
 #include net/lwtunnel.h
+#include net/ip_tunnels.h
 
 #include asm/uaccess.h
 
@@ -1131,6 +1133,7 @@ void ip6_route_input(struct sk_buff *skb)
const struct ipv6hdr *iph = ipv6_hdr(skb);
struct net *net = dev_net(skb-dev);
int flags = RT6_LOOKUP_F_HAS_SADDR;
+   struct ip_tunnel_info *tun_info;
struct flowi6 fl6 = {
.flowi6_iif = skb-dev-ifindex,
.daddr = iph-daddr,
@@ -1140,6 +1143,9 @@ void ip6_route_input(struct sk_buff *skb)
.flowi6_proto = iph-nexthdr,
};
 
+   tun_info = skb_tunnel_info(skb);
+   if (tun_info  tun_info-mode == IP_TUNNEL_INFO_RX)
+   fl6.flowi6_tun_key.tun_id = tun_info-key.tun_id;
skb_dst_drop(skb);
skb_dst_set(skb, ip6_route_input_lookup(net, skb-dev, fl6, flags));
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 10/13] vxlan: do not shadow flags variable

2015-08-20 Thread Jiri Benc

The 'flags' variable is already defined in the outer scope.

Signed-off-by: Jiri Benc jb...@redhat.com
Acked-by: Thomas Graf tg...@suug.ch
---
 drivers/net/vxlan.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 070149f77072..2c1abf95c17d 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2025,7 +2025,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
} else {
struct dst_entry *ndst;
struct flowi6 fl6;
-   u32 flags;
+   u32 rt6i_flags;
 
memset(fl6, 0, sizeof(fl6));
fl6.flowi6_oif = rdst ? rdst-remote_ifindex : 0;
@@ -2050,9 +2050,9 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
}
 
/* Bypass encapsulation if the destination is local */
-   flags = ((struct rt6_info *)ndst)-rt6i_flags;
-   if (flags  RTF_LOCAL 
-   !(flags  (RTCF_BROADCAST | RTCF_MULTICAST))) {
+   rt6i_flags = ((struct rt6_info *)ndst)-rt6i_flags;
+   if (rt6i_flags  RTF_LOCAL 
+   !(rt6i_flags  (RTCF_BROADCAST | RTCF_MULTICAST))) {
struct vxlan_dev *dst_vxlan;
 
dst_release(ndst);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 08/13] ipv6: ndisc: inherit metadata dst when creating ndisc requests

2015-08-20 Thread Jiri Benc

If output device wants to see the dst, inherit the dst of the original skb
in the ndisc request.

This is an IPv6 counterpart of commit 0accfc268f4d (arp: Inherit metadata
dst when creating ARP requests).

Signed-off-by: Jiri Benc jb...@redhat.com
Acked-by: Thomas Graf tg...@suug.ch
---
 include/net/ndisc.h |  3 ++-
 net/ipv6/addrconf.c |  2 +-
 net/ipv6/ndisc.c| 10 +++---
 net/ipv6/route.c|  2 +-
 4 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index b3a7751251b4..aba5695fadb0 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -182,7 +182,8 @@ int ndisc_rcv(struct sk_buff *skb);
 
 void ndisc_send_ns(struct net_device *dev, struct neighbour *neigh,
   const struct in6_addr *solicit,
-  const struct in6_addr *daddr, const struct in6_addr *saddr);
+  const struct in6_addr *daddr, const struct in6_addr *saddr,
+  struct sk_buff *oskb);
 
 void ndisc_send_rs(struct net_device *dev,
   const struct in6_addr *saddr, const struct in6_addr *daddr);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 59242399b0b5..0f08d3b9e238 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3656,7 +3656,7 @@ static void addrconf_dad_work(struct work_struct *w)
 
/* send a neighbour solicitation for our addr */
addrconf_addr_solict_mult(ifp-addr, mcaddr);
-   ndisc_send_ns(ifp-idev-dev, NULL, ifp-addr, mcaddr, in6addr_any);
+   ndisc_send_ns(ifp-idev-dev, NULL, ifp-addr, mcaddr, in6addr_any, 
NULL);
 out:
in6_ifa_put(ifp);
rtnl_unlock();
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index b3054611f88a..13d3c2beb93e 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -553,7 +553,8 @@ static void ndisc_send_unsol_na(struct net_device *dev)
 
 void ndisc_send_ns(struct net_device *dev, struct neighbour *neigh,
   const struct in6_addr *solicit,
-  const struct in6_addr *daddr, const struct in6_addr *saddr)
+  const struct in6_addr *daddr, const struct in6_addr *saddr,
+  struct sk_buff *oskb)
 {
struct sk_buff *skb;
struct in6_addr addr_buf;
@@ -589,6 +590,9 @@ void ndisc_send_ns(struct net_device *dev, struct neighbour 
*neigh,
ndisc_fill_addr_option(skb, ND_OPT_SOURCE_LL_ADDR,
   dev-dev_addr);
 
+   if (!(dev-priv_flags  IFF_XMIT_DST_RELEASE)  oskb)
+   skb_dst_copy(skb, oskb);
+
ndisc_send_skb(skb, daddr, saddr);
 }
 
@@ -675,12 +679,12 @@ static void ndisc_solicit(struct neighbour *neigh, struct 
sk_buff *skb)
  %s: trying to ucast probe in NUD_INVALID: 
%pI6\n,
  __func__, target);
}
-   ndisc_send_ns(dev, neigh, target, target, saddr);
+   ndisc_send_ns(dev, neigh, target, target, saddr, skb);
} else if ((probes -= NEIGH_VAR(neigh-parms, APP_PROBES))  0) {
neigh_app_ns(neigh);
} else {
addrconf_addr_solict_mult(target, mcaddr);
-   ndisc_send_ns(dev, NULL, target, mcaddr, saddr);
+   ndisc_send_ns(dev, NULL, target, mcaddr, saddr, skb);
}
 }
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 0947ad0b3de8..c4f3b9fcca9d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -538,7 +538,7 @@ static void rt6_probe_deferred(struct work_struct *w)
container_of(w, struct __rt6_probe_work, work);
 
addrconf_addr_solict_mult(work-target, mcaddr);
-   ndisc_send_ns(work-dev, NULL, work-target, mcaddr, NULL);
+   ndisc_send_ns(work-dev, NULL, work-target, mcaddr, NULL, NULL);
dev_put(work-dev);
kfree(work);
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 07/13] ipv6: drop metadata dst in ip6_route_input

2015-08-20 Thread Jiri Benc

The fix in commit 48fb6b554501 is incomplete, as now ip6_route_input can be
called with non-NULL dst if it's a metadata dst and the reference is leaked.
Drop the reference.

Fixes: 48fb6b554501 (ipv6: fix crash over flow-based vxlan device)
Fixes: ee122c79d422 (vxlan: Flow based tunneling)
CC: Wei-Chun Chao weich...@plumgrid.com
CC: Thomas Graf tg...@suug.ch
Signed-off-by: Jiri Benc jb...@redhat.com
Acked-by: Thomas Graf tg...@suug.ch
---
 net/ipv6/route.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index e6bbcdee7707..0947ad0b3de8 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1140,6 +1140,7 @@ void ip6_route_input(struct sk_buff *skb)
.flowi6_proto = iph-nexthdr,
};
 
+   skb_dst_drop(skb);
skb_dst_set(skb, ip6_route_input_lookup(net, skb-dev, fl6, flags));
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 09/13] vxlan: provide access function for vxlan socket address family

2015-08-20 Thread Jiri Benc

Signed-off-by: Jiri Benc jb...@redhat.com
Acked-by: Thomas Graf tg...@suug.ch
---
 drivers/net/vxlan.c | 8 
 include/net/vxlan.h | 5 +
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 93613ffd8d7e..070149f77072 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -236,7 +236,7 @@ static struct vxlan_sock *vxlan_find_sock(struct net *net, 
sa_family_t family,
 
hlist_for_each_entry_rcu(vs, vs_head(net, port), hlist) {
if (inet_sk(vs-sock-sk)-inet_sport == port 
-   inet_sk(vs-sock-sk)-sk.sk_family == family 
+   vxlan_get_sk_family(vs) == family 
vs-flags == flags)
return vs;
}
@@ -625,7 +625,7 @@ static void vxlan_notify_add_rx_port(struct vxlan_sock *vs)
struct net_device *dev;
struct sock *sk = vs-sock-sk;
struct net *net = sock_net(sk);
-   sa_family_t sa_family = sk-sk_family;
+   sa_family_t sa_family = vxlan_get_sk_family(vs);
__be16 port = inet_sk(sk)-inet_sport;
int err;
 
@@ -650,7 +650,7 @@ static void vxlan_notify_del_rx_port(struct vxlan_sock *vs)
struct net_device *dev;
struct sock *sk = vs-sock-sk;
struct net *net = sock_net(sk);
-   sa_family_t sa_family = sk-sk_family;
+   sa_family_t sa_family = vxlan_get_sk_family(vs);
__be16 port = inet_sk(sk)-inet_sport;
 
rcu_read_lock();
@@ -2390,7 +2390,7 @@ void vxlan_get_rx_port(struct net_device *dev)
for (i = 0; i  PORT_HASH_SIZE; ++i) {
hlist_for_each_entry_rcu(vs, vn-sock_list[i], hlist) {
port = inet_sk(vs-sock-sk)-inet_sport;
-   sa_family = vs-sock-sk-sk_family;
+   sa_family = vxlan_get_sk_family(vs);
dev-netdev_ops-ndo_add_vxlan_port(dev, sa_family,
port);
}
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index e4534f1b2d8c..43677e6b9c43 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -241,3 +241,8 @@ static inline void vxlan_get_rx_port(struct net_device 
*netdev)
 }
 #endif
 #endif
+
+static inline unsigned short vxlan_get_sk_family(struct vxlan_sock *vs)
+{
+   return vs-sock-sk-sk_family;
+}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 04/13] ip_tunnels: add IPv6 addresses to ip_tunnel_key

2015-08-20 Thread Jiri Benc

Add the IPv6 addresses as an union with IPv4 ones. When using IPv4, the
newly introduced padding after the IPv4 addresses needs to be zeroed out.

Signed-off-by: Jiri Benc jb...@redhat.com
Acked-by: Thomas Graf tg...@suug.ch
Acked-by: Alexei Starovoitov a...@plumgrid.com
---
v1-v2: Fix incorrect IP_TUNNEL_KEY_IPV4_PAD_LEN calculation, thanks to
Alexei.
---
 drivers/net/vxlan.c|  6 +++---
 include/net/ip_tunnels.h   | 24 
 net/core/filter.c  |  4 ++--
 net/ipv4/ip_gre.c  | 10 +-
 net/ipv4/ip_tunnel_core.c  |  8 
 net/openvswitch/flow_netlink.c | 18 +-
 net/openvswitch/flow_table.c   |  2 +-
 net/openvswitch/vport-geneve.c |  2 +-
 net/openvswitch/vport.c|  2 +-
 net/openvswitch/vport.h|  4 ++--
 10 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ad51dac88d19..30a7abcf2c09 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1276,8 +1276,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
goto drop;
 
info = tun_dst-u.tun_info;
-   info-key.ipv4_src = iph-saddr;
-   info-key.ipv4_dst = iph-daddr;
+   info-key.u.ipv4.src = iph-saddr;
+   info-key.u.ipv4.dst = iph-daddr;
info-key.ipv4_tos = iph-tos;
info-key.ipv4_ttl = iph-ttl;
info-key.tp_src = udp_hdr(skb)-source;
@@ -1925,7 +1925,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
dst_port = info-key.tp_dst ? : vxlan-cfg.dst_port;
vni = be64_to_cpu(info-key.tun_id);
remote_ip.sin.sin_family = AF_INET;
-   remote_ip.sin.sin_addr.s_addr = info-key.ipv4_dst;
+   remote_ip.sin.sin_addr.s_addr = info-key.u.ipv4.dst;
dst = remote_ip;
}
 
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index cc3b39e9010b..6a51371dad00 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -25,10 +25,24 @@
 /* Used to memset ip_tunnel padding. */
 #define IP_TUNNEL_KEY_SIZE offsetofend(struct ip_tunnel_key, tp_dst)
 
+/* Used to memset ipv4 address padding. */
+#define IP_TUNNEL_KEY_IPV4_PAD offsetofend(struct ip_tunnel_key, u.ipv4.dst)
+#define IP_TUNNEL_KEY_IPV4_PAD_LEN \
+   (FIELD_SIZEOF(struct ip_tunnel_key, u) -\
+FIELD_SIZEOF(struct ip_tunnel_key, u.ipv4))
+
 struct ip_tunnel_key {
__be64  tun_id;
-   __be32  ipv4_src;
-   __be32  ipv4_dst;
+   union {
+   struct {
+   __be32  src;
+   __be32  dst;
+   } ipv4;
+   struct {
+   struct in6_addr src;
+   struct in6_addr dst;
+   } ipv6;
+   } u;
__be16  tun_flags;
u8  ipv4_tos;
u8  ipv4_ttl;
@@ -177,8 +191,10 @@ static inline void __ip_tunnel_info_init(struct 
ip_tunnel_info *tun_info,
 const void *opts, u8 opts_len)
 {
tun_info-key.tun_id = tun_id;
-   tun_info-key.ipv4_src = saddr;
-   tun_info-key.ipv4_dst = daddr;
+   tun_info-key.u.ipv4.src = saddr;
+   tun_info-key.u.ipv4.dst = daddr;
+   memset((unsigned char *)tun_info-key + IP_TUNNEL_KEY_IPV4_PAD,
+  0, IP_TUNNEL_KEY_IPV4_PAD_LEN);
tun_info-key.ipv4_tos = tos;
tun_info-key.ipv4_ttl = ttl;
tun_info-key.tun_flags = tun_flags;
diff --git a/net/core/filter.c b/net/core/filter.c
index 83f08cefeab7..379568562ffb 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1495,7 +1495,7 @@ static u64 bpf_skb_get_tunnel_key(u64 r1, u64 r2, u64 
size, u64 flags, u64 r5)
return -EINVAL;
 
to-tunnel_id = be64_to_cpu(info-key.tun_id);
-   to-remote_ipv4 = be32_to_cpu(info-key.ipv4_src);
+   to-remote_ipv4 = be32_to_cpu(info-key.u.ipv4.src);
 
return 0;
 }
@@ -1529,7 +1529,7 @@ static u64 bpf_skb_set_tunnel_key(u64 r1, u64 r2, u64 
size, u64 flags, u64 r5)
info = md-u.tun_info;
info-mode = IP_TUNNEL_INFO_TX;
info-key.tun_id = cpu_to_be64(from-tunnel_id);
-   info-key.ipv4_dst = cpu_to_be32(from-remote_ipv4);
+   info-key.u.ipv4.dst = cpu_to_be32(from-remote_ipv4);
 
return 0;
 }
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index fb44d693796e..b7bb7d6aa7a8 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -407,8 +407,8 @@ static int ipgre_rcv(struct sk_buff *skb, const struct 
tnl_ptk_info *tpi)
return PACKET_REJECT;
 
info = tun_dst-u.tun_info;
-   info-key.ipv4_src = iph-saddr;
-

[PATCH v3 net-next 11/13] vxlan: metadata based tunneling for IPv6

2015-08-20 Thread Jiri Benc

Support metadata based (formerly flow based) tunneling also for IPv6.
This complements commit ee122c79d422 (vxlan: Flow based tunneling).

Signed-off-by: Jiri Benc jb...@redhat.com
Acked-by: Thomas Graf tg...@suug.ch
Acked-by: Alexei Starovoitov a...@plumgrid.com
---
 drivers/net/vxlan.c | 69 +++--
 1 file changed, 40 insertions(+), 29 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 2c1abf95c17d..54615bb9d916 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1269,17 +1269,27 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
}
 
if (vxlan_collect_metadata(vs)) {
-   const struct iphdr *iph = ip_hdr(skb);
-
tun_dst = metadata_dst_alloc(sizeof(*md), GFP_ATOMIC);
if (!tun_dst)
goto drop;
 
info = tun_dst-u.tun_info;
-   info-key.u.ipv4.src = iph-saddr;
-   info-key.u.ipv4.dst = iph-daddr;
-   info-key.tos = iph-tos;
-   info-key.ttl = iph-ttl;
+   if (vxlan_get_sk_family(vs) == AF_INET) {
+   const struct iphdr *iph = ip_hdr(skb);
+
+   info-key.u.ipv4.src = iph-saddr;
+   info-key.u.ipv4.dst = iph-daddr;
+   info-key.tos = iph-tos;
+   info-key.ttl = iph-ttl;
+   } else {
+   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+
+   info-key.u.ipv6.src = ip6h-saddr;
+   info-key.u.ipv6.dst = ip6h-daddr;
+   info-key.tos = ipv6_get_dsfield(ip6h);
+   info-key.ttl = ip6h-hop_limit;
+   }
+
info-key.tp_src = udp_hdr(skb)-source;
info-key.tp_dst = udp_hdr(skb)-dest;
 
@@ -1894,6 +1904,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
struct ip_tunnel_info *info;
struct vxlan_dev *vxlan = netdev_priv(dev);
struct sock *sk = vxlan-vn_sock-sock-sk;
+   unsigned short family = vxlan_get_sk_family(vxlan-vn_sock);
struct rtable *rt = NULL;
const struct iphdr *old_iph;
struct flowi4 fl4;
@@ -1908,7 +1919,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
int err;
u32 flags = vxlan-flags;
 
-   /* FIXME: Support IPv6 */
info = skb_tunnel_info(skb);
 
if (rdst) {
@@ -1924,8 +1934,11 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
 
dst_port = info-key.tp_dst ? : vxlan-cfg.dst_port;
vni = be64_to_cpu(info-key.tun_id);
-   remote_ip.sin.sin_family = AF_INET;
-   remote_ip.sin.sin_addr.s_addr = info-key.u.ipv4.dst;
+   remote_ip.sa.sa_family = family;
+   if (family == AF_INET)
+   remote_ip.sin.sin_addr.s_addr = info-key.u.ipv4.dst;
+   else
+   remote_ip.sin6.sin6_addr = info-key.u.ipv6.dst;
dst = remote_ip;
}
 
@@ -1951,23 +1964,24 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
src_port = udp_flow_src_port(dev_net(dev), skb, vxlan-cfg.port_min,
 vxlan-cfg.port_max, true);
 
+   if (info) {
+   if (info-key.tun_flags  TUNNEL_CSUM)
+   flags |= VXLAN_F_UDP_CSUM;
+   else
+   flags = ~VXLAN_F_UDP_CSUM;
+
+   ttl = info-key.ttl;
+   tos = info-key.tos;
+
+   if (info-options_len)
+   md = ip_tunnel_info_opts(info, sizeof(*md));
+   } else {
+   md-gbp = skb-mark;
+   }
+
if (dst-sa.sa_family == AF_INET) {
-   if (info) {
-   if (info-key.tun_flags  TUNNEL_DONT_FRAGMENT)
-   df = htons(IP_DF);
-   if (info-key.tun_flags  TUNNEL_CSUM)
-   flags |= VXLAN_F_UDP_CSUM;
-   else
-   flags = ~VXLAN_F_UDP_CSUM;
-
-   ttl = info-key.ttl;
-   tos = info-key.tos;
-
-   if (info-options_len)
-   md = ip_tunnel_info_opts(info, sizeof(*md));
-   } else {
-   md-gbp = skb-mark;
-   }
+   if (info  (info-key.tun_flags  TUNNEL_DONT_FRAGMENT))
+   df = htons(IP_DF);
 
memset(fl4, 0, sizeof(fl4));
fl4.flowi4_oif = rdst ? rdst-remote_ifindex : 0;
@@ -2066,12 +2080,10 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
}
 
ttl = ttl ? : ip6_dst_hoplimit(ndst);
-   md-gbp = skb-mark;

[PATCH v3 net-next 06/13] route: move lwtunnel state to dst_entry

2015-08-20 Thread Jiri Benc

Currently, the lwtunnel state resides in per-protocol data. This is
a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa).
The xmit function of the tunnel does not know whether the packet has been
routed to it by ipv4 or ipv6, yet it needs the lwtstate data. Moving the
lwtstate data to dst_entry makes such inter-protocol tunneling possible.

As a bonus, this brings a nice diffstat.

Signed-off-by: Jiri Benc jb...@redhat.com
Acked-by: Roopa Prabhu ro...@cumulusnetworks.com
Acked-by: Thomas Graf tg...@suug.ch
---
 drivers/net/vrf.c  |  1 -
 drivers/net/vxlan.c|  4 +--
 include/net/dst.h  |  3 +-
 include/net/dst_metadata.h | 15 +++--
 include/net/ip6_fib.h  |  1 -
 include/net/lwtunnel.h | 12 
 include/net/route.h|  1 -
 net/core/dst.c |  3 ++
 net/core/filter.c  |  2 +-
 net/core/lwtunnel.c| 70 ++
 net/ipv4/ip_gre.c  |  2 +-
 net/ipv4/route.c   | 20 +---
 net/ipv6/ila.c | 14 +++--
 net/ipv6/ip6_fib.c |  1 -
 net/ipv6/route.c   | 20 ++--
 net/mpls/mpls_iptunnel.c   |  7 ++---
 net/openvswitch/vport-netdev.c |  2 +-
 17 files changed, 48 insertions(+), 130 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index ed208317cbb5..8e03b84dcb7f 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -288,7 +288,6 @@ static struct rtable *vrf_rtable_create(struct net_device 
*dev)
rth-rt_uses_gateway = 0;
INIT_LIST_HEAD(rth-rt_uncached);
rth-rt_uncached_list = NULL;
-   rth-rt_lwtstate = NULL;
}
 
return rth;
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ebeb3def06c5..93613ffd8d7e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1909,7 +1909,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
u32 flags = vxlan-flags;
 
/* FIXME: Support IPv6 */
-   info = skb_tunnel_info(skb, AF_INET);
+   info = skb_tunnel_info(skb);
 
if (rdst) {
dst_port = rdst-remote_port ? rdst-remote_port : 
vxlan-cfg.dst_port;
@@ -2105,7 +2105,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct 
net_device *dev)
struct vxlan_fdb *f;
 
/* FIXME: Support IPv6 */
-   info = skb_tunnel_info(skb, AF_INET);
+   info = skb_tunnel_info(skb);
 
skb_reset_mac_header(skb);
eth = eth_hdr(skb);
diff --git a/include/net/dst.h b/include/net/dst.h
index 2578811cef51..0a9a723f6c19 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -44,6 +44,7 @@ struct dst_entry {
 #else
void*__pad1;
 #endif
+   struct lwtunnel_state   *lwtstate;
int (*input)(struct sk_buff *);
int (*output)(struct sock *sk, struct sk_buff *skb);
 
@@ -89,7 +90,7 @@ struct dst_entry {
 * (L1_CACHE_SIZE would be too much)
 */
 #ifdef CONFIG_64BIT
-   long__pad_to_align_refcnt[2];
+   long__pad_to_align_refcnt[1];
 #endif
/*
 * __refcnt wants to be on a different cache line from
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 075f523ff23f..2cb52d562272 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -23,22 +23,17 @@ static inline struct metadata_dst *skb_metadata_dst(struct 
sk_buff *skb)
return NULL;
 }
 
-static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb,
-int family)
+static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb)
 {
struct metadata_dst *md_dst = skb_metadata_dst(skb);
-   struct rtable *rt;
+   struct dst_entry *dst;
 
if (md_dst)
return md_dst-u.tun_info;
 
-   switch (family) {
-   case AF_INET:
-   rt = (struct rtable *)skb_dst(skb);
-   if (rt  rt-rt_lwtstate)
-   return lwt_tun_info(rt-rt_lwtstate);
-   break;
-   }
+   dst = skb_dst(skb);
+   if (dst  dst-lwtstate)
+   return lwt_tun_info(dst-lwtstate);
 
return NULL;
 }
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 276328e3daa6..063d30474cf6 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -133,7 +133,6 @@ struct rt6_info {
/* more non-fragment space at head required */
unsigned short  rt6i_nfheader_len;
u8  rt6i_protocol;
-   struct lwtunnel_state   *rt6i_lwtstate;
 };
 
 static inline struct inet6_dev *ip6_dst_idev(struct dst_entry *dst)
diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index cfee53916ba5..843489884448 100644
---

[PATCH v3 net-next 13/13] ipv6: route: per route IP tunnel metadata via lightweight tunnel

2015-08-20 Thread Jiri Benc

Allow specification of per route IP tunnel instructions also for IPv6.
This complements commit 3093fbe7ff4b (route: Per route IP tunnel metadata
via lightweight tunnel).

Signed-off-by: Jiri Benc jb...@redhat.com
CC: YOSHIFUJI Hideaki hideaki.yoshif...@miraclelinux.com
Acked-by: Thomas Graf tg...@suug.ch
---
v2-v3: Moved LWTUNNEL_ENCAP_IP6 to the end of the enum.
---
 include/uapi/linux/lwtunnel.h |  16 +++
 net/ipv4/ip_tunnel_core.c | 102 ++
 2 files changed, 118 insertions(+)

diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index aa84ca396bcb..34141a5dfe74 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -8,6 +8,7 @@ enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_MPLS,
LWTUNNEL_ENCAP_IP,
LWTUNNEL_ENCAP_ILA,
+   LWTUNNEL_ENCAP_IP6,
__LWTUNNEL_ENCAP_MAX,
 };
 
@@ -28,4 +29,19 @@ enum lwtunnel_ip_t {
 
 #define LWTUNNEL_IP_MAX (__LWTUNNEL_IP_MAX - 1)
 
+enum lwtunnel_ip6_t {
+   LWTUNNEL_IP6_UNSPEC,
+   LWTUNNEL_IP6_ID,
+   LWTUNNEL_IP6_DST,
+   LWTUNNEL_IP6_SRC,
+   LWTUNNEL_IP6_HOPLIMIT,
+   LWTUNNEL_IP6_TC,
+   LWTUNNEL_IP6_SPORT,
+   LWTUNNEL_IP6_DPORT,
+   LWTUNNEL_IP6_FLAGS,
+   __LWTUNNEL_IP6_MAX,
+};
+
+#define LWTUNNEL_IP6_MAX (__LWTUNNEL_IP6_MAX - 1)
+
 #endif /* _UAPI_LWTUNNEL_H_ */
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index f0514e39e57c..289b6c26ce37 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -299,9 +299,111 @@ static const struct lwtunnel_encap_ops ip_tun_lwt_ops = {
.cmp_encap = ip_tun_cmp_encap,
 };
 
+static const struct nla_policy ip6_tun_policy[LWTUNNEL_IP6_MAX + 1] = {
+   [LWTUNNEL_IP6_ID]   = { .type = NLA_U64 },
+   [LWTUNNEL_IP6_DST]  = { .len = sizeof(struct in6_addr) },
+   [LWTUNNEL_IP6_SRC]  = { .len = sizeof(struct in6_addr) },
+   [LWTUNNEL_IP6_HOPLIMIT] = { .type = NLA_U8 },
+   [LWTUNNEL_IP6_TC]   = { .type = NLA_U8 },
+   [LWTUNNEL_IP6_SPORT]= { .type = NLA_U16 },
+   [LWTUNNEL_IP6_DPORT]= { .type = NLA_U16 },
+   [LWTUNNEL_IP6_FLAGS]= { .type = NLA_U16 },
+};
+
+static int ip6_tun_build_state(struct net_device *dev, struct nlattr *attr,
+  struct lwtunnel_state **ts)
+{
+   struct ip_tunnel_info *tun_info;
+   struct lwtunnel_state *new_state;
+   struct nlattr *tb[LWTUNNEL_IP6_MAX + 1];
+   int err;
+
+   err = nla_parse_nested(tb, LWTUNNEL_IP6_MAX, attr, ip6_tun_policy);
+   if (err  0)
+   return err;
+
+   new_state = lwtunnel_state_alloc(sizeof(*tun_info));
+   if (!new_state)
+   return -ENOMEM;
+
+   new_state-type = LWTUNNEL_ENCAP_IP6;
+
+   tun_info = lwt_tun_info(new_state);
+
+   if (tb[LWTUNNEL_IP6_ID])
+   tun_info-key.tun_id = nla_get_u64(tb[LWTUNNEL_IP6_ID]);
+
+   if (tb[LWTUNNEL_IP6_DST])
+   tun_info-key.u.ipv6.dst = 
nla_get_in6_addr(tb[LWTUNNEL_IP6_DST]);
+
+   if (tb[LWTUNNEL_IP6_SRC])
+   tun_info-key.u.ipv6.src = 
nla_get_in6_addr(tb[LWTUNNEL_IP6_SRC]);
+
+   if (tb[LWTUNNEL_IP6_HOPLIMIT])
+   tun_info-key.ttl = nla_get_u8(tb[LWTUNNEL_IP6_HOPLIMIT]);
+
+   if (tb[LWTUNNEL_IP6_TC])
+   tun_info-key.tos = nla_get_u8(tb[LWTUNNEL_IP6_TC]);
+
+   if (tb[LWTUNNEL_IP6_SPORT])
+   tun_info-key.tp_src = nla_get_be16(tb[LWTUNNEL_IP6_SPORT]);
+
+   if (tb[LWTUNNEL_IP6_DPORT])
+   tun_info-key.tp_dst = nla_get_be16(tb[LWTUNNEL_IP6_DPORT]);
+
+   if (tb[LWTUNNEL_IP6_FLAGS])
+   tun_info-key.tun_flags = nla_get_u16(tb[LWTUNNEL_IP6_FLAGS]);
+
+   tun_info-mode = IP_TUNNEL_INFO_TX;
+   tun_info-options = NULL;
+   tun_info-options_len = 0;
+
+   *ts = new_state;
+
+   return 0;
+}
+
+static int ip6_tun_fill_encap_info(struct sk_buff *skb,
+  struct lwtunnel_state *lwtstate)
+{
+   struct ip_tunnel_info *tun_info = lwt_tun_info(lwtstate);
+
+   if (nla_put_u64(skb, LWTUNNEL_IP6_ID, tun_info-key.tun_id) ||
+   nla_put_in6_addr(skb, LWTUNNEL_IP6_DST, tun_info-key.u.ipv6.dst) 
||
+   nla_put_in6_addr(skb, LWTUNNEL_IP6_SRC, tun_info-key.u.ipv6.src) 
||
+   nla_put_u8(skb, LWTUNNEL_IP6_HOPLIMIT, tun_info-key.tos) ||
+   nla_put_u8(skb, LWTUNNEL_IP6_TC, tun_info-key.ttl) ||
+   nla_put_u16(skb, LWTUNNEL_IP6_SPORT, tun_info-key.tp_src) ||
+   nla_put_u16(skb, LWTUNNEL_IP6_DPORT, tun_info-key.tp_dst) ||
+   nla_put_u16(skb, LWTUNNEL_IP6_FLAGS, tun_info-key.tun_flags))
+   return -ENOMEM;
+
+   return 0;
+}
+
+static int ip6_tun_encap_nlsize(struct lwtunnel_state *lwtstate)
+{
+   return nla_total_size(8)/* LWTUNNEL_IP6_ID */
+   +

RE: [PATCH net-next 2/3] qeth: Convert use of __constant_htons to htons

2015-08-20 Thread David Laight

From: Ursula Braun
 Sent: 19 August 2015 09:21
 In little endian cases, the macro htons unfolds to __swab16 which
 provides special case for constants. In big endian cases,
 __constant_htons and htons expand directly to the same expression.
 So, replace __constant_htons with htons with the goal of getting
 rid of the definition of __constant_htons completely.
...
 diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
 index 70eb2f6..ecfe622 100644
 --- a/drivers/s390/net/qeth_l3_main.c
 +++ b/drivers/s390/net/qeth_l3_main.c
 @@ -1887,13 +1887,13 @@ static inline int qeth_l3_rebuild_skb(struct 
 qeth_card *card,
   case QETH_CAST_MULTICAST:
   switch (prot) {
  #ifdef CONFIG_QETH_IPV6
 - case __constant_htons(ETH_P_IPV6):
 + case htons(ETH_P_IPV6):

I didn't think htons() was 'constant enough' to be used as a case label.

Using byteswapped constants in a case statement can change it from being
implemented as a jump table to a branch tree.
This might be more expensive than byteswapping the value (even on systems
that don't have cheap byteswap instructions).

David

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH, net-next] r8169:Avoid to use I/O Space access on new design and others

2015-08-20 Thread Corcodel Marian

Avoid to use on pci express interface I/O Space access   
 and do not enable Bus Master is set on BIOS.   Warning: Do not apply this
 patch must   apply previously patch  first.
To: netdev@vger.kernel.org

Signed-off-by: Corcodel Marian corcodel.mar...@gmail.com

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index b1fb54f..6cd7226 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -8171,6 +8171,9 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
if (pci_is_pcie(pdev))
pci_intx(pdev, 0);
 
+   if (pci_is_pcie(pdev))
+   pci_write_config_word(pdev, PCI_COMMAND, ~PCI_COMMAND_IO);
+
/* Identify chip attached to board */
rtl8169_get_mac_version(tp, dev, cfg-default_ver);
 
@@ -8183,8 +8186,8 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
rtl_hw_reset(tp);
 
rtl_ack_events(tp, 0x);
-
-   pci_set_master(pdev);
+   if (!pci_is_pcie(pdev))
+   pci_set_master(pdev);
 
rtl_init_mdio_ops(tp);
rtl_init_pll_power_ops(tp);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH net-next 2/3] qeth: Convert use of __constant_htons to htons

2015-08-20 Thread David Laight

From: Ursula Braun [mailto:ubr...@linux.vnet.ibm.com]
 Sent: 20 August 2015 12:44
 On Thu, 2015-08-20 at 10:46 +, David Laight wrote:
  From: Ursula Braun
   Sent: 19 August 2015 09:21
   In little endian cases, the macro htons unfolds to __swab16 which
   provides special case for constants. In big endian cases,
   __constant_htons and htons expand directly to the same expression.
   So, replace __constant_htons with htons with the goal of getting
   rid of the definition of __constant_htons completely.
  ...
   diff --git a/drivers/s390/net/qeth_l3_main.c 
   b/drivers/s390/net/qeth_l3_main.c
   index 70eb2f6..ecfe622 100644
   --- a/drivers/s390/net/qeth_l3_main.c
   +++ b/drivers/s390/net/qeth_l3_main.c
   @@ -1887,13 +1887,13 @@ static inline int qeth_l3_rebuild_skb(struct 
   qeth_card *card,
 case QETH_CAST_MULTICAST:
 switch (prot) {
#ifdef CONFIG_QETH_IPV6
   - case __constant_htons(ETH_P_IPV6):
   + case htons(ETH_P_IPV6):

  I didn't think htons() was 'constant enough' to be used as a case label.

  Using byteswapped constants in a case statement can change it from being
  implemented as a jump table to a branch tree.
  This might be more expensive than byteswapping the value (even on systems
  that don't have cheap byteswap instructions).

  David

 For big endian systems both __constant_htons(x) and htons(x) are
 resolved to ((__force __be16)(__u16)(x)). Thus I do not see a reason to
 reject the patch proposal from Vaishali Thakkar.

Look at a little-endian one (eg amd64).
I think you'll find a C ?: expression that uses __builtin_constant() to
select between an expression the compiler can evaluate and a call to an
inline function that uses some appropriate asm.
The latter isn't a compile-time constant so can't be used as a case
label or as an initialiser.

David

Re: [PATCH] can: flexcan: demote register output to debug level

2015-08-20 Thread Marc Kleine-Budde

On 08/07/2015 05:16 PM, Lucas Stach wrote:
 This message isn't really helpful for the general reader of the kernel
 logs, so should not be printed with info level. All other register
 programming outputs in the flexcan driver already use the debug level.
 
 Signed-off-by: Lucas Stach l.st...@pengutronix.de

Added to can-next.

Thanks,
Marc

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature

RE: [PATCH net-next 2/3] qeth: Convert use of __constant_htons to htons

2015-08-20 Thread Ursula Braun

On Thu, 2015-08-20 at 10:46 +, David Laight wrote:
 From: Ursula Braun
  Sent: 19 August 2015 09:21
  In little endian cases, the macro htons unfolds to __swab16 which
  provides special case for constants. In big endian cases,
  __constant_htons and htons expand directly to the same expression.
  So, replace __constant_htons with htons with the goal of getting
  rid of the definition of __constant_htons completely.
 ...
  diff --git a/drivers/s390/net/qeth_l3_main.c 
  b/drivers/s390/net/qeth_l3_main.c
  index 70eb2f6..ecfe622 100644
  --- a/drivers/s390/net/qeth_l3_main.c
  +++ b/drivers/s390/net/qeth_l3_main.c
  @@ -1887,13 +1887,13 @@ static inline int qeth_l3_rebuild_skb(struct 
  qeth_card *card,
  case QETH_CAST_MULTICAST:
  switch (prot) {
   #ifdef CONFIG_QETH_IPV6
  -   case __constant_htons(ETH_P_IPV6):
  +   case htons(ETH_P_IPV6):
 
 I didn't think htons() was 'constant enough' to be used as a case label.
 
 Using byteswapped constants in a case statement can change it from being
 implemented as a jump table to a branch tree.
 This might be more expensive than byteswapping the value (even on systems
 that don't have cheap byteswap instructions).
 
   David
 
For big endian systems both __constant_htons(x) and htons(x) are
resolved to ((__force __be16)(__u16)(x)). Thus I do not see a reason to
reject the patch proposal from Vaishali Thakkar.

Ursula

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 01/13] ip_tunnels: remove custom alignment and packing

2015-08-20 Thread Jiri Benc

The custom alignment of struct ip_tunnel_key is unnecessary. In struct
sw_flow_key, it starts at offset 256, in struct ip_tunnel_info it's the
first field.

The structure is also packed even without the __packed keyword.

Signed-off-by: Jiri Benc jb...@redhat.com
Acked-by: Thomas Graf tg...@suug.ch
---
 include/net/ip_tunnels.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 984dbfa15e13..81cf11c931e4 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -36,7 +36,7 @@ struct ip_tunnel_key {
__u8ipv4_ttl;
__be16  tp_src;
__be16  tp_dst;
-} __packed __aligned(4); /* Minimize padding. */
+};
 
 /* Indicates whether the tunnel info structure represents receive
  * or transmit tunnel parameters.
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 03/13] ip_tunnels: use offsetofend

2015-08-20 Thread Jiri Benc

Signed-off-by: Jiri Benc jb...@redhat.com
Acked-by: Thomas Graf tg...@suug.ch
---
 include/net/ip_tunnels.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index ca173f22f07f..cc3b39e9010b 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -23,9 +23,7 @@
 #define IPTUNNEL_ERR_TIMEO (30*HZ)
 
 /* Used to memset ip_tunnel padding. */
-#define IP_TUNNEL_KEY_SIZE \
-   (offsetof(struct ip_tunnel_key, tp_dst) +   \
-FIELD_SIZEOF(struct ip_tunnel_key, tp_dst))
+#define IP_TUNNEL_KEY_SIZE offsetofend(struct ip_tunnel_key, tp_dst)
 
 struct ip_tunnel_key {
__be64  tun_id;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 00/13] lwtunnel: per route ipv6 support for vxlan

2015-08-20 Thread Jiri Benc

v3: Moved LWTUNNEL_ENCAP_IP6 definition in patch 13.
v2: Fixed issues in patch 4 pointed out by Alexei.

This series enables IPv6 tunnels based on lwtunnel infrastructure. Only
vxlan is supported for now.

Tested in all combinations of IPv4 over IPv6, IPv6 over IPv4 and IPv6 over
IPv6.

Jiri Benc (13):
  ip_tunnels: remove custom alignment and packing
  ip_tunnels: use u8/u16/u32
  ip_tunnels: use offsetofend
  ip_tunnels: add IPv6 addresses to ip_tunnel_key
  ip_tunnels: use tos and ttl fields also for IPv6
  route: move lwtunnel state to dst_entry
  ipv6: drop metadata dst in ip6_route_input
  ipv6: ndisc: inherit metadata dst when creating ndisc requests
  vxlan: provide access function for vxlan socket address family
  vxlan: do not shadow flags variable
  vxlan: metadata based tunneling for IPv6
  ipv6: route: extend flow representation with tunnel key
  ipv6: route: per route IP tunnel metadata via lightweight tunnel

 drivers/net/vrf.c  |   1 -
 drivers/net/vxlan.c|  89 +--
 include/net/dst.h  |   3 +-
 include/net/dst_metadata.h |  15 ++
 include/net/flow.h |   1 +
 include/net/ip6_fib.h  |   1 -
 include/net/ip_tunnels.h   |  50 ++---
 include/net/lwtunnel.h |  12 -
 include/net/ndisc.h|   3 +-
 include/net/route.h|   1 -
 include/net/vxlan.h|   5 ++
 include/uapi/linux/lwtunnel.h  |  16 ++
 net/core/dst.c |   3 ++
 net/core/filter.c  |   6 +--
 net/core/lwtunnel.c|  70 
 net/ipv4/ip_gre.c  |  20 +++
 net/ipv4/ip_tunnel_core.c  | 118 ++---
 net/ipv4/route.c   |  20 +++
 net/ipv6/addrconf.c|   2 +-
 net/ipv6/ila.c |  14 ++---
 net/ipv6/ip6_fib.c |   1 -
 net/ipv6/ndisc.c   |  10 ++--
 net/ipv6/route.c   |  29 ++
 net/mpls/mpls_iptunnel.c   |   7 +--
 net/openvswitch/flow_netlink.c |  28 +-
 net/openvswitch/flow_table.c   |   2 +-
 net/openvswitch/vport-geneve.c |   4 +-
 net/openvswitch/vport-netdev.c |   2 +-
 net/openvswitch/vport.c|   6 +--
 net/openvswitch/vport.h|   6 +--
 30 files changed, 312 insertions(+), 233 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 05/13] ip_tunnels: use tos and ttl fields also for IPv6

2015-08-20 Thread Jiri Benc

Rename the ipv4_tos and ipv4_ttl fields to just 'tos' and 'ttl', as they'll
be used with IPv6 tunnels, too.

Signed-off-by: Jiri Benc jb...@redhat.com
Acked-by: Thomas Graf tg...@suug.ch
---
 drivers/net/vxlan.c|  8 
 include/net/ip_tunnels.h   |  8 
 net/ipv4/ip_gre.c  |  8 
 net/ipv4/ip_tunnel_core.c  |  8 
 net/openvswitch/flow_netlink.c | 10 +-
 net/openvswitch/vport-geneve.c |  4 ++--
 net/openvswitch/vport.c|  4 ++--
 net/openvswitch/vport.h|  2 +-
 8 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 30a7abcf2c09..ebeb3def06c5 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1278,8 +1278,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
info = tun_dst-u.tun_info;
info-key.u.ipv4.src = iph-saddr;
info-key.u.ipv4.dst = iph-daddr;
-   info-key.ipv4_tos = iph-tos;
-   info-key.ipv4_ttl = iph-ttl;
+   info-key.tos = iph-tos;
+   info-key.ttl = iph-ttl;
info-key.tp_src = udp_hdr(skb)-source;
info-key.tp_dst = udp_hdr(skb)-dest;
 
@@ -1960,8 +1960,8 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
else
flags = ~VXLAN_F_UDP_CSUM;
 
-   ttl = info-key.ipv4_ttl;
-   tos = info-key.ipv4_tos;
+   ttl = info-key.ttl;
+   tos = info-key.tos;
 
if (info-options_len)
md = ip_tunnel_info_opts(info, sizeof(*md));
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 6a51371dad00..224e4ecec91b 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -44,8 +44,8 @@ struct ip_tunnel_key {
} ipv6;
} u;
__be16  tun_flags;
-   u8  ipv4_tos;
-   u8  ipv4_ttl;
+   u8  tos;/* TOS for IPv4, TC for IPv6 */
+   u8  ttl;/* TTL for IPv4, HL for IPv6 */
__be16  tp_src;
__be16  tp_dst;
 };
@@ -195,8 +195,8 @@ static inline void __ip_tunnel_info_init(struct 
ip_tunnel_info *tun_info,
tun_info-key.u.ipv4.dst = daddr;
memset((unsigned char *)tun_info-key + IP_TUNNEL_KEY_IPV4_PAD,
   0, IP_TUNNEL_KEY_IPV4_PAD_LEN);
-   tun_info-key.ipv4_tos = tos;
-   tun_info-key.ipv4_ttl = ttl;
+   tun_info-key.tos = tos;
+   tun_info-key.ttl = ttl;
tun_info-key.tun_flags = tun_flags;
 
/* For the tunnel types on the top of IPsec, the tp_src and tp_dst of
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index b7bb7d6aa7a8..5193618b2600 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -409,8 +409,8 @@ static int ipgre_rcv(struct sk_buff *skb, const struct 
tnl_ptk_info *tpi)
info = tun_dst-u.tun_info;
info-key.u.ipv4.src = iph-saddr;
info-key.u.ipv4.dst = iph-daddr;
-   info-key.ipv4_tos = iph-tos;
-   info-key.ipv4_ttl = iph-ttl;
+   info-key.tos = iph-tos;
+   info-key.ttl = iph-ttl;
 
info-mode = IP_TUNNEL_INFO_RX;
info-key.tun_flags = tpi-flags 
@@ -529,7 +529,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct 
net_device *dev)
memset(fl, 0, sizeof(fl));
fl.daddr = key-u.ipv4.dst;
fl.saddr = key-u.ipv4.src;
-   fl.flowi4_tos = RT_TOS(key-ipv4_tos);
+   fl.flowi4_tos = RT_TOS(key-tos);
fl.flowi4_mark = skb-mark;
fl.flowi4_proto = IPPROTO_GRE;
 
@@ -565,7 +565,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct 
net_device *dev)
df = key-tun_flags  TUNNEL_DONT_FRAGMENT ?  htons(IP_DF) : 0;
err = iptunnel_xmit(skb-sk, rt, skb, fl.saddr,
key-u.ipv4.dst, IPPROTO_GRE,
-   key-ipv4_tos, key-ipv4_ttl, df, false);
+   key-tos, key-ttl, df, false);
iptunnel_xmit_stats(err, dev-stats, dev-tstats);
return;
 
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 93907d71cda6..f0514e39e57c 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -233,10 +233,10 @@ static int ip_tun_build_state(struct net_device *dev, 
struct nlattr *attr,
tun_info-key.u.ipv4.src = nla_get_be32(tb[LWTUNNEL_IP_SRC]);
 
if (tb[LWTUNNEL_IP_TTL])
-   tun_info-key.ipv4_ttl = nla_get_u8(tb[LWTUNNEL_IP_TTL]);
+   tun_info-key.ttl = nla_get_u8(tb[LWTUNNEL_IP_TTL]);
 
if (tb[LWTUNNEL_IP_TOS])
-

Re: [PATCH] veth: replace iflink by a dedicated symlink in sysfs

2015-08-20 Thread Jiri Benc

On Wed, 19 Aug 2015 18:33:14 +0200, Nicolas Dichtel wrote:
 Probably better to introduce veth netlink attribute then, something like
 IFLA_VETH_PEER and keeps IFLA_LINK_NETNSID.

I'd prefer IFLA_PEER. More generic attribute will be helpful should we
introduce an interface similar to veth in the future.

Also, I'd not combine IFLA_LINK_NETNSID with IFLA_PEER. There might
very well be an interface in the future that will need both IFLA_LINK and
IFLA_PEER and this would just create a confusion. It may be unlikely
but the attributes are cheap and it doesn't make sense to design uAPI
in a way that might bring problems in the future.

 I also don't know what is the best way to handle this. veth advertises
 its peer via IFLA_LINK since 4.1, so it's too late to change it for this
 release.

Apparently we need to pick our poison. Either way, we break something.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

pull-request: can-next 2015-05-06

2015-08-20 Thread Marc Kleine-Budde

Hello David,

this is a pull request of a two patches for net-next.

The first patch is by Nik Nyby and fixes a typo in a function name. The
second patch by Lucas Stach demotes register output to debug level.

regards,
Marc

---

The following changes since commit 824e7383e92815cb591793c74cc836aa5165f7f8:

  lwtunnel: Fix the sparse warnings in fib_encap_match (2015-08-19 17:37:51 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git 
tags/linux-can-next-for-4.3-20150820

for you to fetch changes up to 7a4b6c860e7268a79545c30882928b234dd1655d:

  can: flexcan: demote register output to debug level (2015-08-20 10:52:54 
+0200)


linux-can-next-for-4.3-20150820


Lucas Stach (1):
  can: flexcan: demote register output to debug level

Nik Nyby (1):
  can: gs_usb: Fix typo in function name

 drivers/net/can/flexcan.c| 2 +-
 drivers/net/can/usb/gs_usb.c | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2.2 04/22] fjes: platform_driver's .probe and .remove routine

2015-08-20 Thread David Miller

From: Taku Izumi izumi.t...@jp.fujitsu.com
Date: Thu, 20 Aug 2015 17:46:08 +0900

 +
 +err_register:
 + fjes_hw_exit(adapter-hw);
 +err_hw_init:
 +err_sw_init:
 + free_netdev(netdev);
 +err_alloc_netdev:
 + return err;

Having multiple code labels in the same exact spot is suboptimal.

Instead, name the labels such that they describe the first cleanup
action they will perform, instead of the context in which they are
jumped to from.

So err_hw_exit:, err_free_netdev, and err_out: would be
appropriate.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2.2 01/22] fjes: Introduce FUJITSU Extended Socket Network Device driver

2015-08-20 Thread David Miller

From: Taku Izumi izumi.t...@jp.fujitsu.com
Date: Thu, 20 Aug 2015 17:46:05 +0900

 +obj-$(CONFIG_FUJITSU_ES) += fjes.o
 +
 +fjes-objs := fjes_main.o
 +

Please do not have trailing empty lines in any files you add
or edit, 'git' warns about this even when applying patches.

 +static int fjes_acpi_add(struct acpi_device *device)
 +{
 + acpi_status status;
 + struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL};
 + union acpi_object *str;
 + char str_buf[sizeof(FJES_ACPI_SYMBOL) + 1];
 + int result;
 + struct platform_device *plat_dev;

Please order your local variables in reverse christmas tree order, which
means longer lines come before shorter ones.

Please correct this problem in your entire submission, as I am not going
to point out each and every other place where this problem exists.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT] [4.3] NFC update

2015-08-20 Thread Samuel Ortiz

Hi David,

This is the NFC pull request for 4.3.
With this one we have:

- A new driver for Samsung's S3FWRN5 NFC chipset. In order to
  properly support this driver, a few NCI core routines needed
  to be exported. Future drivers like Intel's Fields Peak will
  benefit from this.

- SPI support as a physical transport for STM st21nfcb.

- An additional netlink API for sending replies back to userspace
  from vendor commands.

- 2 small fixes for TI's trf7970a

- A few st-nci fixes.


The following changes since commit d52736e24fe2e927c26817256f8d1a3c8b5d51a0:

  Merge branch 'vrf-lite' (2015-08-13 22:43:22 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next.git 
tags/nfc-next-4.3-1

for you to fetch changes up to 29e76924cf087bc6a9114a9244828fd13ae959bb:

  nfc: netlink: Add capability to reply to vendor_cmd with data (2015-08-20 
22:00:11 +0200)


Christophe Ricard (14):
  nfc: st-nci: Remove duplicate file platform_data/st_nci.h
  nfc: st-nci: Fix typo when changing from st21nfcb to st-nci
  nfc: st-nci: Fix non accurate comment for st_nci_i2c_read
  NFC: st21nfca: fix use of uninitialized variables in error path
  NFC: st-nci: fix use of uninitialized variables in error path
  nfc: st-nci: Remove data from ack_pending_q when receiving a SYNC_ACK
  nfc: st-nci: Free data with irrelevant NDLC PCB_SYNC value
  nfc: st-nci: Add spi phy support for st21nfcb
  nfc: st-nci: Add device tree documentation for spi phy
  nfc: st-nci: Remove pr_err in rcv_queue when ndlc header is unknown
  nfc: netlink: Add check on NFC_ATTR_VENDOR_DATA
  nfc: netlink: Warning fix
  nfc: nci: hci: Add check on skb nci_hci_send_cmd parameter
  nfc: netlink: Add capability to reply to vendor_cmd with data

Mark Greer (2):
  NFC: trf7970a: SDD_EN is bit 5 not bit 3
  NFC: trf7970a: Add NULL check to clear up smatch warning

Robert Baldyga (3):
  NFC: nci: Add post_setup handler
  NFC: nci: export nci_core_reset and nci_core_init
  nfc: s3fwrn5: Add driver for Samsung S3FWRN5 NFC Chip

 .../devicetree/bindings/net/nfc/s3fwrn5.txt|  27 ++
 .../net/nfc/{st-nci.txt = st-nci-i2c.txt} |   0
 .../devicetree/bindings/net/nfc/st-nci-spi.txt |  31 ++
 MAINTAINERS|   6 +
 drivers/nfc/Kconfig|   1 +
 drivers/nfc/Makefile   |   1 +
 drivers/nfc/s3fwrn5/Kconfig|  19 +
 drivers/nfc/s3fwrn5/Makefile   |  11 +
 drivers/nfc/s3fwrn5/core.c | 219 +
 drivers/nfc/s3fwrn5/firmware.c | 511 +
 drivers/nfc/s3fwrn5/firmware.h | 111 +
 drivers/nfc/s3fwrn5/i2c.c  | 306 
 drivers/nfc/s3fwrn5/nci.c  | 165 +++
 drivers/nfc/s3fwrn5/nci.h  |  89 
 drivers/nfc/s3fwrn5/s3fwrn5.h  |  99 
 drivers/nfc/st-nci/Kconfig |  11 +
 drivers/nfc/st-nci/Makefile|   3 +
 drivers/nfc/st-nci/i2c.c   |  23 +-
 drivers/nfc/st-nci/ndlc.c  |   7 +-
 drivers/nfc/st-nci/spi.c   | 392 
 drivers/nfc/st-nci/st-nci_se.c |   8 +-
 drivers/nfc/st21nfca/st21nfca.c|  11 +-
 drivers/nfc/trf7970a.c |   6 +-
 include/linux/platform_data/st_nci.h   |  29 --
 include/net/nfc/nci_core.h |   3 +
 include/net/nfc/nfc.h  |  41 ++
 net/nfc/nci/core.c |  18 +
 net/nfc/nci/hci.c  |   2 +-
 net/nfc/netlink.c  |  91 +++-
 29 files changed, 2180 insertions(+), 61 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/nfc/s3fwrn5.txt
 rename Documentation/devicetree/bindings/net/nfc/{st-nci.txt = 
st-nci-i2c.txt} (100%)
 create mode 100644 Documentation/devicetree/bindings/net/nfc/st-nci-spi.txt
 create mode 100644 drivers/nfc/s3fwrn5/Kconfig
 create mode 100644 drivers/nfc/s3fwrn5/Makefile
 create mode 100644 drivers/nfc/s3fwrn5/core.c
 create mode 100644 drivers/nfc/s3fwrn5/firmware.c
 create mode 100644 drivers/nfc/s3fwrn5/firmware.h
 create mode 100644 drivers/nfc/s3fwrn5/i2c.c
 create mode 100644 drivers/nfc/s3fwrn5/nci.c
 create mode 100644 drivers/nfc/s3fwrn5/nci.h
 create mode 100644 drivers/nfc/s3fwrn5/s3fwrn5.h
 create mode 100644 drivers/nfc/st-nci/spi.c
 delete mode 100644 include/linux/platform_data/st_nci.h
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at

[PATCH net] net: bcmgenet: Avoid sleeping in bcmgenet_timeout

2015-08-20 Thread Florian Fainelli

bcmgenet_timeout() executes in atomic context, yet we will invoke
napi_disable() which does sleep. Looking back at the changes, disabling
TX napi and re-enabling it is completely useless, since we reclaim all
TX buffers and re-enable interrupts, and wake up the TX queues.

Fixes: 13ea657806cf (net: bcmgenet: improve TX timeout)
Signed-off-by: Florian Fainelli f.faine...@gmail.com
---
Hi David,

Sorry this is caming very late, if this does not make it to 4.2, can you
queue this for the 4.2.1 stable tree when it shows up?

Thank you!

 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 64c1e9db6b0b..12a020c105bb 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -2820,8 +2820,6 @@ static void bcmgenet_timeout(struct net_device *dev)
 
netif_dbg(priv, tx_err, dev, bcmgenet_timeout\n);
 
-   bcmgenet_disable_tx_napi(priv);
-
for (q = 0; q  priv-hw_params-tx_queues; q++)
bcmgenet_dump_tx_queue(priv-tx_rings[q]);
bcmgenet_dump_tx_queue(priv-tx_rings[DESC_INDEX]);
@@ -2837,8 +2835,6 @@ static void bcmgenet_timeout(struct net_device *dev)
bcmgenet_intrl2_0_writel(priv, int0_enable, INTRL2_CPU_MASK_CLEAR);
bcmgenet_intrl2_1_writel(priv, int1_enable, INTRL2_CPU_MASK_CLEAR);
 
-   bcmgenet_enable_tx_napi(priv);
-
dev-trans_start = jiffies;
 
dev-stats.tx_errors++;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next RFC 00/10] socket sendmsg MSG_ZEROCOPY

2015-08-20 Thread David Miller

From: Willem de Bruijn will...@google.com
Date: Thu, 20 Aug 2015 10:36:39 -0400

 Datapath integrity does not otherwise depend on payload, with three
 exceptions: checksums, optional sk_filter/tc u32/.. and device +
 driver logic. The effect of wrong checksums is limited to the
 misbehaving process. Filters may have to be addressed by inserting a
 preventative skb_copy_ubufs(). Device drivers can be whitelisted,
 similar to scatter-gather support (NETIF_F_SG).

Consider a userland NFS implementation sending over loopback while
constantly modifying the page.  The sunrpc code could be tricked into
seeing one thing during validation of the RPC headers then doing
another after the user makes changes.

I really don't think this is completely safe as-is.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2.2 02/22] fjes: Hardware initialization routine

2015-08-20 Thread David Miller

From: Taku Izumi izumi.t...@jp.fujitsu.com
Date: Thu, 20 Aug 2015 17:46:06 +0900

 diff --git a/drivers/net/fjes/fjes.h b/drivers/net/fjes/fjes.h
 index 4622da1..15ded96 100644
 --- a/drivers/net/fjes/fjes.h
 +++ b/drivers/net/fjes/fjes.h
 @@ -28,6 +28,6 @@
  
  extern char fjes_driver_name[];
  extern char fjes_driver_version[];
 -extern u32 fjes_support_mtu[];
 +extern const u32 fjes_support_mtu[];
  
  #endif /* FJES_H_ */

This is kind of rediculous.  Just declare it 'const' from the start in the
patch where you add it for the first time.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net af_key: Fix RCU splat

2015-08-20 Thread David Ahern


On 8/20/15 9:51 AM, Eric Dumazet wrote:

On Thu, 2015-08-20 at 08:51 -0700, David Ahern wrote:

Hit the following splat testing VRF change for ipsec:

[  113.475692] ===
[  113.476194] [ INFO: suspicious RCU usage. ]
[  113.476667] 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED 
Not tainted
[  113.477545] ---
[  113.478013] /work/monster-14/dsa/kernel.git/include/linux/rcupdate.h:568 
Illegal context switch in RCU read-side critical section!
[  113.479288]
[  113.479288] other info that might help us debug this:
[  113.479288]
[  113.480207]
[  113.480207] rcu_scheduler_active = 1, debug_locks = 1
[  113.480931] 2 locks held by setkey/6829:
[  113.481371]  #0:  (net-xfrm.xfrm_cfg_mutex){+.+.+.}, at: 
[814e9887] pfkey_sendmsg+0xfb/0x213
[  113.482509]  #1:  (rcu_read_lock){..}, at: [814e767f] 
rcu_read_lock+0x0/0x6e
[  113.483509]
[  113.483509] stack backtrace:
[  113.484041] CPU: 0 PID: 6829 Comm: setkey Not tainted 
4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED
[  113.485422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[  113.486845]  0001 88001d4c7a98 81518af2 
81086962
[  113.487732]  88001d538480 88001d4c7ac8 8107ae75 
8180a154
[  113.488628]  0b30  00d0 
88001d4c7ad8
[  113.489525] Call Trace:
[  113.489813]  [81518af2] dump_stack+0x4c/0x65
[  113.490389]  [81086962] ? console_unlock+0x3d6/0x405
[  113.491039]  [8107ae75] lockdep_rcu_suspicious+0xfa/0x103
[  113.491735]  [81064032] rcu_preempt_sleep_check+0x45/0x47
[  113.492442]  [8106404d] ___might_sleep+0x19/0x1c8
[  113.493077]  [81064268] __might_sleep+0x6c/0x82
[  113.493681]  [81133190] 
cache_alloc_debugcheck_before.isra.50+0x1d/0x24
[  113.494508]  [81134876] kmem_cache_alloc+0x31/0x18f
[  113.495149]  [814012b5] skb_clone+0x64/0x80
[  113.495712]  [814e6f71] pfkey_broadcast_one+0x3d/0xff
[  113.496380]  [814e7b84] pfkey_broadcast+0xb5/0x11e
[  113.497024]  [814e82d1] pfkey_register+0x191/0x1b1
[  113.497653]  [814e9770] pfkey_process+0x162/0x17e
[  113.498274]  [814e9895] pfkey_sendmsg+0x109/0x213

In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes
the RCU lock. Fix by using GFP_ATOMIC for the allocation flag.

Signed-off-by: David Ahern d...@cumulusnetworks.com
---
  net/key/af_key.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index b397f0aa9005..73527e7dd247 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1670,7 +1670,7 @@ static int pfkey_register(struct sock *sk, struct sk_buff 
*skb, const struct sad
return -ENOBUFS;
}

-   pfkey_broadcast(supp_skb, GFP_KERNEL, BROADCAST_REGISTERED, sk, 
sock_net(sk));
+   pfkey_broadcast(supp_skb, GFP_ATOMIC, BROADCAST_REGISTERED, sk, 
sock_net(sk));

return 0;
  }


I would rather remove the useless rcu locking from pfkey_broadcast() if
a mutex properly protects the thing.


rcu_read_lock was added by Stephen with 7f6b9dbd5afbd. It does not 
appear the net-xfrm.xfrm_cfg_mutex mutex added by 283bc9f35bbbc 
properly covers the locking. ie., the rcu_read_lock is needed.


David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] 3c59x: Add BQL support for 3c59x ethernet driver.

2015-08-20 Thread Loganaden Velvindron

This BQL patch is based on work done by Tino Reichardt.

Tested on :05:00.0: 3Com PCI 3c905C Tornado at c9e6e000 by running
Flent several times.


Signed-off-by: Loganaden Velvindron lo...@elandsys.com
---
 drivers/net/ethernet/3com/3c59x.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/3com/3c59x.c 
b/drivers/net/ethernet/3com/3c59x.c
index 753887d..2839af0 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -1726,6 +1726,7 @@ vortex_up(struct net_device *dev)
if (vp-cb_fn_base) /* The PCMCIA people are 
idiots.  */
iowrite32(0x8000, vp-cb_fn_base + 4);
netif_start_queue (dev);
+   netdev_reset_queue(dev);
 err_out:
return err;
 }
@@ -1935,16 +1936,18 @@ static void vortex_tx_timeout(struct net_device *dev)
if (vp-cur_tx - vp-dirty_tx  0ioread32(ioaddr + 
DownListPtr) == 0)
iowrite32(vp-tx_ring_dma + (vp-dirty_tx % 
TX_RING_SIZE) * sizeof(struct boom_tx_desc),
 ioaddr + DownListPtr);
-   if (vp-cur_tx - vp-dirty_tx  TX_RING_SIZE)
+   if (vp-cur_tx - vp-dirty_tx  TX_RING_SIZE) {
netif_wake_queue (dev);
+   netdev_reset_queue (dev);
+   }
if (vp-drv_flags  IS_BOOMERANG)
iowrite8(PKT_BUF_SZ8, ioaddr + TxFreeThreshold);
iowrite16(DownUnstall, ioaddr + EL3_CMD);
} else {
dev-stats.tx_dropped++;
netif_wake_queue(dev);
+   netdev_reset_queue(dev);
}
-
/* Issue Tx Enable */
iowrite16(TxEnable, ioaddr + EL3_CMD);
dev-trans_start = jiffies; /* prevent tx timeout */
@@ -2063,6 +2066,7 @@ vortex_start_xmit(struct sk_buff *skb, struct net_device 
*dev)
 {
struct vortex_private *vp = netdev_priv(dev);
void __iomem *ioaddr = vp-ioaddr;
+   int skblen = skb-len;
 
/* Put out the doubleword header... */
iowrite32(skb-len, ioaddr + TX_FIFO);
@@ -2094,6 +2098,7 @@ vortex_start_xmit(struct sk_buff *skb, struct net_device 
*dev)
}
}
 
+   netdev_sent_queue(dev, skblen);
 
/* Clear the Tx status stack. */
{
@@ -2125,6 +2130,7 @@ boomerang_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
void __iomem *ioaddr = vp-ioaddr;
/* Calculate the next Tx descriptor entry. */
int entry = vp-cur_tx % TX_RING_SIZE;
+   int skblen = skb-len;
struct boom_tx_desc *prev_entry = vp-tx_ring[(vp-cur_tx-1) % 
TX_RING_SIZE];
unsigned long flags;
dma_addr_t dma_addr;
@@ -2230,6 +2236,8 @@ boomerang_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
 
vp-cur_tx++;
+   netdev_sent_queue(dev, skblen);
+
if (vp-cur_tx - vp-dirty_tx  TX_RING_SIZE - 1) {
netif_stop_queue (dev);
} else {/* Clear previous 
interrupt enable. */
@@ -2267,6 +2275,7 @@ vortex_interrupt(int irq, void *dev_id)
int status;
int work_done = max_interrupt_work;
int handled = 0;
+   unsigned int bytes_compl = 0, pkts_compl = 0;
 
ioaddr = vp-ioaddr;
spin_lock(vp-lock);
@@ -2314,6 +2323,8 @@ vortex_interrupt(int irq, void *dev_id)
if (ioread16(ioaddr + Wn7_MasterStatus)  0x1000) {
iowrite16(0x1000, ioaddr + Wn7_MasterStatus); 
/* Ack the event. */
pci_unmap_single(VORTEX_PCI(vp), 
vp-tx_skb_dma, (vp-tx_skb-len + 3)  ~3, PCI_DMA_TODEVICE);
+   pkts_compl++;
+   bytes_compl += vp-tx_skb-len;
dev_kfree_skb_irq(vp-tx_skb); /* Release the 
transferred buffer */
if (ioread16(ioaddr + TxFree)  1536) {
/*
@@ -2358,6 +2369,7 @@ vortex_interrupt(int irq, void *dev_id)
iowrite16(AckIntr | IntReq | IntLatch, ioaddr + EL3_CMD);
} while ((status = ioread16(ioaddr + EL3_STATUS))  (IntLatch | 
RxComplete));
 
+   netdev_completed_queue(dev, pkts_compl, bytes_compl);
spin_unlock(vp-window_lock);
 
if (vortex_debug  4)
@@ -2382,6 +2394,7 @@ boomerang_interrupt(int irq, void *dev_id)
int status;
int work_done = max_interrupt_work;
int handled = 0;
+   unsigned int bytes_compl = 0, pkts_compl = 0;
 
ioaddr = vp-ioaddr;
 
@@ -2455,6 +2468,8 @@ boomerang_interrupt(int irq, void *dev_id)
pci_unmap_single(VORTEX_PCI(vp),

le32_to_cpu(vp-tx_ring[entry].addr), skb-len, PCI_DMA_TODEVICE);
 #endif
+   pkts_compl++;
+

Re: [PATCH] net af_key: Fix RCU splat

2015-08-20 Thread Eric Dumazet

On Thu, 2015-08-20 at 15:57 -0700, David Ahern wrote:
 On 8/20/15 9:51 AM, Eric Dumazet wrote:
  On Thu, 2015-08-20 at 08:51 -0700, David Ahern wrote:
  Hit the following splat testing VRF change for ipsec:

...

 
  diff --git a/net/key/af_key.c b/net/key/af_key.c
  index b397f0aa9005..73527e7dd247 100644
  --- a/net/key/af_key.c
  +++ b/net/key/af_key.c
  @@ -1670,7 +1670,7 @@ static int pfkey_register(struct sock *sk, struct 
  sk_buff *skb, const struct sad
 return -ENOBUFS;
 }
 
  -  pfkey_broadcast(supp_skb, GFP_KERNEL, BROADCAST_REGISTERED, sk, 
  sock_net(sk));
  +  pfkey_broadcast(supp_skb, GFP_ATOMIC, BROADCAST_REGISTERED, sk, 
  sock_net(sk));
 
 return 0;
}
 
  I would rather remove the useless rcu locking from pfkey_broadcast() if
  a mutex properly protects the thing.
 
 rcu_read_lock was added by Stephen with 7f6b9dbd5afbd. It does not 
 appear the net-xfrm.xfrm_cfg_mutex mutex added by 283bc9f35bbbc 
 properly covers the locking. ie., the rcu_read_lock is needed.

Then please cook a complete patch, and add a 'Fixes: ...' tag


# git grep -n pfkey_broadcast|grep GFP_KERNEL
net/key/af_key.c:336:   pfkey_broadcast(skb, GFP_KERNEL, BROADCAST_ONE, sk, 
sock_net(sk));
net/key/af_key.c:1368:  pfkey_broadcast(resp_skb, GFP_KERNEL, BROADCAST_ONE, 
sk, net);
net/key/af_key.c:1673:  pfkey_broadcast(supp_skb, GFP_KERNEL, 
BROADCAST_REGISTERED, sk, sock_net(sk));
net/key/af_key.c:1850:  pfkey_broadcast(skb, GFP_KERNEL, BROADCAST_ALL, NULL, 
sock_net(sk));
net/key/af_key.c:2773:  pfkey_broadcast(skb_clone(skb, GFP_KERNEL), GFP_KERNEL,

Presumably we should remove gfp_t allocation pfkey_broadcast() argument
if we need to use GFP_ATOMIC in all cases.



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv4 net-next 09/10] openvswitch: Allow matching on conntrack label

2015-08-20 Thread Joe Stringer

On 20 August 2015 at 14:01, Pravin Shelar pshe...@nicira.com wrote:
 On Thu, Aug 20, 2015 at 12:13 PM, Joe Stringer joestrin...@nicira.com wrote:
 On 20 August 2015 at 08:45, Pravin Shelar pshe...@nicira.com wrote:
 On Wed, Aug 19, 2015 at 4:04 PM, Joe Stringer joestrin...@nicira.com 
 wrote:
 Thanks for the review,

 On 19 August 2015 at 14:24, Pravin Shelar pshe...@nicira.com wrote:
 On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer joestrin...@nicira.com 
 wrote:
 Allow matching and setting the conntrack label field. As with ct_mark,
 this is populated by executing the CT action, and is a writable field.
 Specifying a label and optional mask allows the label to be modified,
 which takes effect on the entry found by the lookup of the CT action.

 E.g.: actions:ct(zone=1,label=1)

 This will perform conntrack lookup in zone 1, then modify the label for
 that entry. The conntrack entry itself must be committed using the
 commit flag in the conntrack action flags for this change to persist.



 return false;
  }
 @@ -508,8 +601,12 @@ void ovs_ct_free_action(const struct nlattr *a)

  void ovs_ct_init(struct net *net, struct ovs_ct_perdp_data *data)
  {
 +   unsigned int n_bits = sizeof(struct ovs_key_ct_label) * 
 BITS_PER_BYTE;
 +
 data-xt_v4 = !nf_ct_l3proto_try_module_get(PF_INET);
 data-xt_v6 = !nf_ct_l3proto_try_module_get(PF_INET6);
 +   if (nf_connlabels_get(net, n_bits);
 +   OVS_NLERR(true, Failed to set connlabel length);
  }

 In case of error should we reject conntrack label actions? Otherwise
 user will never see any error. But action could drop packets.

 I suspect that currently errors would be seen from ovs_ct_set_label():

...if (!cl || cl-words * sizeof(long)  OVS_CT_LABEL_LEN)
..return -ENOSPC;

 So, for cmd_execute, userspace would see this. For regular handling,
 pipeline processing would stop (so, drop).

 However, I agree it would be more friendly to have the attribute
 rejected up-front. Just means we'll pass the datapath all the way
 down:
 ovs_nla_get_match()
 -- ovs_key_from_nlattrs()
 -- metadata_from_nlattrs()
 -- ovs_ct_verify()

 Incidentally, we generally don't have the datapath by this point
 (ovs_nla_get_match()). There'd need to be a bit of rearranging in the
 ovs_flow_cmd_* functions, which would include holding the locks for
 longer. Given that the two most common cases are that either A) The
 kernel is configured with connlabel support, and built with support
 for at least 128 bits of label, or B) the kernel is configured without
 connlabel, and this is handled already in ovs_ct_verify(), I don't
 think it's worth making this particular change.

 Actually I do not see need for this to be per datapath property.
 infact there is no need to have struct ovs_ct_perdp_data.
 ovs_ct_init can be called from ovs-module init.
 nf-connlabel bit length is per net-namespace property. So
 nf_connlabels_get()  should be called from ovs namespace init. This
 way you can move xt_label flag to ovs_net. ovs_net can be accessed
 from ovs_flow_cmd_* functions.

That sounds like a tidier approach, I'll roll it into the next version.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: bcmgenet: fix uncleaned dma flags

2015-08-20 Thread Jaedon Shin

 2015. 8. 21., 오전 7:04, Florian Fainelli f.faine...@gmail.com 작성:
 
 On 19/08/15 20:17, Jaedon Shin wrote:
 Clean the dma flags of multiq ring buffer int the interface stop
 process. This patch fixes that the genet is not running while the
 interface is re-enabled.
 
 $ ifup eth0 - running after booting
 $ ifdown eth0
 $ ifup eth0 - not running and occur tx_timeout
 
 The bcmgenet_dma_disable() in bcmgenet_open() do clean ring16 dma flag
 only. If the genet has multiq, the dma register is not cleaned. and
 bcmgenet_init_dma() is not done correctly. in case
 GENET_V2(tx_queues=4), tdma_ctrl has 0x1e after running
 bcmgenet_dma_disable().
 
 It sounds like this should be moved to bcmgenet_dma_disable() where we
 are already modifying DMA_CTRL and returning a dma_ctrl value back to
 the caller, or at the very last bcmgenet_dma_teardown().
 
 Thanks!
 

I will send the changes immediately.

Thanks.

 
 Signed-off-by: Jaedon Shin jaedon.s...@gmail.com
 ---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 
 1 file changed, 16 insertions(+)
 
 diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
 b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
 index 64c1e9db6b0b..81bde6fa70b7 100644
 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
 +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
 @@ -2174,6 +2174,8 @@ static int bcmgenet_dma_teardown(struct bcmgenet_priv 
 *priv)
 
 static void bcmgenet_fini_dma(struct bcmgenet_priv *priv)
 {
 +u32 dma_ctrl;
 +u32 reg;
  int i;
 
  bcmgenet_fini_rx_napi(priv);
 @@ -2182,6 +2184,20 @@ static void bcmgenet_fini_dma(struct bcmgenet_priv 
 *priv)
  /* disable DMA */
  bcmgenet_dma_teardown(priv);
 
 +dma_ctrl = 0;
 +for (i = 0; i  priv-hw_params-rx_queues; i++)
 +dma_ctrl |= (1  (i + DMA_RING_BUF_EN_SHIFT));
 +reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
 +reg = ~dma_ctrl;
 +bcmgenet_rdma_writel(priv, reg, DMA_CTRL);
 +
 +dma_ctrl = 0;
 +for (i = 0; i  priv-hw_params-tx_queues; i++)
 +dma_ctrl |= (1  (i + DMA_RING_BUF_EN_SHIFT));
 +reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
 +reg = ~dma_ctrl;
 +bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
 +
  for (i = 0; i  priv-num_tx_bds; i++) {
  if (priv-tx_cbs[i].skb != NULL) {
  dev_kfree_skb(priv-tx_cbs[i].skb);
 
 
 
 -- 
 Florian

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] net: bcmgenet: fix uncleaned dma flags

2015-08-20 Thread Jaedon Shin

Clean the dma flags of multiq ring buffer int the interface stop
process. This patch fixes that the genet is not running while the
interface is re-enabled.

$ ifup eth0 - running after booting
$ ifdown eth0
$ ifup eth0 - not running and occur tx_timeout

The bcmgenet_dma_disable() in bcmgenet_open() do clean ring16 dma flag
only. If the genet has multiq, the dma register is not cleaned. and
bcmgenet_init_dma() is not done correctly. in case
GENET_V2(tx_queues=4), tdma_ctrl has 0x1e after running
bcmgenet_dma_disable().

Signed-off-by: Jaedon Shin jaedon.s...@gmail.com
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 64c1e9db6b0b..4812565c783e 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -2126,6 +2126,8 @@ static int bcmgenet_dma_teardown(struct bcmgenet_priv 
*priv)
int ret = 0;
int timeout = 0;
u32 reg;
+   u32 dma_ctrl;
+   int i;
 
/* Disable TDMA to stop add more frames in TX DMA */
reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
@@ -2169,6 +2171,20 @@ static int bcmgenet_dma_teardown(struct bcmgenet_priv 
*priv)
ret = -ETIMEDOUT;
}
 
+   dma_ctrl = 0;
+   for (i = 0; i  priv-hw_params-rx_queues; i++)
+   dma_ctrl |= (1  (i + DMA_RING_BUF_EN_SHIFT));
+   reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
+   reg = ~dma_ctrl;
+   bcmgenet_rdma_writel(priv, reg, DMA_CTRL);
+
+   dma_ctrl = 0;
+   for (i = 0; i  priv-hw_params-tx_queues; i++)
+   dma_ctrl |= (1  (i + DMA_RING_BUF_EN_SHIFT));
+   reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
+   reg = ~dma_ctrl;
+   bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
+
return ret;
 }
 
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next RFC 00/10] socket sendmsg MSG_ZEROCOPY

2015-08-20 Thread Willem de Bruijn

On Thu, Aug 20, 2015 at 6:56 PM, David Miller da...@davemloft.net wrote:
 From: Willem de Bruijn will...@google.com
 Date: Thu, 20 Aug 2015 10:36:39 -0400

 Datapath integrity does not otherwise depend on payload, with three
 exceptions: checksums, optional sk_filter/tc u32/.. and device +
 driver logic. The effect of wrong checksums is limited to the
 misbehaving process. Filters may have to be addressed by inserting a
 preventative skb_copy_ubufs(). Device drivers can be whitelisted,
 similar to scatter-gather support (NETIF_F_SG).

 Consider a userland NFS implementation sending over loopback while
 constantly modifying the page.  The sunrpc code could be tricked into
 seeing one thing during validation of the RPC headers then doing
 another after the user makes changes.

 I really don't think this is completely safe as-is.

Sunrpc is a great counter example. Anything that calls
kernel_recvmsg may be problematic, I guess. Copying when
passing to kernel sockets would plug that class of issues.

But there may still be others. Most obvious use case for copy
avoidance is pure device transmit. Excluding loopback may be
a reasonable way to initially limit the attack surface. With a flag
NETIF_F_ZC not supported on lo.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 181 matches

Mail list logo