[PATCH] usbnet: dereference after null check in usbnet_start_xmit() and __usbnet_read_cmd()

2015-08-19 Thread Vivek Kumar Bhagat
usbnet_start_xmit() - If info-tx_fixup is not defined by class driver,
NULL check does not happen for skb pointer and leads to NULL dereference.
__usbnet_read_cmd() - if data pointer is passed as NULL, memcpy will
dereference NULL pointer.

Signed-off-by: Vivek Kumar Bhagat vivek.bha...@samsung.com
---
 drivers/net/usb/usbnet.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 3c86b10..ec4d224 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1294,6 +1294,8 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
 
if (skb)
skb_tx_timestamp(skb);
+   else
+   goto drop;
 
// some devices want funky USB-level framing, for
// win32 driver (usually) and/or hardware quirks
@@ -1906,7 +1908,8 @@ static int __usbnet_read_cmd(struct usbnet *dev, u8 cmd, 
u8 reqtype,
buf = kmalloc(size, GFP_KERNEL);
if (!buf)
goto out;
-   }
+   } else
+   goto out;
 
err = usb_control_msg(dev-udev, usb_rcvctrlpipe(dev-udev, 0),
  cmd, reqtype, value, index, buf, size,
-- 
1.7.9.5

[RFC v2 1/3] iwlwifi: mvm: add real TSO implementation

2015-08-19 Thread Emmanuel Grumbach
The segmentation is done completely in software. The
driver creates several MPDUs out of a single large send.
Each MPDU is a newly allocated SKB.
A page is allocated to create the headers that need to be
duplicated (SNAP / IP / TCP). The WiFi header is in the
header of the newly created SKBs.

type=feature

Change-Id: I238ffa79cacc5bbdacdfbf3e9673c8d4f02b462a
Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com
---
 drivers/net/wireless/iwlwifi/mvm/tx.c | 513 +++---
 1 file changed, 481 insertions(+), 32 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c 
b/drivers/net/wireless/iwlwifi/mvm/tx.c
index 90f0ea1..a63686c 100644
--- a/drivers/net/wireless/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/iwlwifi/mvm/tx.c
@@ -65,6 +65,7 @@
 #include linux/ieee80211.h
 #include linux/etherdevice.h
 #include net/tcp.h
+#include net/ip.h
 
 #include iwl-trans.h
 #include iwl-eeprom-parse.h
@@ -435,32 +436,471 @@ int iwl_mvm_tx_skb_non_sta(struct iwl_mvm *mvm, struct 
sk_buff *skb)
return 0;
 }
 
+/*
+ * Update the IP / TCP headers and recompute the IP header CSUM +
+ * pseudo header CSUM.
+ */
+static void iwl_update_ip_tcph(void *iph, struct tcphdr *tcph, bool ipv6,
+  unsigned int len, unsigned int tcp_seq_offset,
+  u16 num_segment)
+{
+   be32_add_cpu(tcph-seq, tcp_seq_offset);
+
+   if (ipv6) {
+   struct ipv6hdr *iphv6 = iph;
+
+   iphv6-payload_len = cpu_to_be16(len + tcph-doff * 4);
+
+   /* Compute CSUM on the the pseudo-header */
+   tcph-check = ~csum_ipv6_magic(iphv6-saddr, iphv6-daddr,
+  len + tcph-doff * 4,
+  IPPROTO_TCP, 0);
+   } else {
+   struct iphdr *iphv4 = iph;
+
+   iphv4-tot_len =
+   cpu_to_be16(len + tcph-doff * 4 + iphv4-ihl * 4);
+   be16_add_cpu(iphv4-id, num_segment);
+   ip_send_check(iphv4);
+
+   /* Compute CSUM on the the pseudo-header */
+   tcph-check = ~csum_tcpudp_magic(iphv4-saddr, iphv4-daddr,
+len + tcph-doff * 4,
+IPPROTO_TCP, 0);
+   }
+}
+
+/**
+ * struct iwl_lso_splitter - state of the split.
+ * @linear_payload_len: The length of the payload inside the header of the
+ * original GSO skb.
+ * @gso_frag_num: The fragment number from which to take the data in the
+ * original GSO skb.
+ * @gso_payload_len: The length of the payload in the original GSO skb.
+ * @gso_payload_pos: The incrementing position in the payload of the original
+ * GSO skb.
+ * @gso_offset_in_page: The offset in the page of gso_frag_num.
+ * @gso_current_frag_size: The size of gso_frag_num.
+ * @gso_offset_in_frag: The offset in the gso_frag_num.
+ * @frag_in_mpdu: The index of the frag inside the new (split) MPDU.
+ * @mss: The maximal segment size.
+ * @si: Points to the the shared info of the original GSO skb.
+ * @ieee80211_hdr *hdr: Points to the WiFi header.
+ * @gso_nr_frags: The number of frags in the original GSO skb.
+ * @wifi_hdr_iv_len: The length of the WiFi header including IV.
+ * @tcp_fin: True if TCP_FIN is set in the original GSO skb.
+ * @tcp_push: True if TCP_PSH is set in the original GSO skb.
+ */
+struct iwl_lso_splitter {
+   unsigned int linear_payload_len;
+   unsigned int gso_frag_num;
+   unsigned int gso_payload_len;
+   unsigned int gso_payload_pos;
+   unsigned int gso_offset_in_page;
+   unsigned int gso_current_frag_size;
+   unsigned int gso_offset_in_frag;
+   unsigned int frag_in_mpdu;
+   unsigned int mss;
+   struct skb_shared_info *si;
+   struct ieee80211_hdr *hdr;
+   u8 gso_nr_frags;
+   u8 wifi_hdr_iv_len;
+   bool tcp_fin;
+   bool tcp_push;
+};
+
+/*
+ * Adds a TCP segment from skb_gso to skb. All the state is taken from
+ * and fed back to p. This function takes care about the payload only.
+ * This MSDU might already have msdu_sz bytes of payload that come from
+ * the original GSO skb's header.
+ */
+static unsigned int
+iwl_add_tcp_segment(struct iwl_mvm *mvm, struct sk_buff *skb_gso,
+   struct sk_buff *skb, struct iwl_lso_splitter *p,
+   unsigned int msdu_sz)
+{
+   while (msdu_sz  p-mss) {
+   unsigned int frag_sz =
+   min_t(unsigned int, p-gso_current_frag_size,
+ p-mss - msdu_sz);
+
+   if (p-frag_in_mpdu = mvm-trans-max_skb_frags)
+   return msdu_sz;
+
+   skb_add_rx_frag(skb, p-frag_in_mpdu,
+   skb_frag_page(p-si-frags[p-gso_frag_num]),
+   p-gso_offset_in_page, frag_sz, 0);
+
+   /* We just added one frag to the mpdu ... */

[RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi

2015-08-19 Thread Emmanuel Grumbach
We enable TSO to get a lot of data at once to build A-MSDUs.
Our hardware doesn't have (yet) TCP CSUM offload, so we do
it manually. TSO won't be enabled on hardware that don't
support CSUM offload in release code, computing TCP CSUM
in the driver is just a way to start coding the flows.
This is why the CSUM offload implementation in the driver
in so bad in terms of efficiency. I preferred to have the
flows as close as they will be when the hardware will be
able to the CSUM than to try to seek efficiency.
The hardware that will have CSUM offload will still require
the driver to split the skb in software including the
IP / TCP header copy and update etc...

We could have enabled A-MSDU based on xmit-more, but the
rationale of using LSO is that when using pfifo-fast,
the Qdisc gets one packet and dequeues is straight away
which limits the possibility to get a lot of packets at
once. (Am I right here?).

A note about A-MSDUs for non-wireless people:
*
An A-MSDU is a aggregated frame. It is one big 802.11
packet that contains several subframes. Each subframe
is a TCP segment. One A-MSDU is represented by one single
skb which means that we need to copy / duplicate the TCP
/ IP / SNAP headers in one single skb. This is why those
headers are copied to a separate page: that page is added
multiple times to the skb with different offsets. Each
subframes needs at least 2 frags: 1 for the headers, 1 (or
more) for the payload.

I am quite a newbie in skb handling, so I guess that this
code can be improved. I have tested it decently using iperf,
but this doesn't mean that there are no issues using other
applications. We are enabling pktgen on TCP (using patches
that were sent a year ago or so) to test the different
layouts of the skb (payload partition amongst the header
and the different frags).

I'll be very happy to get comments on that code, this is
why I am sending it to netdev as well since the TSO experts
are there :)

Emmanuel Grumbach (3):
  iwlwifi: mvm: add real TSO implementation
  iwlwifi: mvm: allow to create A-MSDUs from a large send
  iwlwifi: mvm: transfer the truesize to the last TSO segment

 drivers/net/wireless/iwlwifi/mvm/mac80211.c |   3 +-
 drivers/net/wireless/iwlwifi/mvm/sta.c  |   4 +-
 drivers/net/wireless/iwlwifi/mvm/sta.h  |   6 +-
 drivers/net/wireless/iwlwifi/mvm/tx.c   | 669 ++--
 4 files changed, 647 insertions(+), 35 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment

2015-08-19 Thread Emmanuel Grumbach
This allows to release the backpressure on the socket only
when the last segment is released.
Now the truesize looks like this:
if the truesize of the original skb is 65420, all the
segments will have a truesize of 704 (skb itself) and the
last one will have 65420.

Change-Id: I3c894cf2afc0aedfe7b2a5b992ba41653ff79c0e
Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com
---
 drivers/net/wireless/iwlwifi/mvm/tx.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c 
b/drivers/net/wireless/iwlwifi/mvm/tx.c
index 5046833..046e50d 100644
--- a/drivers/net/wireless/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/iwlwifi/mvm/tx.c
@@ -764,7 +764,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct 
sk_buff *skb_gso,
bool ipv6 = skb_shinfo(skb_gso)-gso_type  SKB_GSO_TCPV6;
struct iwl_lso_splitter s = {};
struct page *hdr_page;
-   unsigned int mpdu_sz;
+   unsigned int mpdu_sz, sum_truesize = 0;
u8 *hdr_page_pos, *qc, tid;
int i, ret;
 
@@ -898,6 +898,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct 
sk_buff *skb_gso,
mpdu_sz, tcp_hdrlen(skb_gso));
 
__skb_queue_tail(mpdus_skb, skb_gso);
+   sum_truesize += skb_gso-truesize;
 
/* mss bytes have been consumed from the data */
s.gso_payload_pos = s.mss;
@@ -1034,6 +1035,20 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct 
sk_buff *skb_gso,
}
 
__skb_queue_tail(mpdus_skb, skb);
+   sum_truesize += skb-truesize;
+   }
+
+   /* Release the backpressure on the socket only when
+* the last segment is released.
+*/
+   if (skb_gso-destructor == sock_wfree) {
+   struct sk_buff *tail = mpdus_skb-prev;
+
+   swap(tail-truesize, skb_gso-truesize);
+   swap(tail-destructor, skb_gso-destructor);
+   swap(tail-sk, skb_gso-sk);
+atomic_add(sum_truesize - skb_gso-truesize,
+   skb_gso-sk-sk_wmem_alloc);
}
 
ret = 0;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 2/3] iwlwifi: mvm: allow to create A-MSDUs from a large send

2015-08-19 Thread Emmanuel Grumbach
Now that we can get a big chunk of data from the network
stack, we can create an A-MSDU out of it. The purpose is to
get a throughput improvement since sending one single A-MSDU
is more efficient than sending several MSDUs at least under
ideal link conditions.

type=feature

Change-Id: I5ea1b1132a57542187cd4c34c5299dbf44fe8b01
Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com
---
 drivers/net/wireless/iwlwifi/mvm/mac80211.c |   3 +-
 drivers/net/wireless/iwlwifi/mvm/sta.c  |   4 +-
 drivers/net/wireless/iwlwifi/mvm/sta.h  |   6 +-
 drivers/net/wireless/iwlwifi/mvm/tx.c   | 159 ++--
 4 files changed, 160 insertions(+), 12 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/mvm/mac80211.c 
b/drivers/net/wireless/iwlwifi/mvm/mac80211.c
index 3dd4e97..dd15e04 100644
--- a/drivers/net/wireless/iwlwifi/mvm/mac80211.c
+++ b/drivers/net/wireless/iwlwifi/mvm/mac80211.c
@@ -925,7 +925,8 @@ static int iwl_mvm_mac_ampdu_action(struct ieee80211_hw *hw,
ret = iwl_mvm_sta_tx_agg_flush(mvm, vif, sta, tid);
break;
case IEEE80211_AMPDU_TX_OPERATIONAL:
-   ret = iwl_mvm_sta_tx_agg_oper(mvm, vif, sta, tid, buf_size);
+   ret = iwl_mvm_sta_tx_agg_oper(mvm, vif, sta, tid,
+ buf_size, amsdu);
break;
default:
WARN_ON_ONCE(1);
diff --git a/drivers/net/wireless/iwlwifi/mvm/sta.c 
b/drivers/net/wireless/iwlwifi/mvm/sta.c
index df216cd..606fc09 100644
--- a/drivers/net/wireless/iwlwifi/mvm/sta.c
+++ b/drivers/net/wireless/iwlwifi/mvm/sta.c
@@ -976,7 +976,8 @@ int iwl_mvm_sta_tx_agg_start(struct iwl_mvm *mvm, struct 
ieee80211_vif *vif,
 }
 
 int iwl_mvm_sta_tx_agg_oper(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
-   struct ieee80211_sta *sta, u16 tid, u8 buf_size)
+   struct ieee80211_sta *sta, u16 tid, u8 buf_size,
+   bool amsdu)
 {
struct iwl_mvm_sta *mvmsta = iwl_mvm_sta_from_mac80211(sta);
struct iwl_mvm_tid_data *tid_data = mvmsta-tid_data[tid];
@@ -995,6 +996,7 @@ int iwl_mvm_sta_tx_agg_oper(struct iwl_mvm *mvm, struct 
ieee80211_vif *vif,
queue = tid_data-txq_id;
tid_data-state = IWL_AGG_ON;
mvmsta-agg_tids |= BIT(tid);
+   tid_data-amsdu_in_ampdu_allowed = amsdu;
tid_data-ssn = 0x;
spin_unlock_bh(mvmsta-lock);
 
diff --git a/drivers/net/wireless/iwlwifi/mvm/sta.h 
b/drivers/net/wireless/iwlwifi/mvm/sta.h
index eedb215..26d1e31 100644
--- a/drivers/net/wireless/iwlwifi/mvm/sta.h
+++ b/drivers/net/wireless/iwlwifi/mvm/sta.h
@@ -258,6 +258,8 @@ enum iwl_mvm_agg_state {
  * Tx response (TX_CMD), and the block ack notification (COMPRESSED_BA).
  * @reduced_tpc: Reduced tx power. Holds the data between the
  * Tx response (TX_CMD), and the block ack notification (COMPRESSED_BA).
+ * @amsdu_in_ampdu_allowed: true if A-MSDU in A-MPDU is allowed. Relevant only
+ * if state is %IWL_AGG_ON.
  * @state: state of the BA agreement establishment / tear down.
  * @txq_id: Tx queue used by the BA session
  * @ssn: the first packet to be sent in AGG HW queue in Tx AGG start flow, or
@@ -272,6 +274,7 @@ struct iwl_mvm_tid_data {
/* The rest is Tx AGG related */
u32 rate_n_flags;
u8 reduced_tpc;
+   bool amsdu_in_ampdu_allowed;
enum iwl_mvm_agg_state state;
u16 txq_id;
u16 ssn;
@@ -387,7 +390,8 @@ int iwl_mvm_sta_rx_agg(struct iwl_mvm *mvm, struct 
ieee80211_sta *sta,
 int iwl_mvm_sta_tx_agg_start(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
struct ieee80211_sta *sta, u16 tid, u16 *ssn);
 int iwl_mvm_sta_tx_agg_oper(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
-   struct ieee80211_sta *sta, u16 tid, u8 buf_size);
+   struct ieee80211_sta *sta, u16 tid, u8 buf_size,
+   bool amsdu);
 int iwl_mvm_sta_tx_agg_stop(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
struct ieee80211_sta *sta, u16 tid);
 int iwl_mvm_sta_tx_agg_flush(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c 
b/drivers/net/wireless/iwlwifi/mvm/tx.c
index a63686c..5046833 100644
--- a/drivers/net/wireless/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/iwlwifi/mvm/tx.c
@@ -488,8 +488,10 @@ static void iwl_update_ip_tcph(void *iph, struct tcphdr 
*tcph, bool ipv6,
  * @ieee80211_hdr *hdr: Points to the WiFi header.
  * @gso_nr_frags: The number of frags in the original GSO skb.
  * @wifi_hdr_iv_len: The length of the WiFi header including IV.
+ * @amsdu_pad: Number of bytes for the A-MSDU subframe
  * @tcp_fin: True if TCP_FIN is set in the original GSO skb.
  * @tcp_push: True if TCP_PSH is set in the original GSO skb.
+ * @amsdu: True if we are building an A-MSDU
  */
 struct iwl_lso_splitter {
unsigned int 

Re: [PATCH] usbnet: dereference after null check in usbnet_start_xmit() and __usbnet_read_cmd()

2015-08-19 Thread Bjørn Mork
Vivek Kumar Bhagat vivek.bha...@samsung.com writes:

 usbnet_start_xmit() - If info-tx_fixup is not defined by class driver,
 NULL check does not happen for skb pointer and leads to NULL dereference.
 __usbnet_read_cmd() - if data pointer is passed as NULL, memcpy will
 dereference NULL pointer.

That's two completely different issues.  Mixing them in a single patch
is only confusing things.


 Signed-off-by: Vivek Kumar Bhagat vivek.bha...@samsung.com
 ---
  drivers/net/usb/usbnet.c |5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
 index 3c86b10..ec4d224 100644
 --- a/drivers/net/usb/usbnet.c
 +++ b/drivers/net/usb/usbnet.c
 @@ -1294,6 +1294,8 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
  
   if (skb)
   skb_tx_timestamp(skb);
 + else
 + goto drop;
  
   // some devices want funky USB-level framing, for
   // win32 driver (usually) and/or hardware quirks


This is wrong.  There are usbnet minidrivers depending on info-tx_fixup
being called with a NULL skb.


 @@ -1906,7 +1908,8 @@ static int __usbnet_read_cmd(struct usbnet *dev, u8 
 cmd, u8 reqtype,
   buf = kmalloc(size, GFP_KERNEL);
   if (!buf)
   goto out;
 - }
 + } else
 + goto out;
  
   err = usb_control_msg(dev-udev, usb_rcvctrlpipe(dev-udev, 0),
 cmd, reqtype, value, index, buf, size,


This is also wrong.  It makes __usbnet_read_cmd() return -ENOMEM if
called with a NULL data pointer.  I don't know if it is used, but it's
perfectly valid to call __usbnet_read_cmd() with data == NULL if
size == 0. No memcpy will happen in this case because usb_control_msg
can only return 0 or an error

Please don't submit any more such patches without proper justification.
You cannot trust that someone will actually take the time to sanity
check your changes.  Patches claiming to fix a NULL dereference should
at least provide an oops.


Bjørn
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] veth: replace iflink by a dedicated symlink in sysfs

2015-08-19 Thread Jiri Benc
On Wed, 19 Aug 2015 14:13:00 +0200, Vincent Bernat wrote:
 That's the main goal of this patch: advertising the peer link as
 IFLA_LINK attribute triggers an infinite loop in userland software when
 they follow iflink to discover network devices topology. iflink has
 always been the index of a lower device. If a sysfs symbolic link is not
 good enough, I can propose a new IFLA_PEER attribute instead.

This would cause regression and break applications for those of us who
started relying on the netnsid feature to match interfaces across net
name spaces.

This is tough. If you're going to do such thing, you would at least
need to also introduce IFLA_PEER_NETNSID.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] veth: replace iflink by a dedicated symlink in sysfs

2015-08-19 Thread Vincent Bernat
 ❦ 19 août 2015 14:38 +0200, Jiri Benc jb...@redhat.com :

 That's the main goal of this patch: advertising the peer link as
 IFLA_LINK attribute triggers an infinite loop in userland software when
 they follow iflink to discover network devices topology. iflink has
 always been the index of a lower device. If a sysfs symbolic link is not
 good enough, I can propose a new IFLA_PEER attribute instead.

 This would cause regression and break applications for those of us who
 started relying on the netnsid feature to match interfaces across net
 name spaces.

Yes. Unfortunately.

 This is tough. If you're going to do such thing, you would at least
 need to also introduce IFLA_PEER_NETNSID.

Yes I can.

In my opinion, the change of semantics of IFLA_LINK is a break of
API. However, I can live with it since it's easy to workaround it. It
just seemed easier to start the discussion with a patch.
-- 
Parenthesise to avoid ambiguity.
- The Elements of Programming Style (Kernighan  Plauger)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull-request: wireless-drivers-next 2015-08-19

2015-08-19 Thread Kalle Valo
Kalle Valo kv...@codeaurora.org writes:

 here's one more pull request for 4.3. More info in the signed tag below.

 This time I had to merge mac80211-next.git due to some iwlwifi
 dependencies and apparently that broke git-request-pull's diffstat
 again, it was showing changes which were not really coming from my tree.
 I think that's just a bug in my old git and really should update the
 tool. This time I just fixed the diffstat manually.

 But please be extra careful with this pull request and please let me
 know if you have any problems.

Oh, I forgot to mention that I saw this build error when I did a test
merge:

net/ipv4/fib_semantics.c:553:3: error: implicit declaration of function 
lwtstate_free [-Werror=implicit-function-declaration]

But I see that also with unmodified net-next so I'm assuming I didn't
cause that :)

-- 
Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] r8169: Add values missing in @get_stats64 from HW counters

2015-08-19 Thread Corinna Vinschen
On Aug 19 09:31, Hayes Wang wrote:
 Corinna Vinschen [mailto:vinsc...@redhat.com]
  Sent: Wednesday, August 19, 2015 5:13 PM
 [...]
   It could be cleared by setting bit 0, such as rtl_tally_reset() of r8152.
  
  Is it safe to assume that this is implemented in all NICs covered by r8169? 
 
 It is supported from RTL8111C. That is, RTL_GIGA_MAC_VER_19 and later.

Thanks.  In that case I would prefer the same generic method for all
chip versions, so I'd opt for storing the offset values at rtl_open
time as my patch is doing right now.  Is that acceptable?

If so, wouldn't it make even more sense to use the hardware collected
information in @get_stats64 throughout, except for the numbers collected
*only* in software?  

I would be willing to propose a matching patch.


Thanks,
Corinna


pgpJydjgGbJe7.pgp
Description: PGP signature


Re: [PATCH] rtlwifi: rtl8192cu: Add new device ID

2015-08-19 Thread Larry Finger

On 08/19/2015 08:51 AM, Adrien Schildknecht wrote:

The v2 of NetGear WNA1000M uses a different idProduct: USB ID 0846:9043

Signed-off-by: Adrien Schildknecht adrien+...@schischi.me
---


Has this ID been tested with the Netgear device?

Larry


  drivers/net/wireless/rtlwifi/rtl8192cu/sw.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c 
b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
index 23806c2..8b4238a 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
@@ -321,6 +321,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = {
{RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/
{RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/
{RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/
+   {RTL_USB_DEVICE(0x0846, 0x9043, rtl92cu_hal_cfg)}, /*NetGear 
WNA1000Mv2*/
{RTL_USB_DEVICE(0x0b05, 0x17ba, rtl92cu_hal_cfg)}, /*ASUS-Edimax*/
{RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CCC*/
{RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Telegrafische Überweisung HINWEIS

2015-08-19 Thread Standard Charterd Bank UAE
Standard Chartered Bank Dubai Main Branch,
Al Fardan Gebäude, Al Mankhool Road,
P.O. Box 999, Dubai,
Vereinigte Arabische Emirate

Sehr geehrte Kunden,


Telegrafische Überweisung HINWEIS.

Wir sind hiermit offiziell informieren Sie über Ihre Fondstelegrafische 
Überweisung durch unsere Bank, Standard Chartered Bank, auf Ihr Bankkonto, die 
offiziell von der Leitung der Weltbank Swiss (WBS) genehmigt worden ist, um die 
Summe von $ 5.500,000.00 USD in Credit Ihrer Bank Konto.
Beachten Sie, dass ich die Bearbeitung Ihrer Zahlung und alles über die 
sofortige Überweisung von Ihrem Fonds wird innerhalb kürzester Zeit von der 
Zeit, die wir Ihren Unten benötigten Informationen erhalten geführt werden 
gestartet.
Auch darüber informiert, dass der Gouverneur von Standard Chartered Bank (UAE) 
Plc auf Ihrer Avis unterschreiben und eine Kopie der Beratung wird bei der 
Weltbank in der Schweiz für einige Aufzeichnungszwecke versandt werden. 
Inzwischen Ihrer Information und Ihre vollständigen Kontaktdaten wurden aus 
unserer Forschung manager.Barrister Paul Dean eingehen, werden in Ihrem Namen, 
um eine zu erhalten, zu handeln
AFIDAVIT ANSPRUCHS zur sofortigen Veröffentlichung Ihres Fonds.

Dieser Fonds war Teil der eingereichten verstorbenen Präsidenten Saddam Hussein 
im Irak Discovery Fund der Weltbank der Schweiz, die die Schweizer Bank hat 
beschlossen, es zu großzügig verteilen
helfen wenigen glücklichen Menschen, und die Europäische Union wird im 
Einvernehmen mit der Schweizerischen Bank, den Fonds auf 700 hunderttausend 
Menschen in Amerika, Europa und Asien, Naher Osten Afrika in andere zu 
verteilen, um zur Verbesserung ihrer Unternehmen.

Daher bestätigen die die unten angegebenen Informationen genau, denn dieses Amt 
nicht leisten können, haftet für falsche Übertragung von Mitteln oder Haftung 
eines Fonds in ein unbekanntes Konto gutgeschrieben gehalten werden.
Das einzige, was von Ihnen verlangt wird, um die eidesstattliche Erklärung 
ANSPRUCH zu erhalten, damit wir Ihrem Konto direkt durch telegrafische 
Überweisung oder über eine unserer entsprechenden Banken und schickt Kopien der 
Geldtransfer Freigabedokumente für Sie und Ihre Banker zur Bestätigung.

Sollten Sie unsere Richtlinien befolgen, werden Ihre Fonds gutgeschrieben und 
beziehen sich auf Ihrem Bankkonto innerhalb von fünf (5) Bankarbeitstagen ab 
dem Tag, Sie diese eidesstattliche Erklärung ANSPRUCH erhalten.

Für weitere Informationen und Unterstützung auf dieser Remittance Mitteilung 
Bitte leiten Sie Ihre

VOLLER NAME:

FULL KONTAKTADRESSE:

Telefon- und Faxnummern:

Direkt an meine E-Mail: standchart_orgb...@asia.com

Mit freundlichen Grüßen,
Mr. Rajesh Arora
Finanzvorstand,
UAE.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rtlwifi: rtl8192cu: Add new device ID

2015-08-19 Thread Adrien Schildknecht
The v2 of NetGear WNA1000M uses a different idProduct: USB ID 0846:9043

Signed-off-by: Adrien Schildknecht adrien+...@schischi.me
---
 drivers/net/wireless/rtlwifi/rtl8192cu/sw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c 
b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
index 23806c2..8b4238a 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
@@ -321,6 +321,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = {
{RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/
{RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/
{RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/
+   {RTL_USB_DEVICE(0x0846, 0x9043, rtl92cu_hal_cfg)}, /*NetGear 
WNA1000Mv2*/
{RTL_USB_DEVICE(0x0b05, 0x17ba, rtl92cu_hal_cfg)}, /*ASUS-Edimax*/
{RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CCC*/
{RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] virtio-net: rephrase devconf fields description

2015-08-19 Thread Victor Kaplansky
On Mon, Aug 17, 2015 at 10:43:46AM +0800, Jason Wang wrote:
 
 
 On 08/16/2015 09:42 PM, Victor Kaplansky wrote:
  Clarify general description of the mac, status and
  max_virtqueue_pairs fields. Specifically, the old description is
  vague about configuration layout and fields offsets when some of
  the fields are non valid.
 
  Also clarify that validity of two status bits depends on two
  different feature flags.
 
  Signed-off-by: Victor Kaplansky vict...@redhat.com
  ---
  +
  +\item [\field{max_virtqueue_pairs}] tells the driver the maximum
  +number of each of virtqueues (receiveq1\ldots receiveqN and
  +transmitq1\ldots transmitqN respectively) that can be configured
  +on the device once VIRTIO_NET_F_MQ is negotiated.
  +\field{max_virtqueue_pairs} is valid only if VIRTIO_NET_F_MQ is
  +set and can be read by the driver.
  +
 
 
 I don't get the point that adding can be read by the driver. Looks
 like it's hard for hypervisor to detect this?

AFAIU, if the device sets VIRTIO_NET_F_MQ, the device also sets
the value of 'max_virtqueue_pairs' even before driver negotiated
VIRTIO_NET_F_MQ. If so, the driver can read the value of
'max_virtqueue_pairs' during negotiation and potentially this
value can even affect negotiation decision of the driver.  

If above is correct, I'll change the description to make this
point more clear.

Thanks,
-- Victor
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usbnet: dereference after null check in usbnet_start_xmit() and __usbnet_read_cmd()

2015-08-19 Thread Bjørn Mork
Bjørn Mork bj...@mork.no writes:
 Vivek Kumar Bhagat vivek.bha...@samsung.com writes:

 @@ -1906,7 +1908,8 @@ static int __usbnet_read_cmd(struct usbnet *dev, u8 
 cmd, u8 reqtype,
  buf = kmalloc(size, GFP_KERNEL);
  if (!buf)
  goto out;
 -}
 +} else
 +goto out;
  
  err = usb_control_msg(dev-udev, usb_rcvctrlpipe(dev-udev, 0),
cmd, reqtype, value, index, buf, size,


 This is also wrong.  It makes __usbnet_read_cmd() return -ENOMEM if
 called with a NULL data pointer.  I don't know if it is used, but it's
 perfectly valid to call __usbnet_read_cmd() with data == NULL if
 size == 0. No memcpy will happen in this case because usb_control_msg
 can only return 0 or an error

Just for the record - a simple grep for usbnet_read_cmd shows that at
least drivers/net/usb/plusb.c depends on the current behaviour:

static inline int
pl_vendor_req(struct usbnet *dev, u8 req, u8 val, u8 index)
{
return usbnet_read_cmd(dev, req,
USB_DIR_IN | USB_TYPE_VENDOR |
USB_RECIP_DEVICE,
val, index, NULL, 0);
}



Bjørn
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 1/3] iwlwifi: mvm: add real TSO implementation

2015-08-19 Thread Eric Dumazet
On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote:
 The segmentation is done completely in software. The
 driver creates several MPDUs out of a single large send.
 Each MPDU is a newly allocated SKB.
 A page is allocated to create the headers that need to be
 duplicated (SNAP / IP / TCP). The WiFi header is in the
 header of the newly created SKBs.
 
 type=feature
 
 Change-Id: I238ffa79cacc5bbdacdfbf3e9673c8d4f02b462a
 Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com
 ---
  drivers/net/wireless/iwlwifi/mvm/tx.c | 513 
 +++---
  1 file changed, 481 insertions(+), 32 deletions(-)

Ouch dynamic allocations while doing xmit are certainly not needed.
Your driver should pre-allocated space for headers.

Drivers willing to implement tso have to use net/core/tso.c provided
helpers.

$ git grep -n tso_build_hdr
drivers/net/ethernet/cavium/thunder/nicvf_queues.c:1030:
tso_build_hdr(skb, hdr, tso, data_left, total_len == 0);
drivers/net/ethernet/freescale/fec_main.c:729:  tso_build_hdr(skb, hdr, 
tso, data_left, total_len == 0);
drivers/net/ethernet/marvell/mv643xx_eth.c:842: tso_build_hdr(skb, hdr, 
tso, data_left, total_len == 0);
drivers/net/ethernet/marvell/mvneta.c:1650: tso_build_hdr(skb, hdr, 
tso, data_left, total_len == 0);
include/net/tso.h:15:void tso_build_hdr(struct sk_buff *skb, char *hdr, struct 
tso_t *tso,
net/core/tso.c:14:void tso_build_hdr(struct sk_buff *skb, char *hdr, struct 
tso_t *tso,
net/core/tso.c:37:EXPORT_SYMBOL(tso_build_hdr);


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] virtio-net: add default_mtu configuration field

2015-08-19 Thread Victor Kaplansky
On Mon, Aug 17, 2015 at 11:07:15AM +0800, Jason Wang wrote:
 
 
 On 08/16/2015 09:42 PM, Victor Kaplansky wrote:
  @@ -3128,6 +3134,7 @@ struct virtio_net_config {
   u8 mac[6];
   le16 status;
   le16 max_virtqueue_pairs;
  +le16 default_mtu;
 
 Looks like mtu is ok, consider we use mac instead of default_mac.

Good point. I'll change the name in the next version of the patch.

 
   };
   \end{lstlisting}
   
  @@ -3158,6 +3165,15 @@ by the driver after negotiation.
   \field{max_virtqueue_pairs} is valid only if VIRTIO_NET_F_MQ is
   set and can be read by the driver.
   
  +\item [\field{default_mtu}] is a hint to the driver set by the
  +device. It is valid during feature negotiation only if
  +VIRTIO_NET_F_DEFAULT_MTU is offered and holds the initial value
  +of MTU to be used by the driver. If VIRTIO_NET_F_DEFAULT_MTU is
  +negotiated, the driver uses the \field{default_mtu} as an initial
  +value, and also reports MTU changes to the device by writes to
  +\field{default_mtu}.  Such reporting can be used for debugging,
  +or it can be used for tunning MTU along the network.
  +
 
 I vaguely remember that config is read only in some arch or transport
 and that's why we introduce another vq cmd to confirm the announcement.
 Probably we should do same for this?

If so, we need to add one more feature bit to confirm the ability
of the driver to report MTU, or we can weaken the requirement in
conformance statement and write the driver may report the MTU.
What do you say?

-- Victor
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-19 Thread Eugene Shatokhin

19.08.2015 13:54, Bjørn Mork пишет:

Eugene Shatokhin eugene.shatok...@rosalab.ru writes:


19.08.2015 04:54, David Miller пишет:

From: Eugene Shatokhin eugene.shatok...@rosalab.ru
Date: Fri, 14 Aug 2015 19:58:36 +0300


2. The second race is on dev-flags.

dev-flags is set to 0 here:
*0  usbnet_stop (usbnet.c:816)
  /* deferred work (task, timer, softirq) must also stop.
   * can't flush_scheduled_work() until we drop rtnl (later),
   * else workers could deadlock; so make workers a NOP.
   */
  dev-flags = 0;
  del_timer_sync (dev-delay);
  tasklet_kill (dev-bh);

And here, the code clears EVENT_RX_KILL bit in dev-flags, which may
execute concurrently with the above operation:
*0 clear_bit (bitops.h:113, inlined)
*1 usbnet_bh (usbnet.c:1475)
  /* restart RX again after disabling due to high error rate */
  clear_bit(EVENT_RX_KILL, dev-flags);

It seems, setting dev-flags to 0 is not necessarily atomic w.r.t.
clear_bit() and other bit operations with dev-flags. It is safer to
make it atomic and this way, make the race harmless.

While at it, the checking of EVENT_NO_RUNTIME_PM bit of dev-flags in
usbnet_stop() was fixed too: the bit should be checked before dev-flags
is cleared.


The fix for this is excessive.

Instead of all of this madness, looping over expensive clear_bit()
atomics, just do whatever it takes to make sure that usbnet_bh() is
quiesced and cannot execute any more.  Then you can safely clear
dev-flags normally.



If I understand it correctly, it is to make sure usbnet_bh() is not
scheduled again that dev-flags should be set to 0 first, one way or
another. That is what this madness is for.


Assuming there is a race which may reorder these, exactly what
difference does it make wrt EVENT_RX_KILL if you do

a)  clear_bit(EVENT_RX_KILL, dev-flags);
 dev-flags = 0;

or

b)  dev-flags = 0;
 clear_bit(EVENT_RX_KILL, dev-flags);


AFAICS, the result will be a cleared EVENT_RX_KILL bit in either case.



Thanks for the review!

The problem is not in the reordering but rather in the fact that 
dev-flags = 0 is not necessarily atomic w.r.t. 
clear_bit(EVENT_RX_KILL, dev-flags), and vice versa.


So the following might be possible, although unlikely:

CPU0 CPU1
 clear_bit: read dev-flags
 clear_bit: clear EVENT_RX_KILL in the read value

dev-flags=0;

 clear_bit: write updated dev-flags

As a result, dev-flags may become non-zero again.

I cannot prove yet that this is an impossible situation. If anyone can, 
please explain. If so, this part of the patch will not be needed.




The EVENT_NO_RUNTIME_PM bug should definitely be fixed.  Please split
that out as a separate fix.  It's a separate issue, and should be
backported to all maintained stable releases it applies to (anything
from v3.8 and newer)


Yes, that makes sense. However, this fix was originally provided by 
Oliver Neukum rather than me, so I would like to hear his opinion as 
well first.



Bjørn



Regards,
Eugene
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-19 Thread Bjørn Mork
Eugene Shatokhin eugene.shatok...@rosalab.ru writes:

 The problem is not in the reordering but rather in the fact that
 dev-flags = 0 is not necessarily atomic
 w.r.t. clear_bit(EVENT_RX_KILL, dev-flags), and vice versa.

 So the following might be possible, although unlikely:

 CPU0 CPU1
  clear_bit: read dev-flags
  clear_bit: clear EVENT_RX_KILL in the read value

 dev-flags=0;

  clear_bit: write updated dev-flags

 As a result, dev-flags may become non-zero again.

Ah, right.  Thanks for explaining.

 I cannot prove yet that this is an impossible situation. If anyone
 can, please explain. If so, this part of the patch will not be needed.

I wonder if we could simply move the dev-flags = 0 down a few lines to
fix both issues?  It doesn't seem to do anything useful except for
resetting the flags to a sane initial state after the device is down.

Stopping the tasklet rescheduling etc depends only on netif_running(),
which will be false when usbnet_stop is called.  There is no need to
touch dev-flags for this to happen.

 The EVENT_NO_RUNTIME_PM bug should definitely be fixed.  Please split
 that out as a separate fix.  It's a separate issue, and should be
 backported to all maintained stable releases it applies to (anything
 from v3.8 and newer)

 Yes, that makes sense. However, this fix was originally provided by
 Oliver Neukum rather than me, so I would like to hear his opinion as
 well first.

If what I write above is correct (please help me verify...), then maybe
it does make sense to do these together anyway.



Bjørn
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ipsec-next] xfrm: Use VRF master index if output device is enslaved

2015-08-19 Thread Nikolay Aleksandrov

 On Aug 18, 2015, at 6:54 PM, David Ahern d...@cumulusnetworks.com wrote:
 
 Directs route lookups to VRF table. Compiles out if NET_VRF is not
 enabled. With this patch able to successfully bring up ipsec tunnels
 in VRFs, even with duplicate network configuration (IPv4 tested).
 
 Signed-off-by: David Ahern d...@cumulusnetworks.com
 ---
 net/ipv4/xfrm4_policy.c | 7 +--
 net/ipv6/xfrm6_policy.c | 7 +--
 2 files changed, 10 insertions(+), 4 deletions(-)

I think you should use the new vrf_master_index() helper that acquires rcu 
because
it looks possible to call -decode_session() without rcu read lock, e.g. in the 
hold_timer
function xfrm_policy_queue_process(), though I haven’t tested it and might be 
missing
something. :-)

Cheers,
 Nik--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] r8169:Set RxConfig on same func. with TxConfig

2015-08-19 Thread Eric Dumazet
On Wed, 2015-08-19 at 08:39 +0300, Marian Corcodel wrote:
 It s not mandatory to accept
 these patches, if you wish
 to apply good if you not ,not problem.

How can we apply a patch that does not compile ?

You are going to piss all netdev people for good.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi

2015-08-19 Thread Eric Dumazet
On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote:

 We could have enabled A-MSDU based on xmit-more, but the
 rationale of using LSO is that when using pfifo-fast,
 the Qdisc gets one packet and dequeues is straight away
 which limits the possibility to get a lot of packets at
 once. (Am I right here?).

No, you are not ;)

Key point for xmit_more is BQL being implemented in your driver.

Relevant code is in try_bulk_dequeue_skb()


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv1 net-next 0/5] netlink: mmap: kernel panic and some issues

2015-08-19 Thread Daniel Borkmann

On 08/17/2015 11:02 PM, David Miller wrote:

From: Daniel Borkmann dan...@iogearbox.net
Date: Fri, 14 Aug 2015 12:38:21 +0200


diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 67d2104..4307446 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -238,6 +238,13 @@ static void __netlink_deliver_tap(struct sk_buff
*skb)

  static void netlink_deliver_tap(struct sk_buff *skb)
  {
+   /* Netlink mmaped skbs must not access shared info, and thus
+* are not allowed to be cloned. For now, just don't allow
+* them to get inspected by taps.
+*/
+   if (netlink_skb_is_mmaped(skb))
+   return;
+


I would seriously rather see us do an expensive full copy of the SKB
than to have traffic which is unexpectedly invisible to taps.


Do you mean generically as we do in TX path, or only in this
particular scenario?

Thanks,
Daniel
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Clarification on rtnetlink requests

2015-08-19 Thread David Chappelle
I am a bit confused with respect to the structure of rtnetlink requests.
It seems that in some circumstances a request can look like:

struct request
{
struct nlmsghdr header;
struct rtgenmsg body;
};

and in other cases it can look like:

struct request
{
struct nlmsghdr header;
struct ifinfomsg body;
};

How do I know which one to use when sending RTM_GETLINK and
RTM_GETADDR requests? Furthermore, it also seems that 'struct rtattr'
can be specified at the end of the request as well. Is there any
documentation that describes this.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] rtlwifi: rtl8192cu: Add new device ID

2015-08-19 Thread Larry Finger

On 08/19/2015 10:33 AM, Adrien Schildknecht wrote:

The v2 of NetGear WNA1000M uses a different idProduct: USB ID 0846:9043

Signed-off-by: Adrien Schildknecht adrien+...@schischi.me
Cc: Stable sta...@vger.kernel.org
---
  drivers/net/wireless/rtlwifi/rtl8192cu/sw.c | 1 +
  1 file changed, 1 insertion(+)


Acked-by: Larry Finger larry.fin...@lwfinger.net

Thanks,

Larry



diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c 
b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
index 23806c2..fd4a535 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
@@ -321,6 +321,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = {
{RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/
{RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/
{RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/
+   {RTL_USB_DEVICE(0x0846, 0x9043, rtl92cu_hal_cfg)}, /*NG WNA1000Mv2*/
{RTL_USB_DEVICE(0x0b05, 0x17ba, rtl92cu_hal_cfg)}, /*ASUS-Edimax*/
{RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CCC*/
{RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 01/13] ip_tunnels: remove custom alignment and packing

2015-08-19 Thread Thomas Graf
On 08/19/15 at 12:09pm, Jiri Benc wrote:
 The custom alignment of struct ip_tunnel_key is unnecessary. In struct
 sw_flow_key, it starts at offset 256, in struct ip_tunnel_info it's the
 first field.
 
 The structure is also packed even without the __packed keyword.
 
 Signed-off-by: Jiri Benc jb...@redhat.com

I came to the same conclusion but didn't want to change it in the original
series.

Acked-by: Thomas Graf tg...@suug.ch
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment

2015-08-19 Thread Eric Dumazet
On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote:
 This allows to release the backpressure on the socket only
 when the last segment is released.
 Now the truesize looks like this:
 if the truesize of the original skb is 65420, all the
 segments will have a truesize of 704 (skb itself) and the
 last one will have 65420.
 
 Change-Id: I3c894cf2afc0aedfe7b2a5b992ba41653ff79c0e
 Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com
 ---
  drivers/net/wireless/iwlwifi/mvm/tx.c | 17 -
  1 file changed, 16 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c 
 b/drivers/net/wireless/iwlwifi/mvm/tx.c
 index 5046833..046e50d 100644
 --- a/drivers/net/wireless/iwlwifi/mvm/tx.c
 +++ b/drivers/net/wireless/iwlwifi/mvm/tx.c
 @@ -764,7 +764,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct 
 sk_buff *skb_gso,
   bool ipv6 = skb_shinfo(skb_gso)-gso_type  SKB_GSO_TCPV6;
   struct iwl_lso_splitter s = {};
   struct page *hdr_page;
 - unsigned int mpdu_sz;
 + unsigned int mpdu_sz, sum_truesize = 0;
   u8 *hdr_page_pos, *qc, tid;
   int i, ret;
  
 @@ -898,6 +898,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct 
 sk_buff *skb_gso,
   mpdu_sz, tcp_hdrlen(skb_gso));
  
   __skb_queue_tail(mpdus_skb, skb_gso);
 + sum_truesize += skb_gso-truesize;
  
   /* mss bytes have been consumed from the data */
   s.gso_payload_pos = s.mss;
 @@ -1034,6 +1035,20 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct 
 sk_buff *skb_gso,
   }
  
   __skb_queue_tail(mpdus_skb, skb);
 + sum_truesize += skb-truesize;
 + }
 +
 + /* Release the backpressure on the socket only when
 +  * the last segment is released.
 +  */
 + if (skb_gso-destructor == sock_wfree) {
 + struct sk_buff *tail = mpdus_skb-prev;
 +
 + swap(tail-truesize, skb_gso-truesize);
 + swap(tail-destructor, skb_gso-destructor);
 + swap(tail-sk, skb_gso-sk);
 +atomic_add(sum_truesize - skb_gso-truesize,
 +   skb_gso-sk-sk_wmem_alloc);
   }
  
   ret = 0;

Using existing net/core/tso.c helpers would avoid using this.

(BTW TCP packets do not have sock_wfree as destructor but tcp_wfree(),
yet we want backpressure mostly for TCP stack (TCP Small Queues))


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] vrf: plug skb leaks

2015-08-19 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov niko...@cumulusnetworks.com

Currently whenever a packet different from ETH_P_IP is sent through the
VRF device it is leaked so plug the leaks and properly drop these
packets.

Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
---
 drivers/net/vrf.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index ed208317cbb5..4aa06450fafa 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -97,6 +97,12 @@ static bool is_ip_rx_frame(struct sk_buff *skb)
return false;
 }
 
+static void vrf_tx_error(struct net_device *vrf_dev, struct sk_buff *skb)
+{
+   vrf_dev-stats.tx_errors++;
+   kfree_skb(skb);
+}
+
 /* note: already called with rcu_read_lock */
 static rx_handler_result_t vrf_handle_frame(struct sk_buff **pskb)
 {
@@ -149,7 +155,8 @@ static struct rtnl_link_stats64 *vrf_get_stats64(struct 
net_device *dev,
 static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
   struct net_device *dev)
 {
-   return 0;
+   vrf_tx_error(dev, skb);
+   return NET_XMIT_DROP;
 }
 
 static int vrf_send_v4_prep(struct sk_buff *skb, struct flowi4 *fl4,
@@ -206,8 +213,7 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff 
*skb,
 out:
return ret;
 err:
-   vrf_dev-stats.tx_errors++;
-   kfree_skb(skb);
+   vrf_tx_error(vrf_dev, skb);
goto out;
 }
 
@@ -219,6 +225,7 @@ static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, 
struct net_device *dev)
case htons(ETH_P_IPV6):
return vrf_process_v6_outbound(skb, dev);
default:
+   vrf_tx_error(dev, skb);
return NET_XMIT_DROP;
}
 }
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: inet_hashtables.c: warning: division by zero

2015-08-19 Thread Eric Dumazet
On Wed, 2015-08-19 at 18:24 +0300, Meelis Roos wrote:
 Noticed this while compiling 4.2-rc7+git on i386 with gcc 4.9.2:
 
   CC  net/ipv4/inet_hashtables.o
 In file included from include/linux/list.h:8:0,
  from include/linux/module.h:9,
  from net/ipv4/inet_hashtables.c:16:
 net/ipv4/inet_hashtables.c: In function ‘inet_ehash_locks_alloc’:
 net/ipv4/inet_hashtables.c:632:24: warning: division by zero [-Wdiv-by-zero]
  2 * L1_CACHE_BYTES / sizeof(spinlock_t),
 ^
 include/linux/kernel.h:769:17: note: in definition of macro ‘max_t’
   type __max1 = (x);   \
  ^
 


This warning was fixed : 

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=89e478a2aa58af2548b7f316e4d5b6bcc9eade5b



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 1/3] iwlwifi: mvm: add real TSO implementation

2015-08-19 Thread Eric Dumazet
On Wed, 2015-08-19 at 07:17 -0700, Eric Dumazet wrote:
 On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote:
  The segmentation is done completely in software. The
  driver creates several MPDUs out of a single large send.
  Each MPDU is a newly allocated SKB.
  A page is allocated to create the headers that need to be
  duplicated (SNAP / IP / TCP). The WiFi header is in the
  header of the newly created SKBs.
  
  type=feature
  
  Change-Id: I238ffa79cacc5bbdacdfbf3e9673c8d4f02b462a
  Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com
  ---
   drivers/net/wireless/iwlwifi/mvm/tx.c | 513 
  +++---
   1 file changed, 481 insertions(+), 32 deletions(-)
 
 Ouch dynamic allocations while doing xmit are certainly not needed.
 Your driver should pre-allocated space for headers.
 
 Drivers willing to implement tso have to use net/core/tso.c provided
 helpers.
 
 $ git grep -n tso_build_hdr
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c:1030:
 tso_build_hdr(skb, hdr, tso, data_left, total_len == 0);
 drivers/net/ethernet/freescale/fec_main.c:729:  tso_build_hdr(skb, 
 hdr, tso, data_left, total_len == 0);
 drivers/net/ethernet/marvell/mv643xx_eth.c:842: tso_build_hdr(skb, 
 hdr, tso, data_left, total_len == 0);
 drivers/net/ethernet/marvell/mvneta.c:1650: tso_build_hdr(skb, 
 hdr, tso, data_left, total_len == 0);
 include/net/tso.h:15:void tso_build_hdr(struct sk_buff *skb, char *hdr, 
 struct tso_t *tso,
 net/core/tso.c:14:void tso_build_hdr(struct sk_buff *skb, char *hdr, struct 
 tso_t *tso,
 net/core/tso.c:37:EXPORT_SYMBOL(tso_build_hdr);
 

Look at commit 2adb719d74f6e174071e5c913290b9bbd8c2c0e8 for a typical
use of these helpers.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 02/13] ip_tunnels: use u8/u16/u32

2015-08-19 Thread Thomas Graf
On 08/19/15 at 12:09pm, Jiri Benc wrote:
 The ip_tunnels.h include file uses mixture of __u16 and u16 (etc.) types.
 Unify it to the non-underscore variants.
 
 Signed-off-by: Jiri Benc jb...@redhat.com

Acked-by: Thomas Graf tg...@suug.ch
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] r8169: Add values missing in @get_stats64 from HW counters

2015-08-19 Thread Corinna Vinschen
On Aug 19 15:07, Corinna Vinschen wrote:
 On Aug 19 09:31, Hayes Wang wrote:
  Corinna Vinschen [mailto:vinsc...@redhat.com]
   Sent: Wednesday, August 19, 2015 5:13 PM
  [...]
It could be cleared by setting bit 0, such as rtl_tally_reset() of 
r8152.
   
   Is it safe to assume that this is implemented in all NICs covered by 
   r8169? 
  
  It is supported from RTL8111C. That is, RTL_GIGA_MAC_VER_19 and later.
 
 Thanks.  In that case I would prefer the same generic method for all
 chip versions, so I'd opt for storing the offset values at rtl_open
 time as my patch is doing right now.  Is that acceptable?
 
 If so, wouldn't it make even more sense to use the hardware collected
 information in @get_stats64 throughout, except for the numbers collected
 *only* in software?  
 
 I would be willing to propose a matching patch.

It just occured to me that the combination of resetting the counters on
post-RTL_GIGA_MAC_VER_19 chips plus offset handling would be quite
nice, because it would reset also the small 16 and 32 bit counters.

So I'd like to propose a patch which combines both techniques, if that's
an acceptable way to go forward.

Btw., does setting the reset bit in CounterAddrLow work the same way as
setting the CounterDump flag?  I.e, does the driver have to wait for the
hardware to set the bit to 0 again to be sure the reset is finished?


Thanks in advance,
Corinna


pgp8ULHO1_RPj.pgp
Description: PGP signature


Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi

2015-08-19 Thread Grumbach, Emmanuel
Hi Eric,

First, thank you a lot for your comments.

On 08/19/2015 05:14 PM, Eric Dumazet wrote:
 On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote:
 
 We could have enabled A-MSDU based on xmit-more, but the
 rationale of using LSO is that when using pfifo-fast,
 the Qdisc gets one packet and dequeues is straight away
 which limits the possibility to get a lot of packets at
 once. (Am I right here?).
 
 No, you are not ;)
 
 Key point for xmit_more is BQL being implemented in your driver.
 
 Relevant code is in try_bulk_dequeue_skb()
 

I'll look at it.
I was almost starting to implement that but then I thought with another
(good?) reason to use LSO. LSO gives me the guarantee that the packet is
directed to one peer, which might not be the case with xmit_more since
we have one Qdisc for several clients in case we are in AP mode.
Building an A-MSDU for several clients is not possible, at least not for
several client in the L2 (different MAC addresses).
LSO avoids this problem completely.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi

2015-08-19 Thread Eric Dumazet
On Wed, 2015-08-19 at 15:07 +, Grumbach, Emmanuel wrote:

 I'll look at it.
 I was almost starting to implement that but then I thought with another
 (good?) reason to use LSO. LSO gives me the guarantee that the packet is
 directed to one peer, which might not be the case with xmit_more since
 we have one Qdisc for several clients in case we are in AP mode.
 Building an A-MSDU for several clients is not possible, at least not for
 several client in the L2 (different MAC addresses).
 LSO avoids this problem completely.

Then, simply calling skb_gso_segment() from the driver might be enough,
and less work for you.

This would even support TSO on IPv6

segs = skb_gso_segment(skb, tp-dev-features 
~(NETIF_F_TSO | NETIF_F_TSO6));




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtlwifi: rtl8192cu: Add new device ID

2015-08-19 Thread Adrien Schildknecht
 Has this ID been tested with the Netgear device?
Yes, I have been using the device and the patch for 2 days.

-- 
Adrien Schildknecht
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] rtlwifi: rtl8192cu: Add new device ID

2015-08-19 Thread Adrien Schildknecht
The v2 of NetGear WNA1000M uses a different idProduct: USB ID 0846:9043

Signed-off-by: Adrien Schildknecht adrien+...@schischi.me
Cc: Stable sta...@vger.kernel.org
---
 drivers/net/wireless/rtlwifi/rtl8192cu/sw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c 
b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
index 23806c2..fd4a535 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
@@ -321,6 +321,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = {
{RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/
{RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/
{RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/
+   {RTL_USB_DEVICE(0x0846, 0x9043, rtl92cu_hal_cfg)}, /*NG WNA1000Mv2*/
{RTL_USB_DEVICE(0x0b05, 0x17ba, rtl92cu_hal_cfg)}, /*ASUS-Edimax*/
{RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CCC*/
{RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with fragmented packets on tun/tap interface

2015-08-19 Thread Eric Dumazet
On Wed, 2015-08-19 at 15:44 +0530, Prashant Upadhyaya wrote:


 Hi Eric,
 
 For some reason, the dropping in the raw table does not work for me
 for the usecase, though I recognize that the raw table operations
 theory, when matched with my usecase theory, is the apparent solution.
 
 I think the reason is that I use packet sockets with defrag option on
 so that it can select the right queue for load balancing purposes.
 
 Anyway, not disappointed with the above, I stuck to my theory and
 tried a simple approach. To tie-break the reassembly/defrag done by
 the kernel from the packets from the eth0 and the packets submitted
 from tap (via application), I made a small change in the application.
 I detected that the packets are fragmented in the app, and bumped up
 the 'Identification' field in the IP header and re-checksummed the IP
 header and then submitted it to tap. Since reassembly/defrag is done
 on the basis of srcip, destip, protocol and Identification field
 tupple from IP header, I expected it to work and it does !
 
 So there we are, I have a nice little solution in place which suits me.

Another idea would have to put your tap device and ethernet device in
different namespaces, as the defrag unit is namespace aware.

Looks like eth0 could be put in a completely new namespace as it holds
no IP address ?

ip netns add eth0ns
ip link set eth0 netns eth0ns


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


inet_hashtables.c: warning: division by zero

2015-08-19 Thread Meelis Roos
Noticed this while compiling 4.2-rc7+git on i386 with gcc 4.9.2:

  CC  net/ipv4/inet_hashtables.o
In file included from include/linux/list.h:8:0,
 from include/linux/module.h:9,
 from net/ipv4/inet_hashtables.c:16:
net/ipv4/inet_hashtables.c: In function ‘inet_ehash_locks_alloc’:
net/ipv4/inet_hashtables.c:632:24: warning: division by zero [-Wdiv-by-zero]
 2 * L1_CACHE_BYTES / sizeof(spinlock_t),
^
include/linux/kernel.h:769:17: note: in definition of macro ‘max_t’
  type __max1 = (x);   \
 ^

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtlwifi: rtl8192cu: Add new device ID

2015-08-19 Thread Larry Finger

On 08/19/2015 08:51 AM, Adrien Schildknecht wrote:

The v2 of NetGear WNA1000M uses a different idProduct: USB ID 0846:9043

Signed-off-by: Adrien Schildknecht adrien+...@schischi.me


Add a Cc: Stable sta...@vger.kernel.org line here. That way the new ID will 
be available with older kernels.


The new line exceeds 80 characters. You might abbreviate Netgear as NG.

When you resubmit, do so as [PATCH V2].

Thanks,

Larry


---
  drivers/net/wireless/rtlwifi/rtl8192cu/sw.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c 
b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
index 23806c2..8b4238a 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
@@ -321,6 +321,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = {
{RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/
{RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/
{RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/
+   {RTL_USB_DEVICE(0x0846, 0x9043, rtl92cu_hal_cfg)}, /*NetGear 
WNA1000Mv2*/
{RTL_USB_DEVICE(0x0b05, 0x17ba, rtl92cu_hal_cfg)}, /*ASUS-Edimax*/
{RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CCC*/
{RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 4/4] vrf: ndo_add|del_slave drop unnecessary checks

2015-08-19 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov niko...@cumulusnetworks.com

When ndo_add|del_slave ops are used, they're taken from the respective
master device's netdev ops, so if the master device is a VRF only then
the VRF ops will get called thus no need to check the type of the
master.

Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
---
 drivers/net/vrf.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 4825c65c62fd..dbeffe789185 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -393,8 +393,7 @@ out_fail:
 
 static int vrf_add_slave(struct net_device *dev, struct net_device *port_dev)
 {
-   if (!netif_is_vrf(dev) || netif_is_vrf(port_dev) ||
-   vrf_is_slave(port_dev))
+   if (netif_is_vrf(port_dev) || vrf_is_slave(port_dev))
return -EINVAL;
 
return do_vrf_add_slave(dev, port_dev);
@@ -431,9 +430,6 @@ static int do_vrf_del_slave(struct net_device *dev, struct 
net_device *port_dev)
 
 static int vrf_del_slave(struct net_device *dev, struct net_device *port_dev)
 {
-   if (!netif_is_vrf(dev))
-   return -EINVAL;
-
return do_vrf_del_slave(dev, port_dev);
 }
 
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/4] vrf: don't panic on cache create failure

2015-08-19 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov niko...@cumulusnetworks.com

It's pointless to panic on cache create failure when that case is handled
and even more so since it's not a kernel-wide fatal problem so don't
panic.

Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
---
 drivers/net/vrf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 4aa06450fafa..01dc91562a88 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -649,7 +649,7 @@ static int __init vrf_init_module(void)
vrf_dst_ops.kmem_cachep =
kmem_cache_create(vrf_ip_dst_cache,
  sizeof(struct rtable), 0,
- SLAB_HWCACHE_ALIGN | SLAB_PANIC,
+ SLAB_HWCACHE_ALIGN,
  NULL);
 
if (!vrf_dst_ops.kmem_cachep)
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/4] vrf: move vrf_insert_slave so we can drop a goto label

2015-08-19 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov niko...@cumulusnetworks.com

We can simplify do_vrf_add_slave by moving vrf_insert_slave in the end
of the enslaving and thus eliminate an error goto label. It always
succeeds and isn't needed before that anyway.

Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
---
 drivers/net/vrf.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 9907550ff640..4825c65c62fd 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -363,15 +363,13 @@ static int do_vrf_add_slave(struct net_device *dev, 
struct net_device *port_dev)
vrf_ptr-ifindex = dev-ifindex;
vrf_ptr-tb_id = vrf-tb_id;
 
-   __vrf_insert_slave(queue, slave);
-
/* register the packet handler for slave ports */
ret = netdev_rx_handler_register(port_dev, vrf_handle_frame, dev);
if (ret) {
netdev_err(port_dev,
   Device %s failed to register rx_handler\n,
   port_dev-name);
-   goto out_remove;
+   goto out_fail;
}
 
ret = netdev_master_upper_dev_link(port_dev, dev);
@@ -379,7 +377,7 @@ static int do_vrf_add_slave(struct net_device *dev, struct 
net_device *port_dev)
goto out_unregister;
 
port_dev-flags |= IFF_SLAVE;
-
+   __vrf_insert_slave(queue, slave);
rcu_assign_pointer(port_dev-vrf_ptr, vrf_ptr);
cycle_netdev(port_dev);
 
@@ -387,8 +385,6 @@ static int do_vrf_add_slave(struct net_device *dev, struct 
net_device *port_dev)
 
 out_unregister:
netdev_rx_handler_unregister(port_dev);
-out_remove:
-   __vrf_remove_slave(queue, slave);
 out_fail:
kfree(vrf_ptr);
kfree(slave);
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 0/4] vrf: cleanups part 2

2015-08-19 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov niko...@cumulusnetworks.com

Hi,
This is the next part of vrf cleanups, patch 1 drops the SLAB_PANIC when
creating kmem cache since it's handled, patch 02 removes a slave duplicate
check which is already done by the lower/upper code, patch 3 moves the
ndo_add_slave code around a bit so we can drop an error label and patch 4
drops the master device checks which are unnecessary because the ops are
taken from the master device itself so it can't be different.

Cheers,
 Nik

Nikolay Aleksandrov (4):
  vrf: don't panic on cache create failure
  vrf: remove unnecessary duplicate check
  vrf: move vrf_insert_slave so we can drop a goto label
  vrf: ndo_add|del_slave drop unnecessary checks

 drivers/net/vrf.c | 24 
 1 file changed, 4 insertions(+), 20 deletions(-)

-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/4] vrf: remove unnecessary duplicate check

2015-08-19 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov niko...@cumulusnetworks.com

The upper/lower functions already check for duplicate slaves so no need
to do it again.

Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
---
 drivers/net/vrf.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 01dc91562a88..9907550ff640 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -352,7 +352,6 @@ static int do_vrf_add_slave(struct net_device *dev, struct 
net_device *port_dev)
 {
struct net_vrf_dev *vrf_ptr = kmalloc(sizeof(*vrf_ptr), GFP_KERNEL);
struct slave *slave = kzalloc(sizeof(*slave), GFP_KERNEL);
-   struct slave *duplicate_slave;
struct net_vrf *vrf = netdev_priv(dev);
struct slave_queue *queue = vrf-queue;
int ret = -ENOMEM;
@@ -361,16 +360,9 @@ static int do_vrf_add_slave(struct net_device *dev, struct 
net_device *port_dev)
goto out_fail;
 
slave-dev = port_dev;
-
vrf_ptr-ifindex = dev-ifindex;
vrf_ptr-tb_id = vrf-tb_id;
 
-   duplicate_slave = __vrf_find_slave_dev(queue, port_dev);
-   if (duplicate_slave) {
-   ret = -EBUSY;
-   goto out_fail;
-   }
-
__vrf_insert_slave(queue, slave);
 
/* register the packet handler for slave ports */
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 03/13] ip_tunnels: use offsetofend

2015-08-19 Thread Thomas Graf
On 08/19/15 at 12:09pm, Jiri Benc wrote:
 Signed-off-by: Jiri Benc jb...@redhat.com

Acked-by: Thomas Graf tg...@suug.ch
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] veth: replace iflink by a dedicated symlink in sysfs

2015-08-19 Thread Nicolas Dichtel

Le 19/08/2015 14:48, Vincent Bernat a écrit :

  ❦ 19 août 2015 14:38 +0200, Jiri Benc jb...@redhat.com :


That's the main goal of this patch: advertising the peer link as
IFLA_LINK attribute triggers an infinite loop in userland software when
they follow iflink to discover network devices topology. iflink has
always been the index of a lower device. If a sysfs symbolic link is not
good enough, I can propose a new IFLA_PEER attribute instead.


This would cause regression and break applications for those of us who
started relying on the netnsid feature to match interfaces across net
name spaces.


Yes. Unfortunately.


This is tough. If you're going to do such thing, you would at least
need to also introduce IFLA_PEER_NETNSID.

Probably better to introduce veth netlink attribute then, something like
IFLA_VETH_PEER and keeps IFLA_LINK_NETNSID.



Yes I can.

In my opinion, the change of semantics of IFLA_LINK is a break of
API. However, I can live with it since it's easy to workaround it. It
just seemed easier to start the discussion with a patch.


I also don't know what is the best way to handle this. veth advertises
its peer via IFLA_LINK since 4.1, so it's too late to change it for this
release.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 08/13] ipv6: ndisc: inherit metadata dst when creating ndisc requests

2015-08-19 Thread Thomas Graf
On 08/19/15 at 12:09pm, Jiri Benc wrote:
 If output device wants to see the dst, inherit the dst of the original skb
 in the ndisc request.
 
 This is an IPv6 counterpart of commit 0accfc268f4d (arp: Inherit metadata
 dst when creating ARP requests).
 
 Signed-off-by: Jiri Benc jb...@redhat.com

Acked-by: Thomas Graf tg...@suug.ch
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] sctp: donot reset the overall_error_count in SHUTDOWN_RECEIVE state

2015-08-19 Thread Marcelo Ricardo Leitner
On Wed, Aug 19, 2015 at 12:38:03PM +0800, Xin Long wrote:
 commit f8d960524 fix the 0 peer.rwnd issue in SHUTDOWN_PENING state
 through not reseting the overall_error_count when recevie a heartbeat,
 but the same issue also exists in SHUTDOWN_RECEIVE state.

Please fix the typos on changelog, specially when regarding symbols so
searching for them later is more successful.

Also, to make changelog closer to the actual change, explaining why it's
okay to include the other states in there too would be good, as you're
including not only SHUTDOWN_RECEIVE but also SHUTDOWN_SENT and
SHUTDOWN_ACK_SENT.

 Fixes: f8d960524 (sctp: Enforce retransmission limit during shutdown)
 Signed-off-by: Xin Long lucien@gmail.com
 ---
  net/sctp/sm_sideeffect.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
 index fef2acd..85e6f03 100644
 --- a/net/sctp/sm_sideeffect.c
 +++ b/net/sctp/sm_sideeffect.c
 @@ -702,7 +702,7 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
* outstanding data and rely on the retransmission limit be reached
* to shutdown the association.
*/
 - if (t-asoc-state != SCTP_STATE_SHUTDOWN_PENDING)
 + if (t-asoc-state  SCTP_STATE_SHUTDOWN_PENDING)
   t-asoc-overall_error_count = 0;
  
   /* Clear the hb_sent flag to signal that we had a good
 -- 
 2.1.0
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.

2015-08-19 Thread Pravin Shelar
On Wed, Aug 19, 2015 at 11:18 AM, Jesse Gross je...@nicira.com wrote:
 On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar pshe...@nicira.com wrote:
 diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
 index e58468b..18ff83b 100644
 --- a/drivers/net/Kconfig
 +++ b/drivers/net/Kconfig
 @@ -181,7 +181,7 @@ config VXLAN

  config GENEVE
 tristate Generic Network Virtualization Encapsulation netdev
 -   depends on INET  GENEVE_CORE
 +   depends on INET
 select NET_IP_TUNNEL

 I think my comments on v1 one this patch were overlooked (about the
 UDP_TUNNEL dependency and the name).

right, I missed it.

 diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
 index 5b43382..eb298ff 100644
 --- a/drivers/net/geneve.c
 +++ b/drivers/net/geneve.c
 +static void geneve_build_header(struct genevehdr *geneveh,
 +   __be16 tun_flags, u8 vni[3],
 +   u8 options_len, u8 *options)
 [...]
 +static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb,
 +   __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
 +   bool csum)

 It seems like we could just merge these functions. I'm not sure that
 the role is all that different.

ok.

 In geneve_build_skb(), the error labels are somewhat confusing (for
 example, free_rt doesn't free the rt). Also, is it right that we don't
 free the rt if udp_tunnel_handle_offloads() fails()? It might be
 cleaner if the caller retains ownership of rt.

ok.

 My guess is that if the issue from the earlier patch about overlapping
 collect_md tunnels is fixed then that might allow us to simplify
 things a little further, since for those tunnels we can assume there
 is a 1:1 mapping between collect_md tunnels and sockets.

I dont see how it would be different. Can you elaborate on this ?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 07/13] ipv6: drop metadata dst in ip6_route_input

2015-08-19 Thread Thomas Graf
On 08/19/15 at 12:09pm, Jiri Benc wrote:
 The fix in commit 48fb6b554501 is incomplete, as now ip6_route_input can be
 called with non-NULL dst if it's a metadata dst and the reference is leaked.
 Drop the reference.
 
 Fixes: 48fb6b554501 (ipv6: fix crash over flow-based vxlan device)
 Fixes: ee122c79d422 (vxlan: Flow based tunneling)
 CC: Wei-Chun Chao weich...@plumgrid.com
 CC: Thomas Graf tg...@suug.ch
 Signed-off-by: Jiri Benc jb...@redhat.com

Acked-by: Thomas Graf tg...@suug.ch
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 09/13] vxlan: provide access function for vxlan socket address family

2015-08-19 Thread Thomas Graf
On 08/19/15 at 12:09pm, Jiri Benc wrote:
 Signed-off-by: Jiri Benc jb...@redhat.com

Acked-by: Thomas Graf tg...@suug.ch
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] bridge: fix netlink max attr size

2015-08-19 Thread sfeldma
From: Scott Feldman sfel...@gmail.com

.maxtype should match .policy.  Probably just been getting lucky here
because IFLA_BRPORT_MAX  IFLA_BR_MAX.

Fixes: 13323516 (bridge: implement rtnl_link_ops-changelink)
Signed-off-by: Scott Feldman sfel...@gmail.com
---
 net/bridge/br_netlink.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 01401ea..d2c4d66 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -849,7 +849,7 @@ struct rtnl_link_ops br_link_ops __read_mostly = {
.kind   = bridge,
.priv_size  = sizeof(struct net_bridge),
.setup  = br_dev_setup,
-   .maxtype= IFLA_BRPORT_MAX,
+   .maxtype= IFLA_BR_MAX,
.policy = br_policy,
.validate   = br_validate,
.newlink= br_dev_newlink,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 06/13] route: move lwtunnel state to dst_entry

2015-08-19 Thread Thomas Graf
On 08/19/15 at 12:09pm, Jiri Benc wrote:
 Currently, the lwtunnel state resides in per-protocol data. This is
 a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa).
 The xmit function of the tunnel does not know whether the packet has been
 routed to it by ipv4 or ipv6, yet it needs the lwtstate data. Moving the
 lwtstate data to dst_entry makes such inter-protocol tunneling possible.
 
 As a bonus, this brings a nice diffstat.
 
 Signed-off-by: Jiri Benc jb...@redhat.com
 Acked-by: Roopa Prabhu ro...@cumulusnetworks.com

Acked-by: Thomas Graf tg...@suug.ch
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Small cleanups for smsc and device property

2015-08-19 Thread Jeremy Linton
These patches are against net-next.

This patch set adds a length check to device_get_mac_addr() before
calling is_valid_ether_addr(), it also removes an unisssary dev==null
check.

The remainder is updates to the comments.

Jeremy Linton (2):
  device property: Add ETH_ALEN check, update comments.
  smsc911x: Remove dev==NULL check.

 drivers/base/property.c  | 21 +
 drivers/net/ethernet/smsc/smsc911x.c |  3 ---
 2 files changed, 13 insertions(+), 11 deletions(-)

-- 
2.4.3


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] device property: Add ETH_ALEN check, update comments.

2015-08-19 Thread Jeremy Linton
This patch adds MAC address length check back into
the device_get_mac_addr() function before calling
is_valid_ether_addr() similar to the way the OF
routine does it.

Update the comments for the two new functions.

Signed-off-by: Jeremy Linton jeremy.lin...@arm.com
---
 drivers/base/property.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/base/property.c b/drivers/base/property.c
index 2e8cd14..4c20828 100644
--- a/drivers/base/property.c
+++ b/drivers/base/property.c
@@ -537,7 +537,7 @@ bool device_dma_is_coherent(struct device *dev)
 EXPORT_SYMBOL_GPL(device_dma_is_coherent);
 
 /**
- * device_get_phy_mode - Get phy mode for given device_node
+ * device_get_phy_mode - Get phy mode for given device
  * @dev:   Pointer to the given device
  *
  * The function gets phy interface string from property 'phy-mode' or
@@ -570,13 +570,18 @@ static void *device_get_mac_addr(struct device *dev,
 {
int ret = device_property_read_u8_array(dev, name, addr, alen);
 
-   if (ret == 0  is_valid_ether_addr(addr))
+   if (ret == 0  alen == ETH_ALEN  is_valid_ether_addr(addr))
return addr;
return NULL;
 }
 
 /**
- * Search the device tree for the best MAC address to use.  'mac-address' is
+ * device_get_mac_address - Get the MAC for a given device
+ * @dev:   Pointer to the device
+ * @addr:  Address of buffer to store the MAC in
+ * @alen:  Length of the buffer pointed to by addr, should be ETH_ALEN
+ *
+ * Search the firmware node for the best MAC address to use.  'mac-address' is
  * checked first, because that is supposed to contain to most recent MAC
  * address. If that isn't set, then 'local-mac-address' is checked next,
  * because that is the default address.  If that isn't set, then the obsolete
@@ -587,11 +592,11 @@ static void *device_get_mac_addr(struct device *dev,
  * MAC address.
  *
  * All-zero MAC addresses are rejected, because those could be properties that
- * exist in the device tree, but were not set by U-Boot.  For example, the
- * DTS could define 'mac-address' and 'local-mac-address', with zero MAC
- * addresses.  Some older U-Boots only initialized 'local-mac-address'.  In
- * this case, the real MAC is in 'local-mac-address', and 'mac-address' exists
- * but is all zeros.
+ * exist in the firmware tables, but were not updated by the firmware.  For
+ * example, the DTS could define 'mac-address' and 'local-mac-address', with
+ * zero MAC addresses.  Some older U-Boots only initialized 
'local-mac-address'.
+ * In this case, the real MAC is in 'local-mac-address', and 'mac-address'
+ * exists but is all zeros.
 */
 void *device_get_mac_address(struct device *dev, char *addr, int alen)
 {
-- 
2.4.3


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] smsc911x: Remove dev==NULL check.

2015-08-19 Thread Jeremy Linton
The dev==NULL check in smsc911x_probe_config is useless
and isn't providing any additional protection. If a fwnode
doesn't exist then an appropriate error should be returned
by device_get_phy_mode() covering the original case
of a missing of/fwnode.

Signed-off-by: Jeremy Linton jeremy.lin...@arm.com
---
 drivers/net/ethernet/smsc/smsc911x.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
b/drivers/net/ethernet/smsc/smsc911x.c
index 34f9768..6eef325 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -2370,9 +2370,6 @@ static int smsc911x_probe_config(struct 
smsc911x_platform_config *config,
int phy_interface;
u32 width = 0;
 
-   if (!dev)
-   return -ENODEV;
-
phy_interface = device_get_phy_mode(dev);
if (phy_interface  0)
return phy_interface;
-- 
2.4.3


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 05/13] ip_tunnels: use tos and ttl fields also for IPv6

2015-08-19 Thread Thomas Graf
On 08/19/15 at 12:09pm, Jiri Benc wrote:
 Rename the ipv4_tos and ipv4_ttl fields to just 'tos' and 'ttl', as they'll
 be used with IPv6 tunnels, too.
 
 Signed-off-by: Jiri Benc jb...@redhat.com

Acked-by: Thomas Graf tg...@suug.ch
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] sctp: start t5 timer only when peer.rwnd == 0 and in SHUTDOWN_PENDING

2015-08-19 Thread Marcelo Ricardo Leitner
On Wed, Aug 19, 2015 at 12:39:06PM +0800, Xin Long wrote:
 when A send a data to B, A close() to be in SHUTDOWN_PENDING state,
 but B neither claim his rwnd is 0 nor SACK this data, then A keep
 retransmiting this data. it should send abord after Max.Retrans
 times, only when peer.rwnd == 0 and more than Max.Retrans times, it
 will start t5 timer.
 
 Fixes: f8d960524 (sctp: Enforce retransmission limit during shutdown)
 Signed-off-by: Xin Long lucien@gmail.com
 ---

changelog is confusing, please reword it, specially the last part.

  net/sctp/sm_statefuns.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
 index 3ee27b7..7d9380c 100644
 --- a/net/sctp/sm_statefuns.c
 +++ b/net/sctp/sm_statefuns.c
 @@ -5412,7 +5412,8 @@ sctp_disposition_t sctp_sf_do_6_3_3_rtx(struct net *net,
   SCTP_INC_STATS(net, SCTP_MIB_T3_RTX_EXPIREDS);
  
   if (asoc-overall_error_count = asoc-max_retrans) {
 - if (asoc-state == SCTP_STATE_SHUTDOWN_PENDING) {
 + if (!q-asoc-peer.rwnd 
 + asoc-state == SCTP_STATE_SHUTDOWN_PENDING) {
^

Indentation issue here. 2nd if line should start where I marked.

Other than that, looks good to me.

   /*
* We are here likely because the receiver had its rwnd
* closed for a while and we have not been able to
 -- 
 2.1.0
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi

2015-08-19 Thread Grumbach, Emmanuel


On 08/19/2015 07:08 PM, Eric Dumazet wrote:
 On Wed, 2015-08-19 at 15:07 +, Grumbach, Emmanuel wrote:
 
 I'll look at it.
 I was almost starting to implement that but then I thought with another
 (good?) reason to use LSO. LSO gives me the guarantee that the packet is
 directed to one peer, which might not be the case with xmit_more since
 we have one Qdisc for several clients in case we are in AP mode.
 Building an A-MSDU for several clients is not possible, at least not for
 several client in the L2 (different MAC addresses).
 LSO avoids this problem completely.
 
 Then, simply calling skb_gso_segment() from the driver might be enough,
 and less work for you.
 
 This would even support TSO on IPv6
 

Well... I did take care of IPv6.

 segs = skb_gso_segment(skb, tp-dev-features 
 ~(NETIF_F_TSO | NETIF_F_TSO6));
 
 

Thing is that our HW layers are currently implemented to receive one skb
per 802.11 packet. So that if I call skb_gso_segment, I'd have to
re-assemble the segs into one A-MSDU which would translate one skb.
I guess I could change the HW layer in the driver to be able to get a
list of skbs and make a single packet out of it, but that'd be tricky or
wasteful. skb_gso_segment will duplicate the wifi header while it is not
needed. Only the TCP / IP / SNAP headers need to be duplicated.
Moreover, each subframe in the A-MSDU needs it own subframe header (same
format as ethhdr) and there is also some padding in there. So that would
be even more complicated IMHO.
My code doesn't copy any payload. Only the headers. This is why I
thought it'd be better than segmenting and then re-assembling.
I did call skb_gso_segment if I get lots of payload in the header (more
than 2 * mss) in order to simplify the implementation.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi

2015-08-19 Thread Eric Dumazet
On Wed, 2015-08-19 at 17:00 +, Grumbach, Emmanuel wrote:
 
 On 08/19/2015 07:08 PM, Eric Dumazet wrote:
  On Wed, 2015-08-19 at 15:07 +, Grumbach, Emmanuel wrote:
  
  I'll look at it.
  I was almost starting to implement that but then I thought with another
  (good?) reason to use LSO. LSO gives me the guarantee that the packet is
  directed to one peer, which might not be the case with xmit_more since
  we have one Qdisc for several clients in case we are in AP mode.
  Building an A-MSDU for several clients is not possible, at least not for
  several client in the L2 (different MAC addresses).
  LSO avoids this problem completely.
  
  Then, simply calling skb_gso_segment() from the driver might be enough,
  and less work for you.
  
  This would even support TSO on IPv6
  
 
 Well... I did take care of IPv6.

net/core/tso.c does not yet handle IPv6


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: Enable configuration of ageing interval for bridges and switch devices.

2015-08-19 Thread Scott Feldman
On Wed, Aug 19, 2015 at 2:34 AM, Premkumar Jonnala
pjonn...@broadcom.com wrote:
 Hello Scott,

 Thank you for the diff and comments.   Please see my comments inline.

 -Original Message-
 From: Scott Feldman [mailto:sfel...@gmail.com]
 Sent: Tuesday, August 18, 2015 12:48 PM
 To: Premkumar Jonnala
 Cc: netdev@vger.kernel.org
 Subject: Re: [PATCH] bridge: Enable configuration of ageing interval for 
 bridges
 and switch devices.



 On Fri, 14 Aug 2015, Premkumar Jonnala wrote:

  Bridge devices have ageing interval used to age out MAC addresses
  from FDB.  This ageing interval was not configuratble.
 
  Enable netlink based configuration of ageing interval for bridges and
  switch devices.  The ageing interval changes the timer used to purge
  inactive FDB entries in bridges.  The ageing interval config is
  propagated to switch devices, so that platform or hardware based
  ageing works according to configuration.
 
  Signed-off-by: Premkumar Jonnala pjonn...@broadcom.com

 Hi Premkumar,

 I agree with Roopa that we should use existing IFLA_BR_AGEING_TIME.

 What is the motivation for using 'ip link' command to configure bridge 
 attributes?  IMHO,
 bridge command is better suited for that.

Can you extend bridge command to allow setting/getting these bridge
attrs?  Looks like you construct a RTM_NEWLINK IFLA_INFO_DATA msg.  No
changes needed to the kernel.

bridge link set dev br0 ageing_time 1000

 --or--

ip link set dev br0 type bridge ageing_time 1000

 diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
 index 0f2408f..01401ea 100644
 --- a/net/bridge/br_netlink.c
 +++ b/net/bridge/br_netlink.c
 @@ -759,9 +759,9 @@ static int br_changelink(struct net_device *brdev, struct
 nlattr *tb[],
   }

   if (data[IFLA_BR_AGEING_TIME]) {
 - u32 ageing_time = nla_get_u32(data[IFLA_BR_AGEING_TIME]);

 Should we do some range checking here to ensure that the value is within a 
 certain range.
 IEEE 802.1d recommends that the ageing time be between 10 sec and 1 million 
 seconds.

Sure, but make that a separate patch.

 +int br_set_ageing_time(struct net_bridge *br, u32 ageing_time)
 +{
 + struct switchdev_attr attr = {
 + .id = SWITCHDEV_ATTR_BRIDGE,
 + .flags = SWITCHDEV_F_SKIP_EOPNOTSUPP,
 + .u.bridge.attr = IFLA_BR_AGEING_TIME,
 + .u.bridge.val = ageing_time,
 + };
 + int err;
 +
 + err = switchdev_port_attr_set(br-dev, attr);
 + if (err)
 + return err;
 +
 + br-ageing_time = clock_t_to_jiffies(ageing_time);

 Should we restart the timer here the new time takes effect?

I don't know...I just copied what the original code did.  If it does
need to be restarted, break that out as a separate patch.

-scott
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi

2015-08-19 Thread Eric Dumazet
On Wed, 2015-08-19 at 17:56 +, Grumbach, Emmanuel wrote:
 

 So I feel that making net/core/tso.c more complicated just because of
 our craziness seems an overkill to me.
 I'll try a bit harder to see how I can use net/core/tso.c, but I have to
 say I am pessimistic.

net/core/tso.c is WIP, feel free to expand it to make it more generic
and meet your needs.

The point is : we want a core infrastructure, not something that each
individual driver implements in ~500 lines of code :(


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


netlink_route kernel data dump size increased

2015-08-19 Thread tej parkash
All,

We are running application on Linux Kernel 3.10 to collect network
interface information using  NETLINK_ROUTE protocol. earlier (kernel
2.6.32) we were having 8K buffer allocated to collect all data but
with new kernel (3.10) we are seeing read socket error, as buffer size
is not sufficient for all network dump data.

We want to understand that if the userspace buffer limit increased to
16K or we need some other mechanism to collect the data in 8K chuck.
or Is there any other way application can use NETLINK_ROUTE  protocol,
so that it will not break the application if data size gets increased
in future.

I did some some browsing and found some link but they were not very conclusive.
http://www.spinics.net/lists/netdev/msg162185.html

Appreciate for any kind of help or pointers here


Thanks
Tej
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 04/13] ip_tunnels: add IPv6 addresses to ip_tunnel_key

2015-08-19 Thread Thomas Graf
On 08/19/15 at 12:09pm, Jiri Benc wrote:
 Add the IPv6 addresses as an union with IPv4 ones. When using IPv4, the
 newly introduced padding after the IPv4 addresses needs to be zeroed out.
 
 Signed-off-by: Jiri Benc jb...@redhat.com
 ---
 v1-v2: Fix incorrect IP_TUNNEL_KEY_IPV4_PAD_LEN calculation, thanks to
 Alexei.


Acked-by: Thomas Graf tg...@suug.ch
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi

2015-08-19 Thread Grumbach, Emmanuel


On 08/19/2015 09:02 PM, Eric Dumazet wrote:
 On Wed, 2015-08-19 at 17:56 +, Grumbach, Emmanuel wrote:

 
 So I feel that making net/core/tso.c more complicated just because of
 our craziness seems an overkill to me.
 I'll try a bit harder to see how I can use net/core/tso.c, but I have to
 say I am pessimistic.
 
 net/core/tso.c is WIP, feel free to expand it to make it more generic
 and meet your needs.

Yeah - trying to see what can be done.


 The point is : we want a core infrastructure, not something that each
 individual driver implements in ~500 lines of code :(
 
 

I totally understand that :) I just claim to be unique in a way that
each individual driver is ... only me :)

I guess that if we would build the DMA descriptors directly from the
skb_gso (the skb coming from the stack), that's be easier. Our HW
abstraction layer wants an skb and I need to pass several skbs (because
skb-len is very likely not to fit in one single 802.11 packet even if
it is an A-MSDU).
So, trying to use net/core/tso.c basically means, to open the arch of
our driver... Not impossible, but quite a bit of work.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 1/3] iwlwifi: mvm: add real TSO implementation

2015-08-19 Thread Grumbach, Emmanuel


On 08/19/2015 05:18 PM, Eric Dumazet wrote:
 On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote:
 The segmentation is done completely in software. The
 driver creates several MPDUs out of a single large send.
 Each MPDU is a newly allocated SKB.
 A page is allocated to create the headers that need to be
 duplicated (SNAP / IP / TCP). The WiFi header is in the
 header of the newly created SKBs.

 type=feature

 Change-Id: I238ffa79cacc5bbdacdfbf3e9673c8d4f02b462a
 Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com
 ---
  drivers/net/wireless/iwlwifi/mvm/tx.c | 513 
 +++---
  1 file changed, 481 insertions(+), 32 deletions(-)
 
 Ouch dynamic allocations while doing xmit are certainly not needed.
 Your driver should pre-allocated space for headers.

This is right as long as you don't need *several* headers in one single
skb. In the case of A-MSDU, I need to have several TCP / IP / SNAP
headers in the same skb. At least that's how my HW layer in the driver
is built. See the other thread.

 
 Drivers willing to implement tso have to use net/core/tso.c provided
 helpers.
 
 $ git grep -n tso_build_hdr
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c:1030:
 tso_build_hdr(skb, hdr, tso, data_left, total_len == 0);
 drivers/net/ethernet/freescale/fec_main.c:729:  tso_build_hdr(skb, 
 hdr, tso, data_left, total_len == 0);
 drivers/net/ethernet/marvell/mv643xx_eth.c:842: tso_build_hdr(skb, 
 hdr, tso, data_left, total_len == 0);
 drivers/net/ethernet/marvell/mvneta.c:1650: tso_build_hdr(skb, 
 hdr, tso, data_left, total_len == 0);
 include/net/tso.h:15:void tso_build_hdr(struct sk_buff *skb, char *hdr, 
 struct tso_t *tso,
 net/core/tso.c:14:void tso_build_hdr(struct sk_buff *skb, char *hdr, struct 
 tso_t *tso,
 net/core/tso.c:37:EXPORT_SYMBOL(tso_build_hdr);
 
 

This looks promising indeed. I'll take a close look.
Thanks a bunch.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] vrf: plug skb leaks

2015-08-19 Thread David Ahern

Hi Nikolay:

On 8/18/15 8:12 PM, Nikolay Aleksandrov wrote:

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index ed208317cbb5..4aa06450fafa 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -97,6 +97,12 @@ static bool is_ip_rx_frame(struct sk_buff *skb)
return false;
  }

+static void vrf_tx_error(struct net_device *vrf_dev, struct sk_buff *skb)
+{
+   vrf_dev-stats.tx_errors++;
+   kfree_skb(skb);
+}
+
  /* note: already called with rcu_read_lock */
  static rx_handler_result_t vrf_handle_frame(struct sk_buff **pskb)
  {
@@ -149,7 +155,8 @@ static struct rtnl_link_stats64 *vrf_get_stats64(struct 
net_device *dev,
  static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
   struct net_device *dev)
  {
-   return 0;
+   vrf_tx_error(dev, skb);
+   return NET_XMIT_DROP;
  }

  static int vrf_send_v4_prep(struct sk_buff *skb, struct flowi4 *fl4,
@@ -206,8 +213,7 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff 
*skb,
  out:
return ret;
  err:
-   vrf_dev-stats.tx_errors++;
-   kfree_skb(skb);
+   vrf_tx_error(vrf_dev, skb);
goto out;
  }

@@ -219,6 +225,7 @@ static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, 
struct net_device *dev)
case htons(ETH_P_IPV6):
return vrf_process_v6_outbound(skb, dev);
default:
+   vrf_tx_error(dev, skb);
return NET_XMIT_DROP;
}
  }



Would be simpler to do the vrf_tx_error at the end of is_ip_tx_frame() 
if ret == NET_XMIT_DROP.


David

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/4] vrf: cleanups part 2

2015-08-19 Thread David Ahern

On 8/18/15 8:27 PM, Nikolay Aleksandrov wrote:

From: Nikolay Aleksandrov niko...@cumulusnetworks.com

Hi,
This is the next part of vrf cleanups, patch 1 drops the SLAB_PANIC when
creating kmem cache since it's handled, patch 02 removes a slave duplicate
check which is already done by the lower/upper code, patch 3 moves the
ndo_add_slave code around a bit so we can drop an error label and patch 4
drops the master device checks which are unnecessary because the ops are
taken from the master device itself so it can't be different.

Cheers,
  Nik

Nikolay Aleksandrov (4):
   vrf: don't panic on cache create failure
   vrf: remove unnecessary duplicate check
   vrf: move vrf_insert_slave so we can drop a goto label
   vrf: ndo_add|del_slave drop unnecessary checks

  drivers/net/vrf.c | 24 
  1 file changed, 4 insertions(+), 20 deletions(-)



Looks good to me. Thanks, Nikolay.

Acked-by: David Ahern d...@cumulusnetworks.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi

2015-08-19 Thread Grumbach, Emmanuel


On 08/19/2015 08:20 PM, Eric Dumazet wrote:
 On Wed, 2015-08-19 at 17:00 +, Grumbach, Emmanuel wrote:

 On 08/19/2015 07:08 PM, Eric Dumazet wrote:
 On Wed, 2015-08-19 at 15:07 +, Grumbach, Emmanuel wrote:

 I'll look at it.
 I was almost starting to implement that but then I thought with another
 (good?) reason to use LSO. LSO gives me the guarantee that the packet is
 directed to one peer, which might not be the case with xmit_more since
 we have one Qdisc for several clients in case we are in AP mode.
 Building an A-MSDU for several clients is not possible, at least not for
 several client in the L2 (different MAC addresses).
 LSO avoids this problem completely.

 Then, simply calling skb_gso_segment() from the driver might be enough,
 and less work for you.

 This would even support TSO on IPv6


 Well... I did take care of IPv6.
 
 net/core/tso.c does not yet handle IPv6
 

Yeah - I can see that now.
I can teach him - that's not a big deal. The bigger problem is that
net/core/tso.c doesn't do what I really need: it does only a small
portion. Since I need to add one frag to several skbs, I need to
refcount the frags' page. net/core/tso.c hides the page from me.

I can try to use tso_build_hdr but it will copy the entire header where
I need only SNAP / IP / TCP (and not 802.11).
I am getting the feeling that net/core/tso.c is close to what I need,
but not close enough to be usable without making changes that would make
the implementation too complicated and changing net/core/tso.c in a way
that would be much less readable for other users.
I know that our device is quite unique in the sense that most other
vendors do all the header twiddling in hardware. We unfortunately don't.
The A-MSDU's format is also somewhat unusual:

802.11 HDR
ETHSNAPIPTCPPAYLOADPAD (variable length)
ETHSNAPIPTCPPAYLOADPAD (variable length)
ETHSNAPIPTCPPAYLOADPAD (variable length)
etc...

So I feel that making net/core/tso.c more complicated just because of
our craziness seems an overkill to me.
I'll try a bit harder to see how I can use net/core/tso.c, but I have to
say I am pessimistic.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] bridge: Enable configuration of ageing interval for bridges and switch devices.

2015-08-19 Thread Wilson, Daniel G
 -Original Message-
 From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
 On Behalf Of Scott Feldman
 Sent: Wednesday, August 19, 2015 12:54 PM
 To: Premkumar Jonnala
 Cc: netdev@vger.kernel.org
 Subject: Re: [PATCH] bridge: Enable configuration of ageing interval for 
 bridges
 and switch devices.
 
 On Wed, Aug 19, 2015 at 2:34 AM, Premkumar Jonnala
 pjonn...@broadcom.com wrote:
  Hello Scott,
 
  Thank you for the diff and comments.   Please see my comments inline.
 
  -Original Message-
  From: Scott Feldman [mailto:sfel...@gmail.com]
  Sent: Tuesday, August 18, 2015 12:48 PM
  To: Premkumar Jonnala
  Cc: netdev@vger.kernel.org
  Subject: Re: [PATCH] bridge: Enable configuration of ageing interval
  for bridges and switch devices.
 
 
 
  On Fri, 14 Aug 2015, Premkumar Jonnala wrote:
 
   Bridge devices have ageing interval used to age out MAC addresses
   from FDB.  This ageing interval was not configuratble.
  
   Enable netlink based configuration of ageing interval for bridges
   and switch devices.  The ageing interval changes the timer used to
   purge inactive FDB entries in bridges.  The ageing interval config
   is propagated to switch devices, so that platform or hardware based
   ageing works according to configuration.
  
   Signed-off-by: Premkumar Jonnala pjonn...@broadcom.com
 
  Hi Premkumar,
 
  I agree with Roopa that we should use existing IFLA_BR_AGEING_TIME.
 
  What is the motivation for using 'ip link' command to configure bridge
  attributes?  IMHO, bridge command is better suited for that.
 
 Can you extend bridge command to allow setting/getting these bridge attrs?
 Looks like you construct a RTM_NEWLINK IFLA_INFO_DATA msg.  No changes
 needed to the kernel.
 
 bridge link set dev br0 ageing_time 1000
 
  --or--
 
 ip link set dev br0 type bridge ageing_time 1000

Being able to set these attributes via both bridge and ip would be great.

  +int br_set_ageing_time(struct net_bridge *br, u32 ageing_time) {
  + struct switchdev_attr attr = {
  + .id = SWITCHDEV_ATTR_BRIDGE,
  + .flags = SWITCHDEV_F_SKIP_EOPNOTSUPP,
  + .u.bridge.attr = IFLA_BR_AGEING_TIME,
  + .u.bridge.val = ageing_time,
  + };
  + int err;
  +
  + err = switchdev_port_attr_set(br-dev, attr);
  + if (err)
  + return err;
  +
  + br-ageing_time = clock_t_to_jiffies(ageing_time);
 
  Should we restart the timer here the new time takes effect?
 
 I don't know...I just copied what the original code did.  If it does need to 
 be
 restarted, break that out as a separate patch.

In my opinion, yes, the timer should be restarted. If the timer had been set to 
1 million seconds and is being changed to 1 minute, you wouldn't want to wait 
for the 1-million-second  timer to expire before resetting it to the 
newly-configured 1-minute timer value.

Dan.



Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.

2015-08-19 Thread Jesse Gross
On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar pshe...@nicira.com wrote:
 diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
 index e58468b..18ff83b 100644
 --- a/drivers/net/Kconfig
 +++ b/drivers/net/Kconfig
 @@ -181,7 +181,7 @@ config VXLAN

  config GENEVE
 tristate Generic Network Virtualization Encapsulation netdev
 -   depends on INET  GENEVE_CORE
 +   depends on INET
 select NET_IP_TUNNEL

I think my comments on v1 one this patch were overlooked (about the
UDP_TUNNEL dependency and the name).

 diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
 index 5b43382..eb298ff 100644
 --- a/drivers/net/geneve.c
 +++ b/drivers/net/geneve.c
 +static void geneve_build_header(struct genevehdr *geneveh,
 +   __be16 tun_flags, u8 vni[3],
 +   u8 options_len, u8 *options)
[...]
 +static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb,
 +   __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
 +   bool csum)

It seems like we could just merge these functions. I'm not sure that
the role is all that different.

In geneve_build_skb(), the error labels are somewhat confusing (for
example, free_rt doesn't free the rt). Also, is it right that we don't
free the rt if udp_tunnel_handle_offloads() fails()? It might be
cleaner if the caller retains ownership of rt.

My guess is that if the issue from the earlier patch about overlapping
collect_md tunnels is fixed then that might allow us to simplify
things a little further, since for those tunnels we can assume there
is a 1:1 mapping between collect_md tunnels and sockets.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/15] netfilter: xt_TEE: get rid of WITH_CONNTRACK definition

2015-08-19 Thread Pablo Neira Ayuso
Use IS_ENABLED(CONFIG_NF_CONNTRACK) instead.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/xt_TEE.c |8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
index c5d6556..0ed9fb6 100644
--- a/net/netfilter/xt_TEE.c
+++ b/net/netfilter/xt_TEE.c
@@ -24,10 +24,8 @@
 #include net/route.h
 #include linux/netfilter/x_tables.h
 #include linux/netfilter/xt_TEE.h
-
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
-#  define WITH_CONNTRACK 1
-#  include net/netfilter/nf_conntrack.h
+#include net/netfilter/nf_conntrack.h
 #endif
 
 struct xt_tee_priv {
@@ -99,7 +97,7 @@ tee_tg4(struct sk_buff *skb, const struct xt_action_param 
*par)
if (skb == NULL)
return XT_CONTINUE;
 
-#ifdef WITH_CONNTRACK
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
/* Avoid counting cloned packets towards the original connection. */
nf_conntrack_put(skb-nfct);
skb-nfct = nf_ct_untracked_get()-ct_general;
@@ -175,7 +173,7 @@ tee_tg6(struct sk_buff *skb, const struct xt_action_param 
*par)
if (skb == NULL)
return XT_CONTINUE;
 
-#ifdef WITH_CONNTRACK
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
nf_conntrack_put(skb-nfct);
skb-nfct = nf_ct_untracked_get()-ct_general;
skb-nfctinfo = IP_CT_NEW;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/15] netfilter: nft_limit: add per-byte limiting

2015-08-19 Thread Pablo Neira Ayuso
This patch adds a new NFTA_LIMIT_TYPE netlink attribute to indicate the type of
limiting.

Contrary to per-packet limiting, the cost is calculated from the packet path
since this depends on the packet length.

The burst attribute indicates the number of bytes in which the rate can be
exceeded.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/uapi/linux/netfilter/nf_tables.h |7 
 net/netfilter/nft_limit.c|   63 --
 2 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_tables.h 
b/include/uapi/linux/netfilter/nf_tables.h
index cafd789..d8c8a7c 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -756,18 +756,25 @@ enum nft_ct_attributes {
 };
 #define NFTA_CT_MAX(__NFTA_CT_MAX - 1)
 
+enum nft_limit_type {
+   NFT_LIMIT_PKTS,
+   NFT_LIMIT_PKT_BYTES
+};
+
 /**
  * enum nft_limit_attributes - nf_tables limit expression netlink attributes
  *
  * @NFTA_LIMIT_RATE: refill rate (NLA_U64)
  * @NFTA_LIMIT_UNIT: refill unit (NLA_U64)
  * @NFTA_LIMIT_BURST: burst (NLA_U32)
+ * @NFTA_LIMIT_TYPE: type of limit (NLA_U32: enum nft_limit_type)
  */
 enum nft_limit_attributes {
NFTA_LIMIT_UNSPEC,
NFTA_LIMIT_RATE,
NFTA_LIMIT_UNIT,
NFTA_LIMIT_BURST,
+   NFTA_LIMIT_TYPE,
__NFTA_LIMIT_MAX
 };
 #define NFTA_LIMIT_MAX (__NFTA_LIMIT_MAX - 1)
diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c
index b418698..5d67938 100644
--- a/net/netfilter/nft_limit.c
+++ b/net/netfilter/nft_limit.c
@@ -83,14 +83,16 @@ static int nft_limit_init(struct nft_limit *limit,
return 0;
 }
 
-static int nft_limit_dump(struct sk_buff *skb, const struct nft_limit *limit)
+static int nft_limit_dump(struct sk_buff *skb, const struct nft_limit *limit,
+ enum nft_limit_type type)
 {
u64 secs = div_u64(limit-nsecs, NSEC_PER_SEC);
u64 rate = limit-rate - limit-burst;
 
if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(rate)) ||
nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(secs)) ||
-   nla_put_be32(skb, NFTA_LIMIT_BURST, htonl(limit-burst)))
+   nla_put_be32(skb, NFTA_LIMIT_BURST, htonl(limit-burst)) ||
+   nla_put_be32(skb, NFTA_LIMIT_TYPE, htonl(type)))
goto nla_put_failure;
return 0;
 
@@ -117,6 +119,7 @@ static const struct nla_policy 
nft_limit_policy[NFTA_LIMIT_MAX + 1] = {
[NFTA_LIMIT_RATE]   = { .type = NLA_U64 },
[NFTA_LIMIT_UNIT]   = { .type = NLA_U64 },
[NFTA_LIMIT_BURST]  = { .type = NLA_U32 },
+   [NFTA_LIMIT_TYPE]   = { .type = NLA_U32 },
 };
 
 static int nft_limit_pkts_init(const struct nft_ctx *ctx,
@@ -138,7 +141,7 @@ static int nft_limit_pkts_dump(struct sk_buff *skb, const 
struct nft_expr *expr)
 {
const struct nft_limit_pkts *priv = nft_expr_priv(expr);
 
-   return nft_limit_dump(skb, priv-limit);
+   return nft_limit_dump(skb, priv-limit, NFT_LIMIT_PKTS);
 }
 
 static struct nft_expr_type nft_limit_type;
@@ -150,9 +153,61 @@ static const struct nft_expr_ops nft_limit_pkts_ops = {
.dump   = nft_limit_pkts_dump,
 };
 
+static void nft_limit_pkt_bytes_eval(const struct nft_expr *expr,
+struct nft_regs *regs,
+const struct nft_pktinfo *pkt)
+{
+   struct nft_limit *priv = nft_expr_priv(expr);
+   u64 cost = div_u64(priv-nsecs * pkt-skb-len, priv-rate);
+
+   if (nft_limit_eval(priv, cost))
+   regs-verdict.code = NFT_BREAK;
+}
+
+static int nft_limit_pkt_bytes_init(const struct nft_ctx *ctx,
+   const struct nft_expr *expr,
+   const struct nlattr * const tb[])
+{
+   struct nft_limit *priv = nft_expr_priv(expr);
+
+   return nft_limit_init(priv, tb);
+}
+
+static int nft_limit_pkt_bytes_dump(struct sk_buff *skb,
+   const struct nft_expr *expr)
+{
+   const struct nft_limit *priv = nft_expr_priv(expr);
+
+   return nft_limit_dump(skb, priv, NFT_LIMIT_PKT_BYTES);
+}
+
+static const struct nft_expr_ops nft_limit_pkt_bytes_ops = {
+   .type   = nft_limit_type,
+   .size   = NFT_EXPR_SIZE(sizeof(struct nft_limit)),
+   .eval   = nft_limit_pkt_bytes_eval,
+   .init   = nft_limit_pkt_bytes_init,
+   .dump   = nft_limit_pkt_bytes_dump,
+};
+
+static const struct nft_expr_ops *
+nft_limit_select_ops(const struct nft_ctx *ctx,
+const struct nlattr * const tb[])
+{
+   if (tb[NFTA_LIMIT_TYPE] == NULL)
+   return nft_limit_pkts_ops;
+
+   switch (ntohl(nla_get_be32(tb[NFTA_LIMIT_TYPE]))) {
+   case NFT_LIMIT_PKTS:
+   return nft_limit_pkts_ops;
+   case NFT_LIMIT_PKT_BYTES:
+   

[PATCH 14/15] netfilter: nf_conntrack: add efficient mark to zone mapping

2015-08-19 Thread Pablo Neira Ayuso
From: Daniel Borkmann dan...@iogearbox.net

This work adds the possibility of deriving the zone id from the skb-mark
field in a scalable manner. This allows for having only a single template
serving hundreds/thousands of different zones, for example, instead of the
need to have one match for each zone as an extra CT jump target.

Note that we'd need to have this information attached to the template as at
the time when we're trying to lookup a possible ct object, we already need
to know zone information for a possible match when going into
__nf_conntrack_find_get(). This work provides a minimal implementation for
a possible mapping.

In order to not add/expose an extra ct-status bit, the zone structure has
been extended to carry a flag for deriving the mark.

Signed-off-by: Daniel Borkmann dan...@iogearbox.net
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/net/netfilter/nf_conntrack_zones.h |   45 +++--
 include/uapi/linux/netfilter/xt_CT.h   |4 +-
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c   |3 +-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |4 +-
 net/netfilter/nf_conntrack_core.c  |   50 
 net/netfilter/nf_conntrack_netlink.c   |5 +--
 net/netfilter/xt_CT.c  |5 ++-
 7 files changed, 72 insertions(+), 44 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_zones.h 
b/include/net/netfilter/nf_conntrack_zones.h
index 3942ddf..5316c7b 100644
--- a/include/net/netfilter/nf_conntrack_zones.h
+++ b/include/net/netfilter/nf_conntrack_zones.h
@@ -10,9 +10,12 @@
 
 #define NF_CT_DEFAULT_ZONE_DIR (NF_CT_ZONE_DIR_ORIG | NF_CT_ZONE_DIR_REPL)
 
+#define NF_CT_FLAG_MARK1
+
 struct nf_conntrack_zone {
u16 id;
-   u16 dir;
+   u8  flags;
+   u8  dir;
 };
 
 extern const struct nf_conntrack_zone nf_ct_zone_dflt;
@@ -32,9 +35,45 @@ nf_ct_zone(const struct nf_conn *ct)
 }
 
 static inline const struct nf_conntrack_zone *
-nf_ct_zone_tmpl(const struct nf_conn *tmpl)
+nf_ct_zone_init(struct nf_conntrack_zone *zone, u16 id, u8 dir, u8 flags)
+{
+   zone-id = id;
+   zone-flags = flags;
+   zone-dir = dir;
+
+   return zone;
+}
+
+static inline const struct nf_conntrack_zone *
+nf_ct_zone_tmpl(const struct nf_conn *tmpl, const struct sk_buff *skb,
+   struct nf_conntrack_zone *tmp)
+{
+   const struct nf_conntrack_zone *zone;
+
+   if (!tmpl)
+   return nf_ct_zone_dflt;
+
+   zone = nf_ct_zone(tmpl);
+   if (zone-flags  NF_CT_FLAG_MARK)
+   zone = nf_ct_zone_init(tmp, skb-mark, zone-dir, 0);
+
+   return zone;
+}
+
+static inline int nf_ct_zone_add(struct nf_conn *ct, gfp_t flags,
+const struct nf_conntrack_zone *info)
 {
-   return tmpl ? nf_ct_zone(tmpl) : nf_ct_zone_dflt;
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+   struct nf_conntrack_zone *nf_ct_zone;
+
+   nf_ct_zone = nf_ct_ext_add(ct, NF_CT_EXT_ZONE, flags);
+   if (!nf_ct_zone)
+   return -ENOMEM;
+
+   nf_ct_zone_init(nf_ct_zone, info-id, info-dir,
+   info-flags);
+#endif
+   return 0;
 }
 
 static inline bool nf_ct_zone_matches_dir(const struct nf_conntrack_zone *zone,
diff --git a/include/uapi/linux/netfilter/xt_CT.h 
b/include/uapi/linux/netfilter/xt_CT.h
index 452005f..9e52041 100644
--- a/include/uapi/linux/netfilter/xt_CT.h
+++ b/include/uapi/linux/netfilter/xt_CT.h
@@ -8,9 +8,11 @@ enum {
XT_CT_NOTRACK_ALIAS = 1  1,
XT_CT_ZONE_DIR_ORIG = 1  2,
XT_CT_ZONE_DIR_REPL = 1  3,
+   XT_CT_ZONE_MARK = 1  4,
 
XT_CT_MASK  = XT_CT_NOTRACK | XT_CT_NOTRACK_ALIAS |
- XT_CT_ZONE_DIR_ORIG | XT_CT_ZONE_DIR_REPL,
+ XT_CT_ZONE_DIR_ORIG | XT_CT_ZONE_DIR_REPL |
+ XT_CT_ZONE_MARK,
 };
 
 struct xt_ct_target_info {
diff --git a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c 
b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
index 8a2f41c..cdde3ec 100644
--- a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
+++ b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
@@ -135,9 +135,10 @@ icmp_error_message(struct net *net, struct nf_conn *tmpl, 
struct sk_buff *skb,
const struct nf_conntrack_l4proto *innerproto;
const struct nf_conntrack_tuple_hash *h;
const struct nf_conntrack_zone *zone;
+   struct nf_conntrack_zone tmp;
 
NF_CT_ASSERT(skb-nfct == NULL);
-   zone = nf_ct_zone_tmpl(tmpl);
+   zone = nf_ct_zone_tmpl(tmpl, skb, tmp);
 
/* Are they talking about one of our connections? */
if (!nf_ct_get_tuplepr(skb,
diff --git a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c 
b/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
index 2029141..0e6fae1 100644
--- a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
+++ 

[PATCH 03/15] netfilter: factor out packet duplication for IPv4/IPv6

2015-08-19 Thread Pablo Neira Ayuso
Extracted from the xtables TEE target. This creates two new modules for IPv4
and IPv6 that are shared between the TEE target and the new nf_tables dup
expressions.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/net/netfilter/ipv4/nf_dup_ipv4.h |7 ++
 include/net/netfilter/ipv6/nf_dup_ipv6.h |7 ++
 net/ipv4/netfilter/Kconfig   |6 ++
 net/ipv4/netfilter/Makefile  |2 +
 net/ipv4/netfilter/nf_dup_ipv4.c |  120 +++
 net/ipv6/netfilter/Kconfig   |6 ++
 net/ipv6/netfilter/Makefile  |2 +
 net/ipv6/netfilter/nf_dup_ipv6.c |   96 ++
 net/netfilter/Kconfig|2 +
 net/netfilter/xt_TEE.c   |  158 ++
 10 files changed, 254 insertions(+), 152 deletions(-)
 create mode 100644 include/net/netfilter/ipv4/nf_dup_ipv4.h
 create mode 100644 include/net/netfilter/ipv6/nf_dup_ipv6.h
 create mode 100644 net/ipv4/netfilter/nf_dup_ipv4.c
 create mode 100644 net/ipv6/netfilter/nf_dup_ipv6.c

diff --git a/include/net/netfilter/ipv4/nf_dup_ipv4.h 
b/include/net/netfilter/ipv4/nf_dup_ipv4.h
new file mode 100644
index 000..42008f1
--- /dev/null
+++ b/include/net/netfilter/ipv4/nf_dup_ipv4.h
@@ -0,0 +1,7 @@
+#ifndef _NF_DUP_IPV4_H_
+#define _NF_DUP_IPV4_H_
+
+void nf_dup_ipv4(struct sk_buff *skb, unsigned int hooknum,
+const struct in_addr *gw, int oif);
+
+#endif /* _NF_DUP_IPV4_H_ */
diff --git a/include/net/netfilter/ipv6/nf_dup_ipv6.h 
b/include/net/netfilter/ipv6/nf_dup_ipv6.h
new file mode 100644
index 000..ed6bd66
--- /dev/null
+++ b/include/net/netfilter/ipv6/nf_dup_ipv6.h
@@ -0,0 +1,7 @@
+#ifndef _NF_DUP_IPV6_H_
+#define _NF_DUP_IPV6_H_
+
+void nf_dup_ipv6(struct sk_buff *skb, unsigned int hooknum,
+const struct in6_addr *gw, int oif);
+
+#endif /* _NF_DUP_IPV6_H_ */
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 2199a5d..0142ea2 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -67,6 +67,12 @@ config NF_TABLES_ARP
 
 endif # NF_TABLES
 
+config NF_DUP_IPV4
+   tristate Netfilter IPv4 packet duplication to alternate destination
+   help
+ This option enables the nf_dup_ipv4 core, which duplicates an IPv4
+ packet to be rerouted to another destination.
+
 config NF_LOG_ARP
tristate ARP packet logging
default m if NETFILTER_ADVANCED=n
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 7fe6c70..9136ffc 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -70,3 +70,5 @@ obj-$(CONFIG_IP_NF_ARP_MANGLE) += arpt_mangle.o
 
 # just filtering instance of ARP tables for now
 obj-$(CONFIG_IP_NF_ARPFILTER) += arptable_filter.o
+
+obj-$(CONFIG_NF_DUP_IPV4) += nf_dup_ipv4.o
diff --git a/net/ipv4/netfilter/nf_dup_ipv4.c b/net/ipv4/netfilter/nf_dup_ipv4.c
new file mode 100644
index 000..eff85ab
--- /dev/null
+++ b/net/ipv4/netfilter/nf_dup_ipv4.c
@@ -0,0 +1,120 @@
+/*
+ * (C) 2007 by Sebastian Cla??en sebastian.clas...@freenet.ag
+ * (C) 2007-2010 by Jan Engelhardt jeng...@medozas.de
+ *
+ * Extracted from xt_TEE.c
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 or later, as
+ * published by the Free Software Foundation.
+ */
+#include linux/ip.h
+#include linux/module.h
+#include linux/percpu.h
+#include linux/route.h
+#include linux/skbuff.h
+#include net/checksum.h
+#include net/icmp.h
+#include net/ip.h
+#include net/route.h
+#include net/netfilter/ipv4/nf_dup_ipv4.h
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include net/netfilter/nf_conntrack.h
+#endif
+
+static struct net *pick_net(struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_NS
+   const struct dst_entry *dst;
+
+   if (skb-dev != NULL)
+   return dev_net(skb-dev);
+   dst = skb_dst(skb);
+   if (dst != NULL  dst-dev != NULL)
+   return dev_net(dst-dev);
+#endif
+   return init_net;
+}
+
+static bool nf_dup_ipv4_route(struct sk_buff *skb, const struct in_addr *gw,
+ int oif)
+{
+   const struct iphdr *iph = ip_hdr(skb);
+   struct net *net = pick_net(skb);
+   struct rtable *rt;
+   struct flowi4 fl4;
+
+   memset(fl4, 0, sizeof(fl4));
+   if (oif != -1)
+   fl4.flowi4_oif = oif;
+
+   fl4.daddr = gw-s_addr;
+   fl4.flowi4_tos = RT_TOS(iph-tos);
+   fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
+   fl4.flowi4_flags = FLOWI_FLAG_KNOWN_NH;
+   rt = ip_route_output_key(net, fl4);
+   if (IS_ERR(rt))
+   return false;
+
+   skb_dst_drop(skb);
+   skb_dst_set(skb, rt-dst);
+   skb-dev  = rt-dst.dev;
+   skb-protocol = htons(ETH_P_IP);
+
+   return true;
+}
+
+void nf_dup_ipv4(struct sk_buff *skb, unsigned int hooknum,
+const struct in_addr 

[PATCH 01/15] netfilter: nft_counter: convert it to use per-cpu counters

2015-08-19 Thread Pablo Neira Ayuso
This patch converts the existing seqlock to per-cpu counters.

Suggested-by: Eric Dumazet eric.duma...@gmail.com
Suggested-by: Patrick McHardy ka...@trash.net
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/nft_counter.c |   97 ++-
 1 file changed, 69 insertions(+), 28 deletions(-)

diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c
index 1759123..1067fb4 100644
--- a/net/netfilter/nft_counter.c
+++ b/net/netfilter/nft_counter.c
@@ -18,39 +18,59 @@
 #include net/netfilter/nf_tables.h
 
 struct nft_counter {
-   seqlock_t   lock;
u64 bytes;
u64 packets;
 };
 
+struct nft_counter_percpu {
+   struct nft_counter  counter;
+   struct u64_stats_sync   syncp;
+};
+
+struct nft_counter_percpu_priv {
+   struct nft_counter_percpu __percpu *counter;
+};
+
 static void nft_counter_eval(const struct nft_expr *expr,
 struct nft_regs *regs,
 const struct nft_pktinfo *pkt)
 {
-   struct nft_counter *priv = nft_expr_priv(expr);
-
-   write_seqlock_bh(priv-lock);
-   priv-bytes += pkt-skb-len;
-   priv-packets++;
-   write_sequnlock_bh(priv-lock);
+   struct nft_counter_percpu_priv *priv = nft_expr_priv(expr);
+   struct nft_counter_percpu *this_cpu;
+
+   local_bh_disable();
+   this_cpu = this_cpu_ptr(priv-counter);
+   u64_stats_update_begin(this_cpu-syncp);
+   this_cpu-counter.bytes += pkt-skb-len;
+   this_cpu-counter.packets++;
+   u64_stats_update_end(this_cpu-syncp);
+   local_bh_enable();
 }
 
 static int nft_counter_dump(struct sk_buff *skb, const struct nft_expr *expr)
 {
-   struct nft_counter *priv = nft_expr_priv(expr);
+   struct nft_counter_percpu_priv *priv = nft_expr_priv(expr);
+   struct nft_counter_percpu *cpu_stats;
+   struct nft_counter total;
+   u64 bytes, packets;
unsigned int seq;
-   u64 bytes;
-   u64 packets;
-
-   do {
-   seq = read_seqbegin(priv-lock);
-   bytes   = priv-bytes;
-   packets = priv-packets;
-   } while (read_seqretry(priv-lock, seq));
-
-   if (nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(bytes)))
-   goto nla_put_failure;
-   if (nla_put_be64(skb, NFTA_COUNTER_PACKETS, cpu_to_be64(packets)))
+   int cpu;
+
+   memset(total, 0, sizeof(total));
+   for_each_possible_cpu(cpu) {
+   cpu_stats = per_cpu_ptr(priv-counter, cpu);
+   do {
+   seq = u64_stats_fetch_begin_irq(cpu_stats-syncp);
+   bytes   = cpu_stats-counter.bytes;
+   packets = cpu_stats-counter.packets;
+   } while (u64_stats_fetch_retry_irq(cpu_stats-syncp, seq));
+
+   total.packets += packets;
+   total.bytes += bytes;
+   }
+
+   if (nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(total.bytes)) ||
+   nla_put_be64(skb, NFTA_COUNTER_PACKETS, cpu_to_be64(total.packets)))
goto nla_put_failure;
return 0;
 
@@ -67,23 +87,44 @@ static int nft_counter_init(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nlattr * const tb[])
 {
-   struct nft_counter *priv = nft_expr_priv(expr);
+   struct nft_counter_percpu_priv *priv = nft_expr_priv(expr);
+   struct nft_counter_percpu __percpu *cpu_stats;
+   struct nft_counter_percpu *this_cpu;
+
+   cpu_stats = netdev_alloc_pcpu_stats(struct nft_counter_percpu);
+   if (cpu_stats == NULL)
+   return ENOMEM;
+
+   preempt_disable();
+   this_cpu = this_cpu_ptr(cpu_stats);
+   if (tb[NFTA_COUNTER_PACKETS]) {
+   this_cpu-counter.packets =
+   be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_PACKETS]));
+   }
+   if (tb[NFTA_COUNTER_BYTES]) {
+   this_cpu-counter.bytes =
+   be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_BYTES]));
+   }
+   preempt_enable();
+   priv-counter = cpu_stats;
+   return 0;
+}
 
-   if (tb[NFTA_COUNTER_PACKETS])
-   priv-packets = 
be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_PACKETS]));
-   if (tb[NFTA_COUNTER_BYTES])
-   priv-bytes = be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_BYTES]));
+static void nft_counter_destroy(const struct nft_ctx *ctx,
+   const struct nft_expr *expr)
+{
+   struct nft_counter_percpu_priv *priv = nft_expr_priv(expr);
 
-   seqlock_init(priv-lock);
-   return 0;
+   free_percpu(priv-counter);
 }
 
 static struct nft_expr_type nft_counter_type;
 static const struct nft_expr_ops nft_counter_ops = {
.type   = nft_counter_type,
-   .size   = NFT_EXPR_SIZE(sizeof(struct nft_counter)),
+   .size   

Re: [PATCH net-next v2 9/9] geneve: Implement rtnl changelink

2015-08-19 Thread Pravin Shelar
On Wed, Aug 19, 2015 at 12:40 PM, Jesse Gross je...@nicira.com wrote:
 On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar pshe...@nicira.com wrote:
 diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
 index e47cdd9..0d7fbef 100644
 --- a/drivers/net/geneve.c
 +++ b/drivers/net/geneve.c
 -static int geneve_configure(struct net *net, struct net_device *dev,
 -   __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
 -   __u16 dst_port, bool metadata)
 +static int __geneve_configure(struct net *net, struct net_device *dev,
 + __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
 + __u16 dst_port, bool metadata)
  {
 [...]
 geneve-net = net;
 geneve-dev = dev;

 I guess this stuff should really be in geneve_configure() - it seems a
 bit odd to change it for a running device (even if it shouldn't
 change).

ok.

 geneve-remote.sin_addr.s_addr = rem_addr;
 if (IN_MULTICAST(ntohl(geneve-remote.sin_addr.s_addr)))
 return -EINVAL;

 +   u32_to_vni(vni, geneve-vni);
 list_for_each_entry(t, gn-geneve_list, next) {
 if (!memcmp(geneve-vni, t-vni, sizeof(t-vni)) 
 rem_addr == t-remote.sin_addr.s_addr 

 I'm not sure that these types of operations are safe if the device is
 already running. We first overwrite the remote value and then we do
 error checking but that means that if there is an error, then the
 device will be left in a broken state. Don't we also need to update
 the hash table if some of these parameters change?

ok, I will stop device before making changes. that way we can add it
to hash table.

 +static int geneve_changelink(struct net_device *dev,
 +struct nlattr *tb[], struct nlattr *data[])
 +{
 [...]
 -   if (data[IFLA_GENEVE_PORT])
 -   dst_port = nla_get_u16(data[IFLA_GENEVE_PORT]);
 +   if (geneve-sock  (dst_port != ntohs(geneve-dst_port) ||
 +metadata != geneve-collect_md)) {

 It seems like in an ideal world, we wouldn't need to recreate the
 socket if metadata collection changed (assuming that there are no new
 conflicts).

To keep changelink simple I am thinking of disallowing metadata changes.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment

2015-08-19 Thread Eric Dumazet
On Wed, 2015-08-19 at 19:17 +, Grumbach, Emmanuel wrote:

 Hm.. how would net/core/tso.c avoid this?

Because a driver using these helpers keep around the original LSO packet
and frees it normally at TX completion time.

 I can't see anything related to truesize there.
 Note that this work since it is guaranteed that we release the skbs in
 order.
 
  
  (BTW TCP packets do not have sock_wfree as destructor but tcp_wfree(),
  yet we want backpressure mostly for TCP stack (TCP Small Queues))
  
  
 
 I am not sure I follow here.
 You want me to test:
 if (skb_gso-destructor == tcp_wfree) ?


Yes.

Look for example at tcp_gso_segment() (called from skb_gso_segment())

copy_destructor = gso_skb-destructor == tcp_wfree;
...
/* Following permits TCP Small Queues to work well with GSO :
 * The callback to TCP stack will be called at the time last frag
 * is freed at TX completion, and not right now when gso_skb
 * is freed by GSO engine
 */
if (copy_destructor) {
swap(gso_skb-sk, skb-sk);
swap(gso_skb-destructor, skb-destructor);
sum_truesize += skb-truesize;
atomic_add(sum_truesize - gso_skb-truesize,
   skb-sk-sk_wmem_alloc);
}


 
 I checked that code using iperf and saw that I don't get into this if,
 but I (probably wrongly) assumed that other applications would set a
 flag on the socket (forgive my ignorance) that would make this if be taken.

If you do not see skb-destructor == tcp_wfree, then something is
definitely wrong on your setup.



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.

2015-08-19 Thread Pravin Shelar
On Wed, Aug 19, 2015 at 11:37 AM, Jesse Gross je...@nicira.com wrote:
 On Wed, Aug 19, 2015 at 11:29 AM, Pravin Shelar pshe...@nicira.com wrote:
 On Wed, Aug 19, 2015 at 11:18 AM, Jesse Gross je...@nicira.com wrote:
 My guess is that if the issue from the earlier patch about overlapping
 collect_md tunnels is fixed then that might allow us to simplify
 things a little further, since for those tunnels we can assume there
 is a 1:1 mapping between collect_md tunnels and sockets.

 I dont see how it would be different. Can you elaborate on this ?

 Mostly just conceptually simpler. Right now it looks like we are doing
 some kind of refcounting between devices and tunnels in
 geneve_open/stop (I know it's not really but it appears like that in
 some ways.) We could just directly assign collect_md in geneve_open()
 and do nothing at all in geneve_stop().

If you look at next patch, I have changed geneve_open and stop
further. The change is geneve_open adds tunnel to hash table so that
only device which are open are in hash table. Since geneve_open and
stop is common for both type of tunnel I do not think there can be any
changes even after avoiding overlapping tunnel types in given socket.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 11/13] vxlan: metadata based tunneling for IPv6

2015-08-19 Thread Alexei Starovoitov
On Wed, Aug 19, 2015 at 12:10:01PM +0200, Jiri Benc wrote:
 Support metadata based (formerly flow based) tunneling also for IPv6.
 This complements commit ee122c79d422 (vxlan: Flow based tunneling).
 
 Signed-off-by: Jiri Benc jb...@redhat.com
 ---
  drivers/net/vxlan.c | 69 
 +++--
  1 file changed, 40 insertions(+), 29 deletions(-)

Looks good.
Acked-by: Alexei Starovoitov a...@plumgrid.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/15] netfilter: nft_limit: factor out shared code with per-byte limiting

2015-08-19 Thread Pablo Neira Ayuso
This patch prepares the introduction of per-byte limiting.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/nft_limit.c |   86 -
 1 file changed, 53 insertions(+), 33 deletions(-)

diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c
index c79703e..c4d1b1b 100644
--- a/net/netfilter/nft_limit.c
+++ b/net/netfilter/nft_limit.c
@@ -27,65 +27,54 @@ struct nft_limit {
u64 nsecs;
 };
 
-static void nft_limit_pkts_eval(const struct nft_expr *expr,
-   struct nft_regs *regs,
-   const struct nft_pktinfo *pkt)
+static inline bool nft_limit_eval(struct nft_limit *limit, u64 cost)
 {
-   struct nft_limit *priv = nft_expr_priv(expr);
-   u64 now, tokens, cost = div_u64(priv-nsecs, priv-rate);
+   u64 now, tokens;
s64 delta;
 
spin_lock_bh(limit_lock);
now = ktime_get_ns();
-   tokens = priv-tokens + now - priv-last;
-   if (tokens  priv-tokens_max)
-   tokens = priv-tokens_max;
+   tokens = limit-tokens + now - limit-last;
+   if (tokens  limit-tokens_max)
+   tokens = limit-tokens_max;
 
-   priv-last = now;
+   limit-last = now;
delta = tokens - cost;
if (delta = 0) {
-   priv-tokens = delta;
+   limit-tokens = delta;
spin_unlock_bh(limit_lock);
-   return;
+   return false;
}
-   priv-tokens = tokens;
+   limit-tokens = tokens;
spin_unlock_bh(limit_lock);
-
-   regs-verdict.code = NFT_BREAK;
+   return true;
 }
 
-static const struct nla_policy nft_limit_policy[NFTA_LIMIT_MAX + 1] = {
-   [NFTA_LIMIT_RATE]   = { .type = NLA_U64 },
-   [NFTA_LIMIT_UNIT]   = { .type = NLA_U64 },
-};
-
-static int nft_limit_init(const struct nft_ctx *ctx,
- const struct nft_expr *expr,
+static int nft_limit_init(struct nft_limit *limit,
  const struct nlattr * const tb[])
 {
-   struct nft_limit *priv = nft_expr_priv(expr);
u64 unit;
 
if (tb[NFTA_LIMIT_RATE] == NULL ||
tb[NFTA_LIMIT_UNIT] == NULL)
return -EINVAL;
 
-   priv-rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE]));
+   limit-rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE]));
unit = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT]));
-   priv-nsecs = unit * NSEC_PER_SEC;
-   if (priv-rate == 0 || priv-nsecs  unit)
+   limit-nsecs = unit * NSEC_PER_SEC;
+   if (limit-rate == 0 || limit-nsecs  unit)
return -EOVERFLOW;
-   priv-tokens = priv-tokens_max = priv-nsecs;
-   priv-last = ktime_get_ns();
+   limit-tokens = limit-tokens_max = limit-nsecs;
+   limit-last = ktime_get_ns();
+
return 0;
 }
 
-static int nft_limit_dump(struct sk_buff *skb, const struct nft_expr *expr)
+static int nft_limit_dump(struct sk_buff *skb, const struct nft_limit *limit)
 {
-   const struct nft_limit *priv = nft_expr_priv(expr);
-   u64 secs = div_u64(priv-nsecs, NSEC_PER_SEC);
+   u64 secs = div_u64(limit-nsecs, NSEC_PER_SEC);
 
-   if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(priv-rate)) ||
+   if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(limit-rate)) ||
nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(secs)))
goto nla_put_failure;
return 0;
@@ -94,13 +83,44 @@ nla_put_failure:
return -1;
 }
 
+static void nft_limit_pkts_eval(const struct nft_expr *expr,
+   struct nft_regs *regs,
+   const struct nft_pktinfo *pkt)
+{
+   struct nft_limit *priv = nft_expr_priv(expr);
+
+   if (nft_limit_eval(priv, div_u64(priv-nsecs, priv-rate)))
+   regs-verdict.code = NFT_BREAK;
+}
+
+static const struct nla_policy nft_limit_policy[NFTA_LIMIT_MAX + 1] = {
+   [NFTA_LIMIT_RATE]   = { .type = NLA_U64 },
+   [NFTA_LIMIT_UNIT]   = { .type = NLA_U64 },
+};
+
+static int nft_limit_pkts_init(const struct nft_ctx *ctx,
+  const struct nft_expr *expr,
+  const struct nlattr * const tb[])
+{
+   struct nft_limit *priv = nft_expr_priv(expr);
+
+   return nft_limit_init(priv, tb);
+}
+
+static int nft_limit_pkts_dump(struct sk_buff *skb, const struct nft_expr 
*expr)
+{
+   const struct nft_limit *priv = nft_expr_priv(expr);
+
+   return nft_limit_dump(skb, priv);
+}
+
 static struct nft_expr_type nft_limit_type;
 static const struct nft_expr_ops nft_limit_pkts_ops = {
.type   = nft_limit_type,
.size   = NFT_EXPR_SIZE(sizeof(struct nft_limit)),
.eval   = nft_limit_pkts_eval,
-   .init   = nft_limit_init,
-   .dump   = nft_limit_dump,
+   .init   = nft_limit_pkts_init,
+   

[PATCH 11/15] netfilter: nfacct: per network namespace support

2015-08-19 Thread Pablo Neira Ayuso
From: Andreas Schultz aschu...@tpip.net

- Move the nfnl_acct_list into the network namespace, initialize
  and destroy it per namespace
- Keep track of refcnt on nfacct objects, the old logic does not
  longer work with a per namespace list
- Adjust xt_nfacct to pass the namespace when registring objects

Signed-off-by: Andreas Schultz aschu...@tpip.net
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/linux/netfilter/nfnetlink_acct.h |3 +-
 include/net/net_namespace.h  |3 ++
 net/netfilter/nfnetlink_acct.c   |   71 +-
 net/netfilter/xt_nfacct.c|2 +-
 4 files changed, 56 insertions(+), 23 deletions(-)

diff --git a/include/linux/netfilter/nfnetlink_acct.h 
b/include/linux/netfilter/nfnetlink_acct.h
index 6ec9757..80ca889 100644
--- a/include/linux/netfilter/nfnetlink_acct.h
+++ b/include/linux/netfilter/nfnetlink_acct.h
@@ -2,6 +2,7 @@
 #define _NFNL_ACCT_H_
 
 #include uapi/linux/netfilter/nfnetlink_acct.h
+#include net/net_namespace.h
 
 enum {
NFACCT_NO_QUOTA = -1,
@@ -11,7 +12,7 @@ enum {
 
 struct nf_acct;
 
-struct nf_acct *nfnl_acct_find_get(const char *filter_name);
+struct nf_acct *nfnl_acct_find_get(struct net *net, const char *filter_name);
 void nfnl_acct_put(struct nf_acct *acct);
 void nfnl_acct_update(const struct sk_buff *skb, struct nf_acct *nfacct);
 extern int nfnl_acct_overquota(const struct sk_buff *skb,
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index e951453..2dcea63 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -118,6 +118,9 @@ struct net {
 #endif
struct sock *nfnl;
struct sock *nfnl_stash;
+#if IS_ENABLED(CONFIG_NETFILTER_NETLINK_ACCT)
+   struct list_headnfnl_acct_list;
+#endif
 #endif
 #ifdef CONFIG_WEXT_CORE
struct sk_buff_head wext_nlevents;
diff --git a/net/netfilter/nfnetlink_acct.c b/net/netfilter/nfnetlink_acct.c
index c18af2f..fefbf5f 100644
--- a/net/netfilter/nfnetlink_acct.c
+++ b/net/netfilter/nfnetlink_acct.c
@@ -27,8 +27,6 @@ MODULE_LICENSE(GPL);
 MODULE_AUTHOR(Pablo Neira Ayuso pa...@netfilter.org);
 MODULE_DESCRIPTION(nfacct: Extended Netfilter accounting infrastructure);
 
-static LIST_HEAD(nfnl_acct_list);
-
 struct nf_acct {
atomic64_t  pkts;
atomic64_t  bytes;
@@ -53,6 +51,7 @@ nfnl_acct_new(struct sock *nfnl, struct sk_buff *skb,
 const struct nlmsghdr *nlh, const struct nlattr * const tb[])
 {
struct nf_acct *nfacct, *matching = NULL;
+   struct net *net = sock_net(nfnl);
char *acct_name;
unsigned int size = 0;
u32 flags = 0;
@@ -64,7 +63,7 @@ nfnl_acct_new(struct sock *nfnl, struct sk_buff *skb,
if (strlen(acct_name) == 0)
return -EINVAL;
 
-   list_for_each_entry(nfacct, nfnl_acct_list, head) {
+   list_for_each_entry(nfacct, net-nfnl_acct_list, head) {
if (strncmp(nfacct-name, acct_name, NFACCT_NAME_MAX) != 0)
continue;
 
@@ -124,7 +123,7 @@ nfnl_acct_new(struct sock *nfnl, struct sk_buff *skb,
 be64_to_cpu(nla_get_be64(tb[NFACCT_PKTS])));
}
atomic_set(nfacct-refcnt, 1);
-   list_add_tail_rcu(nfacct-head, nfnl_acct_list);
+   list_add_tail_rcu(nfacct-head, net-nfnl_acct_list);
return 0;
 }
 
@@ -185,6 +184,7 @@ nla_put_failure:
 static int
 nfnl_acct_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   struct net *net = sock_net(skb-sk);
struct nf_acct *cur, *last;
const struct nfacct_filter *filter = cb-data;
 
@@ -196,7 +196,7 @@ nfnl_acct_dump(struct sk_buff *skb, struct netlink_callback 
*cb)
cb-args[1] = 0;
 
rcu_read_lock();
-   list_for_each_entry_rcu(cur, nfnl_acct_list, head) {
+   list_for_each_entry_rcu(cur, net-nfnl_acct_list, head) {
if (last) {
if (cur != last)
continue;
@@ -257,6 +257,7 @@ static int
 nfnl_acct_get(struct sock *nfnl, struct sk_buff *skb,
 const struct nlmsghdr *nlh, const struct nlattr * const tb[])
 {
+   struct net *net = sock_net(nfnl);
int ret = -ENOENT;
struct nf_acct *cur;
char *acct_name;
@@ -283,7 +284,7 @@ nfnl_acct_get(struct sock *nfnl, struct sk_buff *skb,
return -EINVAL;
acct_name = nla_data(tb[NFACCT_NAME]);
 
-   list_for_each_entry(cur, nfnl_acct_list, head) {
+   list_for_each_entry(cur, net-nfnl_acct_list, head) {
struct sk_buff *skb2;
 
if (strncmp(cur-name, acct_name, NFACCT_NAME_MAX)!= 0)
@@ -336,19 +337,20 @@ static int
 nfnl_acct_del(struct sock *nfnl, struct sk_buff *skb,
 const struct nlmsghdr *nlh, const struct nlattr * const tb[])
 {
+   struct net *net = sock_net(nfnl);
char *acct_name;

[PATCH 13/15] netfilter: nf_conntrack: add direction support for zones

2015-08-19 Thread Pablo Neira Ayuso
From: Daniel Borkmann dan...@iogearbox.net

This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.

The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct-mark
approach we proposed initially.

As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.

Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a stable value that would be equal
for both directions at all times, f.e. if only zone-id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.

If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).

Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.

Below a minimal, simplified collision example (script in [2]) with
netperf sessions:

  +--- tenant-1 ---+   mark := 1
  |netperf |--+
  ++  |CT zone := mark [ORIGINAL]
   [ip,sport] := X   +--+  +--- gateway ---+
 | mark routing |--| SNAT  |-- ... +
 +--+  +---+   |
  +--- tenant-2 ---+  | ~~~|~~~
  |netperf |--++---+   |
  ++   mark := 2   | netserver |-- ... +
   [ip,sport] := X +---+
[ip,port] := Y
On the gateway netns, example:

  iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
  iptables -t nat -A POSTROUTING -o dev -j SNAT --to-source ip 
--random-fully

  iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK 
--save-mark
  iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK 
--restore-mark

conntrack dump from gateway netns:

  netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865, from each tenant netns

  tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport= dport=12865 
zone-orig=1
   src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
   [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport= dport=12865 
zone-orig=2
   src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=
   [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 
zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
   [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 
zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
   [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2

Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.

I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.

  [1] 
http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
  [2] https://paste.fedoraproject.org/242835/65657871/

Signed-off-by: Daniel Borkmann dan...@iogearbox.net
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/net/netfilter/nf_conntrack_zones.h |   31 +++-
 

[PATCH 12/15] netfilter: nf_conntrack: push zone object into functions

2015-08-19 Thread Pablo Neira Ayuso
From: Daniel Borkmann dan...@iogearbox.net

This patch replaces the zone id which is pushed down into functions
with the actual zone object. It's a bigger one-time change, but
needed for later on extending zones with a direction parameter, and
thus decoupling this additional information from all call-sites.

No functional changes in this patch.

The default zone becomes a global const object, namely nf_ct_zone_dflt
and will be returned directly in various cases, one being, when there's
f.e. no zoning support.

Signed-off-by: Daniel Borkmann dan...@iogearbox.net
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/net/netfilter/nf_conntrack.h   |   10 ++-
 include/net/netfilter/nf_conntrack_core.h  |3 +-
 include/net/netfilter/nf_conntrack_expect.h|   11 +++-
 include/net/netfilter/nf_conntrack_zones.h |   33 +++---
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |2 +-
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c   |3 +-
 net/ipv4/netfilter/nf_defrag_ipv4.c|   11 ++--
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |2 +-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |3 +-
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c  |   12 ++--
 net/netfilter/ipvs/ip_vs_nfct.c|2 +-
 net/netfilter/nf_conntrack_core.c  |   75 +
 net/netfilter/nf_conntrack_expect.c|   21 +++---
 net/netfilter/nf_conntrack_netlink.c   |   84 +---
 net/netfilter/nf_conntrack_pptp.c  |3 +-
 net/netfilter/nf_conntrack_standalone.c|   17 +++--
 net/netfilter/nf_nat_core.c|   19 --
 net/netfilter/nf_synproxy_core.c   |4 +-
 net/netfilter/xt_CT.c  |6 +-
 net/netfilter/xt_connlimit.c   |9 +--
 net/sched/act_connmark.c   |5 +-
 21 files changed, 203 insertions(+), 132 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index 37cd391..f5e23c6 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -250,8 +250,12 @@ void nf_ct_untracked_status_or(unsigned long bits);
 void nf_ct_iterate_cleanup(struct net *net,
   int (*iter)(struct nf_conn *i, void *data),
   void *data, u32 portid, int report);
+
+struct nf_conntrack_zone;
+
 void nf_conntrack_free(struct nf_conn *ct);
-struct nf_conn *nf_conntrack_alloc(struct net *net, u16 zone,
+struct nf_conn *nf_conntrack_alloc(struct net *net,
+  const struct nf_conntrack_zone *zone,
   const struct nf_conntrack_tuple *orig,
   const struct nf_conntrack_tuple *repl,
   gfp_t gfp);
@@ -291,7 +295,9 @@ extern unsigned int nf_conntrack_max;
 extern unsigned int nf_conntrack_hash_rnd;
 void init_nf_conntrack_hash_rnd(void);
 
-struct nf_conn *nf_ct_tmpl_alloc(struct net *net, u16 zone, gfp_t flags);
+struct nf_conn *nf_ct_tmpl_alloc(struct net *net,
+const struct nf_conntrack_zone *zone,
+gfp_t flags);
 
 #define NF_CT_STAT_INC(net, count)   __this_cpu_inc((net)-ct.stat-count)
 #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)-ct.stat-count)
diff --git a/include/net/netfilter/nf_conntrack_core.h 
b/include/net/netfilter/nf_conntrack_core.h
index f2f0fa3..c03f9c4 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -52,7 +52,8 @@ bool nf_ct_invert_tuple(struct nf_conntrack_tuple *inverse,
 
 /* Find a connection corresponding to a tuple. */
 struct nf_conntrack_tuple_hash *
-nf_conntrack_find_get(struct net *net, u16 zone,
+nf_conntrack_find_get(struct net *net,
+ const struct nf_conntrack_zone *zone,
  const struct nf_conntrack_tuple *tuple);
 
 int __nf_conntrack_confirm(struct sk_buff *skb);
diff --git a/include/net/netfilter/nf_conntrack_expect.h 
b/include/net/netfilter/nf_conntrack_expect.h
index 3f3aecb..dce56f0 100644
--- a/include/net/netfilter/nf_conntrack_expect.h
+++ b/include/net/netfilter/nf_conntrack_expect.h
@@ -4,7 +4,9 @@
 
 #ifndef _NF_CONNTRACK_EXPECT_H
 #define _NF_CONNTRACK_EXPECT_H
+
 #include net/netfilter/nf_conntrack.h
+#include net/netfilter/nf_conntrack_zones.h
 
 extern unsigned int nf_ct_expect_hsize;
 extern unsigned int nf_ct_expect_max;
@@ -76,15 +78,18 @@ int nf_conntrack_expect_init(void);
 void nf_conntrack_expect_fini(void);
 
 struct nf_conntrack_expect *
-__nf_ct_expect_find(struct net *net, u16 zone,
+__nf_ct_expect_find(struct net *net,
+   const struct nf_conntrack_zone *zone,
const struct nf_conntrack_tuple *tuple);
 
 struct nf_conntrack_expect *
-nf_ct_expect_find_get(struct net *net, u16 zone,

[PATCH 00/15] Netfilter updates for net-next

2015-08-19 Thread Pablo Neira Ayuso
Hi David,

The following patchset contains Netfilter updates for your net-next tree, they
are:

1) Rework the existing nf_tables counter expression to make it per-cpu.

2) Prepare and factor out common packet duplication code from the TEE target so
   it can be reused from the new dup expression.

3) Add the new dup expression for the nf_tables IPv4 and IPv6 families.

4) Convert the nf_tables limit expression to use a token-based approach with
   64-bits precision.

5) Enhance the nf_tables limit expression to support limiting at packet byte.
   This comes after several preparation patches.

6) Add a burst parameter to indicate the amount of packets or bytes that can
   exceed the limiting.

7) Add netns support to nfacct, from Andreas Schultz.

8) Pass the nf_conn_zone structure instead of the zone ID in nf_tables to allow
   accessing more zone specific information, from Daniel Borkmann.

9) Allow to define zone per-direction to support netns containers with
   overlapping network addressing, also from Daniel.

10) Extend the CT target to allow setting the zone based on the skb-mark as a
   way to support simple mappings from iptables, also from Daniel.

11) Make the nf_tables payload expression aware of the fact that VLAN offload
may have removed a vlan header, from Florian Westphal.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Thanks!



The following changes since commit d92cff89a0c80e7e49796366e441d97f07b5d321:

  net_dbg_ratelimited: turn into no-op when !DEBUG (2015-08-06 23:51:30 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master

for you to fetch changes up to 8cfd23e6740158817d2045915f6ea5a2daf11bce:

  netfilter: nft_payload: work around vlan header stripping (2015-08-19 
08:39:53 +0200)


Andreas Schultz (1):
  netfilter: nfacct: per network namespace support

Daniel Borkmann (3):
  netfilter: nf_conntrack: push zone object into functions
  netfilter: nf_conntrack: add direction support for zones
  netfilter: nf_conntrack: add efficient mark to zone mapping

Florian Westphal (1):
  netfilter: nft_payload: work around vlan header stripping

Pablo Neira Ayuso (10):
  netfilter: nft_counter: convert it to use per-cpu counters
  netfilter: xt_TEE: get rid of WITH_CONNTRACK definition
  netfilter: factor out packet duplication for IPv4/IPv6
  netfilter: nf_tables: add nft_dup expression
  netfilter: nft_limit: rename to nft_limit_pkts
  netfilter: nft_limit: convert to token-based limiting at nanosecond 
granularity
  netfilter: nft_limit: factor out shared code with per-byte limiting
  netfilter: nft_limit: add burst parameter
  netfilter: nft_limit: constant token cost per packet
  netfilter: nft_limit: add per-byte limiting

 include/linux/netfilter/nfnetlink_acct.h   |3 +-
 include/net/net_namespace.h|3 +
 include/net/netfilter/ipv4/nf_dup_ipv4.h   |7 +
 include/net/netfilter/ipv6/nf_dup_ipv6.h   |7 +
 include/net/netfilter/nf_conntrack.h   |   10 +-
 include/net/netfilter/nf_conntrack_core.h  |3 +-
 include/net/netfilter/nf_conntrack_expect.h|   11 +-
 include/net/netfilter/nf_conntrack_zones.h |   99 -
 include/net/netfilter/nft_dup.h|9 +
 include/uapi/linux/netfilter/nf_tables.h   |   23 ++
 include/uapi/linux/netfilter/nfnetlink_conntrack.h |1 +
 include/uapi/linux/netfilter/xt_CT.h   |8 +-
 net/ipv4/netfilter/Kconfig |   12 ++
 net/ipv4/netfilter/Makefile|3 +
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |2 +-
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c   |4 +-
 net/ipv4/netfilter/nf_defrag_ipv4.c|   17 +-
 net/ipv4/netfilter/nf_dup_ipv4.c   |  120 +++
 net/ipv4/netfilter/nft_dup_ipv4.c  |  110 ++
 net/ipv6/netfilter/Kconfig |   12 ++
 net/ipv6/netfilter/Makefile|3 +
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |2 +-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |5 +-
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c  |   18 +-
 net/ipv6/netfilter/nf_dup_ipv6.c   |   96 +
 net/ipv6/netfilter/nft_dup_ipv6.c  |  108 ++
 net/netfilter/Kconfig  |2 +
 net/netfilter/ipvs/ip_vs_nfct.c|2 +-
 net/netfilter/nf_conntrack_core.c  |  134 ++--
 net/netfilter/nf_conntrack_expect.c|   21 +-
 net/netfilter/nf_conntrack_netlink.c   |  228 ++--
 

[PATCH 06/15] netfilter: nft_limit: convert to token-based limiting at nanosecond granularity

2015-08-19 Thread Pablo Neira Ayuso
Rework the limit expression to use a token-based limiting approach that refills
the bucket gradually. The tokens are calculated at nanosecond granularity
instead jiffies to improve precision.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/nft_limit.c |   42 ++
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c
index d0788e1..c79703e 100644
--- a/net/netfilter/nft_limit.c
+++ b/net/netfilter/nft_limit.c
@@ -20,10 +20,11 @@
 static DEFINE_SPINLOCK(limit_lock);
 
 struct nft_limit {
+   u64 last;
u64 tokens;
+   u64 tokens_max;
u64 rate;
-   u64 unit;
-   unsigned long   stamp;
+   u64 nsecs;
 };
 
 static void nft_limit_pkts_eval(const struct nft_expr *expr,
@@ -31,18 +32,23 @@ static void nft_limit_pkts_eval(const struct nft_expr *expr,
const struct nft_pktinfo *pkt)
 {
struct nft_limit *priv = nft_expr_priv(expr);
+   u64 now, tokens, cost = div_u64(priv-nsecs, priv-rate);
+   s64 delta;
 
spin_lock_bh(limit_lock);
-   if (time_after_eq(jiffies, priv-stamp)) {
-   priv-tokens = priv-rate;
-   priv-stamp = jiffies + priv-unit * HZ;
-   }
-
-   if (priv-tokens = 1) {
-   priv-tokens--;
+   now = ktime_get_ns();
+   tokens = priv-tokens + now - priv-last;
+   if (tokens  priv-tokens_max)
+   tokens = priv-tokens_max;
+
+   priv-last = now;
+   delta = tokens - cost;
+   if (delta = 0) {
+   priv-tokens = delta;
spin_unlock_bh(limit_lock);
return;
}
+   priv-tokens = tokens;
spin_unlock_bh(limit_lock);
 
regs-verdict.code = NFT_BREAK;
@@ -58,25 +64,29 @@ static int nft_limit_init(const struct nft_ctx *ctx,
  const struct nlattr * const tb[])
 {
struct nft_limit *priv = nft_expr_priv(expr);
+   u64 unit;
 
if (tb[NFTA_LIMIT_RATE] == NULL ||
tb[NFTA_LIMIT_UNIT] == NULL)
return -EINVAL;
 
-   priv-rate   = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE]));
-   priv-unit   = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT]));
-   priv-stamp  = jiffies + priv-unit * HZ;
-   priv-tokens = priv-rate;
+   priv-rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE]));
+   unit = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT]));
+   priv-nsecs = unit * NSEC_PER_SEC;
+   if (priv-rate == 0 || priv-nsecs  unit)
+   return -EOVERFLOW;
+   priv-tokens = priv-tokens_max = priv-nsecs;
+   priv-last = ktime_get_ns();
return 0;
 }
 
 static int nft_limit_dump(struct sk_buff *skb, const struct nft_expr *expr)
 {
const struct nft_limit *priv = nft_expr_priv(expr);
+   u64 secs = div_u64(priv-nsecs, NSEC_PER_SEC);
 
-   if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(priv-rate)))
-   goto nla_put_failure;
-   if (nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(priv-unit)))
+   if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(priv-rate)) ||
+   nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(secs)))
goto nla_put_failure;
return 0;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/15] netfilter: nf_tables: add nft_dup expression

2015-08-19 Thread Pablo Neira Ayuso
This new expression uses the nf_dup engine to clone packets to a given gateway.
Unlike xt_TEE, we use an index to indicate output interface which should be
fine at this stage.

Moreover, change to the preemtion-safe this_cpu_read(nf_skb_duplicated) from
nf_dup_ipv{4,6} to silence a lockdep splat.

Based on the original tee expression from Arturo Borrero Gonzalez, although
this patch has diverted quite a bit from this initial effort due to the
change to support maps.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/net/netfilter/nft_dup.h  |9 +++
 include/uapi/linux/netfilter/nf_tables.h |   14 
 net/ipv4/netfilter/Kconfig   |6 ++
 net/ipv4/netfilter/Makefile  |1 +
 net/ipv4/netfilter/nf_dup_ipv4.c |2 +-
 net/ipv4/netfilter/nft_dup_ipv4.c|  110 ++
 net/ipv6/netfilter/Kconfig   |6 ++
 net/ipv6/netfilter/Makefile  |1 +
 net/ipv6/netfilter/nf_dup_ipv6.c |2 +-
 net/ipv6/netfilter/nft_dup_ipv6.c|  108 +
 10 files changed, 257 insertions(+), 2 deletions(-)
 create mode 100644 include/net/netfilter/nft_dup.h
 create mode 100644 net/ipv4/netfilter/nft_dup_ipv4.c
 create mode 100644 net/ipv6/netfilter/nft_dup_ipv6.c

diff --git a/include/net/netfilter/nft_dup.h b/include/net/netfilter/nft_dup.h
new file mode 100644
index 000..6b84cf6
--- /dev/null
+++ b/include/net/netfilter/nft_dup.h
@@ -0,0 +1,9 @@
+#ifndef _NFT_DUP_H_
+#define _NFT_DUP_H_
+
+struct nft_dup_inet {
+   enum nft_registers  sreg_addr:8;
+   enum nft_registers  sreg_dev:8;
+};
+
+#endif /* _NFT_DUP_H_ */
diff --git a/include/uapi/linux/netfilter/nf_tables.h 
b/include/uapi/linux/netfilter/nf_tables.h
index a99e6a9..2ef35f2 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -936,6 +936,20 @@ enum nft_redir_attributes {
 #define NFTA_REDIR_MAX (__NFTA_REDIR_MAX - 1)
 
 /**
+ * enum nft_dup_attributes - nf_tables dup expression netlink attributes
+ *
+ * @NFTA_DUP_SREG_ADDR: source register of address (NLA_U32: nft_registers)
+ * @NFTA_DUP_SREG_DEV: source register of output interface (NLA_U32: 
nft_register)
+ */
+enum nft_dup_attributes {
+   NFTA_DUP_UNSPEC,
+   NFTA_DUP_SREG_ADDR,
+   NFTA_DUP_SREG_DEV,
+   __NFTA_DUP_MAX
+};
+#define NFTA_DUP_MAX   (__NFTA_DUP_MAX - 1)
+
+/**
  * enum nft_gen_attributes - nf_tables ruleset generation attributes
  *
  * @NFTA_GEN_ID: Ruleset generation ID (NLA_U32)
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 0142ea2..690d27d 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -58,6 +58,12 @@ config NFT_REJECT_IPV4
default NFT_REJECT
tristate
 
+config NFT_DUP_IPV4
+   tristate IPv4 nf_tables packet duplication support
+   select NF_DUP_IPV4
+   help
+ This module enables IPv4 packet duplication support for nf_tables.
+
 endif # NF_TABLES_IPV4
 
 config NF_TABLES_ARP
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 9136ffc..87b073d 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -41,6 +41,7 @@ obj-$(CONFIG_NFT_CHAIN_NAT_IPV4) += nft_chain_nat_ipv4.o
 obj-$(CONFIG_NFT_REJECT_IPV4) += nft_reject_ipv4.o
 obj-$(CONFIG_NFT_MASQ_IPV4) += nft_masq_ipv4.o
 obj-$(CONFIG_NFT_REDIR_IPV4) += nft_redir_ipv4.o
+obj-$(CONFIG_NFT_DUP_IPV4) += nft_dup_ipv4.o
 obj-$(CONFIG_NF_TABLES_ARP) += nf_tables_arp.o
 
 # generic IP tables 
diff --git a/net/ipv4/netfilter/nf_dup_ipv4.c b/net/ipv4/netfilter/nf_dup_ipv4.c
index eff85ab..b5bb375 100644
--- a/net/ipv4/netfilter/nf_dup_ipv4.c
+++ b/net/ipv4/netfilter/nf_dup_ipv4.c
@@ -69,7 +69,7 @@ void nf_dup_ipv4(struct sk_buff *skb, unsigned int hooknum,
 {
struct iphdr *iph;
 
-   if (__this_cpu_read(nf_skb_duplicated))
+   if (this_cpu_read(nf_skb_duplicated))
return;
/*
 * Copy the skb, and route the copy. Will later return %XT_CONTINUE for
diff --git a/net/ipv4/netfilter/nft_dup_ipv4.c 
b/net/ipv4/netfilter/nft_dup_ipv4.c
new file mode 100644
index 000..25419fb
--- /dev/null
+++ b/net/ipv4/netfilter/nft_dup_ipv4.c
@@ -0,0 +1,110 @@
+/*
+ * Copyright (c) 2015 Pablo Neira Ayuso pa...@netfilter.org
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include linux/kernel.h
+#include linux/init.h
+#include linux/module.h
+#include linux/netlink.h
+#include linux/netfilter.h
+#include linux/netfilter/nf_tables.h
+#include net/netfilter/nf_tables.h
+#include net/netfilter/ipv4/nf_dup_ipv4.h
+
+struct nft_dup_ipv4 {
+   enum nft_registers  sreg_addr:8;
+   enum nft_registers  sreg_dev:8;
+};
+
+static void nft_dup_ipv4_eval(const struct nft_expr 

[PATCH 15/15] netfilter: nft_payload: work around vlan header stripping

2015-08-19 Thread Pablo Neira Ayuso
From: Florian Westphal f...@strlen.de

make payload expression aware of the fact that VLAN offload may have
removed a vlan header.

When we encounter tagged skb, transparently insert the tag into the
register so that vlan header matching can work without userspace being
aware of offload features.

Signed-off-by: Florian Westphal f...@strlen.de
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/nft_payload.c |   57 ++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index 94fb3b2..09b4b07 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -9,6 +9,7 @@
  */
 
 #include linux/kernel.h
+#include linux/if_vlan.h
 #include linux/init.h
 #include linux/module.h
 #include linux/netlink.h
@@ -17,6 +18,53 @@
 #include net/netfilter/nf_tables_core.h
 #include net/netfilter/nf_tables.h
 
+/* add vlan header into the user buffer for if tag was removed by offloads */
+static bool
+nft_payload_copy_vlan(u32 *d, const struct sk_buff *skb, u8 offset, u8 len)
+{
+   int mac_off = skb_mac_header(skb) - skb-data;
+   u8 vlan_len, *vlanh, *dst_u8 = (u8 *) d;
+   struct vlan_ethhdr veth;
+
+   vlanh = (u8 *) veth;
+   if (offset  ETH_HLEN) {
+   u8 ethlen = min_t(u8, len, ETH_HLEN - offset);
+
+   if (skb_copy_bits(skb, mac_off, veth, ETH_HLEN))
+   return false;
+
+   veth.h_vlan_proto = skb-vlan_proto;
+
+   memcpy(dst_u8, vlanh + offset, ethlen);
+
+   len -= ethlen;
+   if (len == 0)
+   return true;
+
+   dst_u8 += ethlen;
+   offset = ETH_HLEN;
+   } else if (offset = VLAN_ETH_HLEN) {
+   offset -= VLAN_HLEN;
+   goto skip;
+   }
+
+   veth.h_vlan_TCI = htons(skb_vlan_tag_get(skb));
+   veth.h_vlan_encapsulated_proto = skb-protocol;
+
+   vlanh += offset;
+
+   vlan_len = min_t(u8, len, VLAN_ETH_HLEN - offset);
+   memcpy(dst_u8, vlanh, vlan_len);
+
+   len -= vlan_len;
+   if (!len)
+   return true;
+
+   dst_u8 += vlan_len;
+ skip:
+   return skb_copy_bits(skb, offset + mac_off, dst_u8, len) == 0;
+}
+
 static void nft_payload_eval(const struct nft_expr *expr,
 struct nft_regs *regs,
 const struct nft_pktinfo *pkt)
@@ -26,10 +74,18 @@ static void nft_payload_eval(const struct nft_expr *expr,
u32 *dest = regs-data[priv-dreg];
int offset;
 
+   dest[priv-len / NFT_REG32_SIZE] = 0;
switch (priv-base) {
case NFT_PAYLOAD_LL_HEADER:
if (!skb_mac_header_was_set(skb))
goto err;
+
+   if (skb_vlan_tag_present(skb)) {
+   if (!nft_payload_copy_vlan(dest, skb,
+  priv-offset, priv-len))
+   goto err;
+   return;
+   }
offset = skb_mac_header(skb) - skb-data;
break;
case NFT_PAYLOAD_NETWORK_HEADER:
@@ -43,7 +99,6 @@ static void nft_payload_eval(const struct nft_expr *expr,
}
offset += priv-offset;
 
-   dest[priv-len / NFT_REG32_SIZE] = 0;
if (skb_copy_bits(skb, offset, dest, priv-len)  0)
goto err;
return;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/15] netfilter: nft_limit: rename to nft_limit_pkts

2015-08-19 Thread Pablo Neira Ayuso
To prepare introduction of bytes ratelimit support.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/nft_limit.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c
index 435c1cc..d0788e1 100644
--- a/net/netfilter/nft_limit.c
+++ b/net/netfilter/nft_limit.c
@@ -26,9 +26,9 @@ struct nft_limit {
unsigned long   stamp;
 };
 
-static void nft_limit_eval(const struct nft_expr *expr,
-  struct nft_regs *regs,
-  const struct nft_pktinfo *pkt)
+static void nft_limit_pkts_eval(const struct nft_expr *expr,
+   struct nft_regs *regs,
+   const struct nft_pktinfo *pkt)
 {
struct nft_limit *priv = nft_expr_priv(expr);
 
@@ -85,17 +85,17 @@ nla_put_failure:
 }
 
 static struct nft_expr_type nft_limit_type;
-static const struct nft_expr_ops nft_limit_ops = {
+static const struct nft_expr_ops nft_limit_pkts_ops = {
.type   = nft_limit_type,
.size   = NFT_EXPR_SIZE(sizeof(struct nft_limit)),
-   .eval   = nft_limit_eval,
+   .eval   = nft_limit_pkts_eval,
.init   = nft_limit_init,
.dump   = nft_limit_dump,
 };
 
 static struct nft_expr_type nft_limit_type __read_mostly = {
.name   = limit,
-   .ops= nft_limit_ops,
+   .ops= nft_limit_pkts_ops,
.policy = nft_limit_policy,
.maxattr= NFTA_LIMIT_MAX,
.flags  = NFT_EXPR_STATEFUL,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 9/9] geneve: Implement rtnl changelink

2015-08-19 Thread Jesse Gross
On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar pshe...@nicira.com wrote:
 diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
 index e47cdd9..0d7fbef 100644
 --- a/drivers/net/geneve.c
 +++ b/drivers/net/geneve.c
 -static int geneve_configure(struct net *net, struct net_device *dev,
 -   __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
 -   __u16 dst_port, bool metadata)
 +static int __geneve_configure(struct net *net, struct net_device *dev,
 + __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
 + __u16 dst_port, bool metadata)
  {
[...]
 geneve-net = net;
 geneve-dev = dev;

I guess this stuff should really be in geneve_configure() - it seems a
bit odd to change it for a running device (even if it shouldn't
change).

 geneve-remote.sin_addr.s_addr = rem_addr;
 if (IN_MULTICAST(ntohl(geneve-remote.sin_addr.s_addr)))
 return -EINVAL;

 +   u32_to_vni(vni, geneve-vni);
 list_for_each_entry(t, gn-geneve_list, next) {
 if (!memcmp(geneve-vni, t-vni, sizeof(t-vni)) 
 rem_addr == t-remote.sin_addr.s_addr 

I'm not sure that these types of operations are safe if the device is
already running. We first overwrite the remote value and then we do
error checking but that means that if there is an error, then the
device will be left in a broken state. Don't we also need to update
the hash table if some of these parameters change?

 +static int geneve_changelink(struct net_device *dev,
 +struct nlattr *tb[], struct nlattr *data[])
 +{
[...]
 -   if (data[IFLA_GENEVE_PORT])
 -   dst_port = nla_get_u16(data[IFLA_GENEVE_PORT]);
 +   if (geneve-sock  (dst_port != ntohs(geneve-dst_port) ||
 +metadata != geneve-collect_md)) {

It seems like in an ideal world, we wouldn't need to recreate the
socket if metadata collection changed (assuming that there are no new
conflicts).
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3] rocker: add debugfs support to dump internal tables

2015-08-19 Thread Scott Feldman
On Tue, Aug 18, 2015 at 1:47 PM, David Miller da...@davemloft.net wrote:
 From: Scott Feldman sfel...@gmail.com
 Date: Tue, 18 Aug 2015 13:37:56 -0700


  Hi Scott
 
  David is not so keen no debugfs stuff. He already NACKed adding more
  than what is currently in DSA:
 
  https://lkml.org/lkml/2015/7/11/8

 That patch added writable debugfs files, which I can see might be used
 as a back-door to program hardware.  That does seem bad.

 I fully agreed with respect to write. But if you read the whole
 message, David is also not happy with read only.

 I think before you spend too much more time on this, you need some
 indication from David if he is going to merge it or not.

 David, please give us guidance on debugfs in drivers/net.  Is there
 some criteria we can define to know when it's OK to use debugfs?

 The less you use it the better, seriously.

 I see some drivers where the foo_debugfs.c file is larger than the rest
 of the driver.  Once people start using it, it's like crack, and they
 dump every single debugging widget they found useful at some point into
 there.

 This is not what we want.  Most things I see in debugfs support was
 probably useful for debugging one particular bug but then it was never
 really useful again in the future.  Those kinds of things can be done
 locally in someone's tree.

 I often see various kinds of statistics ending up in these things,
 or register dumps, both of which are 'ethtool' or similar material.

# git grep debugfs_create drivers/net

^^^  this is scary.  I see some crazy things being done here.   The
writable nodes look like workaround driver/device bugs or to provide
backdoor interfaces that don't exist natively.

I say we clean up this mess.  Just eliminating the writable files
would force bugs to get fixed and get new interfaces defined.  And
replace readable files when interface exist (stats/reg).  Finally,
look for readable files that can be converted to new shared common
interfaces.  What's left should be read-only (S_IRUGO) files (no
binary blobs) containing data unique for driver/device useful for
field troubleshooting.

I'm motivated.  Next net-next cycle I'm going to go down the list with
a big eraser.  I'm sure I'll be a popular guy.

-scott
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] 3c59x: Add BQL support for 3c59x ethernet driver.

2015-08-19 Thread Loganaden Velvindron
---
 drivers/net/ethernet/3com/3c59x.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/3com/3c59x.c 
b/drivers/net/ethernet/3com/3c59x.c
index 753887d..2839af0 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -1726,6 +1726,7 @@ vortex_up(struct net_device *dev)
if (vp-cb_fn_base) /* The PCMCIA people are 
idiots.  */
iowrite32(0x8000, vp-cb_fn_base + 4);
netif_start_queue (dev);
+   netdev_reset_queue(dev);
 err_out:
return err;
 }
@@ -1935,16 +1936,18 @@ static void vortex_tx_timeout(struct net_device *dev)
if (vp-cur_tx - vp-dirty_tx  0ioread32(ioaddr + 
DownListPtr) == 0)
iowrite32(vp-tx_ring_dma + (vp-dirty_tx % 
TX_RING_SIZE) * sizeof(struct boom_tx_desc),
 ioaddr + DownListPtr);
-   if (vp-cur_tx - vp-dirty_tx  TX_RING_SIZE)
+   if (vp-cur_tx - vp-dirty_tx  TX_RING_SIZE) {
netif_wake_queue (dev);
+   netdev_reset_queue (dev);
+   }
if (vp-drv_flags  IS_BOOMERANG)
iowrite8(PKT_BUF_SZ8, ioaddr + TxFreeThreshold);
iowrite16(DownUnstall, ioaddr + EL3_CMD);
} else {
dev-stats.tx_dropped++;
netif_wake_queue(dev);
+   netdev_reset_queue(dev);
}
-
/* Issue Tx Enable */
iowrite16(TxEnable, ioaddr + EL3_CMD);
dev-trans_start = jiffies; /* prevent tx timeout */
@@ -2063,6 +2066,7 @@ vortex_start_xmit(struct sk_buff *skb, struct net_device 
*dev)
 {
struct vortex_private *vp = netdev_priv(dev);
void __iomem *ioaddr = vp-ioaddr;
+   int skblen = skb-len;
 
/* Put out the doubleword header... */
iowrite32(skb-len, ioaddr + TX_FIFO);
@@ -2094,6 +2098,7 @@ vortex_start_xmit(struct sk_buff *skb, struct net_device 
*dev)
}
}
 
+   netdev_sent_queue(dev, skblen);
 
/* Clear the Tx status stack. */
{
@@ -2125,6 +2130,7 @@ boomerang_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
void __iomem *ioaddr = vp-ioaddr;
/* Calculate the next Tx descriptor entry. */
int entry = vp-cur_tx % TX_RING_SIZE;
+   int skblen = skb-len;
struct boom_tx_desc *prev_entry = vp-tx_ring[(vp-cur_tx-1) % 
TX_RING_SIZE];
unsigned long flags;
dma_addr_t dma_addr;
@@ -2230,6 +2236,8 @@ boomerang_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
 
vp-cur_tx++;
+   netdev_sent_queue(dev, skblen);
+
if (vp-cur_tx - vp-dirty_tx  TX_RING_SIZE - 1) {
netif_stop_queue (dev);
} else {/* Clear previous 
interrupt enable. */
@@ -2267,6 +2275,7 @@ vortex_interrupt(int irq, void *dev_id)
int status;
int work_done = max_interrupt_work;
int handled = 0;
+   unsigned int bytes_compl = 0, pkts_compl = 0;
 
ioaddr = vp-ioaddr;
spin_lock(vp-lock);
@@ -2314,6 +2323,8 @@ vortex_interrupt(int irq, void *dev_id)
if (ioread16(ioaddr + Wn7_MasterStatus)  0x1000) {
iowrite16(0x1000, ioaddr + Wn7_MasterStatus); 
/* Ack the event. */
pci_unmap_single(VORTEX_PCI(vp), 
vp-tx_skb_dma, (vp-tx_skb-len + 3)  ~3, PCI_DMA_TODEVICE);
+   pkts_compl++;
+   bytes_compl += vp-tx_skb-len;
dev_kfree_skb_irq(vp-tx_skb); /* Release the 
transferred buffer */
if (ioread16(ioaddr + TxFree)  1536) {
/*
@@ -2358,6 +2369,7 @@ vortex_interrupt(int irq, void *dev_id)
iowrite16(AckIntr | IntReq | IntLatch, ioaddr + EL3_CMD);
} while ((status = ioread16(ioaddr + EL3_STATUS))  (IntLatch | 
RxComplete));
 
+   netdev_completed_queue(dev, pkts_compl, bytes_compl);
spin_unlock(vp-window_lock);
 
if (vortex_debug  4)
@@ -2382,6 +2394,7 @@ boomerang_interrupt(int irq, void *dev_id)
int status;
int work_done = max_interrupt_work;
int handled = 0;
+   unsigned int bytes_compl = 0, pkts_compl = 0;
 
ioaddr = vp-ioaddr;
 
@@ -2455,6 +2468,8 @@ boomerang_interrupt(int irq, void *dev_id)
pci_unmap_single(VORTEX_PCI(vp),

le32_to_cpu(vp-tx_ring[entry].addr), skb-len, PCI_DMA_TODEVICE);
 #endif
+   pkts_compl++;
+   bytes_compl += skb-len;
dev_kfree_skb_irq(skb);
vp-tx_skbuff[entry] = NULL;
  

Re: linux-next: unregister_netdevice: waiting for lo to become free. Usage count = 1

2015-08-19 Thread Andrey Wagin
2015-08-18 18:27 GMT+03:00 David Ahern d...@cumulusnetworks.com:
 On 8/18/15 9:24 AM, Andrey Wagin wrote:

 Hello David,

 CRIU tests detetect that references on net devices leak on
 4.2.0-rc6-next-20150817. Looks like it started with
 v4.2-rc6-882-g3bfd847.


 1e3136789975f03e461798149309034e5213c1b4 should have fixed it.

Yes, it works now. Thanks!


 David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 20/21] net: warn if drivers set tx_queue_len = 0

2015-08-19 Thread Eric Dumazet
On Wed, 2015-08-19 at 13:31 -0700, Eric Dumazet wrote:

 lpaa5:~# tc qd sh dev eth1
 qdisc mq 0: root 
 qdisc fq 0: parent :4 limit 1p flow_limit 1000p buckets 1024 bands 3 
 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3028 initial_quantum 15140 
 qdisc fq 0: parent :3 limit 1p flow_limit 1000p buckets 1024 bands 3 
 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3028 initial_quantum 15140 
 qdisc fq 0: parent :2 limit 1p flow_limit 1000p buckets 1024 bands 3 
 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3028 initial_quantum 15140 
 qdisc fq 0: parent :1 limit 1p flow_limit 1000p buckets 1024 bands 3 
 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3028 initial_quantum 15140 

Well, it seems I just leaked fact that we use 3-bands in our fq
implementation ;)



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 10/13] vxlan: do not shadow flags variable

2015-08-19 Thread Thomas Graf
On 08/19/15 at 12:10pm, Jiri Benc wrote:
 The 'flags' variable is already defined in the outer scope.
 
 Signed-off-by: Jiri Benc jb...@redhat.com

Acked-by: Thomas Graf tg...@suug.ch
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 04/13] ip_tunnels: add IPv6 addresses to ip_tunnel_key

2015-08-19 Thread Alexei Starovoitov
On Wed, Aug 19, 2015 at 12:09:54PM +0200, Jiri Benc wrote:
 Add the IPv6 addresses as an union with IPv4 ones. When using IPv4, the
 newly introduced padding after the IPv4 addresses needs to be zeroed out.
 
 Signed-off-by: Jiri Benc jb...@redhat.com
 ---
 v1-v2: Fix incorrect IP_TUNNEL_KEY_IPV4_PAD_LEN calculation, thanks to
 Alexei.

Acked-by: Alexei Starovoitov a...@plumgrid.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Clarification on rtnetlink requests

2015-08-19 Thread Anish Bhatt

 -Original Message-
 From: netdev-ow...@vger.kernel.org [mailto:netdev-
 ow...@vger.kernel.org] On Behalf Of David Chappelle
 Sent: Wednesday, August 19, 2015 8:05 AM
 To: netdev@vger.kernel.org
 Subject: Clarification on rtnetlink requests
 
 I am a bit confused with respect to the structure of rtnetlink requests.
 It seems that in some circumstances a request can look like:
 
 struct request
 {
 struct nlmsghdr header;
 struct rtgenmsg body;
 };
 
 and in other cases it can look like:
 
 struct request
 {
 struct nlmsghdr header;
 struct ifinfomsg body;
 };
 
 How do I know which one to use when sending RTM_GETLINK and
 RTM_GETADDR requests? Furthermore, it also seems that 'struct rtattr'
 can be specified at the end of the request as well. Is there any
 documentation that describes this.

RTM_GETLINK uses ifinfomsg and RTM_GETADDR uses ifaddrmsg, see man 7 rtnetlink. 
struct  rtgenmsg is just a generic type, look at include/linux/rtnetlink.h
-Anish

N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

Re: [RFC v2 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment

2015-08-19 Thread Sergei Shtylyov

Hello.

On 08/19/2015 03:59 PM, Emmanuel Grumbach wrote:


This allows to release the backpressure on the socket only
when the last segment is released.
Now the truesize looks like this:
if the truesize of the original skb is 65420, all the
segments will have a truesize of 704 (skb itself) and the
last one will have 65420.

Change-Id: I3c894cf2afc0aedfe7b2a5b992ba41653ff79c0e
Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com
---
  drivers/net/wireless/iwlwifi/mvm/tx.c | 17 -
  1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c 
b/drivers/net/wireless/iwlwifi/mvm/tx.c
index 5046833..046e50d 100644
--- a/drivers/net/wireless/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/iwlwifi/mvm/tx.c

[...]

@@ -1034,6 +1035,20 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct 
sk_buff *skb_gso,
}

__skb_queue_tail(mpdus_skb, skb);
+   sum_truesize += skb-truesize;
+   }
+
+   /* Release the backpressure on the socket only when
+* the last segment is released.
+*/
+   if (skb_gso-destructor == sock_wfree) {
+   struct sk_buff *tail = mpdus_skb-prev;
+
+   swap(tail-truesize, skb_gso-truesize);
+   swap(tail-destructor, skb_gso-destructor);
+   swap(tail-sk, skb_gso-sk);
+atomic_add(sum_truesize - skb_gso-truesize,


   Please indent using tabs, not spaces.


+   skb_gso-sk-sk_wmem_alloc);
}

ret = 0;


MBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment

2015-08-19 Thread Grumbach, Emmanuel


On 08/19/2015 05:24 PM, Eric Dumazet wrote:
 On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote:
 This allows to release the backpressure on the socket only
 when the last segment is released.
 Now the truesize looks like this:
 if the truesize of the original skb is 65420, all the
 segments will have a truesize of 704 (skb itself) and the
 last one will have 65420.

 Change-Id: I3c894cf2afc0aedfe7b2a5b992ba41653ff79c0e
 Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com
 ---
  drivers/net/wireless/iwlwifi/mvm/tx.c | 17 -
  1 file changed, 16 insertions(+), 1 deletion(-)

 diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c 
 b/drivers/net/wireless/iwlwifi/mvm/tx.c
 index 5046833..046e50d 100644
 --- a/drivers/net/wireless/iwlwifi/mvm/tx.c
 +++ b/drivers/net/wireless/iwlwifi/mvm/tx.c
 @@ -764,7 +764,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct 
 sk_buff *skb_gso,
  bool ipv6 = skb_shinfo(skb_gso)-gso_type  SKB_GSO_TCPV6;
  struct iwl_lso_splitter s = {};
  struct page *hdr_page;
 -unsigned int mpdu_sz;
 +unsigned int mpdu_sz, sum_truesize = 0;
  u8 *hdr_page_pos, *qc, tid;
  int i, ret;
  
 @@ -898,6 +898,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct 
 sk_buff *skb_gso,
  mpdu_sz, tcp_hdrlen(skb_gso));
  
  __skb_queue_tail(mpdus_skb, skb_gso);
 +sum_truesize += skb_gso-truesize;
  
  /* mss bytes have been consumed from the data */
  s.gso_payload_pos = s.mss;
 @@ -1034,6 +1035,20 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct 
 sk_buff *skb_gso,
  }
  
  __skb_queue_tail(mpdus_skb, skb);
 +sum_truesize += skb-truesize;
 +}
 +
 +/* Release the backpressure on the socket only when
 + * the last segment is released.
 + */
 +if (skb_gso-destructor == sock_wfree) {
 +struct sk_buff *tail = mpdus_skb-prev;
 +
 +swap(tail-truesize, skb_gso-truesize);
 +swap(tail-destructor, skb_gso-destructor);
 +swap(tail-sk, skb_gso-sk);
 +atomic_add(sum_truesize - skb_gso-truesize,
 +   skb_gso-sk-sk_wmem_alloc);
  }
  
  ret = 0;
 
 Using existing net/core/tso.c helpers would avoid using this.

Hm.. how would net/core/tso.c avoid this?
I can't see anything related to truesize there.
Note that this work since it is guaranteed that we release the skbs in
order.

 
 (BTW TCP packets do not have sock_wfree as destructor but tcp_wfree(),
 yet we want backpressure mostly for TCP stack (TCP Small Queues))
 
 

I am not sure I follow here.
You want me to test:
if (skb_gso-destructor == tcp_wfree) ?

I checked that code using iperf and saw that I don't get into this if,
but I (probably wrongly) assumed that other applications would set a
flag on the socket (forgive my ignorance) that would make this if be taken.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.

2015-08-19 Thread Jesse Gross
On Wed, Aug 19, 2015 at 11:49 AM, Pravin Shelar pshe...@nicira.com wrote:
 On Wed, Aug 19, 2015 at 11:37 AM, Jesse Gross je...@nicira.com wrote:
 On Wed, Aug 19, 2015 at 11:29 AM, Pravin Shelar pshe...@nicira.com wrote:
 On Wed, Aug 19, 2015 at 11:18 AM, Jesse Gross je...@nicira.com wrote:
 My guess is that if the issue from the earlier patch about overlapping
 collect_md tunnels is fixed then that might allow us to simplify
 things a little further, since for those tunnels we can assume there
 is a 1:1 mapping between collect_md tunnels and sockets.

 I dont see how it would be different. Can you elaborate on this ?

 Mostly just conceptually simpler. Right now it looks like we are doing
 some kind of refcounting between devices and tunnels in
 geneve_open/stop (I know it's not really but it appears like that in
 some ways.) We could just directly assign collect_md in geneve_open()
 and do nothing at all in geneve_stop().

 If you look at next patch, I have changed geneve_open and stop
 further. The change is geneve_open adds tunnel to hash table so that
 only device which are open are in hash table. Since geneve_open and
 stop is common for both type of tunnel I do not think there can be any
 changes even after avoiding overlapping tunnel types in given socket.

I guess I'm not sure why with the later changes it would be
incompatible. All I'm talking about is something pretty small:

geneve_open:
if (geneve-collect_md)
gs-collect_md = true;
to
gs-collect_md = geneve-collect_md;

geneve_close:
remove
if (geneve-collect_md)
gs-collect_md = false;
since the socket is about to be freed anyways.

It's not very different in practice but it looks less like refcounting
and more like a 1:1 mapping.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >