[PATCH] usbnet: dereference after null check in usbnet_start_xmit() and __usbnet_read_cmd()
usbnet_start_xmit() - If info-tx_fixup is not defined by class driver, NULL check does not happen for skb pointer and leads to NULL dereference. __usbnet_read_cmd() - if data pointer is passed as NULL, memcpy will dereference NULL pointer. Signed-off-by: Vivek Kumar Bhagat vivek.bha...@samsung.com --- drivers/net/usb/usbnet.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 3c86b10..ec4d224 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -1294,6 +1294,8 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb, if (skb) skb_tx_timestamp(skb); + else + goto drop; // some devices want funky USB-level framing, for // win32 driver (usually) and/or hardware quirks @@ -1906,7 +1908,8 @@ static int __usbnet_read_cmd(struct usbnet *dev, u8 cmd, u8 reqtype, buf = kmalloc(size, GFP_KERNEL); if (!buf) goto out; - } + } else + goto out; err = usb_control_msg(dev-udev, usb_rcvctrlpipe(dev-udev, 0), cmd, reqtype, value, index, buf, size, -- 1.7.9.5
[RFC v2 1/3] iwlwifi: mvm: add real TSO implementation
The segmentation is done completely in software. The driver creates several MPDUs out of a single large send. Each MPDU is a newly allocated SKB. A page is allocated to create the headers that need to be duplicated (SNAP / IP / TCP). The WiFi header is in the header of the newly created SKBs. type=feature Change-Id: I238ffa79cacc5bbdacdfbf3e9673c8d4f02b462a Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com --- drivers/net/wireless/iwlwifi/mvm/tx.c | 513 +++--- 1 file changed, 481 insertions(+), 32 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c b/drivers/net/wireless/iwlwifi/mvm/tx.c index 90f0ea1..a63686c 100644 --- a/drivers/net/wireless/iwlwifi/mvm/tx.c +++ b/drivers/net/wireless/iwlwifi/mvm/tx.c @@ -65,6 +65,7 @@ #include linux/ieee80211.h #include linux/etherdevice.h #include net/tcp.h +#include net/ip.h #include iwl-trans.h #include iwl-eeprom-parse.h @@ -435,32 +436,471 @@ int iwl_mvm_tx_skb_non_sta(struct iwl_mvm *mvm, struct sk_buff *skb) return 0; } +/* + * Update the IP / TCP headers and recompute the IP header CSUM + + * pseudo header CSUM. + */ +static void iwl_update_ip_tcph(void *iph, struct tcphdr *tcph, bool ipv6, + unsigned int len, unsigned int tcp_seq_offset, + u16 num_segment) +{ + be32_add_cpu(tcph-seq, tcp_seq_offset); + + if (ipv6) { + struct ipv6hdr *iphv6 = iph; + + iphv6-payload_len = cpu_to_be16(len + tcph-doff * 4); + + /* Compute CSUM on the the pseudo-header */ + tcph-check = ~csum_ipv6_magic(iphv6-saddr, iphv6-daddr, + len + tcph-doff * 4, + IPPROTO_TCP, 0); + } else { + struct iphdr *iphv4 = iph; + + iphv4-tot_len = + cpu_to_be16(len + tcph-doff * 4 + iphv4-ihl * 4); + be16_add_cpu(iphv4-id, num_segment); + ip_send_check(iphv4); + + /* Compute CSUM on the the pseudo-header */ + tcph-check = ~csum_tcpudp_magic(iphv4-saddr, iphv4-daddr, +len + tcph-doff * 4, +IPPROTO_TCP, 0); + } +} + +/** + * struct iwl_lso_splitter - state of the split. + * @linear_payload_len: The length of the payload inside the header of the + * original GSO skb. + * @gso_frag_num: The fragment number from which to take the data in the + * original GSO skb. + * @gso_payload_len: The length of the payload in the original GSO skb. + * @gso_payload_pos: The incrementing position in the payload of the original + * GSO skb. + * @gso_offset_in_page: The offset in the page of gso_frag_num. + * @gso_current_frag_size: The size of gso_frag_num. + * @gso_offset_in_frag: The offset in the gso_frag_num. + * @frag_in_mpdu: The index of the frag inside the new (split) MPDU. + * @mss: The maximal segment size. + * @si: Points to the the shared info of the original GSO skb. + * @ieee80211_hdr *hdr: Points to the WiFi header. + * @gso_nr_frags: The number of frags in the original GSO skb. + * @wifi_hdr_iv_len: The length of the WiFi header including IV. + * @tcp_fin: True if TCP_FIN is set in the original GSO skb. + * @tcp_push: True if TCP_PSH is set in the original GSO skb. + */ +struct iwl_lso_splitter { + unsigned int linear_payload_len; + unsigned int gso_frag_num; + unsigned int gso_payload_len; + unsigned int gso_payload_pos; + unsigned int gso_offset_in_page; + unsigned int gso_current_frag_size; + unsigned int gso_offset_in_frag; + unsigned int frag_in_mpdu; + unsigned int mss; + struct skb_shared_info *si; + struct ieee80211_hdr *hdr; + u8 gso_nr_frags; + u8 wifi_hdr_iv_len; + bool tcp_fin; + bool tcp_push; +}; + +/* + * Adds a TCP segment from skb_gso to skb. All the state is taken from + * and fed back to p. This function takes care about the payload only. + * This MSDU might already have msdu_sz bytes of payload that come from + * the original GSO skb's header. + */ +static unsigned int +iwl_add_tcp_segment(struct iwl_mvm *mvm, struct sk_buff *skb_gso, + struct sk_buff *skb, struct iwl_lso_splitter *p, + unsigned int msdu_sz) +{ + while (msdu_sz p-mss) { + unsigned int frag_sz = + min_t(unsigned int, p-gso_current_frag_size, + p-mss - msdu_sz); + + if (p-frag_in_mpdu = mvm-trans-max_skb_frags) + return msdu_sz; + + skb_add_rx_frag(skb, p-frag_in_mpdu, + skb_frag_page(p-si-frags[p-gso_frag_num]), + p-gso_offset_in_page, frag_sz, 0); + + /* We just added one frag to the mpdu ... */
[RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi
We enable TSO to get a lot of data at once to build A-MSDUs. Our hardware doesn't have (yet) TCP CSUM offload, so we do it manually. TSO won't be enabled on hardware that don't support CSUM offload in release code, computing TCP CSUM in the driver is just a way to start coding the flows. This is why the CSUM offload implementation in the driver in so bad in terms of efficiency. I preferred to have the flows as close as they will be when the hardware will be able to the CSUM than to try to seek efficiency. The hardware that will have CSUM offload will still require the driver to split the skb in software including the IP / TCP header copy and update etc... We could have enabled A-MSDU based on xmit-more, but the rationale of using LSO is that when using pfifo-fast, the Qdisc gets one packet and dequeues is straight away which limits the possibility to get a lot of packets at once. (Am I right here?). A note about A-MSDUs for non-wireless people: * An A-MSDU is a aggregated frame. It is one big 802.11 packet that contains several subframes. Each subframe is a TCP segment. One A-MSDU is represented by one single skb which means that we need to copy / duplicate the TCP / IP / SNAP headers in one single skb. This is why those headers are copied to a separate page: that page is added multiple times to the skb with different offsets. Each subframes needs at least 2 frags: 1 for the headers, 1 (or more) for the payload. I am quite a newbie in skb handling, so I guess that this code can be improved. I have tested it decently using iperf, but this doesn't mean that there are no issues using other applications. We are enabling pktgen on TCP (using patches that were sent a year ago or so) to test the different layouts of the skb (payload partition amongst the header and the different frags). I'll be very happy to get comments on that code, this is why I am sending it to netdev as well since the TSO experts are there :) Emmanuel Grumbach (3): iwlwifi: mvm: add real TSO implementation iwlwifi: mvm: allow to create A-MSDUs from a large send iwlwifi: mvm: transfer the truesize to the last TSO segment drivers/net/wireless/iwlwifi/mvm/mac80211.c | 3 +- drivers/net/wireless/iwlwifi/mvm/sta.c | 4 +- drivers/net/wireless/iwlwifi/mvm/sta.h | 6 +- drivers/net/wireless/iwlwifi/mvm/tx.c | 669 ++-- 4 files changed, 647 insertions(+), 35 deletions(-) -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v2 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment
This allows to release the backpressure on the socket only when the last segment is released. Now the truesize looks like this: if the truesize of the original skb is 65420, all the segments will have a truesize of 704 (skb itself) and the last one will have 65420. Change-Id: I3c894cf2afc0aedfe7b2a5b992ba41653ff79c0e Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com --- drivers/net/wireless/iwlwifi/mvm/tx.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c b/drivers/net/wireless/iwlwifi/mvm/tx.c index 5046833..046e50d 100644 --- a/drivers/net/wireless/iwlwifi/mvm/tx.c +++ b/drivers/net/wireless/iwlwifi/mvm/tx.c @@ -764,7 +764,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct sk_buff *skb_gso, bool ipv6 = skb_shinfo(skb_gso)-gso_type SKB_GSO_TCPV6; struct iwl_lso_splitter s = {}; struct page *hdr_page; - unsigned int mpdu_sz; + unsigned int mpdu_sz, sum_truesize = 0; u8 *hdr_page_pos, *qc, tid; int i, ret; @@ -898,6 +898,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct sk_buff *skb_gso, mpdu_sz, tcp_hdrlen(skb_gso)); __skb_queue_tail(mpdus_skb, skb_gso); + sum_truesize += skb_gso-truesize; /* mss bytes have been consumed from the data */ s.gso_payload_pos = s.mss; @@ -1034,6 +1035,20 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct sk_buff *skb_gso, } __skb_queue_tail(mpdus_skb, skb); + sum_truesize += skb-truesize; + } + + /* Release the backpressure on the socket only when +* the last segment is released. +*/ + if (skb_gso-destructor == sock_wfree) { + struct sk_buff *tail = mpdus_skb-prev; + + swap(tail-truesize, skb_gso-truesize); + swap(tail-destructor, skb_gso-destructor); + swap(tail-sk, skb_gso-sk); +atomic_add(sum_truesize - skb_gso-truesize, + skb_gso-sk-sk_wmem_alloc); } ret = 0; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v2 2/3] iwlwifi: mvm: allow to create A-MSDUs from a large send
Now that we can get a big chunk of data from the network stack, we can create an A-MSDU out of it. The purpose is to get a throughput improvement since sending one single A-MSDU is more efficient than sending several MSDUs at least under ideal link conditions. type=feature Change-Id: I5ea1b1132a57542187cd4c34c5299dbf44fe8b01 Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com --- drivers/net/wireless/iwlwifi/mvm/mac80211.c | 3 +- drivers/net/wireless/iwlwifi/mvm/sta.c | 4 +- drivers/net/wireless/iwlwifi/mvm/sta.h | 6 +- drivers/net/wireless/iwlwifi/mvm/tx.c | 159 ++-- 4 files changed, 160 insertions(+), 12 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/iwlwifi/mvm/mac80211.c index 3dd4e97..dd15e04 100644 --- a/drivers/net/wireless/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/iwlwifi/mvm/mac80211.c @@ -925,7 +925,8 @@ static int iwl_mvm_mac_ampdu_action(struct ieee80211_hw *hw, ret = iwl_mvm_sta_tx_agg_flush(mvm, vif, sta, tid); break; case IEEE80211_AMPDU_TX_OPERATIONAL: - ret = iwl_mvm_sta_tx_agg_oper(mvm, vif, sta, tid, buf_size); + ret = iwl_mvm_sta_tx_agg_oper(mvm, vif, sta, tid, + buf_size, amsdu); break; default: WARN_ON_ONCE(1); diff --git a/drivers/net/wireless/iwlwifi/mvm/sta.c b/drivers/net/wireless/iwlwifi/mvm/sta.c index df216cd..606fc09 100644 --- a/drivers/net/wireless/iwlwifi/mvm/sta.c +++ b/drivers/net/wireless/iwlwifi/mvm/sta.c @@ -976,7 +976,8 @@ int iwl_mvm_sta_tx_agg_start(struct iwl_mvm *mvm, struct ieee80211_vif *vif, } int iwl_mvm_sta_tx_agg_oper(struct iwl_mvm *mvm, struct ieee80211_vif *vif, - struct ieee80211_sta *sta, u16 tid, u8 buf_size) + struct ieee80211_sta *sta, u16 tid, u8 buf_size, + bool amsdu) { struct iwl_mvm_sta *mvmsta = iwl_mvm_sta_from_mac80211(sta); struct iwl_mvm_tid_data *tid_data = mvmsta-tid_data[tid]; @@ -995,6 +996,7 @@ int iwl_mvm_sta_tx_agg_oper(struct iwl_mvm *mvm, struct ieee80211_vif *vif, queue = tid_data-txq_id; tid_data-state = IWL_AGG_ON; mvmsta-agg_tids |= BIT(tid); + tid_data-amsdu_in_ampdu_allowed = amsdu; tid_data-ssn = 0x; spin_unlock_bh(mvmsta-lock); diff --git a/drivers/net/wireless/iwlwifi/mvm/sta.h b/drivers/net/wireless/iwlwifi/mvm/sta.h index eedb215..26d1e31 100644 --- a/drivers/net/wireless/iwlwifi/mvm/sta.h +++ b/drivers/net/wireless/iwlwifi/mvm/sta.h @@ -258,6 +258,8 @@ enum iwl_mvm_agg_state { * Tx response (TX_CMD), and the block ack notification (COMPRESSED_BA). * @reduced_tpc: Reduced tx power. Holds the data between the * Tx response (TX_CMD), and the block ack notification (COMPRESSED_BA). + * @amsdu_in_ampdu_allowed: true if A-MSDU in A-MPDU is allowed. Relevant only + * if state is %IWL_AGG_ON. * @state: state of the BA agreement establishment / tear down. * @txq_id: Tx queue used by the BA session * @ssn: the first packet to be sent in AGG HW queue in Tx AGG start flow, or @@ -272,6 +274,7 @@ struct iwl_mvm_tid_data { /* The rest is Tx AGG related */ u32 rate_n_flags; u8 reduced_tpc; + bool amsdu_in_ampdu_allowed; enum iwl_mvm_agg_state state; u16 txq_id; u16 ssn; @@ -387,7 +390,8 @@ int iwl_mvm_sta_rx_agg(struct iwl_mvm *mvm, struct ieee80211_sta *sta, int iwl_mvm_sta_tx_agg_start(struct iwl_mvm *mvm, struct ieee80211_vif *vif, struct ieee80211_sta *sta, u16 tid, u16 *ssn); int iwl_mvm_sta_tx_agg_oper(struct iwl_mvm *mvm, struct ieee80211_vif *vif, - struct ieee80211_sta *sta, u16 tid, u8 buf_size); + struct ieee80211_sta *sta, u16 tid, u8 buf_size, + bool amsdu); int iwl_mvm_sta_tx_agg_stop(struct iwl_mvm *mvm, struct ieee80211_vif *vif, struct ieee80211_sta *sta, u16 tid); int iwl_mvm_sta_tx_agg_flush(struct iwl_mvm *mvm, struct ieee80211_vif *vif, diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c b/drivers/net/wireless/iwlwifi/mvm/tx.c index a63686c..5046833 100644 --- a/drivers/net/wireless/iwlwifi/mvm/tx.c +++ b/drivers/net/wireless/iwlwifi/mvm/tx.c @@ -488,8 +488,10 @@ static void iwl_update_ip_tcph(void *iph, struct tcphdr *tcph, bool ipv6, * @ieee80211_hdr *hdr: Points to the WiFi header. * @gso_nr_frags: The number of frags in the original GSO skb. * @wifi_hdr_iv_len: The length of the WiFi header including IV. + * @amsdu_pad: Number of bytes for the A-MSDU subframe * @tcp_fin: True if TCP_FIN is set in the original GSO skb. * @tcp_push: True if TCP_PSH is set in the original GSO skb. + * @amsdu: True if we are building an A-MSDU */ struct iwl_lso_splitter { unsigned int
Re: [PATCH] usbnet: dereference after null check in usbnet_start_xmit() and __usbnet_read_cmd()
Vivek Kumar Bhagat vivek.bha...@samsung.com writes: usbnet_start_xmit() - If info-tx_fixup is not defined by class driver, NULL check does not happen for skb pointer and leads to NULL dereference. __usbnet_read_cmd() - if data pointer is passed as NULL, memcpy will dereference NULL pointer. That's two completely different issues. Mixing them in a single patch is only confusing things. Signed-off-by: Vivek Kumar Bhagat vivek.bha...@samsung.com --- drivers/net/usb/usbnet.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 3c86b10..ec4d224 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -1294,6 +1294,8 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb, if (skb) skb_tx_timestamp(skb); + else + goto drop; // some devices want funky USB-level framing, for // win32 driver (usually) and/or hardware quirks This is wrong. There are usbnet minidrivers depending on info-tx_fixup being called with a NULL skb. @@ -1906,7 +1908,8 @@ static int __usbnet_read_cmd(struct usbnet *dev, u8 cmd, u8 reqtype, buf = kmalloc(size, GFP_KERNEL); if (!buf) goto out; - } + } else + goto out; err = usb_control_msg(dev-udev, usb_rcvctrlpipe(dev-udev, 0), cmd, reqtype, value, index, buf, size, This is also wrong. It makes __usbnet_read_cmd() return -ENOMEM if called with a NULL data pointer. I don't know if it is used, but it's perfectly valid to call __usbnet_read_cmd() with data == NULL if size == 0. No memcpy will happen in this case because usb_control_msg can only return 0 or an error Please don't submit any more such patches without proper justification. You cannot trust that someone will actually take the time to sanity check your changes. Patches claiming to fix a NULL dereference should at least provide an oops. Bjørn -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] veth: replace iflink by a dedicated symlink in sysfs
On Wed, 19 Aug 2015 14:13:00 +0200, Vincent Bernat wrote: That's the main goal of this patch: advertising the peer link as IFLA_LINK attribute triggers an infinite loop in userland software when they follow iflink to discover network devices topology. iflink has always been the index of a lower device. If a sysfs symbolic link is not good enough, I can propose a new IFLA_PEER attribute instead. This would cause regression and break applications for those of us who started relying on the netnsid feature to match interfaces across net name spaces. This is tough. If you're going to do such thing, you would at least need to also introduce IFLA_PEER_NETNSID. Jiri -- Jiri Benc -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] veth: replace iflink by a dedicated symlink in sysfs
❦ 19 août 2015 14:38 +0200, Jiri Benc jb...@redhat.com : That's the main goal of this patch: advertising the peer link as IFLA_LINK attribute triggers an infinite loop in userland software when they follow iflink to discover network devices topology. iflink has always been the index of a lower device. If a sysfs symbolic link is not good enough, I can propose a new IFLA_PEER attribute instead. This would cause regression and break applications for those of us who started relying on the netnsid feature to match interfaces across net name spaces. Yes. Unfortunately. This is tough. If you're going to do such thing, you would at least need to also introduce IFLA_PEER_NETNSID. Yes I can. In my opinion, the change of semantics of IFLA_LINK is a break of API. However, I can live with it since it's easy to workaround it. It just seemed easier to start the discussion with a patch. -- Parenthesise to avoid ambiguity. - The Elements of Programming Style (Kernighan Plauger) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pull-request: wireless-drivers-next 2015-08-19
Kalle Valo kv...@codeaurora.org writes: here's one more pull request for 4.3. More info in the signed tag below. This time I had to merge mac80211-next.git due to some iwlwifi dependencies and apparently that broke git-request-pull's diffstat again, it was showing changes which were not really coming from my tree. I think that's just a bug in my old git and really should update the tool. This time I just fixed the diffstat manually. But please be extra careful with this pull request and please let me know if you have any problems. Oh, I forgot to mention that I saw this build error when I did a test merge: net/ipv4/fib_semantics.c:553:3: error: implicit declaration of function lwtstate_free [-Werror=implicit-function-declaration] But I see that also with unmodified net-next so I'm assuming I didn't cause that :) -- Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] r8169: Add values missing in @get_stats64 from HW counters
On Aug 19 09:31, Hayes Wang wrote: Corinna Vinschen [mailto:vinsc...@redhat.com] Sent: Wednesday, August 19, 2015 5:13 PM [...] It could be cleared by setting bit 0, such as rtl_tally_reset() of r8152. Is it safe to assume that this is implemented in all NICs covered by r8169? It is supported from RTL8111C. That is, RTL_GIGA_MAC_VER_19 and later. Thanks. In that case I would prefer the same generic method for all chip versions, so I'd opt for storing the offset values at rtl_open time as my patch is doing right now. Is that acceptable? If so, wouldn't it make even more sense to use the hardware collected information in @get_stats64 throughout, except for the numbers collected *only* in software? I would be willing to propose a matching patch. Thanks, Corinna pgpJydjgGbJe7.pgp Description: PGP signature
Re: [PATCH] rtlwifi: rtl8192cu: Add new device ID
On 08/19/2015 08:51 AM, Adrien Schildknecht wrote: The v2 of NetGear WNA1000M uses a different idProduct: USB ID 0846:9043 Signed-off-by: Adrien Schildknecht adrien+...@schischi.me --- Has this ID been tested with the Netgear device? Larry drivers/net/wireless/rtlwifi/rtl8192cu/sw.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c index 23806c2..8b4238a 100644 --- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c +++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c @@ -321,6 +321,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = { {RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/ {RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/ {RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/ + {RTL_USB_DEVICE(0x0846, 0x9043, rtl92cu_hal_cfg)}, /*NetGear WNA1000Mv2*/ {RTL_USB_DEVICE(0x0b05, 0x17ba, rtl92cu_hal_cfg)}, /*ASUS-Edimax*/ {RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CCC*/ {RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Telegrafische Überweisung HINWEIS
Standard Chartered Bank Dubai Main Branch, Al Fardan Gebäude, Al Mankhool Road, P.O. Box 999, Dubai, Vereinigte Arabische Emirate Sehr geehrte Kunden, Telegrafische Überweisung HINWEIS. Wir sind hiermit offiziell informieren Sie über Ihre Fondstelegrafische Überweisung durch unsere Bank, Standard Chartered Bank, auf Ihr Bankkonto, die offiziell von der Leitung der Weltbank Swiss (WBS) genehmigt worden ist, um die Summe von $ 5.500,000.00 USD in Credit Ihrer Bank Konto. Beachten Sie, dass ich die Bearbeitung Ihrer Zahlung und alles über die sofortige Überweisung von Ihrem Fonds wird innerhalb kürzester Zeit von der Zeit, die wir Ihren Unten benötigten Informationen erhalten geführt werden gestartet. Auch darüber informiert, dass der Gouverneur von Standard Chartered Bank (UAE) Plc auf Ihrer Avis unterschreiben und eine Kopie der Beratung wird bei der Weltbank in der Schweiz für einige Aufzeichnungszwecke versandt werden. Inzwischen Ihrer Information und Ihre vollständigen Kontaktdaten wurden aus unserer Forschung manager.Barrister Paul Dean eingehen, werden in Ihrem Namen, um eine zu erhalten, zu handeln AFIDAVIT ANSPRUCHS zur sofortigen Veröffentlichung Ihres Fonds. Dieser Fonds war Teil der eingereichten verstorbenen Präsidenten Saddam Hussein im Irak Discovery Fund der Weltbank der Schweiz, die die Schweizer Bank hat beschlossen, es zu großzügig verteilen helfen wenigen glücklichen Menschen, und die Europäische Union wird im Einvernehmen mit der Schweizerischen Bank, den Fonds auf 700 hunderttausend Menschen in Amerika, Europa und Asien, Naher Osten Afrika in andere zu verteilen, um zur Verbesserung ihrer Unternehmen. Daher bestätigen die die unten angegebenen Informationen genau, denn dieses Amt nicht leisten können, haftet für falsche Übertragung von Mitteln oder Haftung eines Fonds in ein unbekanntes Konto gutgeschrieben gehalten werden. Das einzige, was von Ihnen verlangt wird, um die eidesstattliche Erklärung ANSPRUCH zu erhalten, damit wir Ihrem Konto direkt durch telegrafische Überweisung oder über eine unserer entsprechenden Banken und schickt Kopien der Geldtransfer Freigabedokumente für Sie und Ihre Banker zur Bestätigung. Sollten Sie unsere Richtlinien befolgen, werden Ihre Fonds gutgeschrieben und beziehen sich auf Ihrem Bankkonto innerhalb von fünf (5) Bankarbeitstagen ab dem Tag, Sie diese eidesstattliche Erklärung ANSPRUCH erhalten. Für weitere Informationen und Unterstützung auf dieser Remittance Mitteilung Bitte leiten Sie Ihre VOLLER NAME: FULL KONTAKTADRESSE: Telefon- und Faxnummern: Direkt an meine E-Mail: standchart_orgb...@asia.com Mit freundlichen Grüßen, Mr. Rajesh Arora Finanzvorstand, UAE. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] rtlwifi: rtl8192cu: Add new device ID
The v2 of NetGear WNA1000M uses a different idProduct: USB ID 0846:9043 Signed-off-by: Adrien Schildknecht adrien+...@schischi.me --- drivers/net/wireless/rtlwifi/rtl8192cu/sw.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c index 23806c2..8b4238a 100644 --- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c +++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c @@ -321,6 +321,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = { {RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/ {RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/ {RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/ + {RTL_USB_DEVICE(0x0846, 0x9043, rtl92cu_hal_cfg)}, /*NetGear WNA1000Mv2*/ {RTL_USB_DEVICE(0x0b05, 0x17ba, rtl92cu_hal_cfg)}, /*ASUS-Edimax*/ {RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CCC*/ {RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/ -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] virtio-net: rephrase devconf fields description
On Mon, Aug 17, 2015 at 10:43:46AM +0800, Jason Wang wrote: On 08/16/2015 09:42 PM, Victor Kaplansky wrote: Clarify general description of the mac, status and max_virtqueue_pairs fields. Specifically, the old description is vague about configuration layout and fields offsets when some of the fields are non valid. Also clarify that validity of two status bits depends on two different feature flags. Signed-off-by: Victor Kaplansky vict...@redhat.com --- + +\item [\field{max_virtqueue_pairs}] tells the driver the maximum +number of each of virtqueues (receiveq1\ldots receiveqN and +transmitq1\ldots transmitqN respectively) that can be configured +on the device once VIRTIO_NET_F_MQ is negotiated. +\field{max_virtqueue_pairs} is valid only if VIRTIO_NET_F_MQ is +set and can be read by the driver. + I don't get the point that adding can be read by the driver. Looks like it's hard for hypervisor to detect this? AFAIU, if the device sets VIRTIO_NET_F_MQ, the device also sets the value of 'max_virtqueue_pairs' even before driver negotiated VIRTIO_NET_F_MQ. If so, the driver can read the value of 'max_virtqueue_pairs' during negotiation and potentially this value can even affect negotiation decision of the driver. If above is correct, I'll change the description to make this point more clear. Thanks, -- Victor -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usbnet: dereference after null check in usbnet_start_xmit() and __usbnet_read_cmd()
Bjørn Mork bj...@mork.no writes: Vivek Kumar Bhagat vivek.bha...@samsung.com writes: @@ -1906,7 +1908,8 @@ static int __usbnet_read_cmd(struct usbnet *dev, u8 cmd, u8 reqtype, buf = kmalloc(size, GFP_KERNEL); if (!buf) goto out; -} +} else +goto out; err = usb_control_msg(dev-udev, usb_rcvctrlpipe(dev-udev, 0), cmd, reqtype, value, index, buf, size, This is also wrong. It makes __usbnet_read_cmd() return -ENOMEM if called with a NULL data pointer. I don't know if it is used, but it's perfectly valid to call __usbnet_read_cmd() with data == NULL if size == 0. No memcpy will happen in this case because usb_control_msg can only return 0 or an error Just for the record - a simple grep for usbnet_read_cmd shows that at least drivers/net/usb/plusb.c depends on the current behaviour: static inline int pl_vendor_req(struct usbnet *dev, u8 req, u8 val, u8 index) { return usbnet_read_cmd(dev, req, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE, val, index, NULL, 0); } Bjørn -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 1/3] iwlwifi: mvm: add real TSO implementation
On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote: The segmentation is done completely in software. The driver creates several MPDUs out of a single large send. Each MPDU is a newly allocated SKB. A page is allocated to create the headers that need to be duplicated (SNAP / IP / TCP). The WiFi header is in the header of the newly created SKBs. type=feature Change-Id: I238ffa79cacc5bbdacdfbf3e9673c8d4f02b462a Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com --- drivers/net/wireless/iwlwifi/mvm/tx.c | 513 +++--- 1 file changed, 481 insertions(+), 32 deletions(-) Ouch dynamic allocations while doing xmit are certainly not needed. Your driver should pre-allocated space for headers. Drivers willing to implement tso have to use net/core/tso.c provided helpers. $ git grep -n tso_build_hdr drivers/net/ethernet/cavium/thunder/nicvf_queues.c:1030: tso_build_hdr(skb, hdr, tso, data_left, total_len == 0); drivers/net/ethernet/freescale/fec_main.c:729: tso_build_hdr(skb, hdr, tso, data_left, total_len == 0); drivers/net/ethernet/marvell/mv643xx_eth.c:842: tso_build_hdr(skb, hdr, tso, data_left, total_len == 0); drivers/net/ethernet/marvell/mvneta.c:1650: tso_build_hdr(skb, hdr, tso, data_left, total_len == 0); include/net/tso.h:15:void tso_build_hdr(struct sk_buff *skb, char *hdr, struct tso_t *tso, net/core/tso.c:14:void tso_build_hdr(struct sk_buff *skb, char *hdr, struct tso_t *tso, net/core/tso.c:37:EXPORT_SYMBOL(tso_build_hdr); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] virtio-net: add default_mtu configuration field
On Mon, Aug 17, 2015 at 11:07:15AM +0800, Jason Wang wrote: On 08/16/2015 09:42 PM, Victor Kaplansky wrote: @@ -3128,6 +3134,7 @@ struct virtio_net_config { u8 mac[6]; le16 status; le16 max_virtqueue_pairs; +le16 default_mtu; Looks like mtu is ok, consider we use mac instead of default_mac. Good point. I'll change the name in the next version of the patch. }; \end{lstlisting} @@ -3158,6 +3165,15 @@ by the driver after negotiation. \field{max_virtqueue_pairs} is valid only if VIRTIO_NET_F_MQ is set and can be read by the driver. +\item [\field{default_mtu}] is a hint to the driver set by the +device. It is valid during feature negotiation only if +VIRTIO_NET_F_DEFAULT_MTU is offered and holds the initial value +of MTU to be used by the driver. If VIRTIO_NET_F_DEFAULT_MTU is +negotiated, the driver uses the \field{default_mtu} as an initial +value, and also reports MTU changes to the device by writes to +\field{default_mtu}. Such reporting can be used for debugging, +or it can be used for tunning MTU along the network. + I vaguely remember that config is read only in some arch or transport and that's why we introduce another vq cmd to confirm the announcement. Probably we should do same for this? If so, we need to add one more feature bit to confirm the ability of the driver to report MTU, or we can weaken the requirement in conformance statement and write the driver may report the MTU. What do you say? -- Victor -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH
19.08.2015 13:54, Bjørn Mork пишет: Eugene Shatokhin eugene.shatok...@rosalab.ru writes: 19.08.2015 04:54, David Miller пишет: From: Eugene Shatokhin eugene.shatok...@rosalab.ru Date: Fri, 14 Aug 2015 19:58:36 +0300 2. The second race is on dev-flags. dev-flags is set to 0 here: *0 usbnet_stop (usbnet.c:816) /* deferred work (task, timer, softirq) must also stop. * can't flush_scheduled_work() until we drop rtnl (later), * else workers could deadlock; so make workers a NOP. */ dev-flags = 0; del_timer_sync (dev-delay); tasklet_kill (dev-bh); And here, the code clears EVENT_RX_KILL bit in dev-flags, which may execute concurrently with the above operation: *0 clear_bit (bitops.h:113, inlined) *1 usbnet_bh (usbnet.c:1475) /* restart RX again after disabling due to high error rate */ clear_bit(EVENT_RX_KILL, dev-flags); It seems, setting dev-flags to 0 is not necessarily atomic w.r.t. clear_bit() and other bit operations with dev-flags. It is safer to make it atomic and this way, make the race harmless. While at it, the checking of EVENT_NO_RUNTIME_PM bit of dev-flags in usbnet_stop() was fixed too: the bit should be checked before dev-flags is cleared. The fix for this is excessive. Instead of all of this madness, looping over expensive clear_bit() atomics, just do whatever it takes to make sure that usbnet_bh() is quiesced and cannot execute any more. Then you can safely clear dev-flags normally. If I understand it correctly, it is to make sure usbnet_bh() is not scheduled again that dev-flags should be set to 0 first, one way or another. That is what this madness is for. Assuming there is a race which may reorder these, exactly what difference does it make wrt EVENT_RX_KILL if you do a) clear_bit(EVENT_RX_KILL, dev-flags); dev-flags = 0; or b) dev-flags = 0; clear_bit(EVENT_RX_KILL, dev-flags); AFAICS, the result will be a cleared EVENT_RX_KILL bit in either case. Thanks for the review! The problem is not in the reordering but rather in the fact that dev-flags = 0 is not necessarily atomic w.r.t. clear_bit(EVENT_RX_KILL, dev-flags), and vice versa. So the following might be possible, although unlikely: CPU0 CPU1 clear_bit: read dev-flags clear_bit: clear EVENT_RX_KILL in the read value dev-flags=0; clear_bit: write updated dev-flags As a result, dev-flags may become non-zero again. I cannot prove yet that this is an impossible situation. If anyone can, please explain. If so, this part of the patch will not be needed. The EVENT_NO_RUNTIME_PM bug should definitely be fixed. Please split that out as a separate fix. It's a separate issue, and should be backported to all maintained stable releases it applies to (anything from v3.8 and newer) Yes, that makes sense. However, this fix was originally provided by Oliver Neukum rather than me, so I would like to hear his opinion as well first. Bjørn Regards, Eugene -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH
Eugene Shatokhin eugene.shatok...@rosalab.ru writes: The problem is not in the reordering but rather in the fact that dev-flags = 0 is not necessarily atomic w.r.t. clear_bit(EVENT_RX_KILL, dev-flags), and vice versa. So the following might be possible, although unlikely: CPU0 CPU1 clear_bit: read dev-flags clear_bit: clear EVENT_RX_KILL in the read value dev-flags=0; clear_bit: write updated dev-flags As a result, dev-flags may become non-zero again. Ah, right. Thanks for explaining. I cannot prove yet that this is an impossible situation. If anyone can, please explain. If so, this part of the patch will not be needed. I wonder if we could simply move the dev-flags = 0 down a few lines to fix both issues? It doesn't seem to do anything useful except for resetting the flags to a sane initial state after the device is down. Stopping the tasklet rescheduling etc depends only on netif_running(), which will be false when usbnet_stop is called. There is no need to touch dev-flags for this to happen. The EVENT_NO_RUNTIME_PM bug should definitely be fixed. Please split that out as a separate fix. It's a separate issue, and should be backported to all maintained stable releases it applies to (anything from v3.8 and newer) Yes, that makes sense. However, this fix was originally provided by Oliver Neukum rather than me, so I would like to hear his opinion as well first. If what I write above is correct (please help me verify...), then maybe it does make sense to do these together anyway. Bjørn -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH ipsec-next] xfrm: Use VRF master index if output device is enslaved
On Aug 18, 2015, at 6:54 PM, David Ahern d...@cumulusnetworks.com wrote: Directs route lookups to VRF table. Compiles out if NET_VRF is not enabled. With this patch able to successfully bring up ipsec tunnels in VRFs, even with duplicate network configuration (IPv4 tested). Signed-off-by: David Ahern d...@cumulusnetworks.com --- net/ipv4/xfrm4_policy.c | 7 +-- net/ipv6/xfrm6_policy.c | 7 +-- 2 files changed, 10 insertions(+), 4 deletions(-) I think you should use the new vrf_master_index() helper that acquires rcu because it looks possible to call -decode_session() without rcu read lock, e.g. in the hold_timer function xfrm_policy_queue_process(), though I haven’t tested it and might be missing something. :-) Cheers, Nik-- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] r8169:Set RxConfig on same func. with TxConfig
On Wed, 2015-08-19 at 08:39 +0300, Marian Corcodel wrote: It s not mandatory to accept these patches, if you wish to apply good if you not ,not problem. How can we apply a patch that does not compile ? You are going to piss all netdev people for good. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi
On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote: We could have enabled A-MSDU based on xmit-more, but the rationale of using LSO is that when using pfifo-fast, the Qdisc gets one packet and dequeues is straight away which limits the possibility to get a lot of packets at once. (Am I right here?). No, you are not ;) Key point for xmit_more is BQL being implemented in your driver. Relevant code is in try_bulk_dequeue_skb() -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv1 net-next 0/5] netlink: mmap: kernel panic and some issues
On 08/17/2015 11:02 PM, David Miller wrote: From: Daniel Borkmann dan...@iogearbox.net Date: Fri, 14 Aug 2015 12:38:21 +0200 diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 67d2104..4307446 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -238,6 +238,13 @@ static void __netlink_deliver_tap(struct sk_buff *skb) static void netlink_deliver_tap(struct sk_buff *skb) { + /* Netlink mmaped skbs must not access shared info, and thus +* are not allowed to be cloned. For now, just don't allow +* them to get inspected by taps. +*/ + if (netlink_skb_is_mmaped(skb)) + return; + I would seriously rather see us do an expensive full copy of the SKB than to have traffic which is unexpectedly invisible to taps. Do you mean generically as we do in TX path, or only in this particular scenario? Thanks, Daniel -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Clarification on rtnetlink requests
I am a bit confused with respect to the structure of rtnetlink requests. It seems that in some circumstances a request can look like: struct request { struct nlmsghdr header; struct rtgenmsg body; }; and in other cases it can look like: struct request { struct nlmsghdr header; struct ifinfomsg body; }; How do I know which one to use when sending RTM_GETLINK and RTM_GETADDR requests? Furthermore, it also seems that 'struct rtattr' can be specified at the end of the request as well. Is there any documentation that describes this. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] rtlwifi: rtl8192cu: Add new device ID
On 08/19/2015 10:33 AM, Adrien Schildknecht wrote: The v2 of NetGear WNA1000M uses a different idProduct: USB ID 0846:9043 Signed-off-by: Adrien Schildknecht adrien+...@schischi.me Cc: Stable sta...@vger.kernel.org --- drivers/net/wireless/rtlwifi/rtl8192cu/sw.c | 1 + 1 file changed, 1 insertion(+) Acked-by: Larry Finger larry.fin...@lwfinger.net Thanks, Larry diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c index 23806c2..fd4a535 100644 --- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c +++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c @@ -321,6 +321,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = { {RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/ {RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/ {RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/ + {RTL_USB_DEVICE(0x0846, 0x9043, rtl92cu_hal_cfg)}, /*NG WNA1000Mv2*/ {RTL_USB_DEVICE(0x0b05, 0x17ba, rtl92cu_hal_cfg)}, /*ASUS-Edimax*/ {RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CCC*/ {RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 01/13] ip_tunnels: remove custom alignment and packing
On 08/19/15 at 12:09pm, Jiri Benc wrote: The custom alignment of struct ip_tunnel_key is unnecessary. In struct sw_flow_key, it starts at offset 256, in struct ip_tunnel_info it's the first field. The structure is also packed even without the __packed keyword. Signed-off-by: Jiri Benc jb...@redhat.com I came to the same conclusion but didn't want to change it in the original series. Acked-by: Thomas Graf tg...@suug.ch -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment
On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote: This allows to release the backpressure on the socket only when the last segment is released. Now the truesize looks like this: if the truesize of the original skb is 65420, all the segments will have a truesize of 704 (skb itself) and the last one will have 65420. Change-Id: I3c894cf2afc0aedfe7b2a5b992ba41653ff79c0e Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com --- drivers/net/wireless/iwlwifi/mvm/tx.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c b/drivers/net/wireless/iwlwifi/mvm/tx.c index 5046833..046e50d 100644 --- a/drivers/net/wireless/iwlwifi/mvm/tx.c +++ b/drivers/net/wireless/iwlwifi/mvm/tx.c @@ -764,7 +764,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct sk_buff *skb_gso, bool ipv6 = skb_shinfo(skb_gso)-gso_type SKB_GSO_TCPV6; struct iwl_lso_splitter s = {}; struct page *hdr_page; - unsigned int mpdu_sz; + unsigned int mpdu_sz, sum_truesize = 0; u8 *hdr_page_pos, *qc, tid; int i, ret; @@ -898,6 +898,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct sk_buff *skb_gso, mpdu_sz, tcp_hdrlen(skb_gso)); __skb_queue_tail(mpdus_skb, skb_gso); + sum_truesize += skb_gso-truesize; /* mss bytes have been consumed from the data */ s.gso_payload_pos = s.mss; @@ -1034,6 +1035,20 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct sk_buff *skb_gso, } __skb_queue_tail(mpdus_skb, skb); + sum_truesize += skb-truesize; + } + + /* Release the backpressure on the socket only when + * the last segment is released. + */ + if (skb_gso-destructor == sock_wfree) { + struct sk_buff *tail = mpdus_skb-prev; + + swap(tail-truesize, skb_gso-truesize); + swap(tail-destructor, skb_gso-destructor); + swap(tail-sk, skb_gso-sk); +atomic_add(sum_truesize - skb_gso-truesize, + skb_gso-sk-sk_wmem_alloc); } ret = 0; Using existing net/core/tso.c helpers would avoid using this. (BTW TCP packets do not have sock_wfree as destructor but tcp_wfree(), yet we want backpressure mostly for TCP stack (TCP Small Queues)) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] vrf: plug skb leaks
From: Nikolay Aleksandrov niko...@cumulusnetworks.com Currently whenever a packet different from ETH_P_IP is sent through the VRF device it is leaked so plug the leaks and properly drop these packets. Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com --- drivers/net/vrf.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index ed208317cbb5..4aa06450fafa 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -97,6 +97,12 @@ static bool is_ip_rx_frame(struct sk_buff *skb) return false; } +static void vrf_tx_error(struct net_device *vrf_dev, struct sk_buff *skb) +{ + vrf_dev-stats.tx_errors++; + kfree_skb(skb); +} + /* note: already called with rcu_read_lock */ static rx_handler_result_t vrf_handle_frame(struct sk_buff **pskb) { @@ -149,7 +155,8 @@ static struct rtnl_link_stats64 *vrf_get_stats64(struct net_device *dev, static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb, struct net_device *dev) { - return 0; + vrf_tx_error(dev, skb); + return NET_XMIT_DROP; } static int vrf_send_v4_prep(struct sk_buff *skb, struct flowi4 *fl4, @@ -206,8 +213,7 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb, out: return ret; err: - vrf_dev-stats.tx_errors++; - kfree_skb(skb); + vrf_tx_error(vrf_dev, skb); goto out; } @@ -219,6 +225,7 @@ static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev) case htons(ETH_P_IPV6): return vrf_process_v6_outbound(skb, dev); default: + vrf_tx_error(dev, skb); return NET_XMIT_DROP; } } -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: inet_hashtables.c: warning: division by zero
On Wed, 2015-08-19 at 18:24 +0300, Meelis Roos wrote: Noticed this while compiling 4.2-rc7+git on i386 with gcc 4.9.2: CC net/ipv4/inet_hashtables.o In file included from include/linux/list.h:8:0, from include/linux/module.h:9, from net/ipv4/inet_hashtables.c:16: net/ipv4/inet_hashtables.c: In function ‘inet_ehash_locks_alloc’: net/ipv4/inet_hashtables.c:632:24: warning: division by zero [-Wdiv-by-zero] 2 * L1_CACHE_BYTES / sizeof(spinlock_t), ^ include/linux/kernel.h:769:17: note: in definition of macro ‘max_t’ type __max1 = (x); \ ^ This warning was fixed : http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=89e478a2aa58af2548b7f316e4d5b6bcc9eade5b -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 1/3] iwlwifi: mvm: add real TSO implementation
On Wed, 2015-08-19 at 07:17 -0700, Eric Dumazet wrote: On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote: The segmentation is done completely in software. The driver creates several MPDUs out of a single large send. Each MPDU is a newly allocated SKB. A page is allocated to create the headers that need to be duplicated (SNAP / IP / TCP). The WiFi header is in the header of the newly created SKBs. type=feature Change-Id: I238ffa79cacc5bbdacdfbf3e9673c8d4f02b462a Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com --- drivers/net/wireless/iwlwifi/mvm/tx.c | 513 +++--- 1 file changed, 481 insertions(+), 32 deletions(-) Ouch dynamic allocations while doing xmit are certainly not needed. Your driver should pre-allocated space for headers. Drivers willing to implement tso have to use net/core/tso.c provided helpers. $ git grep -n tso_build_hdr drivers/net/ethernet/cavium/thunder/nicvf_queues.c:1030: tso_build_hdr(skb, hdr, tso, data_left, total_len == 0); drivers/net/ethernet/freescale/fec_main.c:729: tso_build_hdr(skb, hdr, tso, data_left, total_len == 0); drivers/net/ethernet/marvell/mv643xx_eth.c:842: tso_build_hdr(skb, hdr, tso, data_left, total_len == 0); drivers/net/ethernet/marvell/mvneta.c:1650: tso_build_hdr(skb, hdr, tso, data_left, total_len == 0); include/net/tso.h:15:void tso_build_hdr(struct sk_buff *skb, char *hdr, struct tso_t *tso, net/core/tso.c:14:void tso_build_hdr(struct sk_buff *skb, char *hdr, struct tso_t *tso, net/core/tso.c:37:EXPORT_SYMBOL(tso_build_hdr); Look at commit 2adb719d74f6e174071e5c913290b9bbd8c2c0e8 for a typical use of these helpers. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 02/13] ip_tunnels: use u8/u16/u32
On 08/19/15 at 12:09pm, Jiri Benc wrote: The ip_tunnels.h include file uses mixture of __u16 and u16 (etc.) types. Unify it to the non-underscore variants. Signed-off-by: Jiri Benc jb...@redhat.com Acked-by: Thomas Graf tg...@suug.ch -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] r8169: Add values missing in @get_stats64 from HW counters
On Aug 19 15:07, Corinna Vinschen wrote: On Aug 19 09:31, Hayes Wang wrote: Corinna Vinschen [mailto:vinsc...@redhat.com] Sent: Wednesday, August 19, 2015 5:13 PM [...] It could be cleared by setting bit 0, such as rtl_tally_reset() of r8152. Is it safe to assume that this is implemented in all NICs covered by r8169? It is supported from RTL8111C. That is, RTL_GIGA_MAC_VER_19 and later. Thanks. In that case I would prefer the same generic method for all chip versions, so I'd opt for storing the offset values at rtl_open time as my patch is doing right now. Is that acceptable? If so, wouldn't it make even more sense to use the hardware collected information in @get_stats64 throughout, except for the numbers collected *only* in software? I would be willing to propose a matching patch. It just occured to me that the combination of resetting the counters on post-RTL_GIGA_MAC_VER_19 chips plus offset handling would be quite nice, because it would reset also the small 16 and 32 bit counters. So I'd like to propose a patch which combines both techniques, if that's an acceptable way to go forward. Btw., does setting the reset bit in CounterAddrLow work the same way as setting the CounterDump flag? I.e, does the driver have to wait for the hardware to set the bit to 0 again to be sure the reset is finished? Thanks in advance, Corinna pgp8ULHO1_RPj.pgp Description: PGP signature
Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi
Hi Eric, First, thank you a lot for your comments. On 08/19/2015 05:14 PM, Eric Dumazet wrote: On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote: We could have enabled A-MSDU based on xmit-more, but the rationale of using LSO is that when using pfifo-fast, the Qdisc gets one packet and dequeues is straight away which limits the possibility to get a lot of packets at once. (Am I right here?). No, you are not ;) Key point for xmit_more is BQL being implemented in your driver. Relevant code is in try_bulk_dequeue_skb() I'll look at it. I was almost starting to implement that but then I thought with another (good?) reason to use LSO. LSO gives me the guarantee that the packet is directed to one peer, which might not be the case with xmit_more since we have one Qdisc for several clients in case we are in AP mode. Building an A-MSDU for several clients is not possible, at least not for several client in the L2 (different MAC addresses). LSO avoids this problem completely. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi
On Wed, 2015-08-19 at 15:07 +, Grumbach, Emmanuel wrote: I'll look at it. I was almost starting to implement that but then I thought with another (good?) reason to use LSO. LSO gives me the guarantee that the packet is directed to one peer, which might not be the case with xmit_more since we have one Qdisc for several clients in case we are in AP mode. Building an A-MSDU for several clients is not possible, at least not for several client in the L2 (different MAC addresses). LSO avoids this problem completely. Then, simply calling skb_gso_segment() from the driver might be enough, and less work for you. This would even support TSO on IPv6 segs = skb_gso_segment(skb, tp-dev-features ~(NETIF_F_TSO | NETIF_F_TSO6)); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtlwifi: rtl8192cu: Add new device ID
Has this ID been tested with the Netgear device? Yes, I have been using the device and the patch for 2 days. -- Adrien Schildknecht -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] rtlwifi: rtl8192cu: Add new device ID
The v2 of NetGear WNA1000M uses a different idProduct: USB ID 0846:9043 Signed-off-by: Adrien Schildknecht adrien+...@schischi.me Cc: Stable sta...@vger.kernel.org --- drivers/net/wireless/rtlwifi/rtl8192cu/sw.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c index 23806c2..fd4a535 100644 --- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c +++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c @@ -321,6 +321,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = { {RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/ {RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/ {RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/ + {RTL_USB_DEVICE(0x0846, 0x9043, rtl92cu_hal_cfg)}, /*NG WNA1000Mv2*/ {RTL_USB_DEVICE(0x0b05, 0x17ba, rtl92cu_hal_cfg)}, /*ASUS-Edimax*/ {RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CCC*/ {RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/ -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with fragmented packets on tun/tap interface
On Wed, 2015-08-19 at 15:44 +0530, Prashant Upadhyaya wrote: Hi Eric, For some reason, the dropping in the raw table does not work for me for the usecase, though I recognize that the raw table operations theory, when matched with my usecase theory, is the apparent solution. I think the reason is that I use packet sockets with defrag option on so that it can select the right queue for load balancing purposes. Anyway, not disappointed with the above, I stuck to my theory and tried a simple approach. To tie-break the reassembly/defrag done by the kernel from the packets from the eth0 and the packets submitted from tap (via application), I made a small change in the application. I detected that the packets are fragmented in the app, and bumped up the 'Identification' field in the IP header and re-checksummed the IP header and then submitted it to tap. Since reassembly/defrag is done on the basis of srcip, destip, protocol and Identification field tupple from IP header, I expected it to work and it does ! So there we are, I have a nice little solution in place which suits me. Another idea would have to put your tap device and ethernet device in different namespaces, as the defrag unit is namespace aware. Looks like eth0 could be put in a completely new namespace as it holds no IP address ? ip netns add eth0ns ip link set eth0 netns eth0ns -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
inet_hashtables.c: warning: division by zero
Noticed this while compiling 4.2-rc7+git on i386 with gcc 4.9.2: CC net/ipv4/inet_hashtables.o In file included from include/linux/list.h:8:0, from include/linux/module.h:9, from net/ipv4/inet_hashtables.c:16: net/ipv4/inet_hashtables.c: In function ‘inet_ehash_locks_alloc’: net/ipv4/inet_hashtables.c:632:24: warning: division by zero [-Wdiv-by-zero] 2 * L1_CACHE_BYTES / sizeof(spinlock_t), ^ include/linux/kernel.h:769:17: note: in definition of macro ‘max_t’ type __max1 = (x); \ ^ -- Meelis Roos (mr...@linux.ee) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtlwifi: rtl8192cu: Add new device ID
On 08/19/2015 08:51 AM, Adrien Schildknecht wrote: The v2 of NetGear WNA1000M uses a different idProduct: USB ID 0846:9043 Signed-off-by: Adrien Schildknecht adrien+...@schischi.me Add a Cc: Stable sta...@vger.kernel.org line here. That way the new ID will be available with older kernels. The new line exceeds 80 characters. You might abbreviate Netgear as NG. When you resubmit, do so as [PATCH V2]. Thanks, Larry --- drivers/net/wireless/rtlwifi/rtl8192cu/sw.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c index 23806c2..8b4238a 100644 --- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c +++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c @@ -321,6 +321,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = { {RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/ {RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/ {RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/ + {RTL_USB_DEVICE(0x0846, 0x9043, rtl92cu_hal_cfg)}, /*NetGear WNA1000Mv2*/ {RTL_USB_DEVICE(0x0b05, 0x17ba, rtl92cu_hal_cfg)}, /*ASUS-Edimax*/ {RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CCC*/ {RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 4/4] vrf: ndo_add|del_slave drop unnecessary checks
From: Nikolay Aleksandrov niko...@cumulusnetworks.com When ndo_add|del_slave ops are used, they're taken from the respective master device's netdev ops, so if the master device is a VRF only then the VRF ops will get called thus no need to check the type of the master. Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com --- drivers/net/vrf.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 4825c65c62fd..dbeffe789185 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -393,8 +393,7 @@ out_fail: static int vrf_add_slave(struct net_device *dev, struct net_device *port_dev) { - if (!netif_is_vrf(dev) || netif_is_vrf(port_dev) || - vrf_is_slave(port_dev)) + if (netif_is_vrf(port_dev) || vrf_is_slave(port_dev)) return -EINVAL; return do_vrf_add_slave(dev, port_dev); @@ -431,9 +430,6 @@ static int do_vrf_del_slave(struct net_device *dev, struct net_device *port_dev) static int vrf_del_slave(struct net_device *dev, struct net_device *port_dev) { - if (!netif_is_vrf(dev)) - return -EINVAL; - return do_vrf_del_slave(dev, port_dev); } -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/4] vrf: don't panic on cache create failure
From: Nikolay Aleksandrov niko...@cumulusnetworks.com It's pointless to panic on cache create failure when that case is handled and even more so since it's not a kernel-wide fatal problem so don't panic. Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com --- drivers/net/vrf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 4aa06450fafa..01dc91562a88 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -649,7 +649,7 @@ static int __init vrf_init_module(void) vrf_dst_ops.kmem_cachep = kmem_cache_create(vrf_ip_dst_cache, sizeof(struct rtable), 0, - SLAB_HWCACHE_ALIGN | SLAB_PANIC, + SLAB_HWCACHE_ALIGN, NULL); if (!vrf_dst_ops.kmem_cachep) -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 3/4] vrf: move vrf_insert_slave so we can drop a goto label
From: Nikolay Aleksandrov niko...@cumulusnetworks.com We can simplify do_vrf_add_slave by moving vrf_insert_slave in the end of the enslaving and thus eliminate an error goto label. It always succeeds and isn't needed before that anyway. Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com --- drivers/net/vrf.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 9907550ff640..4825c65c62fd 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -363,15 +363,13 @@ static int do_vrf_add_slave(struct net_device *dev, struct net_device *port_dev) vrf_ptr-ifindex = dev-ifindex; vrf_ptr-tb_id = vrf-tb_id; - __vrf_insert_slave(queue, slave); - /* register the packet handler for slave ports */ ret = netdev_rx_handler_register(port_dev, vrf_handle_frame, dev); if (ret) { netdev_err(port_dev, Device %s failed to register rx_handler\n, port_dev-name); - goto out_remove; + goto out_fail; } ret = netdev_master_upper_dev_link(port_dev, dev); @@ -379,7 +377,7 @@ static int do_vrf_add_slave(struct net_device *dev, struct net_device *port_dev) goto out_unregister; port_dev-flags |= IFF_SLAVE; - + __vrf_insert_slave(queue, slave); rcu_assign_pointer(port_dev-vrf_ptr, vrf_ptr); cycle_netdev(port_dev); @@ -387,8 +385,6 @@ static int do_vrf_add_slave(struct net_device *dev, struct net_device *port_dev) out_unregister: netdev_rx_handler_unregister(port_dev); -out_remove: - __vrf_remove_slave(queue, slave); out_fail: kfree(vrf_ptr); kfree(slave); -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/4] vrf: cleanups part 2
From: Nikolay Aleksandrov niko...@cumulusnetworks.com Hi, This is the next part of vrf cleanups, patch 1 drops the SLAB_PANIC when creating kmem cache since it's handled, patch 02 removes a slave duplicate check which is already done by the lower/upper code, patch 3 moves the ndo_add_slave code around a bit so we can drop an error label and patch 4 drops the master device checks which are unnecessary because the ops are taken from the master device itself so it can't be different. Cheers, Nik Nikolay Aleksandrov (4): vrf: don't panic on cache create failure vrf: remove unnecessary duplicate check vrf: move vrf_insert_slave so we can drop a goto label vrf: ndo_add|del_slave drop unnecessary checks drivers/net/vrf.c | 24 1 file changed, 4 insertions(+), 20 deletions(-) -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/4] vrf: remove unnecessary duplicate check
From: Nikolay Aleksandrov niko...@cumulusnetworks.com The upper/lower functions already check for duplicate slaves so no need to do it again. Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com --- drivers/net/vrf.c | 8 1 file changed, 8 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 01dc91562a88..9907550ff640 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -352,7 +352,6 @@ static int do_vrf_add_slave(struct net_device *dev, struct net_device *port_dev) { struct net_vrf_dev *vrf_ptr = kmalloc(sizeof(*vrf_ptr), GFP_KERNEL); struct slave *slave = kzalloc(sizeof(*slave), GFP_KERNEL); - struct slave *duplicate_slave; struct net_vrf *vrf = netdev_priv(dev); struct slave_queue *queue = vrf-queue; int ret = -ENOMEM; @@ -361,16 +360,9 @@ static int do_vrf_add_slave(struct net_device *dev, struct net_device *port_dev) goto out_fail; slave-dev = port_dev; - vrf_ptr-ifindex = dev-ifindex; vrf_ptr-tb_id = vrf-tb_id; - duplicate_slave = __vrf_find_slave_dev(queue, port_dev); - if (duplicate_slave) { - ret = -EBUSY; - goto out_fail; - } - __vrf_insert_slave(queue, slave); /* register the packet handler for slave ports */ -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 03/13] ip_tunnels: use offsetofend
On 08/19/15 at 12:09pm, Jiri Benc wrote: Signed-off-by: Jiri Benc jb...@redhat.com Acked-by: Thomas Graf tg...@suug.ch -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] veth: replace iflink by a dedicated symlink in sysfs
Le 19/08/2015 14:48, Vincent Bernat a écrit : ❦ 19 août 2015 14:38 +0200, Jiri Benc jb...@redhat.com : That's the main goal of this patch: advertising the peer link as IFLA_LINK attribute triggers an infinite loop in userland software when they follow iflink to discover network devices topology. iflink has always been the index of a lower device. If a sysfs symbolic link is not good enough, I can propose a new IFLA_PEER attribute instead. This would cause regression and break applications for those of us who started relying on the netnsid feature to match interfaces across net name spaces. Yes. Unfortunately. This is tough. If you're going to do such thing, you would at least need to also introduce IFLA_PEER_NETNSID. Probably better to introduce veth netlink attribute then, something like IFLA_VETH_PEER and keeps IFLA_LINK_NETNSID. Yes I can. In my opinion, the change of semantics of IFLA_LINK is a break of API. However, I can live with it since it's easy to workaround it. It just seemed easier to start the discussion with a patch. I also don't know what is the best way to handle this. veth advertises its peer via IFLA_LINK since 4.1, so it's too late to change it for this release. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 08/13] ipv6: ndisc: inherit metadata dst when creating ndisc requests
On 08/19/15 at 12:09pm, Jiri Benc wrote: If output device wants to see the dst, inherit the dst of the original skb in the ndisc request. This is an IPv6 counterpart of commit 0accfc268f4d (arp: Inherit metadata dst when creating ARP requests). Signed-off-by: Jiri Benc jb...@redhat.com Acked-by: Thomas Graf tg...@suug.ch -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] sctp: donot reset the overall_error_count in SHUTDOWN_RECEIVE state
On Wed, Aug 19, 2015 at 12:38:03PM +0800, Xin Long wrote: commit f8d960524 fix the 0 peer.rwnd issue in SHUTDOWN_PENING state through not reseting the overall_error_count when recevie a heartbeat, but the same issue also exists in SHUTDOWN_RECEIVE state. Please fix the typos on changelog, specially when regarding symbols so searching for them later is more successful. Also, to make changelog closer to the actual change, explaining why it's okay to include the other states in there too would be good, as you're including not only SHUTDOWN_RECEIVE but also SHUTDOWN_SENT and SHUTDOWN_ACK_SENT. Fixes: f8d960524 (sctp: Enforce retransmission limit during shutdown) Signed-off-by: Xin Long lucien@gmail.com --- net/sctp/sm_sideeffect.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c index fef2acd..85e6f03 100644 --- a/net/sctp/sm_sideeffect.c +++ b/net/sctp/sm_sideeffect.c @@ -702,7 +702,7 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds, * outstanding data and rely on the retransmission limit be reached * to shutdown the association. */ - if (t-asoc-state != SCTP_STATE_SHUTDOWN_PENDING) + if (t-asoc-state SCTP_STATE_SHUTDOWN_PENDING) t-asoc-overall_error_count = 0; /* Clear the hb_sent flag to signal that we had a good -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.
On Wed, Aug 19, 2015 at 11:18 AM, Jesse Gross je...@nicira.com wrote: On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar pshe...@nicira.com wrote: diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index e58468b..18ff83b 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -181,7 +181,7 @@ config VXLAN config GENEVE tristate Generic Network Virtualization Encapsulation netdev - depends on INET GENEVE_CORE + depends on INET select NET_IP_TUNNEL I think my comments on v1 one this patch were overlooked (about the UDP_TUNNEL dependency and the name). right, I missed it. diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 5b43382..eb298ff 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c +static void geneve_build_header(struct genevehdr *geneveh, + __be16 tun_flags, u8 vni[3], + u8 options_len, u8 *options) [...] +static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb, + __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt, + bool csum) It seems like we could just merge these functions. I'm not sure that the role is all that different. ok. In geneve_build_skb(), the error labels are somewhat confusing (for example, free_rt doesn't free the rt). Also, is it right that we don't free the rt if udp_tunnel_handle_offloads() fails()? It might be cleaner if the caller retains ownership of rt. ok. My guess is that if the issue from the earlier patch about overlapping collect_md tunnels is fixed then that might allow us to simplify things a little further, since for those tunnels we can assume there is a 1:1 mapping between collect_md tunnels and sockets. I dont see how it would be different. Can you elaborate on this ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 07/13] ipv6: drop metadata dst in ip6_route_input
On 08/19/15 at 12:09pm, Jiri Benc wrote: The fix in commit 48fb6b554501 is incomplete, as now ip6_route_input can be called with non-NULL dst if it's a metadata dst and the reference is leaked. Drop the reference. Fixes: 48fb6b554501 (ipv6: fix crash over flow-based vxlan device) Fixes: ee122c79d422 (vxlan: Flow based tunneling) CC: Wei-Chun Chao weich...@plumgrid.com CC: Thomas Graf tg...@suug.ch Signed-off-by: Jiri Benc jb...@redhat.com Acked-by: Thomas Graf tg...@suug.ch -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 09/13] vxlan: provide access function for vxlan socket address family
On 08/19/15 at 12:09pm, Jiri Benc wrote: Signed-off-by: Jiri Benc jb...@redhat.com Acked-by: Thomas Graf tg...@suug.ch -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] bridge: fix netlink max attr size
From: Scott Feldman sfel...@gmail.com .maxtype should match .policy. Probably just been getting lucky here because IFLA_BRPORT_MAX IFLA_BR_MAX. Fixes: 13323516 (bridge: implement rtnl_link_ops-changelink) Signed-off-by: Scott Feldman sfel...@gmail.com --- net/bridge/br_netlink.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 01401ea..d2c4d66 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -849,7 +849,7 @@ struct rtnl_link_ops br_link_ops __read_mostly = { .kind = bridge, .priv_size = sizeof(struct net_bridge), .setup = br_dev_setup, - .maxtype= IFLA_BRPORT_MAX, + .maxtype= IFLA_BR_MAX, .policy = br_policy, .validate = br_validate, .newlink= br_dev_newlink, -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 06/13] route: move lwtunnel state to dst_entry
On 08/19/15 at 12:09pm, Jiri Benc wrote: Currently, the lwtunnel state resides in per-protocol data. This is a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa). The xmit function of the tunnel does not know whether the packet has been routed to it by ipv4 or ipv6, yet it needs the lwtstate data. Moving the lwtstate data to dst_entry makes such inter-protocol tunneling possible. As a bonus, this brings a nice diffstat. Signed-off-by: Jiri Benc jb...@redhat.com Acked-by: Roopa Prabhu ro...@cumulusnetworks.com Acked-by: Thomas Graf tg...@suug.ch -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Small cleanups for smsc and device property
These patches are against net-next. This patch set adds a length check to device_get_mac_addr() before calling is_valid_ether_addr(), it also removes an unisssary dev==null check. The remainder is updates to the comments. Jeremy Linton (2): device property: Add ETH_ALEN check, update comments. smsc911x: Remove dev==NULL check. drivers/base/property.c | 21 + drivers/net/ethernet/smsc/smsc911x.c | 3 --- 2 files changed, 13 insertions(+), 11 deletions(-) -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] device property: Add ETH_ALEN check, update comments.
This patch adds MAC address length check back into the device_get_mac_addr() function before calling is_valid_ether_addr() similar to the way the OF routine does it. Update the comments for the two new functions. Signed-off-by: Jeremy Linton jeremy.lin...@arm.com --- drivers/base/property.c | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/base/property.c b/drivers/base/property.c index 2e8cd14..4c20828 100644 --- a/drivers/base/property.c +++ b/drivers/base/property.c @@ -537,7 +537,7 @@ bool device_dma_is_coherent(struct device *dev) EXPORT_SYMBOL_GPL(device_dma_is_coherent); /** - * device_get_phy_mode - Get phy mode for given device_node + * device_get_phy_mode - Get phy mode for given device * @dev: Pointer to the given device * * The function gets phy interface string from property 'phy-mode' or @@ -570,13 +570,18 @@ static void *device_get_mac_addr(struct device *dev, { int ret = device_property_read_u8_array(dev, name, addr, alen); - if (ret == 0 is_valid_ether_addr(addr)) + if (ret == 0 alen == ETH_ALEN is_valid_ether_addr(addr)) return addr; return NULL; } /** - * Search the device tree for the best MAC address to use. 'mac-address' is + * device_get_mac_address - Get the MAC for a given device + * @dev: Pointer to the device + * @addr: Address of buffer to store the MAC in + * @alen: Length of the buffer pointed to by addr, should be ETH_ALEN + * + * Search the firmware node for the best MAC address to use. 'mac-address' is * checked first, because that is supposed to contain to most recent MAC * address. If that isn't set, then 'local-mac-address' is checked next, * because that is the default address. If that isn't set, then the obsolete @@ -587,11 +592,11 @@ static void *device_get_mac_addr(struct device *dev, * MAC address. * * All-zero MAC addresses are rejected, because those could be properties that - * exist in the device tree, but were not set by U-Boot. For example, the - * DTS could define 'mac-address' and 'local-mac-address', with zero MAC - * addresses. Some older U-Boots only initialized 'local-mac-address'. In - * this case, the real MAC is in 'local-mac-address', and 'mac-address' exists - * but is all zeros. + * exist in the firmware tables, but were not updated by the firmware. For + * example, the DTS could define 'mac-address' and 'local-mac-address', with + * zero MAC addresses. Some older U-Boots only initialized 'local-mac-address'. + * In this case, the real MAC is in 'local-mac-address', and 'mac-address' + * exists but is all zeros. */ void *device_get_mac_address(struct device *dev, char *addr, int alen) { -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] smsc911x: Remove dev==NULL check.
The dev==NULL check in smsc911x_probe_config is useless and isn't providing any additional protection. If a fwnode doesn't exist then an appropriate error should be returned by device_get_phy_mode() covering the original case of a missing of/fwnode. Signed-off-by: Jeremy Linton jeremy.lin...@arm.com --- drivers/net/ethernet/smsc/smsc911x.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c index 34f9768..6eef325 100644 --- a/drivers/net/ethernet/smsc/smsc911x.c +++ b/drivers/net/ethernet/smsc/smsc911x.c @@ -2370,9 +2370,6 @@ static int smsc911x_probe_config(struct smsc911x_platform_config *config, int phy_interface; u32 width = 0; - if (!dev) - return -ENODEV; - phy_interface = device_get_phy_mode(dev); if (phy_interface 0) return phy_interface; -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 05/13] ip_tunnels: use tos and ttl fields also for IPv6
On 08/19/15 at 12:09pm, Jiri Benc wrote: Rename the ipv4_tos and ipv4_ttl fields to just 'tos' and 'ttl', as they'll be used with IPv6 tunnels, too. Signed-off-by: Jiri Benc jb...@redhat.com Acked-by: Thomas Graf tg...@suug.ch -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] sctp: start t5 timer only when peer.rwnd == 0 and in SHUTDOWN_PENDING
On Wed, Aug 19, 2015 at 12:39:06PM +0800, Xin Long wrote: when A send a data to B, A close() to be in SHUTDOWN_PENDING state, but B neither claim his rwnd is 0 nor SACK this data, then A keep retransmiting this data. it should send abord after Max.Retrans times, only when peer.rwnd == 0 and more than Max.Retrans times, it will start t5 timer. Fixes: f8d960524 (sctp: Enforce retransmission limit during shutdown) Signed-off-by: Xin Long lucien@gmail.com --- changelog is confusing, please reword it, specially the last part. net/sctp/sm_statefuns.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index 3ee27b7..7d9380c 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -5412,7 +5412,8 @@ sctp_disposition_t sctp_sf_do_6_3_3_rtx(struct net *net, SCTP_INC_STATS(net, SCTP_MIB_T3_RTX_EXPIREDS); if (asoc-overall_error_count = asoc-max_retrans) { - if (asoc-state == SCTP_STATE_SHUTDOWN_PENDING) { + if (!q-asoc-peer.rwnd + asoc-state == SCTP_STATE_SHUTDOWN_PENDING) { ^ Indentation issue here. 2nd if line should start where I marked. Other than that, looks good to me. /* * We are here likely because the receiver had its rwnd * closed for a while and we have not been able to -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi
On 08/19/2015 07:08 PM, Eric Dumazet wrote: On Wed, 2015-08-19 at 15:07 +, Grumbach, Emmanuel wrote: I'll look at it. I was almost starting to implement that but then I thought with another (good?) reason to use LSO. LSO gives me the guarantee that the packet is directed to one peer, which might not be the case with xmit_more since we have one Qdisc for several clients in case we are in AP mode. Building an A-MSDU for several clients is not possible, at least not for several client in the L2 (different MAC addresses). LSO avoids this problem completely. Then, simply calling skb_gso_segment() from the driver might be enough, and less work for you. This would even support TSO on IPv6 Well... I did take care of IPv6. segs = skb_gso_segment(skb, tp-dev-features ~(NETIF_F_TSO | NETIF_F_TSO6)); Thing is that our HW layers are currently implemented to receive one skb per 802.11 packet. So that if I call skb_gso_segment, I'd have to re-assemble the segs into one A-MSDU which would translate one skb. I guess I could change the HW layer in the driver to be able to get a list of skbs and make a single packet out of it, but that'd be tricky or wasteful. skb_gso_segment will duplicate the wifi header while it is not needed. Only the TCP / IP / SNAP headers need to be duplicated. Moreover, each subframe in the A-MSDU needs it own subframe header (same format as ethhdr) and there is also some padding in there. So that would be even more complicated IMHO. My code doesn't copy any payload. Only the headers. This is why I thought it'd be better than segmenting and then re-assembling. I did call skb_gso_segment if I get lots of payload in the header (more than 2 * mss) in order to simplify the implementation. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi
On Wed, 2015-08-19 at 17:00 +, Grumbach, Emmanuel wrote: On 08/19/2015 07:08 PM, Eric Dumazet wrote: On Wed, 2015-08-19 at 15:07 +, Grumbach, Emmanuel wrote: I'll look at it. I was almost starting to implement that but then I thought with another (good?) reason to use LSO. LSO gives me the guarantee that the packet is directed to one peer, which might not be the case with xmit_more since we have one Qdisc for several clients in case we are in AP mode. Building an A-MSDU for several clients is not possible, at least not for several client in the L2 (different MAC addresses). LSO avoids this problem completely. Then, simply calling skb_gso_segment() from the driver might be enough, and less work for you. This would even support TSO on IPv6 Well... I did take care of IPv6. net/core/tso.c does not yet handle IPv6 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bridge: Enable configuration of ageing interval for bridges and switch devices.
On Wed, Aug 19, 2015 at 2:34 AM, Premkumar Jonnala pjonn...@broadcom.com wrote: Hello Scott, Thank you for the diff and comments. Please see my comments inline. -Original Message- From: Scott Feldman [mailto:sfel...@gmail.com] Sent: Tuesday, August 18, 2015 12:48 PM To: Premkumar Jonnala Cc: netdev@vger.kernel.org Subject: Re: [PATCH] bridge: Enable configuration of ageing interval for bridges and switch devices. On Fri, 14 Aug 2015, Premkumar Jonnala wrote: Bridge devices have ageing interval used to age out MAC addresses from FDB. This ageing interval was not configuratble. Enable netlink based configuration of ageing interval for bridges and switch devices. The ageing interval changes the timer used to purge inactive FDB entries in bridges. The ageing interval config is propagated to switch devices, so that platform or hardware based ageing works according to configuration. Signed-off-by: Premkumar Jonnala pjonn...@broadcom.com Hi Premkumar, I agree with Roopa that we should use existing IFLA_BR_AGEING_TIME. What is the motivation for using 'ip link' command to configure bridge attributes? IMHO, bridge command is better suited for that. Can you extend bridge command to allow setting/getting these bridge attrs? Looks like you construct a RTM_NEWLINK IFLA_INFO_DATA msg. No changes needed to the kernel. bridge link set dev br0 ageing_time 1000 --or-- ip link set dev br0 type bridge ageing_time 1000 diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 0f2408f..01401ea 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -759,9 +759,9 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[], } if (data[IFLA_BR_AGEING_TIME]) { - u32 ageing_time = nla_get_u32(data[IFLA_BR_AGEING_TIME]); Should we do some range checking here to ensure that the value is within a certain range. IEEE 802.1d recommends that the ageing time be between 10 sec and 1 million seconds. Sure, but make that a separate patch. +int br_set_ageing_time(struct net_bridge *br, u32 ageing_time) +{ + struct switchdev_attr attr = { + .id = SWITCHDEV_ATTR_BRIDGE, + .flags = SWITCHDEV_F_SKIP_EOPNOTSUPP, + .u.bridge.attr = IFLA_BR_AGEING_TIME, + .u.bridge.val = ageing_time, + }; + int err; + + err = switchdev_port_attr_set(br-dev, attr); + if (err) + return err; + + br-ageing_time = clock_t_to_jiffies(ageing_time); Should we restart the timer here the new time takes effect? I don't know...I just copied what the original code did. If it does need to be restarted, break that out as a separate patch. -scott -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi
On Wed, 2015-08-19 at 17:56 +, Grumbach, Emmanuel wrote: So I feel that making net/core/tso.c more complicated just because of our craziness seems an overkill to me. I'll try a bit harder to see how I can use net/core/tso.c, but I have to say I am pessimistic. net/core/tso.c is WIP, feel free to expand it to make it more generic and meet your needs. The point is : we want a core infrastructure, not something that each individual driver implements in ~500 lines of code :( -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
netlink_route kernel data dump size increased
All, We are running application on Linux Kernel 3.10 to collect network interface information using NETLINK_ROUTE protocol. earlier (kernel 2.6.32) we were having 8K buffer allocated to collect all data but with new kernel (3.10) we are seeing read socket error, as buffer size is not sufficient for all network dump data. We want to understand that if the userspace buffer limit increased to 16K or we need some other mechanism to collect the data in 8K chuck. or Is there any other way application can use NETLINK_ROUTE protocol, so that it will not break the application if data size gets increased in future. I did some some browsing and found some link but they were not very conclusive. http://www.spinics.net/lists/netdev/msg162185.html Appreciate for any kind of help or pointers here Thanks Tej -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 04/13] ip_tunnels: add IPv6 addresses to ip_tunnel_key
On 08/19/15 at 12:09pm, Jiri Benc wrote: Add the IPv6 addresses as an union with IPv4 ones. When using IPv4, the newly introduced padding after the IPv4 addresses needs to be zeroed out. Signed-off-by: Jiri Benc jb...@redhat.com --- v1-v2: Fix incorrect IP_TUNNEL_KEY_IPV4_PAD_LEN calculation, thanks to Alexei. Acked-by: Thomas Graf tg...@suug.ch -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi
On 08/19/2015 09:02 PM, Eric Dumazet wrote: On Wed, 2015-08-19 at 17:56 +, Grumbach, Emmanuel wrote: So I feel that making net/core/tso.c more complicated just because of our craziness seems an overkill to me. I'll try a bit harder to see how I can use net/core/tso.c, but I have to say I am pessimistic. net/core/tso.c is WIP, feel free to expand it to make it more generic and meet your needs. Yeah - trying to see what can be done. The point is : we want a core infrastructure, not something that each individual driver implements in ~500 lines of code :( I totally understand that :) I just claim to be unique in a way that each individual driver is ... only me :) I guess that if we would build the DMA descriptors directly from the skb_gso (the skb coming from the stack), that's be easier. Our HW abstraction layer wants an skb and I need to pass several skbs (because skb-len is very likely not to fit in one single 802.11 packet even if it is an A-MSDU). So, trying to use net/core/tso.c basically means, to open the arch of our driver... Not impossible, but quite a bit of work. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 1/3] iwlwifi: mvm: add real TSO implementation
On 08/19/2015 05:18 PM, Eric Dumazet wrote: On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote: The segmentation is done completely in software. The driver creates several MPDUs out of a single large send. Each MPDU is a newly allocated SKB. A page is allocated to create the headers that need to be duplicated (SNAP / IP / TCP). The WiFi header is in the header of the newly created SKBs. type=feature Change-Id: I238ffa79cacc5bbdacdfbf3e9673c8d4f02b462a Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com --- drivers/net/wireless/iwlwifi/mvm/tx.c | 513 +++--- 1 file changed, 481 insertions(+), 32 deletions(-) Ouch dynamic allocations while doing xmit are certainly not needed. Your driver should pre-allocated space for headers. This is right as long as you don't need *several* headers in one single skb. In the case of A-MSDU, I need to have several TCP / IP / SNAP headers in the same skb. At least that's how my HW layer in the driver is built. See the other thread. Drivers willing to implement tso have to use net/core/tso.c provided helpers. $ git grep -n tso_build_hdr drivers/net/ethernet/cavium/thunder/nicvf_queues.c:1030: tso_build_hdr(skb, hdr, tso, data_left, total_len == 0); drivers/net/ethernet/freescale/fec_main.c:729: tso_build_hdr(skb, hdr, tso, data_left, total_len == 0); drivers/net/ethernet/marvell/mv643xx_eth.c:842: tso_build_hdr(skb, hdr, tso, data_left, total_len == 0); drivers/net/ethernet/marvell/mvneta.c:1650: tso_build_hdr(skb, hdr, tso, data_left, total_len == 0); include/net/tso.h:15:void tso_build_hdr(struct sk_buff *skb, char *hdr, struct tso_t *tso, net/core/tso.c:14:void tso_build_hdr(struct sk_buff *skb, char *hdr, struct tso_t *tso, net/core/tso.c:37:EXPORT_SYMBOL(tso_build_hdr); This looks promising indeed. I'll take a close look. Thanks a bunch. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] vrf: plug skb leaks
Hi Nikolay: On 8/18/15 8:12 PM, Nikolay Aleksandrov wrote: diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index ed208317cbb5..4aa06450fafa 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -97,6 +97,12 @@ static bool is_ip_rx_frame(struct sk_buff *skb) return false; } +static void vrf_tx_error(struct net_device *vrf_dev, struct sk_buff *skb) +{ + vrf_dev-stats.tx_errors++; + kfree_skb(skb); +} + /* note: already called with rcu_read_lock */ static rx_handler_result_t vrf_handle_frame(struct sk_buff **pskb) { @@ -149,7 +155,8 @@ static struct rtnl_link_stats64 *vrf_get_stats64(struct net_device *dev, static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb, struct net_device *dev) { - return 0; + vrf_tx_error(dev, skb); + return NET_XMIT_DROP; } static int vrf_send_v4_prep(struct sk_buff *skb, struct flowi4 *fl4, @@ -206,8 +213,7 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb, out: return ret; err: - vrf_dev-stats.tx_errors++; - kfree_skb(skb); + vrf_tx_error(vrf_dev, skb); goto out; } @@ -219,6 +225,7 @@ static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev) case htons(ETH_P_IPV6): return vrf_process_v6_outbound(skb, dev); default: + vrf_tx_error(dev, skb); return NET_XMIT_DROP; } } Would be simpler to do the vrf_tx_error at the end of is_ip_tx_frame() if ret == NET_XMIT_DROP. David -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/4] vrf: cleanups part 2
On 8/18/15 8:27 PM, Nikolay Aleksandrov wrote: From: Nikolay Aleksandrov niko...@cumulusnetworks.com Hi, This is the next part of vrf cleanups, patch 1 drops the SLAB_PANIC when creating kmem cache since it's handled, patch 02 removes a slave duplicate check which is already done by the lower/upper code, patch 3 moves the ndo_add_slave code around a bit so we can drop an error label and patch 4 drops the master device checks which are unnecessary because the ops are taken from the master device itself so it can't be different. Cheers, Nik Nikolay Aleksandrov (4): vrf: don't panic on cache create failure vrf: remove unnecessary duplicate check vrf: move vrf_insert_slave so we can drop a goto label vrf: ndo_add|del_slave drop unnecessary checks drivers/net/vrf.c | 24 1 file changed, 4 insertions(+), 20 deletions(-) Looks good to me. Thanks, Nikolay. Acked-by: David Ahern d...@cumulusnetworks.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/3] add TSO / A-MSDU TX for iwlwifi
On 08/19/2015 08:20 PM, Eric Dumazet wrote: On Wed, 2015-08-19 at 17:00 +, Grumbach, Emmanuel wrote: On 08/19/2015 07:08 PM, Eric Dumazet wrote: On Wed, 2015-08-19 at 15:07 +, Grumbach, Emmanuel wrote: I'll look at it. I was almost starting to implement that but then I thought with another (good?) reason to use LSO. LSO gives me the guarantee that the packet is directed to one peer, which might not be the case with xmit_more since we have one Qdisc for several clients in case we are in AP mode. Building an A-MSDU for several clients is not possible, at least not for several client in the L2 (different MAC addresses). LSO avoids this problem completely. Then, simply calling skb_gso_segment() from the driver might be enough, and less work for you. This would even support TSO on IPv6 Well... I did take care of IPv6. net/core/tso.c does not yet handle IPv6 Yeah - I can see that now. I can teach him - that's not a big deal. The bigger problem is that net/core/tso.c doesn't do what I really need: it does only a small portion. Since I need to add one frag to several skbs, I need to refcount the frags' page. net/core/tso.c hides the page from me. I can try to use tso_build_hdr but it will copy the entire header where I need only SNAP / IP / TCP (and not 802.11). I am getting the feeling that net/core/tso.c is close to what I need, but not close enough to be usable without making changes that would make the implementation too complicated and changing net/core/tso.c in a way that would be much less readable for other users. I know that our device is quite unique in the sense that most other vendors do all the header twiddling in hardware. We unfortunately don't. The A-MSDU's format is also somewhat unusual: 802.11 HDR ETHSNAPIPTCPPAYLOADPAD (variable length) ETHSNAPIPTCPPAYLOADPAD (variable length) ETHSNAPIPTCPPAYLOADPAD (variable length) etc... So I feel that making net/core/tso.c more complicated just because of our craziness seems an overkill to me. I'll try a bit harder to see how I can use net/core/tso.c, but I have to say I am pessimistic. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] bridge: Enable configuration of ageing interval for bridges and switch devices.
-Original Message- From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On Behalf Of Scott Feldman Sent: Wednesday, August 19, 2015 12:54 PM To: Premkumar Jonnala Cc: netdev@vger.kernel.org Subject: Re: [PATCH] bridge: Enable configuration of ageing interval for bridges and switch devices. On Wed, Aug 19, 2015 at 2:34 AM, Premkumar Jonnala pjonn...@broadcom.com wrote: Hello Scott, Thank you for the diff and comments. Please see my comments inline. -Original Message- From: Scott Feldman [mailto:sfel...@gmail.com] Sent: Tuesday, August 18, 2015 12:48 PM To: Premkumar Jonnala Cc: netdev@vger.kernel.org Subject: Re: [PATCH] bridge: Enable configuration of ageing interval for bridges and switch devices. On Fri, 14 Aug 2015, Premkumar Jonnala wrote: Bridge devices have ageing interval used to age out MAC addresses from FDB. This ageing interval was not configuratble. Enable netlink based configuration of ageing interval for bridges and switch devices. The ageing interval changes the timer used to purge inactive FDB entries in bridges. The ageing interval config is propagated to switch devices, so that platform or hardware based ageing works according to configuration. Signed-off-by: Premkumar Jonnala pjonn...@broadcom.com Hi Premkumar, I agree with Roopa that we should use existing IFLA_BR_AGEING_TIME. What is the motivation for using 'ip link' command to configure bridge attributes? IMHO, bridge command is better suited for that. Can you extend bridge command to allow setting/getting these bridge attrs? Looks like you construct a RTM_NEWLINK IFLA_INFO_DATA msg. No changes needed to the kernel. bridge link set dev br0 ageing_time 1000 --or-- ip link set dev br0 type bridge ageing_time 1000 Being able to set these attributes via both bridge and ip would be great. +int br_set_ageing_time(struct net_bridge *br, u32 ageing_time) { + struct switchdev_attr attr = { + .id = SWITCHDEV_ATTR_BRIDGE, + .flags = SWITCHDEV_F_SKIP_EOPNOTSUPP, + .u.bridge.attr = IFLA_BR_AGEING_TIME, + .u.bridge.val = ageing_time, + }; + int err; + + err = switchdev_port_attr_set(br-dev, attr); + if (err) + return err; + + br-ageing_time = clock_t_to_jiffies(ageing_time); Should we restart the timer here the new time takes effect? I don't know...I just copied what the original code did. If it does need to be restarted, break that out as a separate patch. In my opinion, yes, the timer should be restarted. If the timer had been set to 1 million seconds and is being changed to 1 minute, you wouldn't want to wait for the 1-million-second timer to expire before resetting it to the newly-configured 1-minute timer value. Dan.
Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.
On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar pshe...@nicira.com wrote: diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index e58468b..18ff83b 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -181,7 +181,7 @@ config VXLAN config GENEVE tristate Generic Network Virtualization Encapsulation netdev - depends on INET GENEVE_CORE + depends on INET select NET_IP_TUNNEL I think my comments on v1 one this patch were overlooked (about the UDP_TUNNEL dependency and the name). diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 5b43382..eb298ff 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c +static void geneve_build_header(struct genevehdr *geneveh, + __be16 tun_flags, u8 vni[3], + u8 options_len, u8 *options) [...] +static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb, + __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt, + bool csum) It seems like we could just merge these functions. I'm not sure that the role is all that different. In geneve_build_skb(), the error labels are somewhat confusing (for example, free_rt doesn't free the rt). Also, is it right that we don't free the rt if udp_tunnel_handle_offloads() fails()? It might be cleaner if the caller retains ownership of rt. My guess is that if the issue from the earlier patch about overlapping collect_md tunnels is fixed then that might allow us to simplify things a little further, since for those tunnels we can assume there is a 1:1 mapping between collect_md tunnels and sockets. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/15] netfilter: xt_TEE: get rid of WITH_CONNTRACK definition
Use IS_ENABLED(CONFIG_NF_CONNTRACK) instead. Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- net/netfilter/xt_TEE.c |8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c index c5d6556..0ed9fb6 100644 --- a/net/netfilter/xt_TEE.c +++ b/net/netfilter/xt_TEE.c @@ -24,10 +24,8 @@ #include net/route.h #include linux/netfilter/x_tables.h #include linux/netfilter/xt_TEE.h - #if IS_ENABLED(CONFIG_NF_CONNTRACK) -# define WITH_CONNTRACK 1 -# include net/netfilter/nf_conntrack.h +#include net/netfilter/nf_conntrack.h #endif struct xt_tee_priv { @@ -99,7 +97,7 @@ tee_tg4(struct sk_buff *skb, const struct xt_action_param *par) if (skb == NULL) return XT_CONTINUE; -#ifdef WITH_CONNTRACK +#if IS_ENABLED(CONFIG_NF_CONNTRACK) /* Avoid counting cloned packets towards the original connection. */ nf_conntrack_put(skb-nfct); skb-nfct = nf_ct_untracked_get()-ct_general; @@ -175,7 +173,7 @@ tee_tg6(struct sk_buff *skb, const struct xt_action_param *par) if (skb == NULL) return XT_CONTINUE; -#ifdef WITH_CONNTRACK +#if IS_ENABLED(CONFIG_NF_CONNTRACK) nf_conntrack_put(skb-nfct); skb-nfct = nf_ct_untracked_get()-ct_general; skb-nfctinfo = IP_CT_NEW; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/15] netfilter: nft_limit: add per-byte limiting
This patch adds a new NFTA_LIMIT_TYPE netlink attribute to indicate the type of limiting. Contrary to per-packet limiting, the cost is calculated from the packet path since this depends on the packet length. The burst attribute indicates the number of bytes in which the rate can be exceeded. Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/uapi/linux/netfilter/nf_tables.h |7 net/netfilter/nft_limit.c| 63 -- 2 files changed, 66 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index cafd789..d8c8a7c 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -756,18 +756,25 @@ enum nft_ct_attributes { }; #define NFTA_CT_MAX(__NFTA_CT_MAX - 1) +enum nft_limit_type { + NFT_LIMIT_PKTS, + NFT_LIMIT_PKT_BYTES +}; + /** * enum nft_limit_attributes - nf_tables limit expression netlink attributes * * @NFTA_LIMIT_RATE: refill rate (NLA_U64) * @NFTA_LIMIT_UNIT: refill unit (NLA_U64) * @NFTA_LIMIT_BURST: burst (NLA_U32) + * @NFTA_LIMIT_TYPE: type of limit (NLA_U32: enum nft_limit_type) */ enum nft_limit_attributes { NFTA_LIMIT_UNSPEC, NFTA_LIMIT_RATE, NFTA_LIMIT_UNIT, NFTA_LIMIT_BURST, + NFTA_LIMIT_TYPE, __NFTA_LIMIT_MAX }; #define NFTA_LIMIT_MAX (__NFTA_LIMIT_MAX - 1) diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c index b418698..5d67938 100644 --- a/net/netfilter/nft_limit.c +++ b/net/netfilter/nft_limit.c @@ -83,14 +83,16 @@ static int nft_limit_init(struct nft_limit *limit, return 0; } -static int nft_limit_dump(struct sk_buff *skb, const struct nft_limit *limit) +static int nft_limit_dump(struct sk_buff *skb, const struct nft_limit *limit, + enum nft_limit_type type) { u64 secs = div_u64(limit-nsecs, NSEC_PER_SEC); u64 rate = limit-rate - limit-burst; if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(rate)) || nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(secs)) || - nla_put_be32(skb, NFTA_LIMIT_BURST, htonl(limit-burst))) + nla_put_be32(skb, NFTA_LIMIT_BURST, htonl(limit-burst)) || + nla_put_be32(skb, NFTA_LIMIT_TYPE, htonl(type))) goto nla_put_failure; return 0; @@ -117,6 +119,7 @@ static const struct nla_policy nft_limit_policy[NFTA_LIMIT_MAX + 1] = { [NFTA_LIMIT_RATE] = { .type = NLA_U64 }, [NFTA_LIMIT_UNIT] = { .type = NLA_U64 }, [NFTA_LIMIT_BURST] = { .type = NLA_U32 }, + [NFTA_LIMIT_TYPE] = { .type = NLA_U32 }, }; static int nft_limit_pkts_init(const struct nft_ctx *ctx, @@ -138,7 +141,7 @@ static int nft_limit_pkts_dump(struct sk_buff *skb, const struct nft_expr *expr) { const struct nft_limit_pkts *priv = nft_expr_priv(expr); - return nft_limit_dump(skb, priv-limit); + return nft_limit_dump(skb, priv-limit, NFT_LIMIT_PKTS); } static struct nft_expr_type nft_limit_type; @@ -150,9 +153,61 @@ static const struct nft_expr_ops nft_limit_pkts_ops = { .dump = nft_limit_pkts_dump, }; +static void nft_limit_pkt_bytes_eval(const struct nft_expr *expr, +struct nft_regs *regs, +const struct nft_pktinfo *pkt) +{ + struct nft_limit *priv = nft_expr_priv(expr); + u64 cost = div_u64(priv-nsecs * pkt-skb-len, priv-rate); + + if (nft_limit_eval(priv, cost)) + regs-verdict.code = NFT_BREAK; +} + +static int nft_limit_pkt_bytes_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_limit *priv = nft_expr_priv(expr); + + return nft_limit_init(priv, tb); +} + +static int nft_limit_pkt_bytes_dump(struct sk_buff *skb, + const struct nft_expr *expr) +{ + const struct nft_limit *priv = nft_expr_priv(expr); + + return nft_limit_dump(skb, priv, NFT_LIMIT_PKT_BYTES); +} + +static const struct nft_expr_ops nft_limit_pkt_bytes_ops = { + .type = nft_limit_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_limit)), + .eval = nft_limit_pkt_bytes_eval, + .init = nft_limit_pkt_bytes_init, + .dump = nft_limit_pkt_bytes_dump, +}; + +static const struct nft_expr_ops * +nft_limit_select_ops(const struct nft_ctx *ctx, +const struct nlattr * const tb[]) +{ + if (tb[NFTA_LIMIT_TYPE] == NULL) + return nft_limit_pkts_ops; + + switch (ntohl(nla_get_be32(tb[NFTA_LIMIT_TYPE]))) { + case NFT_LIMIT_PKTS: + return nft_limit_pkts_ops; + case NFT_LIMIT_PKT_BYTES: +
[PATCH 14/15] netfilter: nf_conntrack: add efficient mark to zone mapping
From: Daniel Borkmann dan...@iogearbox.net This work adds the possibility of deriving the zone id from the skb-mark field in a scalable manner. This allows for having only a single template serving hundreds/thousands of different zones, for example, instead of the need to have one match for each zone as an extra CT jump target. Note that we'd need to have this information attached to the template as at the time when we're trying to lookup a possible ct object, we already need to know zone information for a possible match when going into __nf_conntrack_find_get(). This work provides a minimal implementation for a possible mapping. In order to not add/expose an extra ct-status bit, the zone structure has been extended to carry a flag for deriving the mark. Signed-off-by: Daniel Borkmann dan...@iogearbox.net Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/net/netfilter/nf_conntrack_zones.h | 45 +++-- include/uapi/linux/netfilter/xt_CT.h |4 +- net/ipv4/netfilter/nf_conntrack_proto_icmp.c |3 +- net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |4 +- net/netfilter/nf_conntrack_core.c | 50 net/netfilter/nf_conntrack_netlink.c |5 +-- net/netfilter/xt_CT.c |5 ++- 7 files changed, 72 insertions(+), 44 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_zones.h b/include/net/netfilter/nf_conntrack_zones.h index 3942ddf..5316c7b 100644 --- a/include/net/netfilter/nf_conntrack_zones.h +++ b/include/net/netfilter/nf_conntrack_zones.h @@ -10,9 +10,12 @@ #define NF_CT_DEFAULT_ZONE_DIR (NF_CT_ZONE_DIR_ORIG | NF_CT_ZONE_DIR_REPL) +#define NF_CT_FLAG_MARK1 + struct nf_conntrack_zone { u16 id; - u16 dir; + u8 flags; + u8 dir; }; extern const struct nf_conntrack_zone nf_ct_zone_dflt; @@ -32,9 +35,45 @@ nf_ct_zone(const struct nf_conn *ct) } static inline const struct nf_conntrack_zone * -nf_ct_zone_tmpl(const struct nf_conn *tmpl) +nf_ct_zone_init(struct nf_conntrack_zone *zone, u16 id, u8 dir, u8 flags) +{ + zone-id = id; + zone-flags = flags; + zone-dir = dir; + + return zone; +} + +static inline const struct nf_conntrack_zone * +nf_ct_zone_tmpl(const struct nf_conn *tmpl, const struct sk_buff *skb, + struct nf_conntrack_zone *tmp) +{ + const struct nf_conntrack_zone *zone; + + if (!tmpl) + return nf_ct_zone_dflt; + + zone = nf_ct_zone(tmpl); + if (zone-flags NF_CT_FLAG_MARK) + zone = nf_ct_zone_init(tmp, skb-mark, zone-dir, 0); + + return zone; +} + +static inline int nf_ct_zone_add(struct nf_conn *ct, gfp_t flags, +const struct nf_conntrack_zone *info) { - return tmpl ? nf_ct_zone(tmpl) : nf_ct_zone_dflt; +#ifdef CONFIG_NF_CONNTRACK_ZONES + struct nf_conntrack_zone *nf_ct_zone; + + nf_ct_zone = nf_ct_ext_add(ct, NF_CT_EXT_ZONE, flags); + if (!nf_ct_zone) + return -ENOMEM; + + nf_ct_zone_init(nf_ct_zone, info-id, info-dir, + info-flags); +#endif + return 0; } static inline bool nf_ct_zone_matches_dir(const struct nf_conntrack_zone *zone, diff --git a/include/uapi/linux/netfilter/xt_CT.h b/include/uapi/linux/netfilter/xt_CT.h index 452005f..9e52041 100644 --- a/include/uapi/linux/netfilter/xt_CT.h +++ b/include/uapi/linux/netfilter/xt_CT.h @@ -8,9 +8,11 @@ enum { XT_CT_NOTRACK_ALIAS = 1 1, XT_CT_ZONE_DIR_ORIG = 1 2, XT_CT_ZONE_DIR_REPL = 1 3, + XT_CT_ZONE_MARK = 1 4, XT_CT_MASK = XT_CT_NOTRACK | XT_CT_NOTRACK_ALIAS | - XT_CT_ZONE_DIR_ORIG | XT_CT_ZONE_DIR_REPL, + XT_CT_ZONE_DIR_ORIG | XT_CT_ZONE_DIR_REPL | + XT_CT_ZONE_MARK, }; struct xt_ct_target_info { diff --git a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c index 8a2f41c..cdde3ec 100644 --- a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c +++ b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c @@ -135,9 +135,10 @@ icmp_error_message(struct net *net, struct nf_conn *tmpl, struct sk_buff *skb, const struct nf_conntrack_l4proto *innerproto; const struct nf_conntrack_tuple_hash *h; const struct nf_conntrack_zone *zone; + struct nf_conntrack_zone tmp; NF_CT_ASSERT(skb-nfct == NULL); - zone = nf_ct_zone_tmpl(tmpl); + zone = nf_ct_zone_tmpl(tmpl, skb, tmp); /* Are they talking about one of our connections? */ if (!nf_ct_get_tuplepr(skb, diff --git a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c b/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c index 2029141..0e6fae1 100644 --- a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c +++
[PATCH 03/15] netfilter: factor out packet duplication for IPv4/IPv6
Extracted from the xtables TEE target. This creates two new modules for IPv4 and IPv6 that are shared between the TEE target and the new nf_tables dup expressions. Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/net/netfilter/ipv4/nf_dup_ipv4.h |7 ++ include/net/netfilter/ipv6/nf_dup_ipv6.h |7 ++ net/ipv4/netfilter/Kconfig |6 ++ net/ipv4/netfilter/Makefile |2 + net/ipv4/netfilter/nf_dup_ipv4.c | 120 +++ net/ipv6/netfilter/Kconfig |6 ++ net/ipv6/netfilter/Makefile |2 + net/ipv6/netfilter/nf_dup_ipv6.c | 96 ++ net/netfilter/Kconfig|2 + net/netfilter/xt_TEE.c | 158 ++ 10 files changed, 254 insertions(+), 152 deletions(-) create mode 100644 include/net/netfilter/ipv4/nf_dup_ipv4.h create mode 100644 include/net/netfilter/ipv6/nf_dup_ipv6.h create mode 100644 net/ipv4/netfilter/nf_dup_ipv4.c create mode 100644 net/ipv6/netfilter/nf_dup_ipv6.c diff --git a/include/net/netfilter/ipv4/nf_dup_ipv4.h b/include/net/netfilter/ipv4/nf_dup_ipv4.h new file mode 100644 index 000..42008f1 --- /dev/null +++ b/include/net/netfilter/ipv4/nf_dup_ipv4.h @@ -0,0 +1,7 @@ +#ifndef _NF_DUP_IPV4_H_ +#define _NF_DUP_IPV4_H_ + +void nf_dup_ipv4(struct sk_buff *skb, unsigned int hooknum, +const struct in_addr *gw, int oif); + +#endif /* _NF_DUP_IPV4_H_ */ diff --git a/include/net/netfilter/ipv6/nf_dup_ipv6.h b/include/net/netfilter/ipv6/nf_dup_ipv6.h new file mode 100644 index 000..ed6bd66 --- /dev/null +++ b/include/net/netfilter/ipv6/nf_dup_ipv6.h @@ -0,0 +1,7 @@ +#ifndef _NF_DUP_IPV6_H_ +#define _NF_DUP_IPV6_H_ + +void nf_dup_ipv6(struct sk_buff *skb, unsigned int hooknum, +const struct in6_addr *gw, int oif); + +#endif /* _NF_DUP_IPV6_H_ */ diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig index 2199a5d..0142ea2 100644 --- a/net/ipv4/netfilter/Kconfig +++ b/net/ipv4/netfilter/Kconfig @@ -67,6 +67,12 @@ config NF_TABLES_ARP endif # NF_TABLES +config NF_DUP_IPV4 + tristate Netfilter IPv4 packet duplication to alternate destination + help + This option enables the nf_dup_ipv4 core, which duplicates an IPv4 + packet to be rerouted to another destination. + config NF_LOG_ARP tristate ARP packet logging default m if NETFILTER_ADVANCED=n diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile index 7fe6c70..9136ffc 100644 --- a/net/ipv4/netfilter/Makefile +++ b/net/ipv4/netfilter/Makefile @@ -70,3 +70,5 @@ obj-$(CONFIG_IP_NF_ARP_MANGLE) += arpt_mangle.o # just filtering instance of ARP tables for now obj-$(CONFIG_IP_NF_ARPFILTER) += arptable_filter.o + +obj-$(CONFIG_NF_DUP_IPV4) += nf_dup_ipv4.o diff --git a/net/ipv4/netfilter/nf_dup_ipv4.c b/net/ipv4/netfilter/nf_dup_ipv4.c new file mode 100644 index 000..eff85ab --- /dev/null +++ b/net/ipv4/netfilter/nf_dup_ipv4.c @@ -0,0 +1,120 @@ +/* + * (C) 2007 by Sebastian Cla??en sebastian.clas...@freenet.ag + * (C) 2007-2010 by Jan Engelhardt jeng...@medozas.de + * + * Extracted from xt_TEE.c + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 or later, as + * published by the Free Software Foundation. + */ +#include linux/ip.h +#include linux/module.h +#include linux/percpu.h +#include linux/route.h +#include linux/skbuff.h +#include net/checksum.h +#include net/icmp.h +#include net/ip.h +#include net/route.h +#include net/netfilter/ipv4/nf_dup_ipv4.h +#if IS_ENABLED(CONFIG_NF_CONNTRACK) +#include net/netfilter/nf_conntrack.h +#endif + +static struct net *pick_net(struct sk_buff *skb) +{ +#ifdef CONFIG_NET_NS + const struct dst_entry *dst; + + if (skb-dev != NULL) + return dev_net(skb-dev); + dst = skb_dst(skb); + if (dst != NULL dst-dev != NULL) + return dev_net(dst-dev); +#endif + return init_net; +} + +static bool nf_dup_ipv4_route(struct sk_buff *skb, const struct in_addr *gw, + int oif) +{ + const struct iphdr *iph = ip_hdr(skb); + struct net *net = pick_net(skb); + struct rtable *rt; + struct flowi4 fl4; + + memset(fl4, 0, sizeof(fl4)); + if (oif != -1) + fl4.flowi4_oif = oif; + + fl4.daddr = gw-s_addr; + fl4.flowi4_tos = RT_TOS(iph-tos); + fl4.flowi4_scope = RT_SCOPE_UNIVERSE; + fl4.flowi4_flags = FLOWI_FLAG_KNOWN_NH; + rt = ip_route_output_key(net, fl4); + if (IS_ERR(rt)) + return false; + + skb_dst_drop(skb); + skb_dst_set(skb, rt-dst); + skb-dev = rt-dst.dev; + skb-protocol = htons(ETH_P_IP); + + return true; +} + +void nf_dup_ipv4(struct sk_buff *skb, unsigned int hooknum, +const struct in_addr
[PATCH 01/15] netfilter: nft_counter: convert it to use per-cpu counters
This patch converts the existing seqlock to per-cpu counters. Suggested-by: Eric Dumazet eric.duma...@gmail.com Suggested-by: Patrick McHardy ka...@trash.net Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- net/netfilter/nft_counter.c | 97 ++- 1 file changed, 69 insertions(+), 28 deletions(-) diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c index 1759123..1067fb4 100644 --- a/net/netfilter/nft_counter.c +++ b/net/netfilter/nft_counter.c @@ -18,39 +18,59 @@ #include net/netfilter/nf_tables.h struct nft_counter { - seqlock_t lock; u64 bytes; u64 packets; }; +struct nft_counter_percpu { + struct nft_counter counter; + struct u64_stats_sync syncp; +}; + +struct nft_counter_percpu_priv { + struct nft_counter_percpu __percpu *counter; +}; + static void nft_counter_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) { - struct nft_counter *priv = nft_expr_priv(expr); - - write_seqlock_bh(priv-lock); - priv-bytes += pkt-skb-len; - priv-packets++; - write_sequnlock_bh(priv-lock); + struct nft_counter_percpu_priv *priv = nft_expr_priv(expr); + struct nft_counter_percpu *this_cpu; + + local_bh_disable(); + this_cpu = this_cpu_ptr(priv-counter); + u64_stats_update_begin(this_cpu-syncp); + this_cpu-counter.bytes += pkt-skb-len; + this_cpu-counter.packets++; + u64_stats_update_end(this_cpu-syncp); + local_bh_enable(); } static int nft_counter_dump(struct sk_buff *skb, const struct nft_expr *expr) { - struct nft_counter *priv = nft_expr_priv(expr); + struct nft_counter_percpu_priv *priv = nft_expr_priv(expr); + struct nft_counter_percpu *cpu_stats; + struct nft_counter total; + u64 bytes, packets; unsigned int seq; - u64 bytes; - u64 packets; - - do { - seq = read_seqbegin(priv-lock); - bytes = priv-bytes; - packets = priv-packets; - } while (read_seqretry(priv-lock, seq)); - - if (nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(bytes))) - goto nla_put_failure; - if (nla_put_be64(skb, NFTA_COUNTER_PACKETS, cpu_to_be64(packets))) + int cpu; + + memset(total, 0, sizeof(total)); + for_each_possible_cpu(cpu) { + cpu_stats = per_cpu_ptr(priv-counter, cpu); + do { + seq = u64_stats_fetch_begin_irq(cpu_stats-syncp); + bytes = cpu_stats-counter.bytes; + packets = cpu_stats-counter.packets; + } while (u64_stats_fetch_retry_irq(cpu_stats-syncp, seq)); + + total.packets += packets; + total.bytes += bytes; + } + + if (nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(total.bytes)) || + nla_put_be64(skb, NFTA_COUNTER_PACKETS, cpu_to_be64(total.packets))) goto nla_put_failure; return 0; @@ -67,23 +87,44 @@ static int nft_counter_init(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nlattr * const tb[]) { - struct nft_counter *priv = nft_expr_priv(expr); + struct nft_counter_percpu_priv *priv = nft_expr_priv(expr); + struct nft_counter_percpu __percpu *cpu_stats; + struct nft_counter_percpu *this_cpu; + + cpu_stats = netdev_alloc_pcpu_stats(struct nft_counter_percpu); + if (cpu_stats == NULL) + return ENOMEM; + + preempt_disable(); + this_cpu = this_cpu_ptr(cpu_stats); + if (tb[NFTA_COUNTER_PACKETS]) { + this_cpu-counter.packets = + be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_PACKETS])); + } + if (tb[NFTA_COUNTER_BYTES]) { + this_cpu-counter.bytes = + be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_BYTES])); + } + preempt_enable(); + priv-counter = cpu_stats; + return 0; +} - if (tb[NFTA_COUNTER_PACKETS]) - priv-packets = be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_PACKETS])); - if (tb[NFTA_COUNTER_BYTES]) - priv-bytes = be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_BYTES])); +static void nft_counter_destroy(const struct nft_ctx *ctx, + const struct nft_expr *expr) +{ + struct nft_counter_percpu_priv *priv = nft_expr_priv(expr); - seqlock_init(priv-lock); - return 0; + free_percpu(priv-counter); } static struct nft_expr_type nft_counter_type; static const struct nft_expr_ops nft_counter_ops = { .type = nft_counter_type, - .size = NFT_EXPR_SIZE(sizeof(struct nft_counter)), + .size
Re: [PATCH net-next v2 9/9] geneve: Implement rtnl changelink
On Wed, Aug 19, 2015 at 12:40 PM, Jesse Gross je...@nicira.com wrote: On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar pshe...@nicira.com wrote: diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index e47cdd9..0d7fbef 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c -static int geneve_configure(struct net *net, struct net_device *dev, - __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos, - __u16 dst_port, bool metadata) +static int __geneve_configure(struct net *net, struct net_device *dev, + __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos, + __u16 dst_port, bool metadata) { [...] geneve-net = net; geneve-dev = dev; I guess this stuff should really be in geneve_configure() - it seems a bit odd to change it for a running device (even if it shouldn't change). ok. geneve-remote.sin_addr.s_addr = rem_addr; if (IN_MULTICAST(ntohl(geneve-remote.sin_addr.s_addr))) return -EINVAL; + u32_to_vni(vni, geneve-vni); list_for_each_entry(t, gn-geneve_list, next) { if (!memcmp(geneve-vni, t-vni, sizeof(t-vni)) rem_addr == t-remote.sin_addr.s_addr I'm not sure that these types of operations are safe if the device is already running. We first overwrite the remote value and then we do error checking but that means that if there is an error, then the device will be left in a broken state. Don't we also need to update the hash table if some of these parameters change? ok, I will stop device before making changes. that way we can add it to hash table. +static int geneve_changelink(struct net_device *dev, +struct nlattr *tb[], struct nlattr *data[]) +{ [...] - if (data[IFLA_GENEVE_PORT]) - dst_port = nla_get_u16(data[IFLA_GENEVE_PORT]); + if (geneve-sock (dst_port != ntohs(geneve-dst_port) || +metadata != geneve-collect_md)) { It seems like in an ideal world, we wouldn't need to recreate the socket if metadata collection changed (assuming that there are no new conflicts). To keep changelink simple I am thinking of disallowing metadata changes. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment
On Wed, 2015-08-19 at 19:17 +, Grumbach, Emmanuel wrote: Hm.. how would net/core/tso.c avoid this? Because a driver using these helpers keep around the original LSO packet and frees it normally at TX completion time. I can't see anything related to truesize there. Note that this work since it is guaranteed that we release the skbs in order. (BTW TCP packets do not have sock_wfree as destructor but tcp_wfree(), yet we want backpressure mostly for TCP stack (TCP Small Queues)) I am not sure I follow here. You want me to test: if (skb_gso-destructor == tcp_wfree) ? Yes. Look for example at tcp_gso_segment() (called from skb_gso_segment()) copy_destructor = gso_skb-destructor == tcp_wfree; ... /* Following permits TCP Small Queues to work well with GSO : * The callback to TCP stack will be called at the time last frag * is freed at TX completion, and not right now when gso_skb * is freed by GSO engine */ if (copy_destructor) { swap(gso_skb-sk, skb-sk); swap(gso_skb-destructor, skb-destructor); sum_truesize += skb-truesize; atomic_add(sum_truesize - gso_skb-truesize, skb-sk-sk_wmem_alloc); } I checked that code using iperf and saw that I don't get into this if, but I (probably wrongly) assumed that other applications would set a flag on the socket (forgive my ignorance) that would make this if be taken. If you do not see skb-destructor == tcp_wfree, then something is definitely wrong on your setup. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.
On Wed, Aug 19, 2015 at 11:37 AM, Jesse Gross je...@nicira.com wrote: On Wed, Aug 19, 2015 at 11:29 AM, Pravin Shelar pshe...@nicira.com wrote: On Wed, Aug 19, 2015 at 11:18 AM, Jesse Gross je...@nicira.com wrote: My guess is that if the issue from the earlier patch about overlapping collect_md tunnels is fixed then that might allow us to simplify things a little further, since for those tunnels we can assume there is a 1:1 mapping between collect_md tunnels and sockets. I dont see how it would be different. Can you elaborate on this ? Mostly just conceptually simpler. Right now it looks like we are doing some kind of refcounting between devices and tunnels in geneve_open/stop (I know it's not really but it appears like that in some ways.) We could just directly assign collect_md in geneve_open() and do nothing at all in geneve_stop(). If you look at next patch, I have changed geneve_open and stop further. The change is geneve_open adds tunnel to hash table so that only device which are open are in hash table. Since geneve_open and stop is common for both type of tunnel I do not think there can be any changes even after avoiding overlapping tunnel types in given socket. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 11/13] vxlan: metadata based tunneling for IPv6
On Wed, Aug 19, 2015 at 12:10:01PM +0200, Jiri Benc wrote: Support metadata based (formerly flow based) tunneling also for IPv6. This complements commit ee122c79d422 (vxlan: Flow based tunneling). Signed-off-by: Jiri Benc jb...@redhat.com --- drivers/net/vxlan.c | 69 +++-- 1 file changed, 40 insertions(+), 29 deletions(-) Looks good. Acked-by: Alexei Starovoitov a...@plumgrid.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/15] netfilter: nft_limit: factor out shared code with per-byte limiting
This patch prepares the introduction of per-byte limiting. Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- net/netfilter/nft_limit.c | 86 - 1 file changed, 53 insertions(+), 33 deletions(-) diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c index c79703e..c4d1b1b 100644 --- a/net/netfilter/nft_limit.c +++ b/net/netfilter/nft_limit.c @@ -27,65 +27,54 @@ struct nft_limit { u64 nsecs; }; -static void nft_limit_pkts_eval(const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_pktinfo *pkt) +static inline bool nft_limit_eval(struct nft_limit *limit, u64 cost) { - struct nft_limit *priv = nft_expr_priv(expr); - u64 now, tokens, cost = div_u64(priv-nsecs, priv-rate); + u64 now, tokens; s64 delta; spin_lock_bh(limit_lock); now = ktime_get_ns(); - tokens = priv-tokens + now - priv-last; - if (tokens priv-tokens_max) - tokens = priv-tokens_max; + tokens = limit-tokens + now - limit-last; + if (tokens limit-tokens_max) + tokens = limit-tokens_max; - priv-last = now; + limit-last = now; delta = tokens - cost; if (delta = 0) { - priv-tokens = delta; + limit-tokens = delta; spin_unlock_bh(limit_lock); - return; + return false; } - priv-tokens = tokens; + limit-tokens = tokens; spin_unlock_bh(limit_lock); - - regs-verdict.code = NFT_BREAK; + return true; } -static const struct nla_policy nft_limit_policy[NFTA_LIMIT_MAX + 1] = { - [NFTA_LIMIT_RATE] = { .type = NLA_U64 }, - [NFTA_LIMIT_UNIT] = { .type = NLA_U64 }, -}; - -static int nft_limit_init(const struct nft_ctx *ctx, - const struct nft_expr *expr, +static int nft_limit_init(struct nft_limit *limit, const struct nlattr * const tb[]) { - struct nft_limit *priv = nft_expr_priv(expr); u64 unit; if (tb[NFTA_LIMIT_RATE] == NULL || tb[NFTA_LIMIT_UNIT] == NULL) return -EINVAL; - priv-rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE])); + limit-rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE])); unit = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT])); - priv-nsecs = unit * NSEC_PER_SEC; - if (priv-rate == 0 || priv-nsecs unit) + limit-nsecs = unit * NSEC_PER_SEC; + if (limit-rate == 0 || limit-nsecs unit) return -EOVERFLOW; - priv-tokens = priv-tokens_max = priv-nsecs; - priv-last = ktime_get_ns(); + limit-tokens = limit-tokens_max = limit-nsecs; + limit-last = ktime_get_ns(); + return 0; } -static int nft_limit_dump(struct sk_buff *skb, const struct nft_expr *expr) +static int nft_limit_dump(struct sk_buff *skb, const struct nft_limit *limit) { - const struct nft_limit *priv = nft_expr_priv(expr); - u64 secs = div_u64(priv-nsecs, NSEC_PER_SEC); + u64 secs = div_u64(limit-nsecs, NSEC_PER_SEC); - if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(priv-rate)) || + if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(limit-rate)) || nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(secs))) goto nla_put_failure; return 0; @@ -94,13 +83,44 @@ nla_put_failure: return -1; } +static void nft_limit_pkts_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) +{ + struct nft_limit *priv = nft_expr_priv(expr); + + if (nft_limit_eval(priv, div_u64(priv-nsecs, priv-rate))) + regs-verdict.code = NFT_BREAK; +} + +static const struct nla_policy nft_limit_policy[NFTA_LIMIT_MAX + 1] = { + [NFTA_LIMIT_RATE] = { .type = NLA_U64 }, + [NFTA_LIMIT_UNIT] = { .type = NLA_U64 }, +}; + +static int nft_limit_pkts_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_limit *priv = nft_expr_priv(expr); + + return nft_limit_init(priv, tb); +} + +static int nft_limit_pkts_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_limit *priv = nft_expr_priv(expr); + + return nft_limit_dump(skb, priv); +} + static struct nft_expr_type nft_limit_type; static const struct nft_expr_ops nft_limit_pkts_ops = { .type = nft_limit_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_limit)), .eval = nft_limit_pkts_eval, - .init = nft_limit_init, - .dump = nft_limit_dump, + .init = nft_limit_pkts_init, +
[PATCH 11/15] netfilter: nfacct: per network namespace support
From: Andreas Schultz aschu...@tpip.net - Move the nfnl_acct_list into the network namespace, initialize and destroy it per namespace - Keep track of refcnt on nfacct objects, the old logic does not longer work with a per namespace list - Adjust xt_nfacct to pass the namespace when registring objects Signed-off-by: Andreas Schultz aschu...@tpip.net Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/linux/netfilter/nfnetlink_acct.h |3 +- include/net/net_namespace.h |3 ++ net/netfilter/nfnetlink_acct.c | 71 +- net/netfilter/xt_nfacct.c|2 +- 4 files changed, 56 insertions(+), 23 deletions(-) diff --git a/include/linux/netfilter/nfnetlink_acct.h b/include/linux/netfilter/nfnetlink_acct.h index 6ec9757..80ca889 100644 --- a/include/linux/netfilter/nfnetlink_acct.h +++ b/include/linux/netfilter/nfnetlink_acct.h @@ -2,6 +2,7 @@ #define _NFNL_ACCT_H_ #include uapi/linux/netfilter/nfnetlink_acct.h +#include net/net_namespace.h enum { NFACCT_NO_QUOTA = -1, @@ -11,7 +12,7 @@ enum { struct nf_acct; -struct nf_acct *nfnl_acct_find_get(const char *filter_name); +struct nf_acct *nfnl_acct_find_get(struct net *net, const char *filter_name); void nfnl_acct_put(struct nf_acct *acct); void nfnl_acct_update(const struct sk_buff *skb, struct nf_acct *nfacct); extern int nfnl_acct_overquota(const struct sk_buff *skb, diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index e951453..2dcea63 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -118,6 +118,9 @@ struct net { #endif struct sock *nfnl; struct sock *nfnl_stash; +#if IS_ENABLED(CONFIG_NETFILTER_NETLINK_ACCT) + struct list_headnfnl_acct_list; +#endif #endif #ifdef CONFIG_WEXT_CORE struct sk_buff_head wext_nlevents; diff --git a/net/netfilter/nfnetlink_acct.c b/net/netfilter/nfnetlink_acct.c index c18af2f..fefbf5f 100644 --- a/net/netfilter/nfnetlink_acct.c +++ b/net/netfilter/nfnetlink_acct.c @@ -27,8 +27,6 @@ MODULE_LICENSE(GPL); MODULE_AUTHOR(Pablo Neira Ayuso pa...@netfilter.org); MODULE_DESCRIPTION(nfacct: Extended Netfilter accounting infrastructure); -static LIST_HEAD(nfnl_acct_list); - struct nf_acct { atomic64_t pkts; atomic64_t bytes; @@ -53,6 +51,7 @@ nfnl_acct_new(struct sock *nfnl, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const tb[]) { struct nf_acct *nfacct, *matching = NULL; + struct net *net = sock_net(nfnl); char *acct_name; unsigned int size = 0; u32 flags = 0; @@ -64,7 +63,7 @@ nfnl_acct_new(struct sock *nfnl, struct sk_buff *skb, if (strlen(acct_name) == 0) return -EINVAL; - list_for_each_entry(nfacct, nfnl_acct_list, head) { + list_for_each_entry(nfacct, net-nfnl_acct_list, head) { if (strncmp(nfacct-name, acct_name, NFACCT_NAME_MAX) != 0) continue; @@ -124,7 +123,7 @@ nfnl_acct_new(struct sock *nfnl, struct sk_buff *skb, be64_to_cpu(nla_get_be64(tb[NFACCT_PKTS]))); } atomic_set(nfacct-refcnt, 1); - list_add_tail_rcu(nfacct-head, nfnl_acct_list); + list_add_tail_rcu(nfacct-head, net-nfnl_acct_list); return 0; } @@ -185,6 +184,7 @@ nla_put_failure: static int nfnl_acct_dump(struct sk_buff *skb, struct netlink_callback *cb) { + struct net *net = sock_net(skb-sk); struct nf_acct *cur, *last; const struct nfacct_filter *filter = cb-data; @@ -196,7 +196,7 @@ nfnl_acct_dump(struct sk_buff *skb, struct netlink_callback *cb) cb-args[1] = 0; rcu_read_lock(); - list_for_each_entry_rcu(cur, nfnl_acct_list, head) { + list_for_each_entry_rcu(cur, net-nfnl_acct_list, head) { if (last) { if (cur != last) continue; @@ -257,6 +257,7 @@ static int nfnl_acct_get(struct sock *nfnl, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const tb[]) { + struct net *net = sock_net(nfnl); int ret = -ENOENT; struct nf_acct *cur; char *acct_name; @@ -283,7 +284,7 @@ nfnl_acct_get(struct sock *nfnl, struct sk_buff *skb, return -EINVAL; acct_name = nla_data(tb[NFACCT_NAME]); - list_for_each_entry(cur, nfnl_acct_list, head) { + list_for_each_entry(cur, net-nfnl_acct_list, head) { struct sk_buff *skb2; if (strncmp(cur-name, acct_name, NFACCT_NAME_MAX)!= 0) @@ -336,19 +337,20 @@ static int nfnl_acct_del(struct sock *nfnl, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const tb[]) { + struct net *net = sock_net(nfnl); char *acct_name;
[PATCH 13/15] netfilter: nf_conntrack: add direction support for zones
From: Daniel Borkmann dan...@iogearbox.net This work adds a direction parameter to netfilter zones, so identity separation can be performed only in original/reply or both directions (default). This basically opens up the possibility of doing NAT with conflicting IP address/port tuples from multiple, isolated tenants on a host (e.g. from a netns) without requiring each tenant to NAT twice resp. to use its own dedicated IP address to SNAT to, meaning overlapping tuples can be made unique with the zone identifier in original direction, where the NAT engine will then allocate a unique tuple in the commonly shared default zone for the reply direction. In some restricted, local DNAT cases, also port redirection could be used for making the reply traffic unique w/o requiring SNAT. The consensus we've reached and discussed at NFWS and since the initial implementation [1] was to directly integrate the direction meta data into the existing zones infrastructure, as opposed to the ct-mark approach we proposed initially. As we pass the nf_conntrack_zone object directly around, we don't have to touch all call-sites, but only those, that contain equality checks of zones. Thus, based on the current direction (original or reply), we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID. CT expectations are direction-agnostic entities when expectations are being compared among themselves, so we can only use the identifier in this case. Note that zone identifiers can not be included into the hash mix anymore as they don't contain a stable value that would be equal for both directions at all times, f.e. if only zone-id would unconditionally be xor'ed into the table slot hash, then replies won't find the corresponding conntracking entry anymore. If no particular direction is specified when configuring zones, the behaviour is exactly as we expect currently (both directions). Support has been added for the CT netlink interface as well as the x_tables raw CT target, which both already offer existing interfaces to user space for the configuration of zones. Below a minimal, simplified collision example (script in [2]) with netperf sessions: +--- tenant-1 ---+ mark := 1 |netperf |--+ ++ |CT zone := mark [ORIGINAL] [ip,sport] := X +--+ +--- gateway ---+ | mark routing |--| SNAT |-- ... + +--+ +---+ | +--- tenant-2 ---+ | ~~~|~~~ |netperf |--++---+ | ++ mark := 2 | netserver |-- ... + [ip,sport] := X +---+ [ip,port] := Y On the gateway netns, example: iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL iptables -t nat -A POSTROUTING -o dev -j SNAT --to-source ip --random-fully iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark conntrack dump from gateway netns: netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865, from each tenant netns tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport= dport=12865 zone-orig=1 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024 [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport= dport=12865 zone-orig=2 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport= [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1 src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438 [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2 src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2 Taking this further, test script in [2] creates 200 tenants and runs original-tuple colliding netperf sessions each. A conntrack -L dump in the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED state as expected. I also did run various other tests with some permutations of the script, to mention some: SNAT in random/random-fully/persistent mode, no zones (no overlaps), static zones (original, reply, both directions), etc. [1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/ [2] https://paste.fedoraproject.org/242835/65657871/ Signed-off-by: Daniel Borkmann dan...@iogearbox.net Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/net/netfilter/nf_conntrack_zones.h | 31 +++-
[PATCH 12/15] netfilter: nf_conntrack: push zone object into functions
From: Daniel Borkmann dan...@iogearbox.net This patch replaces the zone id which is pushed down into functions with the actual zone object. It's a bigger one-time change, but needed for later on extending zones with a direction parameter, and thus decoupling this additional information from all call-sites. No functional changes in this patch. The default zone becomes a global const object, namely nf_ct_zone_dflt and will be returned directly in various cases, one being, when there's f.e. no zoning support. Signed-off-by: Daniel Borkmann dan...@iogearbox.net Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/net/netfilter/nf_conntrack.h | 10 ++- include/net/netfilter/nf_conntrack_core.h |3 +- include/net/netfilter/nf_conntrack_expect.h| 11 +++- include/net/netfilter/nf_conntrack_zones.h | 33 +++--- net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |2 +- net/ipv4/netfilter/nf_conntrack_proto_icmp.c |3 +- net/ipv4/netfilter/nf_defrag_ipv4.c| 11 ++-- net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |2 +- net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |3 +- net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 12 ++-- net/netfilter/ipvs/ip_vs_nfct.c|2 +- net/netfilter/nf_conntrack_core.c | 75 + net/netfilter/nf_conntrack_expect.c| 21 +++--- net/netfilter/nf_conntrack_netlink.c | 84 +--- net/netfilter/nf_conntrack_pptp.c |3 +- net/netfilter/nf_conntrack_standalone.c| 17 +++-- net/netfilter/nf_nat_core.c| 19 -- net/netfilter/nf_synproxy_core.c |4 +- net/netfilter/xt_CT.c |6 +- net/netfilter/xt_connlimit.c |9 +-- net/sched/act_connmark.c |5 +- 21 files changed, 203 insertions(+), 132 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 37cd391..f5e23c6 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -250,8 +250,12 @@ void nf_ct_untracked_status_or(unsigned long bits); void nf_ct_iterate_cleanup(struct net *net, int (*iter)(struct nf_conn *i, void *data), void *data, u32 portid, int report); + +struct nf_conntrack_zone; + void nf_conntrack_free(struct nf_conn *ct); -struct nf_conn *nf_conntrack_alloc(struct net *net, u16 zone, +struct nf_conn *nf_conntrack_alloc(struct net *net, + const struct nf_conntrack_zone *zone, const struct nf_conntrack_tuple *orig, const struct nf_conntrack_tuple *repl, gfp_t gfp); @@ -291,7 +295,9 @@ extern unsigned int nf_conntrack_max; extern unsigned int nf_conntrack_hash_rnd; void init_nf_conntrack_hash_rnd(void); -struct nf_conn *nf_ct_tmpl_alloc(struct net *net, u16 zone, gfp_t flags); +struct nf_conn *nf_ct_tmpl_alloc(struct net *net, +const struct nf_conntrack_zone *zone, +gfp_t flags); #define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)-ct.stat-count) #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)-ct.stat-count) diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h index f2f0fa3..c03f9c4 100644 --- a/include/net/netfilter/nf_conntrack_core.h +++ b/include/net/netfilter/nf_conntrack_core.h @@ -52,7 +52,8 @@ bool nf_ct_invert_tuple(struct nf_conntrack_tuple *inverse, /* Find a connection corresponding to a tuple. */ struct nf_conntrack_tuple_hash * -nf_conntrack_find_get(struct net *net, u16 zone, +nf_conntrack_find_get(struct net *net, + const struct nf_conntrack_zone *zone, const struct nf_conntrack_tuple *tuple); int __nf_conntrack_confirm(struct sk_buff *skb); diff --git a/include/net/netfilter/nf_conntrack_expect.h b/include/net/netfilter/nf_conntrack_expect.h index 3f3aecb..dce56f0 100644 --- a/include/net/netfilter/nf_conntrack_expect.h +++ b/include/net/netfilter/nf_conntrack_expect.h @@ -4,7 +4,9 @@ #ifndef _NF_CONNTRACK_EXPECT_H #define _NF_CONNTRACK_EXPECT_H + #include net/netfilter/nf_conntrack.h +#include net/netfilter/nf_conntrack_zones.h extern unsigned int nf_ct_expect_hsize; extern unsigned int nf_ct_expect_max; @@ -76,15 +78,18 @@ int nf_conntrack_expect_init(void); void nf_conntrack_expect_fini(void); struct nf_conntrack_expect * -__nf_ct_expect_find(struct net *net, u16 zone, +__nf_ct_expect_find(struct net *net, + const struct nf_conntrack_zone *zone, const struct nf_conntrack_tuple *tuple); struct nf_conntrack_expect * -nf_ct_expect_find_get(struct net *net, u16 zone,
[PATCH 00/15] Netfilter updates for net-next
Hi David, The following patchset contains Netfilter updates for your net-next tree, they are: 1) Rework the existing nf_tables counter expression to make it per-cpu. 2) Prepare and factor out common packet duplication code from the TEE target so it can be reused from the new dup expression. 3) Add the new dup expression for the nf_tables IPv4 and IPv6 families. 4) Convert the nf_tables limit expression to use a token-based approach with 64-bits precision. 5) Enhance the nf_tables limit expression to support limiting at packet byte. This comes after several preparation patches. 6) Add a burst parameter to indicate the amount of packets or bytes that can exceed the limiting. 7) Add netns support to nfacct, from Andreas Schultz. 8) Pass the nf_conn_zone structure instead of the zone ID in nf_tables to allow accessing more zone specific information, from Daniel Borkmann. 9) Allow to define zone per-direction to support netns containers with overlapping network addressing, also from Daniel. 10) Extend the CT target to allow setting the zone based on the skb-mark as a way to support simple mappings from iptables, also from Daniel. 11) Make the nf_tables payload expression aware of the fact that VLAN offload may have removed a vlan header, from Florian Westphal. You can pull these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git Thanks! The following changes since commit d92cff89a0c80e7e49796366e441d97f07b5d321: net_dbg_ratelimited: turn into no-op when !DEBUG (2015-08-06 23:51:30 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master for you to fetch changes up to 8cfd23e6740158817d2045915f6ea5a2daf11bce: netfilter: nft_payload: work around vlan header stripping (2015-08-19 08:39:53 +0200) Andreas Schultz (1): netfilter: nfacct: per network namespace support Daniel Borkmann (3): netfilter: nf_conntrack: push zone object into functions netfilter: nf_conntrack: add direction support for zones netfilter: nf_conntrack: add efficient mark to zone mapping Florian Westphal (1): netfilter: nft_payload: work around vlan header stripping Pablo Neira Ayuso (10): netfilter: nft_counter: convert it to use per-cpu counters netfilter: xt_TEE: get rid of WITH_CONNTRACK definition netfilter: factor out packet duplication for IPv4/IPv6 netfilter: nf_tables: add nft_dup expression netfilter: nft_limit: rename to nft_limit_pkts netfilter: nft_limit: convert to token-based limiting at nanosecond granularity netfilter: nft_limit: factor out shared code with per-byte limiting netfilter: nft_limit: add burst parameter netfilter: nft_limit: constant token cost per packet netfilter: nft_limit: add per-byte limiting include/linux/netfilter/nfnetlink_acct.h |3 +- include/net/net_namespace.h|3 + include/net/netfilter/ipv4/nf_dup_ipv4.h |7 + include/net/netfilter/ipv6/nf_dup_ipv6.h |7 + include/net/netfilter/nf_conntrack.h | 10 +- include/net/netfilter/nf_conntrack_core.h |3 +- include/net/netfilter/nf_conntrack_expect.h| 11 +- include/net/netfilter/nf_conntrack_zones.h | 99 - include/net/netfilter/nft_dup.h|9 + include/uapi/linux/netfilter/nf_tables.h | 23 ++ include/uapi/linux/netfilter/nfnetlink_conntrack.h |1 + include/uapi/linux/netfilter/xt_CT.h |8 +- net/ipv4/netfilter/Kconfig | 12 ++ net/ipv4/netfilter/Makefile|3 + net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |2 +- net/ipv4/netfilter/nf_conntrack_proto_icmp.c |4 +- net/ipv4/netfilter/nf_defrag_ipv4.c| 17 +- net/ipv4/netfilter/nf_dup_ipv4.c | 120 +++ net/ipv4/netfilter/nft_dup_ipv4.c | 110 ++ net/ipv6/netfilter/Kconfig | 12 ++ net/ipv6/netfilter/Makefile|3 + net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |2 +- net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |5 +- net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 18 +- net/ipv6/netfilter/nf_dup_ipv6.c | 96 + net/ipv6/netfilter/nft_dup_ipv6.c | 108 ++ net/netfilter/Kconfig |2 + net/netfilter/ipvs/ip_vs_nfct.c|2 +- net/netfilter/nf_conntrack_core.c | 134 ++-- net/netfilter/nf_conntrack_expect.c| 21 +- net/netfilter/nf_conntrack_netlink.c | 228 ++--
[PATCH 06/15] netfilter: nft_limit: convert to token-based limiting at nanosecond granularity
Rework the limit expression to use a token-based limiting approach that refills the bucket gradually. The tokens are calculated at nanosecond granularity instead jiffies to improve precision. Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- net/netfilter/nft_limit.c | 42 ++ 1 file changed, 26 insertions(+), 16 deletions(-) diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c index d0788e1..c79703e 100644 --- a/net/netfilter/nft_limit.c +++ b/net/netfilter/nft_limit.c @@ -20,10 +20,11 @@ static DEFINE_SPINLOCK(limit_lock); struct nft_limit { + u64 last; u64 tokens; + u64 tokens_max; u64 rate; - u64 unit; - unsigned long stamp; + u64 nsecs; }; static void nft_limit_pkts_eval(const struct nft_expr *expr, @@ -31,18 +32,23 @@ static void nft_limit_pkts_eval(const struct nft_expr *expr, const struct nft_pktinfo *pkt) { struct nft_limit *priv = nft_expr_priv(expr); + u64 now, tokens, cost = div_u64(priv-nsecs, priv-rate); + s64 delta; spin_lock_bh(limit_lock); - if (time_after_eq(jiffies, priv-stamp)) { - priv-tokens = priv-rate; - priv-stamp = jiffies + priv-unit * HZ; - } - - if (priv-tokens = 1) { - priv-tokens--; + now = ktime_get_ns(); + tokens = priv-tokens + now - priv-last; + if (tokens priv-tokens_max) + tokens = priv-tokens_max; + + priv-last = now; + delta = tokens - cost; + if (delta = 0) { + priv-tokens = delta; spin_unlock_bh(limit_lock); return; } + priv-tokens = tokens; spin_unlock_bh(limit_lock); regs-verdict.code = NFT_BREAK; @@ -58,25 +64,29 @@ static int nft_limit_init(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { struct nft_limit *priv = nft_expr_priv(expr); + u64 unit; if (tb[NFTA_LIMIT_RATE] == NULL || tb[NFTA_LIMIT_UNIT] == NULL) return -EINVAL; - priv-rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE])); - priv-unit = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT])); - priv-stamp = jiffies + priv-unit * HZ; - priv-tokens = priv-rate; + priv-rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE])); + unit = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT])); + priv-nsecs = unit * NSEC_PER_SEC; + if (priv-rate == 0 || priv-nsecs unit) + return -EOVERFLOW; + priv-tokens = priv-tokens_max = priv-nsecs; + priv-last = ktime_get_ns(); return 0; } static int nft_limit_dump(struct sk_buff *skb, const struct nft_expr *expr) { const struct nft_limit *priv = nft_expr_priv(expr); + u64 secs = div_u64(priv-nsecs, NSEC_PER_SEC); - if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(priv-rate))) - goto nla_put_failure; - if (nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(priv-unit))) + if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(priv-rate)) || + nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(secs))) goto nla_put_failure; return 0; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/15] netfilter: nf_tables: add nft_dup expression
This new expression uses the nf_dup engine to clone packets to a given gateway. Unlike xt_TEE, we use an index to indicate output interface which should be fine at this stage. Moreover, change to the preemtion-safe this_cpu_read(nf_skb_duplicated) from nf_dup_ipv{4,6} to silence a lockdep splat. Based on the original tee expression from Arturo Borrero Gonzalez, although this patch has diverted quite a bit from this initial effort due to the change to support maps. Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/net/netfilter/nft_dup.h |9 +++ include/uapi/linux/netfilter/nf_tables.h | 14 net/ipv4/netfilter/Kconfig |6 ++ net/ipv4/netfilter/Makefile |1 + net/ipv4/netfilter/nf_dup_ipv4.c |2 +- net/ipv4/netfilter/nft_dup_ipv4.c| 110 ++ net/ipv6/netfilter/Kconfig |6 ++ net/ipv6/netfilter/Makefile |1 + net/ipv6/netfilter/nf_dup_ipv6.c |2 +- net/ipv6/netfilter/nft_dup_ipv6.c| 108 + 10 files changed, 257 insertions(+), 2 deletions(-) create mode 100644 include/net/netfilter/nft_dup.h create mode 100644 net/ipv4/netfilter/nft_dup_ipv4.c create mode 100644 net/ipv6/netfilter/nft_dup_ipv6.c diff --git a/include/net/netfilter/nft_dup.h b/include/net/netfilter/nft_dup.h new file mode 100644 index 000..6b84cf6 --- /dev/null +++ b/include/net/netfilter/nft_dup.h @@ -0,0 +1,9 @@ +#ifndef _NFT_DUP_H_ +#define _NFT_DUP_H_ + +struct nft_dup_inet { + enum nft_registers sreg_addr:8; + enum nft_registers sreg_dev:8; +}; + +#endif /* _NFT_DUP_H_ */ diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index a99e6a9..2ef35f2 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -936,6 +936,20 @@ enum nft_redir_attributes { #define NFTA_REDIR_MAX (__NFTA_REDIR_MAX - 1) /** + * enum nft_dup_attributes - nf_tables dup expression netlink attributes + * + * @NFTA_DUP_SREG_ADDR: source register of address (NLA_U32: nft_registers) + * @NFTA_DUP_SREG_DEV: source register of output interface (NLA_U32: nft_register) + */ +enum nft_dup_attributes { + NFTA_DUP_UNSPEC, + NFTA_DUP_SREG_ADDR, + NFTA_DUP_SREG_DEV, + __NFTA_DUP_MAX +}; +#define NFTA_DUP_MAX (__NFTA_DUP_MAX - 1) + +/** * enum nft_gen_attributes - nf_tables ruleset generation attributes * * @NFTA_GEN_ID: Ruleset generation ID (NLA_U32) diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig index 0142ea2..690d27d 100644 --- a/net/ipv4/netfilter/Kconfig +++ b/net/ipv4/netfilter/Kconfig @@ -58,6 +58,12 @@ config NFT_REJECT_IPV4 default NFT_REJECT tristate +config NFT_DUP_IPV4 + tristate IPv4 nf_tables packet duplication support + select NF_DUP_IPV4 + help + This module enables IPv4 packet duplication support for nf_tables. + endif # NF_TABLES_IPV4 config NF_TABLES_ARP diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile index 9136ffc..87b073d 100644 --- a/net/ipv4/netfilter/Makefile +++ b/net/ipv4/netfilter/Makefile @@ -41,6 +41,7 @@ obj-$(CONFIG_NFT_CHAIN_NAT_IPV4) += nft_chain_nat_ipv4.o obj-$(CONFIG_NFT_REJECT_IPV4) += nft_reject_ipv4.o obj-$(CONFIG_NFT_MASQ_IPV4) += nft_masq_ipv4.o obj-$(CONFIG_NFT_REDIR_IPV4) += nft_redir_ipv4.o +obj-$(CONFIG_NFT_DUP_IPV4) += nft_dup_ipv4.o obj-$(CONFIG_NF_TABLES_ARP) += nf_tables_arp.o # generic IP tables diff --git a/net/ipv4/netfilter/nf_dup_ipv4.c b/net/ipv4/netfilter/nf_dup_ipv4.c index eff85ab..b5bb375 100644 --- a/net/ipv4/netfilter/nf_dup_ipv4.c +++ b/net/ipv4/netfilter/nf_dup_ipv4.c @@ -69,7 +69,7 @@ void nf_dup_ipv4(struct sk_buff *skb, unsigned int hooknum, { struct iphdr *iph; - if (__this_cpu_read(nf_skb_duplicated)) + if (this_cpu_read(nf_skb_duplicated)) return; /* * Copy the skb, and route the copy. Will later return %XT_CONTINUE for diff --git a/net/ipv4/netfilter/nft_dup_ipv4.c b/net/ipv4/netfilter/nft_dup_ipv4.c new file mode 100644 index 000..25419fb --- /dev/null +++ b/net/ipv4/netfilter/nft_dup_ipv4.c @@ -0,0 +1,110 @@ +/* + * Copyright (c) 2015 Pablo Neira Ayuso pa...@netfilter.org + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include linux/kernel.h +#include linux/init.h +#include linux/module.h +#include linux/netlink.h +#include linux/netfilter.h +#include linux/netfilter/nf_tables.h +#include net/netfilter/nf_tables.h +#include net/netfilter/ipv4/nf_dup_ipv4.h + +struct nft_dup_ipv4 { + enum nft_registers sreg_addr:8; + enum nft_registers sreg_dev:8; +}; + +static void nft_dup_ipv4_eval(const struct nft_expr
[PATCH 15/15] netfilter: nft_payload: work around vlan header stripping
From: Florian Westphal f...@strlen.de make payload expression aware of the fact that VLAN offload may have removed a vlan header. When we encounter tagged skb, transparently insert the tag into the register so that vlan header matching can work without userspace being aware of offload features. Signed-off-by: Florian Westphal f...@strlen.de Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- net/netfilter/nft_payload.c | 57 ++- 1 file changed, 56 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c index 94fb3b2..09b4b07 100644 --- a/net/netfilter/nft_payload.c +++ b/net/netfilter/nft_payload.c @@ -9,6 +9,7 @@ */ #include linux/kernel.h +#include linux/if_vlan.h #include linux/init.h #include linux/module.h #include linux/netlink.h @@ -17,6 +18,53 @@ #include net/netfilter/nf_tables_core.h #include net/netfilter/nf_tables.h +/* add vlan header into the user buffer for if tag was removed by offloads */ +static bool +nft_payload_copy_vlan(u32 *d, const struct sk_buff *skb, u8 offset, u8 len) +{ + int mac_off = skb_mac_header(skb) - skb-data; + u8 vlan_len, *vlanh, *dst_u8 = (u8 *) d; + struct vlan_ethhdr veth; + + vlanh = (u8 *) veth; + if (offset ETH_HLEN) { + u8 ethlen = min_t(u8, len, ETH_HLEN - offset); + + if (skb_copy_bits(skb, mac_off, veth, ETH_HLEN)) + return false; + + veth.h_vlan_proto = skb-vlan_proto; + + memcpy(dst_u8, vlanh + offset, ethlen); + + len -= ethlen; + if (len == 0) + return true; + + dst_u8 += ethlen; + offset = ETH_HLEN; + } else if (offset = VLAN_ETH_HLEN) { + offset -= VLAN_HLEN; + goto skip; + } + + veth.h_vlan_TCI = htons(skb_vlan_tag_get(skb)); + veth.h_vlan_encapsulated_proto = skb-protocol; + + vlanh += offset; + + vlan_len = min_t(u8, len, VLAN_ETH_HLEN - offset); + memcpy(dst_u8, vlanh, vlan_len); + + len -= vlan_len; + if (!len) + return true; + + dst_u8 += vlan_len; + skip: + return skb_copy_bits(skb, offset + mac_off, dst_u8, len) == 0; +} + static void nft_payload_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) @@ -26,10 +74,18 @@ static void nft_payload_eval(const struct nft_expr *expr, u32 *dest = regs-data[priv-dreg]; int offset; + dest[priv-len / NFT_REG32_SIZE] = 0; switch (priv-base) { case NFT_PAYLOAD_LL_HEADER: if (!skb_mac_header_was_set(skb)) goto err; + + if (skb_vlan_tag_present(skb)) { + if (!nft_payload_copy_vlan(dest, skb, + priv-offset, priv-len)) + goto err; + return; + } offset = skb_mac_header(skb) - skb-data; break; case NFT_PAYLOAD_NETWORK_HEADER: @@ -43,7 +99,6 @@ static void nft_payload_eval(const struct nft_expr *expr, } offset += priv-offset; - dest[priv-len / NFT_REG32_SIZE] = 0; if (skb_copy_bits(skb, offset, dest, priv-len) 0) goto err; return; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/15] netfilter: nft_limit: rename to nft_limit_pkts
To prepare introduction of bytes ratelimit support. Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- net/netfilter/nft_limit.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c index 435c1cc..d0788e1 100644 --- a/net/netfilter/nft_limit.c +++ b/net/netfilter/nft_limit.c @@ -26,9 +26,9 @@ struct nft_limit { unsigned long stamp; }; -static void nft_limit_eval(const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_pktinfo *pkt) +static void nft_limit_pkts_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) { struct nft_limit *priv = nft_expr_priv(expr); @@ -85,17 +85,17 @@ nla_put_failure: } static struct nft_expr_type nft_limit_type; -static const struct nft_expr_ops nft_limit_ops = { +static const struct nft_expr_ops nft_limit_pkts_ops = { .type = nft_limit_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_limit)), - .eval = nft_limit_eval, + .eval = nft_limit_pkts_eval, .init = nft_limit_init, .dump = nft_limit_dump, }; static struct nft_expr_type nft_limit_type __read_mostly = { .name = limit, - .ops= nft_limit_ops, + .ops= nft_limit_pkts_ops, .policy = nft_limit_policy, .maxattr= NFTA_LIMIT_MAX, .flags = NFT_EXPR_STATEFUL, -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 9/9] geneve: Implement rtnl changelink
On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar pshe...@nicira.com wrote: diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index e47cdd9..0d7fbef 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c -static int geneve_configure(struct net *net, struct net_device *dev, - __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos, - __u16 dst_port, bool metadata) +static int __geneve_configure(struct net *net, struct net_device *dev, + __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos, + __u16 dst_port, bool metadata) { [...] geneve-net = net; geneve-dev = dev; I guess this stuff should really be in geneve_configure() - it seems a bit odd to change it for a running device (even if it shouldn't change). geneve-remote.sin_addr.s_addr = rem_addr; if (IN_MULTICAST(ntohl(geneve-remote.sin_addr.s_addr))) return -EINVAL; + u32_to_vni(vni, geneve-vni); list_for_each_entry(t, gn-geneve_list, next) { if (!memcmp(geneve-vni, t-vni, sizeof(t-vni)) rem_addr == t-remote.sin_addr.s_addr I'm not sure that these types of operations are safe if the device is already running. We first overwrite the remote value and then we do error checking but that means that if there is an error, then the device will be left in a broken state. Don't we also need to update the hash table if some of these parameters change? +static int geneve_changelink(struct net_device *dev, +struct nlattr *tb[], struct nlattr *data[]) +{ [...] - if (data[IFLA_GENEVE_PORT]) - dst_port = nla_get_u16(data[IFLA_GENEVE_PORT]); + if (geneve-sock (dst_port != ntohs(geneve-dst_port) || +metadata != geneve-collect_md)) { It seems like in an ideal world, we wouldn't need to recreate the socket if metadata collection changed (assuming that there are no new conflicts). -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v3] rocker: add debugfs support to dump internal tables
On Tue, Aug 18, 2015 at 1:47 PM, David Miller da...@davemloft.net wrote: From: Scott Feldman sfel...@gmail.com Date: Tue, 18 Aug 2015 13:37:56 -0700 Hi Scott David is not so keen no debugfs stuff. He already NACKed adding more than what is currently in DSA: https://lkml.org/lkml/2015/7/11/8 That patch added writable debugfs files, which I can see might be used as a back-door to program hardware. That does seem bad. I fully agreed with respect to write. But if you read the whole message, David is also not happy with read only. I think before you spend too much more time on this, you need some indication from David if he is going to merge it or not. David, please give us guidance on debugfs in drivers/net. Is there some criteria we can define to know when it's OK to use debugfs? The less you use it the better, seriously. I see some drivers where the foo_debugfs.c file is larger than the rest of the driver. Once people start using it, it's like crack, and they dump every single debugging widget they found useful at some point into there. This is not what we want. Most things I see in debugfs support was probably useful for debugging one particular bug but then it was never really useful again in the future. Those kinds of things can be done locally in someone's tree. I often see various kinds of statistics ending up in these things, or register dumps, both of which are 'ethtool' or similar material. # git grep debugfs_create drivers/net ^^^ this is scary. I see some crazy things being done here. The writable nodes look like workaround driver/device bugs or to provide backdoor interfaces that don't exist natively. I say we clean up this mess. Just eliminating the writable files would force bugs to get fixed and get new interfaces defined. And replace readable files when interface exist (stats/reg). Finally, look for readable files that can be converted to new shared common interfaces. What's left should be read-only (S_IRUGO) files (no binary blobs) containing data unique for driver/device useful for field troubleshooting. I'm motivated. Next net-next cycle I'm going to go down the list with a big eraser. I'm sure I'll be a popular guy. -scott -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] 3c59x: Add BQL support for 3c59x ethernet driver.
--- drivers/net/ethernet/3com/3c59x.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/3com/3c59x.c b/drivers/net/ethernet/3com/3c59x.c index 753887d..2839af0 100644 --- a/drivers/net/ethernet/3com/3c59x.c +++ b/drivers/net/ethernet/3com/3c59x.c @@ -1726,6 +1726,7 @@ vortex_up(struct net_device *dev) if (vp-cb_fn_base) /* The PCMCIA people are idiots. */ iowrite32(0x8000, vp-cb_fn_base + 4); netif_start_queue (dev); + netdev_reset_queue(dev); err_out: return err; } @@ -1935,16 +1936,18 @@ static void vortex_tx_timeout(struct net_device *dev) if (vp-cur_tx - vp-dirty_tx 0ioread32(ioaddr + DownListPtr) == 0) iowrite32(vp-tx_ring_dma + (vp-dirty_tx % TX_RING_SIZE) * sizeof(struct boom_tx_desc), ioaddr + DownListPtr); - if (vp-cur_tx - vp-dirty_tx TX_RING_SIZE) + if (vp-cur_tx - vp-dirty_tx TX_RING_SIZE) { netif_wake_queue (dev); + netdev_reset_queue (dev); + } if (vp-drv_flags IS_BOOMERANG) iowrite8(PKT_BUF_SZ8, ioaddr + TxFreeThreshold); iowrite16(DownUnstall, ioaddr + EL3_CMD); } else { dev-stats.tx_dropped++; netif_wake_queue(dev); + netdev_reset_queue(dev); } - /* Issue Tx Enable */ iowrite16(TxEnable, ioaddr + EL3_CMD); dev-trans_start = jiffies; /* prevent tx timeout */ @@ -2063,6 +2066,7 @@ vortex_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct vortex_private *vp = netdev_priv(dev); void __iomem *ioaddr = vp-ioaddr; + int skblen = skb-len; /* Put out the doubleword header... */ iowrite32(skb-len, ioaddr + TX_FIFO); @@ -2094,6 +2098,7 @@ vortex_start_xmit(struct sk_buff *skb, struct net_device *dev) } } + netdev_sent_queue(dev, skblen); /* Clear the Tx status stack. */ { @@ -2125,6 +2130,7 @@ boomerang_start_xmit(struct sk_buff *skb, struct net_device *dev) void __iomem *ioaddr = vp-ioaddr; /* Calculate the next Tx descriptor entry. */ int entry = vp-cur_tx % TX_RING_SIZE; + int skblen = skb-len; struct boom_tx_desc *prev_entry = vp-tx_ring[(vp-cur_tx-1) % TX_RING_SIZE]; unsigned long flags; dma_addr_t dma_addr; @@ -2230,6 +2236,8 @@ boomerang_start_xmit(struct sk_buff *skb, struct net_device *dev) } vp-cur_tx++; + netdev_sent_queue(dev, skblen); + if (vp-cur_tx - vp-dirty_tx TX_RING_SIZE - 1) { netif_stop_queue (dev); } else {/* Clear previous interrupt enable. */ @@ -2267,6 +2275,7 @@ vortex_interrupt(int irq, void *dev_id) int status; int work_done = max_interrupt_work; int handled = 0; + unsigned int bytes_compl = 0, pkts_compl = 0; ioaddr = vp-ioaddr; spin_lock(vp-lock); @@ -2314,6 +2323,8 @@ vortex_interrupt(int irq, void *dev_id) if (ioread16(ioaddr + Wn7_MasterStatus) 0x1000) { iowrite16(0x1000, ioaddr + Wn7_MasterStatus); /* Ack the event. */ pci_unmap_single(VORTEX_PCI(vp), vp-tx_skb_dma, (vp-tx_skb-len + 3) ~3, PCI_DMA_TODEVICE); + pkts_compl++; + bytes_compl += vp-tx_skb-len; dev_kfree_skb_irq(vp-tx_skb); /* Release the transferred buffer */ if (ioread16(ioaddr + TxFree) 1536) { /* @@ -2358,6 +2369,7 @@ vortex_interrupt(int irq, void *dev_id) iowrite16(AckIntr | IntReq | IntLatch, ioaddr + EL3_CMD); } while ((status = ioread16(ioaddr + EL3_STATUS)) (IntLatch | RxComplete)); + netdev_completed_queue(dev, pkts_compl, bytes_compl); spin_unlock(vp-window_lock); if (vortex_debug 4) @@ -2382,6 +2394,7 @@ boomerang_interrupt(int irq, void *dev_id) int status; int work_done = max_interrupt_work; int handled = 0; + unsigned int bytes_compl = 0, pkts_compl = 0; ioaddr = vp-ioaddr; @@ -2455,6 +2468,8 @@ boomerang_interrupt(int irq, void *dev_id) pci_unmap_single(VORTEX_PCI(vp), le32_to_cpu(vp-tx_ring[entry].addr), skb-len, PCI_DMA_TODEVICE); #endif + pkts_compl++; + bytes_compl += skb-len; dev_kfree_skb_irq(skb); vp-tx_skbuff[entry] = NULL;
Re: linux-next: unregister_netdevice: waiting for lo to become free. Usage count = 1
2015-08-18 18:27 GMT+03:00 David Ahern d...@cumulusnetworks.com: On 8/18/15 9:24 AM, Andrey Wagin wrote: Hello David, CRIU tests detetect that references on net devices leak on 4.2.0-rc6-next-20150817. Looks like it started with v4.2-rc6-882-g3bfd847. 1e3136789975f03e461798149309034e5213c1b4 should have fixed it. Yes, it works now. Thanks! David -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 20/21] net: warn if drivers set tx_queue_len = 0
On Wed, 2015-08-19 at 13:31 -0700, Eric Dumazet wrote: lpaa5:~# tc qd sh dev eth1 qdisc mq 0: root qdisc fq 0: parent :4 limit 1p flow_limit 1000p buckets 1024 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3028 initial_quantum 15140 qdisc fq 0: parent :3 limit 1p flow_limit 1000p buckets 1024 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3028 initial_quantum 15140 qdisc fq 0: parent :2 limit 1p flow_limit 1000p buckets 1024 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3028 initial_quantum 15140 qdisc fq 0: parent :1 limit 1p flow_limit 1000p buckets 1024 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3028 initial_quantum 15140 Well, it seems I just leaked fact that we use 3-bands in our fq implementation ;) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 10/13] vxlan: do not shadow flags variable
On 08/19/15 at 12:10pm, Jiri Benc wrote: The 'flags' variable is already defined in the outer scope. Signed-off-by: Jiri Benc jb...@redhat.com Acked-by: Thomas Graf tg...@suug.ch -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 04/13] ip_tunnels: add IPv6 addresses to ip_tunnel_key
On Wed, Aug 19, 2015 at 12:09:54PM +0200, Jiri Benc wrote: Add the IPv6 addresses as an union with IPv4 ones. When using IPv4, the newly introduced padding after the IPv4 addresses needs to be zeroed out. Signed-off-by: Jiri Benc jb...@redhat.com --- v1-v2: Fix incorrect IP_TUNNEL_KEY_IPV4_PAD_LEN calculation, thanks to Alexei. Acked-by: Alexei Starovoitov a...@plumgrid.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Clarification on rtnetlink requests
-Original Message- From: netdev-ow...@vger.kernel.org [mailto:netdev- ow...@vger.kernel.org] On Behalf Of David Chappelle Sent: Wednesday, August 19, 2015 8:05 AM To: netdev@vger.kernel.org Subject: Clarification on rtnetlink requests I am a bit confused with respect to the structure of rtnetlink requests. It seems that in some circumstances a request can look like: struct request { struct nlmsghdr header; struct rtgenmsg body; }; and in other cases it can look like: struct request { struct nlmsghdr header; struct ifinfomsg body; }; How do I know which one to use when sending RTM_GETLINK and RTM_GETADDR requests? Furthermore, it also seems that 'struct rtattr' can be specified at the end of the request as well. Is there any documentation that describes this. RTM_GETLINK uses ifinfomsg and RTM_GETADDR uses ifaddrmsg, see man 7 rtnetlink. struct rtgenmsg is just a generic type, look at include/linux/rtnetlink.h -Anish N�r��yb�X��ǧv�^�){.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥
Re: [RFC v2 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment
Hello. On 08/19/2015 03:59 PM, Emmanuel Grumbach wrote: This allows to release the backpressure on the socket only when the last segment is released. Now the truesize looks like this: if the truesize of the original skb is 65420, all the segments will have a truesize of 704 (skb itself) and the last one will have 65420. Change-Id: I3c894cf2afc0aedfe7b2a5b992ba41653ff79c0e Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com --- drivers/net/wireless/iwlwifi/mvm/tx.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c b/drivers/net/wireless/iwlwifi/mvm/tx.c index 5046833..046e50d 100644 --- a/drivers/net/wireless/iwlwifi/mvm/tx.c +++ b/drivers/net/wireless/iwlwifi/mvm/tx.c [...] @@ -1034,6 +1035,20 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct sk_buff *skb_gso, } __skb_queue_tail(mpdus_skb, skb); + sum_truesize += skb-truesize; + } + + /* Release the backpressure on the socket only when +* the last segment is released. +*/ + if (skb_gso-destructor == sock_wfree) { + struct sk_buff *tail = mpdus_skb-prev; + + swap(tail-truesize, skb_gso-truesize); + swap(tail-destructor, skb_gso-destructor); + swap(tail-sk, skb_gso-sk); +atomic_add(sum_truesize - skb_gso-truesize, Please indent using tabs, not spaces. + skb_gso-sk-sk_wmem_alloc); } ret = 0; MBR, Sergei -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 3/3] iwlwifi: mvm: transfer the truesize to the last TSO segment
On 08/19/2015 05:24 PM, Eric Dumazet wrote: On Wed, 2015-08-19 at 15:59 +0300, Emmanuel Grumbach wrote: This allows to release the backpressure on the socket only when the last segment is released. Now the truesize looks like this: if the truesize of the original skb is 65420, all the segments will have a truesize of 704 (skb itself) and the last one will have 65420. Change-Id: I3c894cf2afc0aedfe7b2a5b992ba41653ff79c0e Signed-off-by: Emmanuel Grumbach emmanuel.grumb...@intel.com --- drivers/net/wireless/iwlwifi/mvm/tx.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c b/drivers/net/wireless/iwlwifi/mvm/tx.c index 5046833..046e50d 100644 --- a/drivers/net/wireless/iwlwifi/mvm/tx.c +++ b/drivers/net/wireless/iwlwifi/mvm/tx.c @@ -764,7 +764,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct sk_buff *skb_gso, bool ipv6 = skb_shinfo(skb_gso)-gso_type SKB_GSO_TCPV6; struct iwl_lso_splitter s = {}; struct page *hdr_page; -unsigned int mpdu_sz; +unsigned int mpdu_sz, sum_truesize = 0; u8 *hdr_page_pos, *qc, tid; int i, ret; @@ -898,6 +898,7 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct sk_buff *skb_gso, mpdu_sz, tcp_hdrlen(skb_gso)); __skb_queue_tail(mpdus_skb, skb_gso); +sum_truesize += skb_gso-truesize; /* mss bytes have been consumed from the data */ s.gso_payload_pos = s.mss; @@ -1034,6 +1035,20 @@ static int iwl_mvm_tx_tso(struct iwl_mvm *mvm, struct sk_buff *skb_gso, } __skb_queue_tail(mpdus_skb, skb); +sum_truesize += skb-truesize; +} + +/* Release the backpressure on the socket only when + * the last segment is released. + */ +if (skb_gso-destructor == sock_wfree) { +struct sk_buff *tail = mpdus_skb-prev; + +swap(tail-truesize, skb_gso-truesize); +swap(tail-destructor, skb_gso-destructor); +swap(tail-sk, skb_gso-sk); +atomic_add(sum_truesize - skb_gso-truesize, + skb_gso-sk-sk_wmem_alloc); } ret = 0; Using existing net/core/tso.c helpers would avoid using this. Hm.. how would net/core/tso.c avoid this? I can't see anything related to truesize there. Note that this work since it is guaranteed that we release the skbs in order. (BTW TCP packets do not have sock_wfree as destructor but tcp_wfree(), yet we want backpressure mostly for TCP stack (TCP Small Queues)) I am not sure I follow here. You want me to test: if (skb_gso-destructor == tcp_wfree) ? I checked that code using iperf and saw that I don't get into this if, but I (probably wrongly) assumed that other applications would set a flag on the socket (forgive my ignorance) that would make this if be taken. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.
On Wed, Aug 19, 2015 at 11:49 AM, Pravin Shelar pshe...@nicira.com wrote: On Wed, Aug 19, 2015 at 11:37 AM, Jesse Gross je...@nicira.com wrote: On Wed, Aug 19, 2015 at 11:29 AM, Pravin Shelar pshe...@nicira.com wrote: On Wed, Aug 19, 2015 at 11:18 AM, Jesse Gross je...@nicira.com wrote: My guess is that if the issue from the earlier patch about overlapping collect_md tunnels is fixed then that might allow us to simplify things a little further, since for those tunnels we can assume there is a 1:1 mapping between collect_md tunnels and sockets. I dont see how it would be different. Can you elaborate on this ? Mostly just conceptually simpler. Right now it looks like we are doing some kind of refcounting between devices and tunnels in geneve_open/stop (I know it's not really but it appears like that in some ways.) We could just directly assign collect_md in geneve_open() and do nothing at all in geneve_stop(). If you look at next patch, I have changed geneve_open and stop further. The change is geneve_open adds tunnel to hash table so that only device which are open are in hash table. Since geneve_open and stop is common for both type of tunnel I do not think there can be any changes even after avoiding overlapping tunnel types in given socket. I guess I'm not sure why with the later changes it would be incompatible. All I'm talking about is something pretty small: geneve_open: if (geneve-collect_md) gs-collect_md = true; to gs-collect_md = geneve-collect_md; geneve_close: remove if (geneve-collect_md) gs-collect_md = false; since the socket is about to be freed anyways. It's not very different in practice but it looks less like refcounting and more like a 1:1 mapping. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html