pull-request: wireless-drivers 2015-06-01
Hi Dave, here are three more important fixes I'm hoping to get to 4.1 still, I hope I'm not too late with these. Please let me know if there are any problems. Kalle The following changes since commit aefa441b150279dd8d25658e018898a3fe9a6769: Merge tag 'iwlwifi-for-kalle-2015-05-21' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes (2015-05-22 10:47:02 +0300) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git tags/wireless-drivers-for-davem-2015-06-01 for you to fetch changes up to 38fe44e61a894f1c7b3e60b0614030271070ea53: Merge tag 'iwlwifi-for-kalle-2015-05-28' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes (2015-05-28 16:28:03 +0300) iwlwifi: * fix OTP parsing 8260 * fix powersave handling for 8260 brcmfmac: * fix null pointer crash Arend van Spriel (1): brcmfmac: avoid null pointer access when brcmf_msgbuf_get_pktid() fails Ilan Peer (1): iwlwifi: pcie: fix tracking of cmd_in_flight Kalle Valo (1): Merge tag 'iwlwifi-for-kalle-2015-05-28' of https://git.kernel.org/.../iwlwifi/iwlwifi-fixes Liad Kaufman (1): iwlwifi: nvm: fix otp parsing in 8000 hw family drivers/net/wireless/brcm80211/brcmfmac/msgbuf.c | 12 +-- drivers/net/wireless/iwlwifi/iwl-nvm-parse.c |2 +- drivers/net/wireless/iwlwifi/pcie/internal.h |6 +++--- drivers/net/wireless/iwlwifi/pcie/trans.c|4 ++-- drivers/net/wireless/iwlwifi/pcie/tx.c | 23 +- 5 files changed, 20 insertions(+), 27 deletions(-) -- Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [cdc_ncm] guidance and help refactoring cdc_ncm
Hello Greg, hello everyone reading. I am sorry If I wasn't specific enough - I'll be this time! :) Or I try at least. On Mon, 1 Jun 2015, Greg KH wrote: ==Date: Mon, 1 Jun 2015 02:59:17 ==From: Greg KH g...@kroah.com ==To: Enrico Mioso mrkiko...@gmail.com ==Cc: linux-...@vger.kernel.org, netdev@vger.kernel.org, ==Oliver Neukum oli...@neukum.org ==Subject: Re: [cdc_ncm] guidance and help refactoring cdc_ncm == ==On Sun, May 31, 2015 at 04:37:11PM +0200, Enrico Mioso wrote: == Hello guys. == I am writing to you all to ask for help and assistance in refactoring the == cdc_ncm driver to support newer devices. == In particular - I would need step-by-step guidance in doing this: or any == other kind of help would be anyway greatly apreciated. == == 1 - What we need: == We would need to refactor the driver to be able to re-order parts of the NCM == package itself. == In particular, being a single NCM frame composed of different parts, we would == need more flexibility in changing their order. == ==Do you have hardware that needs this now? What exactly needs to be done ==here that currently doesn't work? yes - there is hardware that curently doesn't work with the actual code. In particular, I am referring to Huawei 3G / 4G modems: - Huawei E3131: will not work with some firmware versions, works with others / olders ones - Huawei E3372 (LTE modem): will not work. I received various mail messages from people trying to configure different devices that aren't working: and partly the situation is confusing since sometimes devices with very similar product names are pretty different, or derive from different hardware branches. Regarding what needs to be done: it's important to note that those devices follow an USB specification. the network control Model spec as found here: http://www.usb.org/developers/docs/devclass_docs/NCM10_012011.zip aims to provide more efficient netowkring over USB solutions, batching frames for example. The fundamental packet unit here is the NCM Package: which can hold more ethernet-style frames in it. The device and the host will negotiate once appropriate some frame characteristics. The specification doesn't mandate where some parts of the package should be placed: infact, they can be put in somewhat arbitrary places. This is true for the NDP part: we actually as I understand it, are putting it near the beginning of the frame. Some Huawei devices, started to mandate it to be at the end of it, ignoring the frame otherwise. == == 2 - What might be nice == To do so, it would be nice to have the driver queue up frames, sending them == out as needed. this already happens to a certain extent, but the NCM package == is created in the process and updated in the while as I understood the code. == The best thing would be to have the NCM package created only before sending == it out, to achieve for best performance and code readability. == ==Would this really make things faster? Probably it would, depending on the setup we are considering. Considering a standard setup where those devices are being connected to a laptop or a PC with relatively high resources, this would not make so much difference. But it's not so unusual to see these devices running coupled with small devices: and here this could make the difference in some cases. But this would not be my main goal: getting things working faster is good, but I would like just to see them working now: and so I am trying to gather help / information / guidance / code in general so in case I might try if needed in the future. == == I already contactedprivately some of you to have some more insight on what == needs to be done, and to understand better how to organize the effort. I == unfortunately miss the time to do this right now: and infact I can't even be == sure to be able to do this, due to various problems (my tesis, my life in == general). == But gathering more informations and in general trying to get some help is == the best thing I feel like doing right now. == == The compelling reasons I find for trying to fix the situation are: == 1 - The fact these drivers are used in different products integrating or == interfacing with 3G/4G technologies. == ==Is there hardware that has out-of-tree drivers that implement what you ==are referring to here? Or does someone just want this to make the ==hardware work better? == ==I think we need more specifics before being able to determine exactly ==what needs to be done. == ==thanks, == ==greg k-h == Thank you Greg for your precious help. Once again - some devices will just not work. There is an out-of-tree vendor driver implementing what I am referring to: it contains code to work with many different devices from Huawei, but only the NCM related parts would be of use in this scenario. Other devices are already supported and working in the kernel. the driver can be found here:
Re: 2.6.32.66 tcp regression OOPs
[cc: Willy Tarreau] On Mon, Jun 1, 2015 at 3:26 AM, starlight.201...@binnacle.cx wrote: Hello, Apoligies if I have submitted to the wrong lists. Encountered a regression in 2.6.32.66 relative to 2.6.32.65. Crash eight minutes after boot. Will responded with additional details if the OOPS is not sufficent. Best Regards Did you bisect it? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pull-request: mac80211-next 2015-05-29
On Sun, 2015-05-31 at 17:35 -0700, David Miller wrote: Pulled, but there was a small conflict in include/net/mac80211.h which seemed trivially resolvable. Take a look and send me a fixup if I got it wrong, thanks. Looks good - it seems I managed to apply two different docbook fixes to the two different trees ... Talk about being confused :) johannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [cdc_ncm] guidance and help refactoring cdc_ncm
Thank you Oliver, thank you all for reading this thread and the attention. For a more detailed discussion and how we got here, you might google for the thread: Is this 32-bit NCM? and Is this 32-bit NCM?y (follow up). Where the y letter comes from a mistake of mine. The specification does only mandate the position of the NTH (header). The rest can be in any order and position in general. This will work with most devices: except, of course, those from Huawei. Our aggregate looks something like this from my perspective (anyone correct me in case): NTH: header NDP: contains indexing informations ethernet packet 1 ethernet packet 2 ... ethernet packet n; While it should look like: NTH: header ethernet packet 1 ethernet packet 2 ... ethernet packet n; NDP: contains indexing informations but, when introducing such a change: you might break other devices now working. Infact, clearly there are multiple vendors producing NCM device, as you might also see by looking at the dirver's comments. So in general, we should be able to dynamically change the way in which the driver order things in the package. and that's why I initially proposed the redesign. thank you guys, for the patience and time. Enrico -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 21/25] time/posix-clock:Convert to the 64bit methods for k_clock and posix_clock_operations structure
This patch converts the posix clock operations over to the new methods with timespec64/itimerspec64 type to making them ready for 2038, and it is based on the ptp patch series. And also changes to the 64bit methods for k_clock structure, that converts the timespec/itimerspec type to timespec64/itimerspec64 type. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- drivers/ptp/ptp_clock.c | 26 -- include/linux/posix-clock.h | 10 +- kernel/time/posix-clock.c | 20 ++-- 3 files changed, 23 insertions(+), 33 deletions(-) diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c index bee8270..8c086e7 100644 --- a/drivers/ptp/ptp_clock.c +++ b/drivers/ptp/ptp_clock.c @@ -97,32 +97,24 @@ static s32 scaled_ppm_to_ppb(long ppm) /* posix clock implementation */ -static int ptp_clock_getres(struct posix_clock *pc, struct timespec *tp) +static int ptp_clock_getres(struct posix_clock *pc, struct timespec64 *tp) { tp-tv_sec = 0; tp-tv_nsec = 1; return 0; } -static int ptp_clock_settime(struct posix_clock *pc, const struct timespec *tp) +static int ptp_clock_settime(struct posix_clock *pc, + const struct timespec64 *tp) { struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock); - struct timespec64 ts = timespec_to_timespec64(*tp); - - return ptp-info-settime64(ptp-info, ts); + return ptp-info-settime64(ptp-info, tp); } -static int ptp_clock_gettime(struct posix_clock *pc, struct timespec *tp) +static int ptp_clock_gettime(struct posix_clock *pc, struct timespec64 *tp) { struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock); - struct timespec64 ts; - int err; - - err = ptp-info-gettime64(ptp-info, ts); - if (!err) - *tp = timespec64_to_timespec(ts); - - return err; + return ptp-info-gettime64(ptp-info, tp); } static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx) @@ -134,8 +126,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx) ops = ptp-info; if (tx-modes ADJ_SETOFFSET) { - struct timespec ts; - ktime_t kt; + struct timespec64 ts; s64 delta; ts.tv_sec = tx-time.tv_sec; @@ -147,8 +138,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx) if ((unsigned long) ts.tv_nsec = NSEC_PER_SEC) return -EINVAL; - kt = timespec_to_ktime(ts); - delta = ktime_to_ns(kt); + delta = timespec64_to_ns(ts); err = ops-adjtime(ops, delta); } else if (tx-modes ADJ_FREQUENCY) { s32 ppb = scaled_ppm_to_ppb(tx-freq); diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h index 34c4498..fd7e22c 100644 --- a/include/linux/posix-clock.h +++ b/include/linux/posix-clock.h @@ -59,23 +59,23 @@ struct posix_clock_operations { int (*clock_adjtime)(struct posix_clock *pc, struct timex *tx); - int (*clock_gettime)(struct posix_clock *pc, struct timespec *ts); + int (*clock_gettime)(struct posix_clock *pc, struct timespec64 *ts); - int (*clock_getres) (struct posix_clock *pc, struct timespec *ts); + int (*clock_getres)(struct posix_clock *pc, struct timespec64 *ts); int (*clock_settime)(struct posix_clock *pc, - const struct timespec *ts); + const struct timespec64 *ts); int (*timer_create) (struct posix_clock *pc, struct k_itimer *kit); int (*timer_delete) (struct posix_clock *pc, struct k_itimer *kit); void (*timer_gettime)(struct posix_clock *pc, - struct k_itimer *kit, struct itimerspec *tsp); + struct k_itimer *kit, struct itimerspec64 *tsp); int (*timer_settime)(struct posix_clock *pc, struct k_itimer *kit, int flags, - struct itimerspec *tsp, struct itimerspec *old); + struct itimerspec64 *tsp, struct itimerspec64 *old); /* * Optional character device methods: */ diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c index ce033c7..e21e4c1 100644 --- a/kernel/time/posix-clock.c +++ b/kernel/time/posix-clock.c @@ -297,7 +297,7 @@ out: return err; } -static int pc_clock_gettime(clockid_t id, struct timespec *ts) +static int pc_clock_gettime(clockid_t id, struct timespec64 *ts) { struct posix_clock_desc cd; int err; @@ -316,7 +316,7 @@ static int pc_clock_gettime(clockid_t id, struct timespec *ts) return err; } -static int pc_clock_getres(clockid_t id, struct timespec *ts) +static int pc_clock_getres(clockid_t id, struct timespec64 *ts) {
Re: [PATCH v2] xen: netback: fix printf format string warning
On Mon, Jun 01, 2015 at 11:30:04AM +0100, Ian Campbell wrote: drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’: drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Wformat=] (txreq.offset~PAGE_MASK) + txreq.size); ^ PAGE_MASK's type can vary by arch, so a cast is needed. Signed-off-by: Ian Campbell ian.campb...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com v2: Cast to unsigned long, since PAGE_MASK can vary by arch. --- drivers/net/xen-netback/netback.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 4de46aa..0d25943 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1250,7 +1250,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, netdev_err(queue-vif-dev, txreq.offset: %x, size: %u, end: %lu\n, txreq.offset, txreq.size, -(txreq.offset~PAGE_MASK) + txreq.size); +(unsigned long)(txreq.offset~PAGE_MASK) + txreq.size); xenvif_fatal_tx_err(queue-vif); break; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [cdc_ncm] guidance and help refactoring cdc_ncm
On Mon, 2015-06-01 at 13:41 +0200, Enrico Mioso wrote: Thank you Oliver, thank you all for reading this thread and the attention. For a more detailed discussion and how we got here, you might google for the thread: Is this 32-bit NCM? and Is this 32-bit NCM?y (follow up). Where the y letter comes from a mistake of mine. Having read them it looks like the issues of padding and sequence numbers are open. The specification does only mandate the position of the NTH (header). The rest can be in any order and position in general. This will work with most devices: except, of course, those from Huawei. Indeed. And a redesign for crap devices looks like a bad idea. Our aggregate looks something like this from my perspective (anyone correct me in case): NTH: header NDP: contains indexing informations ethernet packet 1 ethernet packet 2 ... ethernet packet n; While it should look like: NTH: header ethernet packet 1 ethernet packet 2 ... ethernet packet n; NDP: contains indexing informations but, when introducing such a change: you might break other devices now working. Infact, clearly there are multiple vendors producing NCM device, as you might also see by looking at the dirver's comments. So in general, we should be able to dynamically change the way in which the driver order things in the package. and that's why I initially proposed the redesign. OK, so the NDP needs to be at the end. However in the old thread you state that this requires the NDP to be built between the final aggregate and physically transmitting. I think this is a false choice. You could just as well copy the NDP around provided you reserve enough space at the end of the skb. Regards Oliver -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] bnx2x: Move statistics implementation into semaphores
Commit dff173de84958 (bnx2x: Fix statistics locking scheme) changed the bnx2x locking around statistics state into using a mutex - but the lock is being accessed via a timer which is forbidden. [If compiled with CONFIG_DEBUG_MUTEXES, logs show a warning about accessing the mutex in interrupt context] This moves the implementation into using a semaphore [with size '1'] instead. Signed-off-by: Yuval Mintz yuval.mi...@qlogic.com Signed-off-by: Ariel Elior ariel.el...@qlogic.com --- Hi Dave, Please consider applying this to `net'. Thanks, Yuval --- drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 2 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 9 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c | 20 ++-- 3 files changed, 20 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h index a3b0f7a..1f82a04 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h @@ -1774,7 +1774,7 @@ struct bnx2x { int stats_state; /* used for synchronization of concurrent threads statistics handling */ - struct mutexstats_lock; + struct semaphorestats_lock; /* used by dmae command loader */ struct dmae_command stats_dmae; diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index fd52ce9..33501bc 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -12054,7 +12054,7 @@ static int bnx2x_init_bp(struct bnx2x *bp) mutex_init(bp-port.phy_mutex); mutex_init(bp-fw_mb_mutex); mutex_init(bp-drv_info_mutex); - mutex_init(bp-stats_lock); + sema_init(bp-stats_lock, 1); bp-drv_info_mng_owner = false; INIT_DELAYED_WORK(bp-sp_task, bnx2x_sp_task); @@ -13690,9 +13690,10 @@ static int bnx2x_eeh_nic_unload(struct bnx2x *bp) cancel_delayed_work_sync(bp-sp_task); cancel_delayed_work_sync(bp-period_task); - mutex_lock(bp-stats_lock); - bp-stats_state = STATS_STATE_DISABLED; - mutex_unlock(bp-stats_lock); + if (!down_timeout(bp-stats_lock, HZ / 10)) { + bp-stats_state = STATS_STATE_DISABLED; + up(bp-stats_lock); + } bnx2x_save_statistics(bp); diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c index 266b055..69d699f0 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c @@ -1372,19 +1372,23 @@ void bnx2x_stats_handle(struct bnx2x *bp, enum bnx2x_stats_event event) * that context in case someone is in the middle of a transition. * For other events, wait a bit until lock is taken. */ - if (!mutex_trylock(bp-stats_lock)) { + if (down_trylock(bp-stats_lock)) { if (event == STATS_EVENT_UPDATE) return; DP(BNX2X_MSG_STATS, Unlikely stats' lock contention [event %d]\n, event); - mutex_lock(bp-stats_lock); + if (unlikely(down_timeout(bp-stats_lock, HZ / 10))) { + BNX2X_ERR(Failed to take stats lock [event %d]\n, + event); + return; + } } bnx2x_stats_stm[state][event].action(bp); bp-stats_state = bnx2x_stats_stm[state][event].next_state; - mutex_unlock(bp-stats_lock); + up(bp-stats_lock); if ((event != STATS_EVENT_UPDATE) || netif_msg_timer(bp)) DP(BNX2X_MSG_STATS, state %d - event %d - state %d\n, @@ -1970,7 +1974,11 @@ int bnx2x_stats_safe_exec(struct bnx2x *bp, /* Wait for statistics to end [while blocking further requests], * then run supplied function 'safely'. */ - mutex_lock(bp-stats_lock); + rc = down_timeout(bp-stats_lock, HZ / 10); + if (unlikely(rc)) { + BNX2X_ERR(Failed to take statistics lock for safe execution\n); + goto out_no_lock; + } bnx2x_stats_comp(bp); while (bp-stats_pending cnt--) @@ -1988,7 +1996,7 @@ out: /* No need to restart statistics - if they're enabled, the timer * will restart the statistics. */ - mutex_unlock(bp-stats_lock); - + up(bp-stats_lock); +out_no_lock: return rc; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 14/14] sfc: leak vports if a VF is assigned during PF unload
From: Daniel Pieczko dpiec...@solarflare.com If any VF is assigned as the PF is unloaded, do not attempt to remove its vport or the vswitch. These will be removed if the driver binds to the PF again, as an entity reset occurs during probe. A 'force' flag is added to efx_ef10_pci_sriov_disable() to distinguish between disabling SR-IOV and driver unload. SR-IOV cannot be disabled if VFs are assigned to guests. If the PF driver is unloaded while VFs are assigned, the driver may try to bind to the VF again at a later point if the driver has been reloaded and the VF returns to the same domain as the PF. In this case, the PF will not have a VF data structure, so the VF can check this and drop out of probe early. In this case, efx-vf_count will be zero but VFs will be present. The user is advised to remove the VF and re-create it. The check at the beginning of efx_ef10_pci_sriov_disable() that efx-vf_count is non-zero is removed to allow SR-IOV to be disabled in this case. Also, if the PF driver is unloaded, it will disable SR-IOV to remove these unknown VFs. By not disabling bus-mastering if VFs are still assigned, the VF will continue to pass traffic after the PF has been removed. When using the max_vfs module parameter, if VFs are already present do not try to initialise any more. Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/ef10.c | 20 drivers/net/ethernet/sfc/ef10_sriov.c | 35 --- drivers/net/ethernet/sfc/ef10_sriov.h | 2 ++ drivers/net/ethernet/sfc/efx.c| 4 +++- 4 files changed, 49 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index e73d7b5..142fa23 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -680,6 +680,24 @@ static int efx_ef10_probe_pf(struct efx_nic *efx) static int efx_ef10_probe_vf(struct efx_nic *efx) { int rc; + struct pci_dev *pci_dev_pf; + + /* If the parent PF has no VF data structure, it doesn't know about this +* VF so fail probe. The VF needs to be re-created. This can happen +* if the PF driver is unloaded while the VF is assigned to a guest. +*/ + pci_dev_pf = efx-pci_dev-physfn; + if (pci_dev_pf) { + struct efx_nic *efx_pf = pci_get_drvdata(pci_dev_pf); + struct efx_ef10_nic_data *nic_data_pf = efx_pf-nic_data; + + if (!nic_data_pf-vf) { + netif_info(efx, drv, efx-net_dev, + The VF cannot link to its parent PF; + please destroy and re-create the VF\n); + return -EBUSY; + } + } rc = efx_ef10_probe(efx); if (rc) @@ -697,6 +715,8 @@ static int efx_ef10_probe_vf(struct efx_nic *efx) struct efx_ef10_nic_data *nic_data = efx-nic_data; nic_data_p-vf[nic_data-vf_index].efx = efx; + nic_data_p-vf[nic_data-vf_index].pci_dev = + efx-pci_dev; } else netif_info(efx, drv, efx-net_dev, Could not get the PF id from VF\n); diff --git a/drivers/net/ethernet/sfc/ef10_sriov.c b/drivers/net/ethernet/sfc/ef10_sriov.c index 41ab18d..6c9b6e4 100644 --- a/drivers/net/ethernet/sfc/ef10_sriov.c +++ b/drivers/net/ethernet/sfc/ef10_sriov.c @@ -165,6 +165,11 @@ static void efx_ef10_sriov_free_vf_vports(struct efx_nic *efx) for (i = 0; i efx-vf_count; i++) { struct ef10_vf *vf = nic_data-vf + i; + /* If VF is assigned, do not free the vport */ + if (vf-pci_dev + vf-pci_dev-dev_flags PCI_DEV_FLAGS_ASSIGNED) + continue; + if (vf-vport_assigned) { efx_ef10_evb_port_assign(efx, EVB_PORT_ID_NULL, i); vf-vport_assigned = 0; @@ -380,7 +385,9 @@ void efx_ef10_vswitching_remove_pf(struct efx_nic *efx) efx_ef10_vport_free(efx, nic_data-vport_id); nic_data-vport_id = EVB_PORT_ID_ASSIGNED; - efx_ef10_vswitch_free(efx, nic_data-vport_id); + /* Only free the vswitch if no VFs are assigned */ + if (!pci_vfs_assigned(efx-pci_dev)) + efx_ef10_vswitch_free(efx, nic_data-vport_id); } void efx_ef10_vswitching_remove_vf(struct efx_nic *efx) @@ -413,20 +420,22 @@ fail1: return rc; } -static int efx_ef10_pci_sriov_disable(struct efx_nic *efx) +static int efx_ef10_pci_sriov_disable(struct efx_nic *efx, bool force) { struct pci_dev *dev = efx-pci_dev; + unsigned int vfs_assigned = 0; - if (!efx-vf_count) - return 0; + vfs_assigned = pci_vfs_assigned(dev); - if (pci_vfs_assigned(dev)) { - netif_err(efx, drv,
[PATCH net-next v2 13/14] sfc: force removal of VF and vport on driver removal
From: Daniel Pieczko dpiec...@solarflare.com When the driver unloads, force the unbind and removal of any VFs in the host with the PF. The PF cannot remove vports and vswitches if they are still being used by a VF driver, and when unloading the sfc driver the removal order is not guaranteed, so the instruction from the PF to the VF to unbind enforces a suitable ordering so that vswitches and vports can be removed. As a result of this, manually unbinding the driver from a single PF will result in all of its VFs in the host also being removed. Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/ef10_sriov.c | 9 + drivers/net/ethernet/sfc/efx.c| 3 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/sfc/ef10_sriov.c b/drivers/net/ethernet/sfc/ef10_sriov.c index 083c534..41ab18d 100644 --- a/drivers/net/ethernet/sfc/ef10_sriov.c +++ b/drivers/net/ethernet/sfc/ef10_sriov.c @@ -448,11 +448,20 @@ int efx_ef10_sriov_init(struct efx_nic *efx) void efx_ef10_sriov_fini(struct efx_nic *efx) { struct efx_ef10_nic_data *nic_data = efx-nic_data; + unsigned int i; int rc; if (!nic_data-vf) return; + /* Remove any VFs in the host */ + for (i = 0; i efx-vf_count; ++i) { + struct efx_nic *vf_efx = nic_data-vf[i].efx; + + if (vf_efx) + vf_efx-pci_dev-driver-remove(vf_efx-pci_dev); + } + rc = efx_ef10_pci_sriov_disable(efx); if (rc) netif_dbg(efx, drv, efx-net_dev, diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c index fe3481c..6887871 100644 --- a/drivers/net/ethernet/sfc/efx.c +++ b/drivers/net/ethernet/sfc/efx.c @@ -2867,7 +2867,8 @@ static void efx_pci_remove_main(struct efx_nic *efx) } /* Final NIC shutdown - * This is called only at module unload (or hotplug removal). + * This is called only at module unload (or hotplug removal). A PF can call + * this on its VFs to ensure they are unbound first. */ static void efx_pci_remove(struct pci_dev *pci_dev) { -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 10/14] sfc: suppress vadaptor stats when EVB is not present
From: Daniel Pieczko dpiec...@solarflare.com The raw_mask array is not initialised, so it needs to be explicitly set to zero in the 'else' branch. If the EVB capability is not present, a port cannot have multiple functions so the per-port MAC stats are correct and should match the corresponding vadaptor stats, so this redundancy can be removed from the ethtool stats output. Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/ef10.c | 12 +--- drivers/net/ethernet/sfc/mcdi_pcol.h | 2 ++ 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index cb4c972..39d0cf1 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -1161,13 +1161,19 @@ static u64 efx_ef10_raw_stat_mask(struct efx_nic *efx) static void efx_ef10_get_stat_mask(struct efx_nic *efx, unsigned long *mask) { + struct efx_ef10_nic_data *nic_data = efx-nic_data; u64 raw_mask[2]; raw_mask[0] = efx_ef10_raw_stat_mask(efx); - /* All functions see the vadaptor stats */ - raw_mask[0] |= ~((1ULL EF10_STAT_rx_unicast) - 1); - raw_mask[1] = (1ULL (EF10_STAT_COUNT - 63)) - 1; + /* Only show vadaptor stats when EVB capability is present */ + if (nic_data-datapath_caps + (1 MC_CMD_GET_CAPABILITIES_OUT_EVB_LBN)) { + raw_mask[0] |= ~((1ULL EF10_STAT_rx_unicast) - 1); + raw_mask[1] = (1ULL (EF10_STAT_COUNT - 63)) - 1; + } else { + raw_mask[1] = 0; + } #if BITS_PER_LONG == 64 mask[0] = raw_mask[0]; diff --git a/drivers/net/ethernet/sfc/mcdi_pcol.h b/drivers/net/ethernet/sfc/mcdi_pcol.h index 181978d..45fca9f 100644 --- a/drivers/net/ethernet/sfc/mcdi_pcol.h +++ b/drivers/net/ethernet/sfc/mcdi_pcol.h @@ -5600,6 +5600,8 @@ #defineMC_CMD_GET_CAPABILITIES_OUT_MCAST_FILTER_CHAINING_WIDTH 1 #defineMC_CMD_GET_CAPABILITIES_OUT_PM_AND_RXDP_COUNTERS_LBN 27 #defineMC_CMD_GET_CAPABILITIES_OUT_PM_AND_RXDP_COUNTERS_WIDTH 1 +#defineMC_CMD_GET_CAPABILITIES_OUT_EVB_LBN 30 +#defineMC_CMD_GET_CAPABILITIES_OUT_EVB_WIDTH 1 /* RxDPCPU firmware id. */ #define MC_CMD_GET_CAPABILITIES_OUT_RX_DPCPU_FW_ID_OFST 4 #define MC_CMD_GET_CAPABILITIES_OUT_RX_DPCPU_FW_ID_LEN 2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 11/14] sfc: don't update stats on VF when called in atomic context
From: Daniel Pieczko dpiec...@solarflare.com The ifenslave command to set up a bond runs in an atomic context, and it queries the stats on the devices that are being enslaved. A VF needs to make an MCDI call to update its stats, which is not allowed in atomic context. The releasing of the stats_lock is moved to the beginning of the VF stats update function so that in_interrupt() can be used; it must be taken again before returning from this function. Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/ef10.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index 39d0cf1..e73d7b5 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -1305,11 +1305,24 @@ static int efx_ef10_try_update_nic_stats_vf(struct efx_nic *efx) __le64 *dma_stats; int rc; + spin_unlock_bh(efx-stats_lock); + + if (in_interrupt()) { + /* If in atomic context, cannot update stats. Just update the +* software stats and return so the caller can continue. +*/ + spin_lock_bh(efx-stats_lock); + efx_update_sw_stats(efx, stats); + return 0; + } + efx_ef10_get_stat_mask(efx, mask); rc = efx_nic_alloc_buffer(efx, stats_buf, dma_len, GFP_ATOMIC); - if (rc) + if (rc) { + spin_lock_bh(efx-stats_lock); return rc; + } dma_stats = stats_buf.addr; dma_stats[MC_CMD_MAC_GENERATION_END] = EFX_MC_STATS_GENERATION_INVALID; @@ -1320,7 +1333,6 @@ static int efx_ef10_try_update_nic_stats_vf(struct efx_nic *efx) MCDI_SET_DWORD(inbuf, MAC_STATS_IN_DMA_LEN, dma_len); MCDI_SET_DWORD(inbuf, MAC_STATS_IN_PORT_ID, EVB_PORT_ID_ASSIGNED); - spin_unlock_bh(efx-stats_lock); rc = efx_mcdi_rpc_quiet(efx, MC_CMD_MAC_STATS, inbuf, sizeof(inbuf), NULL, 0, NULL); spin_lock_bh(efx-stats_lock); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 12/14] sfc: do not allow VFs to be destroyed if assigned to guests
From: Daniel Pieczko dpiec...@solarflare.com Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/ef10_sriov.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/net/ethernet/sfc/ef10_sriov.c b/drivers/net/ethernet/sfc/ef10_sriov.c index cd52454..083c534 100644 --- a/drivers/net/ethernet/sfc/ef10_sriov.c +++ b/drivers/net/ethernet/sfc/ef10_sriov.c @@ -417,6 +417,15 @@ static int efx_ef10_pci_sriov_disable(struct efx_nic *efx) { struct pci_dev *dev = efx-pci_dev; + if (!efx-vf_count) + return 0; + + if (pci_vfs_assigned(dev)) { + netif_err(efx, drv, efx-net_dev, VFs are assigned to guests; + please detach them before disabling SR-IOV\n); + return -EBUSY; + } + pci_disable_sriov(dev); efx_ef10_sriov_free_vf_vswitching(efx); efx-vf_count = 0; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 00/25] Convert the posix_clock_operations and k_clock structure to ready for 2038
This patch series changes the 32-bit time types (timespec/itimerspec) to the 64-bit types (timespec64/itimerspec64), since 32-bit time types will break in the year 2038. This patch series introduces new methods with timespec64/itimerspec64 type, and removes the old ones with timespec/itimerspec type for posix_clock_operations and k_clock structure. Baolin Wang (25): time:Introduce struct itimerspec64 timekeeping:Introduce the current_kernel_time64() hrtimer:Introduce hrtimer_get_res64() security: Introduce security_settime64() time:Introduce the do_sys_settimeofday64() posix-timers:Introduce {get,put}_timespec/{get,put}_itimerspec posix-timers: Split up timer_gettime()/timer_settime()/clock_settime()/ clock_gettime()/clock_getres(). posix-timers: Convert timer_gettime()/timer_settime()/clock_settime()/ clock_gettime()/clock_getres() to timespec64/itimerspec64. mmtimer:Convert to timespec64/itimerspec64 alarmtimer:Convert to timespec64/itimerspec64 posix-clock:Convert to timespec64/itimerspec64 time:Introduce timespec64_to_jiffies()/jiffies_to_timespec64() cputime:Introduce cputime_to_timespec64()/timespec64_to_cputime() posix-cpu-timers:Convert to timespec64/itimerspec64 k_clock:Remove timespec/itimerspec arch/powerpc/include/asm/cputime.h|6 +- arch/s390/include/asm/cputime.h |8 +- drivers/char/mmtimer.c| 36 +++-- drivers/ptp/ptp_clock.c | 26 +--- include/asm-generic/cputime_jiffies.h | 10 +- include/asm-generic/cputime_nsecs.h |4 +- include/linux/cputime.h | 16 ++ include/linux/hrtimer.h | 16 +- include/linux/jiffies.h | 21 ++- include/linux/posix-clock.h | 10 +- include/linux/posix-timers.h | 18 +-- include/linux/security.h | 25 +++- include/linux/time64.h| 35 + include/linux/timekeeping.h | 26 +++- kernel/time/alarmtimer.c | 43 +++--- kernel/time/hrtimer.c | 10 +- kernel/time/posix-clock.c | 20 +-- kernel/time/posix-cpu-timers.c| 84 ++- kernel/time/posix-timers.c| 259 + kernel/time/time.c| 20 +-- kernel/time/timekeeping.c |6 +- kernel/time/timekeeping.h |1 - security/commoncap.c |2 +- security/security.c |2 +- 24 files changed, 437 insertions(+), 267 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 07/14] sfc: DMA the VF stats only when requested
From: Daniel Pieczko dpiec...@solarflare.com Firmware does not support a periodic DMA of vadaptor-stats on VFs, so only update the stats buffer when stats are requested (when running ethtool -S or an ip/ifconfig command that reports stats). Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/ef10.c | 149 +-- drivers/net/ethernet/sfc/mcdi_pcol.h | 4 +- 2 files changed, 112 insertions(+), 41 deletions(-) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index 554aff4..323ca47 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -1189,7 +1189,50 @@ static size_t efx_ef10_describe_stats(struct efx_nic *efx, u8 *names) mask, names); } -static int efx_ef10_try_update_nic_stats(struct efx_nic *efx) +static size_t efx_ef10_update_stats_common(struct efx_nic *efx, u64 *full_stats, + struct rtnl_link_stats64 *core_stats) +{ + DECLARE_BITMAP(mask, EF10_STAT_COUNT); + struct efx_ef10_nic_data *nic_data = efx-nic_data; + u64 *stats = nic_data-stats; + size_t stats_count = 0, index; + + efx_ef10_get_stat_mask(efx, mask); + + if (full_stats) { + for_each_set_bit(index, mask, EF10_STAT_COUNT) { + if (efx_ef10_stat_desc[index].name) { + *full_stats++ = stats[index]; + ++stats_count; + } + } + } + + if (core_stats) { + core_stats-rx_packets = stats[EF10_STAT_port_rx_packets]; + core_stats-tx_packets = stats[EF10_STAT_port_tx_packets]; + core_stats-rx_bytes = stats[EF10_STAT_port_rx_bytes]; + core_stats-tx_bytes = stats[EF10_STAT_port_tx_bytes]; + core_stats-rx_dropped = stats[EF10_STAT_port_rx_nodesc_drops] + +stats[GENERIC_STAT_rx_nodesc_trunc] + +stats[GENERIC_STAT_rx_noskb_drops]; + core_stats-multicast = stats[EF10_STAT_port_rx_multicast]; + core_stats-rx_length_errors = + stats[EF10_STAT_port_rx_gtjumbo] + + stats[EF10_STAT_port_rx_length_error]; + core_stats-rx_crc_errors = stats[EF10_STAT_port_rx_bad]; + core_stats-rx_frame_errors = + stats[EF10_STAT_port_rx_align_error]; + core_stats-rx_fifo_errors = stats[EF10_STAT_port_rx_overflow]; + core_stats-rx_errors = (core_stats-rx_length_errors + +core_stats-rx_crc_errors + +core_stats-rx_frame_errors); + } + + return stats_count; +} + +static int efx_ef10_try_update_nic_stats_pf(struct efx_nic *efx) { struct efx_ef10_nic_data *nic_data = efx-nic_data; DECLARE_BITMAP(mask, EF10_STAT_COUNT); @@ -1226,57 +1269,83 @@ static int efx_ef10_try_update_nic_stats(struct efx_nic *efx) } -static size_t efx_ef10_update_stats(struct efx_nic *efx, u64 *full_stats, - struct rtnl_link_stats64 *core_stats) +static size_t efx_ef10_update_stats_pf(struct efx_nic *efx, u64 *full_stats, + struct rtnl_link_stats64 *core_stats) { - DECLARE_BITMAP(mask, EF10_STAT_COUNT); - struct efx_ef10_nic_data *nic_data = efx-nic_data; - u64 *stats = nic_data-stats; - size_t stats_count = 0, index; int retry; - efx_ef10_get_stat_mask(efx, mask); - /* If we're unlucky enough to read statistics during the DMA, wait * up to 10ms for it to finish (typically takes 500us) */ for (retry = 0; retry 100; ++retry) { - if (efx_ef10_try_update_nic_stats(efx) == 0) + if (efx_ef10_try_update_nic_stats_pf(efx) == 0) break; udelay(100); } - if (full_stats) { - for_each_set_bit(index, mask, EF10_STAT_COUNT) { - if (efx_ef10_stat_desc[index].name) { - *full_stats++ = stats[index]; - ++stats_count; - } - } - } + return efx_ef10_update_stats_common(efx, full_stats, core_stats); +} - if (core_stats) { - core_stats-rx_packets = stats[EF10_STAT_port_rx_packets]; - core_stats-tx_packets = stats[EF10_STAT_port_tx_packets]; - core_stats-rx_bytes = stats[EF10_STAT_port_rx_bytes]; - core_stats-tx_bytes = stats[EF10_STAT_port_tx_bytes]; - core_stats-rx_dropped = stats[EF10_STAT_port_rx_nodesc_drops] + -stats[GENERIC_STAT_rx_nodesc_trunc] + -
[PATCH net-next v2 06/14] sfc: display vadaptor statistics for all interfaces
From: Daniel Pieczko dpiec...@solarflare.com All interfaces will display vadaptor statistics, so set all the relevant bits in the stats bitmask. Only functions with the LINKCTRL flag will see other stats, including (per-port) MAC stats. The vadaptor stats are from rx_unicast to tx_overflow. Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/ef10.c | 39 drivers/net/ethernet/sfc/mcdi_pcol.h | 20 ++ drivers/net/ethernet/sfc/nic.h | 18 + 3 files changed, 73 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index a574dd3..554aff4 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -1045,6 +1045,24 @@ static const struct efx_hw_stat_desc efx_ef10_stat_desc[EF10_STAT_COUNT] = { EF10_DMA_STAT(port_rx_dp_streaming_packets, RXDP_STREAMING_PKTS), EF10_DMA_STAT(port_rx_dp_hlb_fetch, RXDP_HLB_FETCH_CONDITIONS), EF10_DMA_STAT(port_rx_dp_hlb_wait, RXDP_HLB_WAIT_CONDITIONS), + EF10_DMA_STAT(rx_unicast, VADAPTER_RX_UNICAST_PACKETS), + EF10_DMA_STAT(rx_unicast_bytes, VADAPTER_RX_UNICAST_BYTES), + EF10_DMA_STAT(rx_multicast, VADAPTER_RX_MULTICAST_PACKETS), + EF10_DMA_STAT(rx_multicast_bytes, VADAPTER_RX_MULTICAST_BYTES), + EF10_DMA_STAT(rx_broadcast, VADAPTER_RX_BROADCAST_PACKETS), + EF10_DMA_STAT(rx_broadcast_bytes, VADAPTER_RX_BROADCAST_BYTES), + EF10_DMA_STAT(rx_bad, VADAPTER_RX_BAD_PACKETS), + EF10_DMA_STAT(rx_bad_bytes, VADAPTER_RX_BAD_BYTES), + EF10_DMA_STAT(rx_overflow, VADAPTER_RX_OVERFLOW), + EF10_DMA_STAT(tx_unicast, VADAPTER_TX_UNICAST_PACKETS), + EF10_DMA_STAT(tx_unicast_bytes, VADAPTER_TX_UNICAST_BYTES), + EF10_DMA_STAT(tx_multicast, VADAPTER_TX_MULTICAST_PACKETS), + EF10_DMA_STAT(tx_multicast_bytes, VADAPTER_TX_MULTICAST_BYTES), + EF10_DMA_STAT(tx_broadcast, VADAPTER_TX_BROADCAST_PACKETS), + EF10_DMA_STAT(tx_broadcast_bytes, VADAPTER_TX_BROADCAST_BYTES), + EF10_DMA_STAT(tx_bad, VADAPTER_TX_BAD_PACKETS), + EF10_DMA_STAT(tx_bad_bytes, VADAPTER_TX_BAD_BYTES), + EF10_DMA_STAT(tx_overflow, VADAPTER_TX_OVERFLOW), }; #define HUNT_COMMON_STAT_MASK ((1ULL EF10_STAT_port_tx_bytes) | \ @@ -1125,6 +1143,10 @@ static u64 efx_ef10_raw_stat_mask(struct efx_nic *efx) u32 port_caps = efx_mcdi_phy_get_caps(efx); struct efx_ef10_nic_data *nic_data = efx-nic_data; + if (!(efx-mcdi-fn_flags + 1 MC_CMD_DRV_ATTACH_EXT_OUT_FLAG_LINKCTRL)) + return 0; + if (port_caps (1 MC_CMD_PHY_CAP_4FDX_LBN)) raw_mask |= HUNT_40G_EXTRA_STAT_MASK; else @@ -1139,13 +1161,22 @@ static u64 efx_ef10_raw_stat_mask(struct efx_nic *efx) static void efx_ef10_get_stat_mask(struct efx_nic *efx, unsigned long *mask) { - u64 raw_mask = efx_ef10_raw_stat_mask(efx); + u64 raw_mask[2]; + + raw_mask[0] = efx_ef10_raw_stat_mask(efx); + + /* All functions see the vadaptor stats */ + raw_mask[0] |= ~((1ULL EF10_STAT_rx_unicast) - 1); + raw_mask[1] = (1ULL (EF10_STAT_COUNT - 63)) - 1; #if BITS_PER_LONG == 64 - mask[0] = raw_mask; + mask[0] = raw_mask[0]; + mask[1] = raw_mask[1]; #else - mask[0] = raw_mask 0x; - mask[1] = raw_mask 32; + mask[0] = raw_mask[0] 0x; + mask[1] = raw_mask[0] 32; + mask[2] = raw_mask[1] 0x; + mask[3] = raw_mask[1] 32; #endif } diff --git a/drivers/net/ethernet/sfc/mcdi_pcol.h b/drivers/net/ethernet/sfc/mcdi_pcol.h index 1e11bb8..0e497b3 100644 --- a/drivers/net/ethernet/sfc/mcdi_pcol.h +++ b/drivers/net/ethernet/sfc/mcdi_pcol.h @@ -2896,6 +2896,26 @@ * descriptor fetch. Valid for EF10 with PM_AND_RXDP_COUNTERS capability only. */ #define MC_CMD_MAC_RXDP_HLB_WAIT_CONDITIONS 0x48 +#define MC_CMD_MAC_VADAPTER_RX_DMABUF_START 0x4c /* enum */ +#define MC_CMD_MAC_VADAPTER_RX_UNICAST_PACKETS 0x4c /* enum */ +#define MC_CMD_MAC_VADAPTER_RX_UNICAST_BYTES 0x4d /* enum */ +#define MC_CMD_MAC_VADAPTER_RX_MULTICAST_PACKETS 0x4e /* enum */ +#define MC_CMD_MAC_VADAPTER_RX_MULTICAST_BYTES 0x4f /* enum */ +#define MC_CMD_MAC_VADAPTER_RX_BROADCAST_PACKETS 0x50 /* enum */ +#define MC_CMD_MAC_VADAPTER_RX_BROADCAST_BYTES 0x51 /* enum */ +#define MC_CMD_MAC_VADAPTER_RX_BAD_PACKETS 0x52 /* enum */ +#define MC_CMD_MAC_VADAPTER_RX_BAD_BYTES 0x53 /* enum */ +#define MC_CMD_MAC_VADAPTER_RX_OVERFLOW 0x54 /* enum */ +#define MC_CMD_MAC_VADAPTER_TX_DMABUF_START 0x57 /* enum */ +#define MC_CMD_MAC_VADAPTER_TX_UNICAST_PACKETS 0x57 /* enum */ +#define MC_CMD_MAC_VADAPTER_TX_UNICAST_BYTES 0x58 /* enum */ +#define MC_CMD_MAC_VADAPTER_TX_MULTICAST_PACKETS 0x59 /*
[PATCH net-next v2 08/14] sfc: update netdevice statistics to use vadaptor stats
From: Daniel Pieczko dpiec...@solarflare.com The netdevice statistics (in /proc/net/dev) are per-function stats so they must use the vadaptor stats. Change the use of MAC stats to vadaptor stats, and remove any statistics that can only be measured per-port. All stats that are removed will be shown as zeroes when these statistics are displayed. Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/ef10.c | 41 ++--- 1 file changed, 22 insertions(+), 19 deletions(-) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index 323ca47..99bf296 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -1209,24 +1209,25 @@ static size_t efx_ef10_update_stats_common(struct efx_nic *efx, u64 *full_stats, } if (core_stats) { - core_stats-rx_packets = stats[EF10_STAT_port_rx_packets]; - core_stats-tx_packets = stats[EF10_STAT_port_tx_packets]; - core_stats-rx_bytes = stats[EF10_STAT_port_rx_bytes]; - core_stats-tx_bytes = stats[EF10_STAT_port_tx_bytes]; - core_stats-rx_dropped = stats[EF10_STAT_port_rx_nodesc_drops] + -stats[GENERIC_STAT_rx_nodesc_trunc] + + core_stats-rx_packets = stats[EF10_STAT_rx_unicast] + +stats[EF10_STAT_rx_multicast] + +stats[EF10_STAT_rx_broadcast]; + core_stats-tx_packets = stats[EF10_STAT_tx_unicast] + +stats[EF10_STAT_tx_multicast] + +stats[EF10_STAT_tx_broadcast]; + core_stats-rx_bytes = stats[EF10_STAT_rx_unicast_bytes] + + stats[EF10_STAT_rx_multicast_bytes] + + stats[EF10_STAT_rx_broadcast_bytes]; + core_stats-tx_bytes = stats[EF10_STAT_tx_unicast_bytes] + + stats[EF10_STAT_tx_multicast_bytes] + + stats[EF10_STAT_tx_broadcast_bytes]; + core_stats-rx_dropped = stats[GENERIC_STAT_rx_nodesc_trunc] + stats[GENERIC_STAT_rx_noskb_drops]; - core_stats-multicast = stats[EF10_STAT_port_rx_multicast]; - core_stats-rx_length_errors = - stats[EF10_STAT_port_rx_gtjumbo] + - stats[EF10_STAT_port_rx_length_error]; - core_stats-rx_crc_errors = stats[EF10_STAT_port_rx_bad]; - core_stats-rx_frame_errors = - stats[EF10_STAT_port_rx_align_error]; - core_stats-rx_fifo_errors = stats[EF10_STAT_port_rx_overflow]; - core_stats-rx_errors = (core_stats-rx_length_errors + -core_stats-rx_crc_errors + -core_stats-rx_frame_errors); + core_stats-multicast = stats[EF10_STAT_rx_multicast]; + core_stats-rx_crc_errors = stats[EF10_STAT_rx_bad]; + core_stats-rx_fifo_errors = stats[EF10_STAT_rx_overflow]; + core_stats-rx_errors = core_stats-rx_crc_errors; + core_stats-tx_errors = stats[EF10_STAT_tx_bad]; } return stats_count; @@ -1309,7 +1310,7 @@ static int efx_ef10_try_update_nic_stats_vf(struct efx_nic *efx) MCDI_SET_QWORD(inbuf, MAC_STATS_IN_DMA_ADDR, stats_buf.dma_addr); MCDI_POPULATE_DWORD_1(inbuf, MAC_STATS_IN_CMD, - MAC_STATS_IN_DMA, true); + MAC_STATS_IN_DMA, 1); MCDI_SET_DWORD(inbuf, MAC_STATS_IN_DMA_LEN, dma_len); MCDI_SET_DWORD(inbuf, MAC_STATS_IN_PORT_ID, EVB_PORT_ID_ASSIGNED); @@ -1321,8 +1322,10 @@ static int efx_ef10_try_update_nic_stats_vf(struct efx_nic *efx) goto out; generation_end = dma_stats[MC_CMD_MAC_GENERATION_END]; - if (generation_end == EFX_MC_STATS_GENERATION_INVALID) + if (generation_end == EFX_MC_STATS_GENERATION_INVALID) { + WARN_ON_ONCE(1); goto out; + } rmb(); efx_nic_update_stats(efx_ef10_stat_desc, EF10_STAT_COUNT, mask, stats, stats_buf.addr, false); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 03/14] sfc: Implement ndo_gets_phys_port_id() for EF10 VFs
Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/ef10.c | 11 +++ drivers/net/ethernet/sfc/ef10_sriov.c | 14 ++ drivers/net/ethernet/sfc/ef10_sriov.h | 3 +++ drivers/net/ethernet/sfc/efx.c| 1 + drivers/net/ethernet/sfc/net_driver.h | 2 ++ drivers/net/ethernet/sfc/nic.h| 1 + drivers/net/ethernet/sfc/sriov.c | 11 +++ drivers/net/ethernet/sfc/sriov.h | 2 ++ 8 files changed, 45 insertions(+) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index 389a45d..714e7cf 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -405,6 +405,16 @@ static int efx_ef10_probe(struct efx_nic *efx) efx_ptp_probe(efx, NULL); +#ifdef CONFIG_SFC_SRIOV + if ((efx-pci_dev-physfn) (!efx-pci_dev-is_physfn)) { + struct pci_dev *pci_dev_pf = efx-pci_dev-physfn; + struct efx_nic *efx_pf = pci_get_drvdata(pci_dev_pf); + + efx_pf-type-get_mac_address(efx_pf, nic_data-port_id); + } else +#endif + ether_addr_copy(nic_data-port_id, efx-net_dev-perm_addr); + return 0; fail5: @@ -4139,6 +4149,7 @@ const struct efx_nic_type efx_hunt_a0_vf_nic_type = { .vswitching_probe = efx_ef10_vswitching_probe_vf, .vswitching_restore = efx_ef10_vswitching_restore_vf, .vswitching_remove = efx_ef10_vswitching_remove_vf, + .sriov_get_phys_port_id = efx_ef10_sriov_get_phys_port_id, #endif .get_mac_address = efx_ef10_get_mac_address_vf, .set_mac_address = efx_ef10_set_mac_address, diff --git a/drivers/net/ethernet/sfc/ef10_sriov.c b/drivers/net/ethernet/sfc/ef10_sriov.c index 3969b1b..cd52454 100644 --- a/drivers/net/ethernet/sfc/ef10_sriov.c +++ b/drivers/net/ethernet/sfc/ef10_sriov.c @@ -736,3 +736,17 @@ int efx_ef10_sriov_get_vf_config(struct efx_nic *efx, int vf_i, return 0; } + +int efx_ef10_sriov_get_phys_port_id(struct efx_nic *efx, + struct netdev_phys_item_id *ppid) +{ + struct efx_ef10_nic_data *nic_data = efx-nic_data; + + if (!is_valid_ether_addr(nic_data-port_id)) + return -EOPNOTSUPP; + + ppid-id_len = ETH_ALEN; + memcpy(ppid-id, nic_data-port_id, ppid-id_len); + + return 0; +} diff --git a/drivers/net/ethernet/sfc/ef10_sriov.h b/drivers/net/ethernet/sfc/ef10_sriov.h index b985576..ffc92a5 100644 --- a/drivers/net/ethernet/sfc/ef10_sriov.h +++ b/drivers/net/ethernet/sfc/ef10_sriov.h @@ -54,6 +54,9 @@ int efx_ef10_sriov_get_vf_config(struct efx_nic *efx, int vf_i, int efx_ef10_sriov_set_vf_link_state(struct efx_nic *efx, int vf_i, int link_state); +int efx_ef10_sriov_get_phys_port_id(struct efx_nic *efx, + struct netdev_phys_item_id *ppid); + int efx_ef10_vswitching_probe_pf(struct efx_nic *efx); int efx_ef10_vswitching_probe_vf(struct efx_nic *efx); int efx_ef10_vswitching_restore_pf(struct efx_nic *efx); diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c index 9eafa39..fe3481c 100644 --- a/drivers/net/ethernet/sfc/efx.c +++ b/drivers/net/ethernet/sfc/efx.c @@ -2282,6 +2282,7 @@ static const struct net_device_ops efx_netdev_ops = { .ndo_set_vf_spoofchk= efx_sriov_set_vf_spoofchk, .ndo_get_vf_config = efx_sriov_get_vf_config, .ndo_set_vf_link_state = efx_sriov_set_vf_link_state, + .ndo_get_phys_port_id = efx_sriov_get_phys_port_id, #endif #ifdef CONFIG_NET_POLL_CONTROLLER .ndo_poll_controller = efx_netpoll, diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h index a468a22..d72f522 100644 --- a/drivers/net/ethernet/sfc/net_driver.h +++ b/drivers/net/ethernet/sfc/net_driver.h @@ -1350,6 +1350,8 @@ struct efx_nic_type { struct ifla_vf_info *ivi); int (*sriov_set_vf_link_state)(struct efx_nic *efx, int vf_i, int link_state); + int (*sriov_get_phys_port_id)(struct efx_nic *efx, + struct netdev_phys_item_id *ppid); int (*vswitching_probe)(struct efx_nic *efx); int (*vswitching_restore)(struct efx_nic *efx); void (*vswitching_remove)(struct efx_nic *efx); diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h index db8562e..e146e30 100644 --- a/drivers/net/ethernet/sfc/nic.h +++ b/drivers/net/ethernet/sfc/nic.h @@ -524,6 +524,7 @@ struct efx_ef10_nic_data { unsigned int vport_id; bool must_probe_vswitching; unsigned int pf_index; + u8 port_id[ETH_ALEN]; #ifdef CONFIG_SFC_SRIOV unsigned int vf_index; struct ef10_vf *vf; diff --git a/drivers/net/ethernet/sfc/sriov.c b/drivers/net/ethernet/sfc/sriov.c index 6c5edbd..816c446 100644 --- a/drivers/net/ethernet/sfc/sriov.c
[PATCH net-next v2 05/14] sfc: set the port-id when calling MC_CMD_MAC_STATS
From: Daniel Pieczko dpiec...@solarflare.com The port-id must be known so that the RMON level can be set for the collection of vadapter stats. Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/mcdi_port.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/sfc/mcdi_port.c b/drivers/net/ethernet/sfc/mcdi_port.c index 9bf04cb..fffc348 100644 --- a/drivers/net/ethernet/sfc/mcdi_port.c +++ b/drivers/net/ethernet/sfc/mcdi_port.c @@ -924,6 +924,7 @@ enum efx_stats_action { static int efx_mcdi_mac_stats(struct efx_nic *efx, enum efx_stats_action action, int clear) { + struct efx_ef10_nic_data *nic_data = efx-nic_data; MCDI_DECLARE_BUF(inbuf, MC_CMD_MAC_STATS_IN_LEN); int rc; int change = action == EFX_STATS_PULL ? 0 : 1; @@ -945,6 +946,7 @@ static int efx_mcdi_mac_stats(struct efx_nic *efx, MAC_STATS_IN_PERIODIC_NOEVENT, 1, MAC_STATS_IN_PERIOD_MS, period); MCDI_SET_DWORD(inbuf, MAC_STATS_IN_DMA_LEN, dma_len); + MCDI_SET_DWORD(inbuf, MAC_STATS_IN_PORT_ID, nic_data-vport_id); rc = efx_mcdi_rpc(efx, MC_CMD_MAC_STATS, inbuf, sizeof(inbuf), NULL, 0, NULL); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 09/14] sfc: suppress ENOENT error messages from MC_CMD_MAC_STATS
From: Daniel Pieczko dpiec...@solarflare.com MC_CMD_MAC_STATS can be called on a function before a vadaptor has been created, as the kernel can call into this through ndo_get_stats/ndo_get_stats64. If MC_CMD_MAC_STATS is called before the DMA queues have been setup, so that a vadaptor has not been created yet, firmware will return ENOENT. This is expected, so suppress the MCDI error message in this case. Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/ef10.c | 11 --- drivers/net/ethernet/sfc/mcdi_port.c | 8 ++-- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index 99bf296..cb4c972 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -1315,11 +1315,16 @@ static int efx_ef10_try_update_nic_stats_vf(struct efx_nic *efx) MCDI_SET_DWORD(inbuf, MAC_STATS_IN_PORT_ID, EVB_PORT_ID_ASSIGNED); spin_unlock_bh(efx-stats_lock); - rc = efx_mcdi_rpc(efx, MC_CMD_MAC_STATS, inbuf, sizeof(inbuf), NULL, - 0, NULL); + rc = efx_mcdi_rpc_quiet(efx, MC_CMD_MAC_STATS, inbuf, sizeof(inbuf), + NULL, 0, NULL); spin_lock_bh(efx-stats_lock); - if (rc) + if (rc) { + /* Expect ENOENT if DMA queues have not been set up */ + if (rc != -ENOENT || atomic_read(efx-active_queues)) + efx_mcdi_display_error(efx, MC_CMD_MAC_STATS, + sizeof(inbuf), NULL, 0, rc); goto out; + } generation_end = dma_stats[MC_CMD_MAC_GENERATION_END]; if (generation_end == EFX_MC_STATS_GENERATION_INVALID) { diff --git a/drivers/net/ethernet/sfc/mcdi_port.c b/drivers/net/ethernet/sfc/mcdi_port.c index fffc348..7f295c4 100644 --- a/drivers/net/ethernet/sfc/mcdi_port.c +++ b/drivers/net/ethernet/sfc/mcdi_port.c @@ -948,8 +948,12 @@ static int efx_mcdi_mac_stats(struct efx_nic *efx, MCDI_SET_DWORD(inbuf, MAC_STATS_IN_DMA_LEN, dma_len); MCDI_SET_DWORD(inbuf, MAC_STATS_IN_PORT_ID, nic_data-vport_id); - rc = efx_mcdi_rpc(efx, MC_CMD_MAC_STATS, inbuf, sizeof(inbuf), - NULL, 0, NULL); + rc = efx_mcdi_rpc_quiet(efx, MC_CMD_MAC_STATS, inbuf, sizeof(inbuf), + NULL, 0, NULL); + /* Expect ENOENT if DMA queues have not been set up */ + if (rc (rc != -ENOENT || atomic_read(efx-active_queues))) + efx_mcdi_display_error(efx, MC_CMD_MAC_STATS, sizeof(inbuf), + NULL, 0, rc); return rc; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 04/14] sfc: add port_ prefix to MAC stats
From: Daniel Pieczko dpiec...@solarflare.com The MAC stats are per-port and will only be displayed on the PF with control of the link (one per physical port). Vadapter stats will also be displayed for this PF, so distinguish the MAC stats by adding a prefix of port_. Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/ef10.c | 251 ++- drivers/net/ethernet/sfc/mcdi_pcol.h | 4 +- drivers/net/ethernet/sfc/nic.h | 106 +++ 3 files changed, 182 insertions(+), 179 deletions(-) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index 714e7cf..a574dd3 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -990,93 +990,94 @@ static int efx_ef10_reset(struct efx_nic *efx, enum reset_type reset_type) [GENERIC_STAT_ ## ext_name] = { #ext_name, 0, 0 } static const struct efx_hw_stat_desc efx_ef10_stat_desc[EF10_STAT_COUNT] = { - EF10_DMA_STAT(tx_bytes, TX_BYTES), - EF10_DMA_STAT(tx_packets, TX_PKTS), - EF10_DMA_STAT(tx_pause, TX_PAUSE_PKTS), - EF10_DMA_STAT(tx_control, TX_CONTROL_PKTS), - EF10_DMA_STAT(tx_unicast, TX_UNICAST_PKTS), - EF10_DMA_STAT(tx_multicast, TX_MULTICAST_PKTS), - EF10_DMA_STAT(tx_broadcast, TX_BROADCAST_PKTS), - EF10_DMA_STAT(tx_lt64, TX_LT64_PKTS), - EF10_DMA_STAT(tx_64, TX_64_PKTS), - EF10_DMA_STAT(tx_65_to_127, TX_65_TO_127_PKTS), - EF10_DMA_STAT(tx_128_to_255, TX_128_TO_255_PKTS), - EF10_DMA_STAT(tx_256_to_511, TX_256_TO_511_PKTS), - EF10_DMA_STAT(tx_512_to_1023, TX_512_TO_1023_PKTS), - EF10_DMA_STAT(tx_1024_to_15xx, TX_1024_TO_15XX_PKTS), - EF10_DMA_STAT(tx_15xx_to_jumbo, TX_15XX_TO_JUMBO_PKTS), - EF10_DMA_STAT(rx_bytes, RX_BYTES), - EF10_DMA_INVIS_STAT(rx_bytes_minus_good_bytes, RX_BAD_BYTES), - EF10_OTHER_STAT(rx_good_bytes), - EF10_OTHER_STAT(rx_bad_bytes), - EF10_DMA_STAT(rx_packets, RX_PKTS), - EF10_DMA_STAT(rx_good, RX_GOOD_PKTS), - EF10_DMA_STAT(rx_bad, RX_BAD_FCS_PKTS), - EF10_DMA_STAT(rx_pause, RX_PAUSE_PKTS), - EF10_DMA_STAT(rx_control, RX_CONTROL_PKTS), - EF10_DMA_STAT(rx_unicast, RX_UNICAST_PKTS), - EF10_DMA_STAT(rx_multicast, RX_MULTICAST_PKTS), - EF10_DMA_STAT(rx_broadcast, RX_BROADCAST_PKTS), - EF10_DMA_STAT(rx_lt64, RX_UNDERSIZE_PKTS), - EF10_DMA_STAT(rx_64, RX_64_PKTS), - EF10_DMA_STAT(rx_65_to_127, RX_65_TO_127_PKTS), - EF10_DMA_STAT(rx_128_to_255, RX_128_TO_255_PKTS), - EF10_DMA_STAT(rx_256_to_511, RX_256_TO_511_PKTS), - EF10_DMA_STAT(rx_512_to_1023, RX_512_TO_1023_PKTS), - EF10_DMA_STAT(rx_1024_to_15xx, RX_1024_TO_15XX_PKTS), - EF10_DMA_STAT(rx_15xx_to_jumbo, RX_15XX_TO_JUMBO_PKTS), - EF10_DMA_STAT(rx_gtjumbo, RX_GTJUMBO_PKTS), - EF10_DMA_STAT(rx_bad_gtjumbo, RX_JABBER_PKTS), - EF10_DMA_STAT(rx_overflow, RX_OVERFLOW_PKTS), - EF10_DMA_STAT(rx_align_error, RX_ALIGN_ERROR_PKTS), - EF10_DMA_STAT(rx_length_error, RX_LENGTH_ERROR_PKTS), - EF10_DMA_STAT(rx_nodesc_drops, RX_NODESC_DROPS), + EF10_DMA_STAT(port_tx_bytes, TX_BYTES), + EF10_DMA_STAT(port_tx_packets, TX_PKTS), + EF10_DMA_STAT(port_tx_pause, TX_PAUSE_PKTS), + EF10_DMA_STAT(port_tx_control, TX_CONTROL_PKTS), + EF10_DMA_STAT(port_tx_unicast, TX_UNICAST_PKTS), + EF10_DMA_STAT(port_tx_multicast, TX_MULTICAST_PKTS), + EF10_DMA_STAT(port_tx_broadcast, TX_BROADCAST_PKTS), + EF10_DMA_STAT(port_tx_lt64, TX_LT64_PKTS), + EF10_DMA_STAT(port_tx_64, TX_64_PKTS), + EF10_DMA_STAT(port_tx_65_to_127, TX_65_TO_127_PKTS), + EF10_DMA_STAT(port_tx_128_to_255, TX_128_TO_255_PKTS), + EF10_DMA_STAT(port_tx_256_to_511, TX_256_TO_511_PKTS), + EF10_DMA_STAT(port_tx_512_to_1023, TX_512_TO_1023_PKTS), + EF10_DMA_STAT(port_tx_1024_to_15xx, TX_1024_TO_15XX_PKTS), + EF10_DMA_STAT(port_tx_15xx_to_jumbo, TX_15XX_TO_JUMBO_PKTS), + EF10_DMA_STAT(port_rx_bytes, RX_BYTES), + EF10_DMA_INVIS_STAT(port_rx_bytes_minus_good_bytes, RX_BAD_BYTES), + EF10_OTHER_STAT(port_rx_good_bytes), + EF10_OTHER_STAT(port_rx_bad_bytes), + EF10_DMA_STAT(port_rx_packets, RX_PKTS), + EF10_DMA_STAT(port_rx_good, RX_GOOD_PKTS), + EF10_DMA_STAT(port_rx_bad, RX_BAD_FCS_PKTS), + EF10_DMA_STAT(port_rx_pause, RX_PAUSE_PKTS), + EF10_DMA_STAT(port_rx_control, RX_CONTROL_PKTS), + EF10_DMA_STAT(port_rx_unicast, RX_UNICAST_PKTS), + EF10_DMA_STAT(port_rx_multicast, RX_MULTICAST_PKTS), + EF10_DMA_STAT(port_rx_broadcast, RX_BROADCAST_PKTS), + EF10_DMA_STAT(port_rx_lt64, RX_UNDERSIZE_PKTS), + EF10_DMA_STAT(port_rx_64, RX_64_PKTS), + EF10_DMA_STAT(port_rx_65_to_127, RX_65_TO_127_PKTS), + EF10_DMA_STAT(port_rx_128_to_255, RX_128_TO_255_PKTS), + EF10_DMA_STAT(port_rx_256_to_511, RX_256_TO_511_PKTS), +
Re: [PATCH v2 net] xen: netback: read hotplug script once at start of day.
On Mon, Jun 01, 2015 at 11:30:24AM +0100, Ian Campbell wrote: When we come to tear things down in netback_remove() and generate the uevent it is possible that the xenstore directory has already been removed (details below). In such cases netback_uevent() won't be able to read the hotplug script and will write a xenstore error node. A recent change to the hypervisor exposed this race such that we now sometimes lose it (where apparently we didn't ever before). Instead read the hotplug script configuration during setup and use it for the lifetime of the backend device. The apparently more obvious fix of moving the transition to state=Closed in netback_remove() to after the uevent does not work because it is possible that we are already in state=Closed (in reaction to the guest having disconnected as it shutdown). Being already in Closed means the toolstack is at liberty to start tearing down the xenstore directories. In principal it might be possible to arrange to unregister the device sooner (e.g on transition to Closing) such that xenstore would still be there but this state machine is fragile and prone to anger... A modern Xen system only relies on the hotplug uevent for driver domains, when the backend is in the same domain as the toolstack it will run the necessary setup/teardown directly in the correct sequence wrt xenstore changes. Signed-off-by: Ian Campbell ian.campb...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 01/14] sfc: Add code to export port_num in netdev-dev_port
In the case where we have multiple functions (PFs and VFs), this sysfs entry is useful to identify the physical port corresponding to the function we are interested in. Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/ef10.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index a547ceb..dacf9f8 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -249,6 +249,7 @@ static int efx_ef10_get_mac_address_vf(struct efx_nic *efx, u8 *mac_address) static int efx_ef10_probe(struct efx_nic *efx) { struct efx_ef10_nic_data *nic_data; + struct net_device *net_dev = efx-net_dev; int i, rc; /* We can have one VI for each 8K region. However, until we @@ -326,6 +327,7 @@ static int efx_ef10_probe(struct efx_nic *efx) if (rc 0) goto fail3; efx-port_num = rc; + net_dev-dev_port = rc; rc = efx-type-get_mac_address(efx, efx-net_dev-perm_addr); if (rc) @@ -334,6 +336,7 @@ static int efx_ef10_probe(struct efx_nic *efx) rc = efx_ef10_get_sysclk_freq(efx); if (rc 0) goto fail3; + efx-timer_quantum_ns = 1536000 / rc; /* 1536 cycles */ /* Check whether firmware supports bug 35388 workaround. @@ -341,9 +344,9 @@ static int efx_ef10_probe(struct efx_nic *efx) * ask if it's already enabled */ rc = efx_mcdi_set_workaround(efx, MC_CMD_WORKAROUND_BUG35388, true); - if (rc == 0) + if (rc == 0) { nic_data-workaround_35388 = true; - else if (rc == -EPERM) { + } else if (rc == -EPERM) { unsigned int enabled; rc = efx_mcdi_get_workarounds(efx, NULL, enabled); @@ -351,9 +354,10 @@ static int efx_ef10_probe(struct efx_nic *efx) goto fail3; nic_data-workaround_35388 = enabled MC_CMD_GET_WORKAROUNDS_OUT_BUG35388; - } - else if (rc != -ENOSYS rc != -ENOENT) + } else if (rc != -ENOSYS rc != -ENOENT) { goto fail3; + } + netif_dbg(efx, probe, efx-net_dev, workaround for bug 35388 is %sabled\n, nic_data-workaround_35388 ? en : dis); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 02/14] sfc: Add sysfs entry for flags (link control and primary)
On every adapter there will be one primary PF per adaptor and one link control PF per port. Signed-off-by: Shradha Shah ss...@solarflare.com --- drivers/net/ethernet/sfc/ef10.c | 58 - 1 file changed, 51 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index dacf9f8..389a45d 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -246,6 +246,34 @@ static int efx_ef10_get_mac_address_vf(struct efx_nic *efx, u8 *mac_address) return 0; } +static ssize_t efx_ef10_show_link_control_flag(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct efx_nic *efx = pci_get_drvdata(to_pci_dev(dev)); + + return sprintf(buf, %d\n, + ((efx-mcdi-fn_flags) + (1 MC_CMD_DRV_ATTACH_EXT_OUT_FLAG_LINKCTRL)) + ? 1 : 0); +} + +static ssize_t efx_ef10_show_primary_flag(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct efx_nic *efx = pci_get_drvdata(to_pci_dev(dev)); + + return sprintf(buf, %d\n, + ((efx-mcdi-fn_flags) + (1 MC_CMD_DRV_ATTACH_EXT_OUT_FLAG_PRIMARY)) + ? 1 : 0); +} + +static DEVICE_ATTR(link_control_flag, 0444, efx_ef10_show_link_control_flag, + NULL); +static DEVICE_ATTR(primary_flag, 0444, efx_ef10_show_primary_flag, NULL); + static int efx_ef10_probe(struct efx_nic *efx) { struct efx_ef10_nic_data *nic_data; @@ -312,30 +340,39 @@ static int efx_ef10_probe(struct efx_nic *efx) if (rc) goto fail3; - rc = efx_ef10_get_pf_index(efx); + rc = device_create_file(efx-pci_dev-dev, + dev_attr_link_control_flag); if (rc) goto fail3; + rc = device_create_file(efx-pci_dev-dev, dev_attr_primary_flag); + if (rc) + goto fail4; + + rc = efx_ef10_get_pf_index(efx); + if (rc) + goto fail5; + rc = efx_ef10_init_datapath_caps(efx); if (rc 0) - goto fail3; + goto fail5; efx-rx_packet_len_offset = ES_DZ_RX_PREFIX_PKTLEN_OFST - ES_DZ_RX_PREFIX_SIZE; rc = efx_mcdi_port_get_number(efx); if (rc 0) - goto fail3; + goto fail5; efx-port_num = rc; net_dev-dev_port = rc; rc = efx-type-get_mac_address(efx, efx-net_dev-perm_addr); if (rc) - goto fail3; + goto fail5; rc = efx_ef10_get_sysclk_freq(efx); if (rc 0) - goto fail3; + goto fail5; efx-timer_quantum_ns = 1536000 / rc; /* 1536 cycles */ @@ -355,7 +392,7 @@ static int efx_ef10_probe(struct efx_nic *efx) nic_data-workaround_35388 = enabled MC_CMD_GET_WORKAROUNDS_OUT_BUG35388; } else if (rc != -ENOSYS rc != -ENOENT) { - goto fail3; + goto fail5; } netif_dbg(efx, probe, efx-net_dev, @@ -364,12 +401,16 @@ static int efx_ef10_probe(struct efx_nic *efx) rc = efx_mcdi_mon_probe(efx); if (rc rc != -EPERM) - goto fail3; + goto fail5; efx_ptp_probe(efx, NULL); return 0; +fail5: + device_remove_file(efx-pci_dev-dev, dev_attr_primary_flag); +fail4: + device_remove_file(efx-pci_dev-dev, dev_attr_link_control_flag); fail3: efx_mcdi_fini(efx); fail2: @@ -612,6 +653,9 @@ static void efx_ef10_remove(struct efx_nic *efx) if (!nic_data-must_restore_piobufs) efx_ef10_free_piobufs(efx); + device_remove_file(efx-pci_dev-dev, dev_attr_primary_flag); + device_remove_file(efx-pci_dev-dev, dev_attr_link_control_flag); + efx_mcdi_fini(efx); efx_nic_free_buffer(efx, nic_data-mcdi_buf); kfree(nic_data); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] net: dsa: add new driver for ar8xxx family
Just a nit: a license mismatch. On Thu, 2015-05-28 at 18:42 -0700, Mathieu Olivari wrote: --- /dev/null +++ b/drivers/net/dsa/ar8xxx.c + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 and + * only version 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. This states the license is GPL v2. +MODULE_LICENSE(GPL); And this states, according to include/linux/module.h, that the license is GPL v2 or later. So I think that either the comment at the top of this file or the ident used in the MODULE_LICENSE() macro needs to change. Paul Bolle -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch v3 30/36] net/mlx4: Cache irq_desc-affinity instead of irq_desc
The field 'affinity' in irq_desc won't change once the irq_desc data structure is created. So cache irq_desc-affinity instead of irq_desc. This also helps to hide struct irq_desc from device drivers. Signed-off-by: Jiang Liu jiang@linux.intel.com --- drivers/net/ethernet/mellanox/mlx4/en_cq.c |6 +++--- drivers/net/ethernet/mellanox/mlx4/en_rx.c |5 + drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |2 +- 3 files changed, 5 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c index 22da4d0d0f05..a03a01625398 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c @@ -31,6 +31,7 @@ * */ +#include linux/irq.h #include linux/mlx4/cq.h #include linux/mlx4/qp.h #include linux/mlx4/cmd.h @@ -135,9 +136,8 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq, mdev-dev-caps.num_comp_vectors; } - cq-irq_desc = - irq_to_desc(mlx4_eq_get_irq(mdev-dev, - cq-vector)); + cq-irq_affinity = irq_get_affinity_mask( + mlx4_eq_get_irq(mdev-dev, cq-vector)); } else { /* For TX we use the same irq per ring we assigned for the RX*/ diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 2a77a6b19121..5675febf478e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -1044,14 +1044,11 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget) /* If we used up all the quota - we're probably not done yet... */ if (done == budget) { int cpu_curr; - const struct cpumask *aff; INC_PERF_COUNTER(priv-pstats.napi_quota); cpu_curr = smp_processor_id(); - aff = irq_desc_get_irq_data(cq-irq_desc)-affinity; - - if (likely(cpumask_test_cpu(cpu_curr, aff))) + if (likely(cpumask_test_cpu(cpu_curr, cq-irq_affinity))) return budget; /* Current cpu is not according to smp_irq_affinity - diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h index d021f079f181..33d544a1fe84 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h @@ -357,7 +357,7 @@ struct mlx4_en_cq { #define CQ_USER_PEND (MLX4_EN_CQ_STATE_POLL | MLX4_EN_CQ_STATE_POLL_YIELD) spinlock_t poll_lock; /* protects from LLS/napi conflicts */ #endif /* CONFIG_NET_RX_BUSY_POLL */ - struct irq_desc *irq_desc; + struct cpumask *irq_affinity; }; struct mlx4_en_port_profile { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] xen: netback: read hotplug script once at start of day.
On Fri, 2015-05-29 at 18:38 +0100, Wei Liu wrote: On Fri, May 29, 2015 at 05:24:53PM +0100, Ian Campbell wrote: [...] if (be-vif != NULL) return 0; @@ -417,12 +409,23 @@ static int backend_create_xenvif(struct backend_info *be) return (err 0) ? err : -EINVAL; } + script = xenbus_read(XBT_NIL, dev-nodename, script, NULL); + if (IS_ERR(script)) { + int err = PTR_ERR(script); + xenbus_dev_fatal(dev, err, reading script); + return err; + } + vif = xenvif_alloc(dev-dev, dev-otherend_id, handle); if (IS_ERR(vif)) { err = PTR_ERR(vif); xenbus_dev_fatal(dev, err, creating interface); + kfree(script); return err; } + + vif-hotplug_script = script; + IMO it's better we make xenvif_alloc accept a new parameter called script then allocate vif-hotplug_script there. Then free vif-hotplug_script in xenvif_free. This way it's less error prone because all memory allocated for vif is managed in proper place - xenvif_alloc and xenvif_free. Well, except the allocation is still in xenbus_read via backend_create_xenvif, but yes I think that refactoring would be an improvement. What about storing it in struct backend_info and setting/restoring in netback_{probe,remove}? That might be best of all? Ian. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH] net: socket: Fix the wrong returns for recvmsg and sendmsg
Hi, Greg: We found that after v3.10.73, recvmsg might return -EFAULT while -EINVAL was expected. We tested it through the recvmsg01 testcase come from LTP testsuit. It set msg-msg_namelen to -1 and the recvmsg syscall returned errno 14, which is unexpected (errno 22 is expected): recvmsg014 TFAIL : invalid socket length ; returned -1 (expected -1), errno 14 (expected 22) Linux mainline has no this bug for commit 08adb7dab fixes it accidentally. However, it is too large and complex to be backported to LTS 3.10. So, I made the following patch to fix the above problem for LTS 3.10. Cheers, Junling Commit 281c9c36 (net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour) made get_compat_msghdr() return error if msg_sys-msg_namelen was negative, which changed the behaviors of recvmsg and sendmsg syscall in a lib32 system: Before commit 281c9c36, get_compat_msghdr() wouldn't fail and it would return -EINVAL in move_addr_to_user() or somewhere if msg_sys-msg_namelen was invalid and then syscall returned -EINVAL, which is correct. And now, when msg_sys-msg_namelen is negative, get_compat_msghdr() will fail and wants to return -EINVAL, however, the outer syscall will return -EFAULT directly, which is unexpected. This patch gets the return value of get_compat_msghdr() as well as copy_msghdr_from_user(), then returns this expected value if get_compat_msghdr() fails. Fixes: 281c9c36 (net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour) Signed-off-by: Junling Zheng zhengjunl...@huawei.com Signed-off-by: Hanbing Xu xuhanb...@huawei.com Cc: Li Zefan lize...@huawei.com Cc: Al Viro v...@zeniv.linux.org.uk Cc: David Miller da...@davemloft.net --- net/socket.c | 24 ++-- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/net/socket.c b/net/socket.c index fc90b4f..53b6e41 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1988,14 +1988,12 @@ static int ___sys_sendmsg(struct socket *sock, struct msghdr __user *msg, int err, ctl_len, total_len; err = -EFAULT; - if (MSG_CMSG_COMPAT flags) { - if (get_compat_msghdr(msg_sys, msg_compat)) - return -EFAULT; - } else { + if (MSG_CMSG_COMPAT flags) + err = get_compat_msghdr(msg_sys, msg_compat); + else err = copy_msghdr_from_user(msg_sys, msg); - if (err) - return err; - } + if (err) + return err; if (msg_sys-msg_iovlen UIO_FASTIOV) { err = -EMSGSIZE; @@ -2200,14 +2198,12 @@ static int ___sys_recvmsg(struct socket *sock, struct msghdr __user *msg, struct sockaddr __user *uaddr; int __user *uaddr_len; - if (MSG_CMSG_COMPAT flags) { - if (get_compat_msghdr(msg_sys, msg_compat)) - return -EFAULT; - } else { + if (MSG_CMSG_COMPAT flags) + err = get_compat_msghdr(msg_sys, msg_compat); + else err = copy_msghdr_from_user(msg_sys, msg); - if (err) - return err; - } + if (err) + return err; if (msg_sys-msg_iovlen UIO_FASTIOV) { err = -EMSGSIZE; -- 1.8.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [cdc_ncm] guidance and help refactoring cdc_ncm
On Mon, 2015-06-01 at 10:24 +0200, Enrico Mioso wrote: We are failing on some cases only because of the position we put the NDP part of the NCM frame. Infact, this 32-bit driver will work when the 16 bit one does, and fail when the 16 bit one does. I think the discussion would benefit from a clearer explanation how those devices need an aggregate to look and how our aggregates look like. Regards Oliver -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [cdc_ncm] guidance and help refactoring cdc_ncm
On Mon, 1 Jun 2015, Oliver Neukum wrote: ==Date: Mon, 1 Jun 2015 14:00:22 ==From: Oliver Neukum oneu...@suse.com ==To: Enrico Mioso mrkiko...@gmail.com ==Cc: you...@gmail.com, Greg KH g...@kroah.com, linux-...@vger.kernel.org, ==netdev@vger.kernel.org ==Subject: Re: [cdc_ncm] guidance and help refactoring cdc_ncm == ==On Mon, 2015-06-01 at 13:41 +0200, Enrico Mioso wrote: == Thank you Oliver, thank you all for reading this thread and the attention. == For a more detailed discussion and how we got here, you might google for the == thread: == Is this 32-bit NCM? == and == Is this 32-bit NCM?y (follow up). == Where the y letter comes from a mistake of mine. == ==Having read them it looks like the issues of padding and ==sequence numbers are open. == == The specification does only mandate the position of the NTH (header). The rest == can be in any order and position in general. This will work with most devices: == except, of course, those from Huawei. == ==Indeed. And a redesign for crap devices looks like ==a bad idea. == == Our aggregate looks something like this from my perspective (anyone correct me == in case): == NTH: header == NDP: contains indexing informations == ethernet packet 1 == ethernet packet 2 == ... == ethernet packet n; == == While it should look like: == NTH: header == ethernet packet 1 == ethernet packet 2 == ... == ethernet packet n; == NDP: contains indexing informations == == but, when introducing such a change: you might break other devices now working. == Infact, clearly there are multiple vendors producing NCM device, as you might == also see by looking at the dirver's comments. == So in general, we should be able to dynamically change the way in which the == driver order things in the package. == and that's why I initially proposed the redesign. == ==OK, so the NDP needs to be at the end. However in the old thread ==you state that this requires the NDP to be built between the ==final aggregate and physically transmitting. I think this is a false ==choice. You could just as well copy the NDP around provided you reserve ==enough space at the end of the skb. Yes, probably you can do so. I am not against anything at this moment. == == Regards == Oliver == == == == -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next] vlan: Add GRO support for non hardware accelerated vlan
Currently packets with non-hardware-accelerated vlan cannot be handled by GRO. This causes low performance for 802.1ad and stacked vlan, as their vlan tags are currently not stripped by hardware. This patch adds GRO support for non-hardware-accelerated vlan and improves receive performance of them. Test Environment: vlan device (.1Q) on vlan device (.1ad) on ixgbe (82599) Result: - Before $ netperf -t TCP_STREAM -H 192.168.20.2 -l 60 Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 16384 1638460.005233.17 Rx side CPU usage: %usr %sys %irq %soft %idle 0.27 58.03 0.00 41.70 0.00 - After $ netperf -t TCP_STREAM -H 192.168.20.2 -l 60 Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 16384 1638460.007586.85 Rx side CPU usage: %usr %sys %irq %soft %idle 0.50 25.83 0.00 59.53 14.14 Signed-off-by: Toshiaki Makita makita.toshi...@lab.ntt.co.jp --- v2: - Add compare_vlan_header() as per Eric Dumazet. include/linux/if_vlan.h | 20 +++ net/8021q/vlan.c| 94 + 2 files changed, 114 insertions(+) diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index a40d298..67ce5bd 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -628,4 +628,24 @@ static inline netdev_features_t vlan_features_check(const struct sk_buff *skb, return features; } +/** + * compare_vlan_header - Compare two vlan headers + * @h1: Pointer to vlan header + * @h2: Pointer to vlan header + * + * Compare two vlan headers, returns 0 if equal. + * + * Please note that alignment of h1 h2 are only guaranteed to be 16 bits. + */ +static inline unsigned long compare_vlan_header(const struct vlan_hdr *h1, + const struct vlan_hdr *h2) +{ +#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) + return *(u32 *)h1 ^ *(u32 *)h2; +#else + return ((__force u32)h1-h_vlan_TCI ^ (__force u32)h2-h_vlan_TCI) | + ((__force u32)h1-h_vlan_encapsulated_proto ^ + (__force u32)h2-h_vlan_encapsulated_proto); +#endif +} #endif /* !(_LINUX_IF_VLAN_H_) */ diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 59555f0..9c4f884 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -618,6 +618,90 @@ out: return err; } +static struct sk_buff **vlan_gro_receive(struct sk_buff **head, +struct sk_buff *skb) +{ + struct sk_buff *p, **pp = NULL; + struct vlan_hdr *vhdr; + unsigned int hlen, off_vlan; + const struct packet_offload *ptype; + __be16 type; + int flush = 1; + + off_vlan = skb_gro_offset(skb); + hlen = off_vlan + sizeof(*vhdr); + vhdr = skb_gro_header_fast(skb, off_vlan); + if (skb_gro_header_hard(skb, hlen)) { + vhdr = skb_gro_header_slow(skb, hlen, off_vlan); + if (unlikely(!vhdr)) + goto out; + } + + type = vhdr-h_vlan_encapsulated_proto; + + rcu_read_lock(); + ptype = gro_find_receive_by_type(type); + if (!ptype) + goto out_unlock; + + flush = 0; + + for (p = *head; p; p = p-next) { + struct vlan_hdr *vhdr2; + + if (!NAPI_GRO_CB(p)-same_flow) + continue; + + vhdr2 = (struct vlan_hdr *)(p-data + off_vlan); + if (compare_vlan_header(vhdr, vhdr2)) + NAPI_GRO_CB(p)-same_flow = 0; + } + + skb_gro_pull(skb, sizeof(*vhdr)); + skb_gro_postpull_rcsum(skb, vhdr, sizeof(*vhdr)); + pp = ptype-callbacks.gro_receive(head, skb); + +out_unlock: + rcu_read_unlock(); +out: + NAPI_GRO_CB(skb)-flush |= flush; + + return pp; +} + +static int vlan_gro_complete(struct sk_buff *skb, int nhoff) +{ + struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb-data + nhoff); + __be16 type = vhdr-h_vlan_encapsulated_proto; + struct packet_offload *ptype; + int err = -ENOENT; + + rcu_read_lock(); + ptype = gro_find_complete_by_type(type); + if (ptype) + err = ptype-callbacks.gro_complete(skb, nhoff + sizeof(*vhdr)); + + rcu_read_unlock(); + return err; +} + +static struct packet_offload vlan_packet_offloads[] __read_mostly = { + { + .type = cpu_to_be16(ETH_P_8021Q), + .callbacks = { + .gro_receive = vlan_gro_receive, + .gro_complete = vlan_gro_complete, + }, + }, + { + .type = cpu_to_be16(ETH_P_8021AD), + .callbacks = { + .gro_receive =
Re: [PATCH] xen: netback: fix error printf format string.
On Sun, 2015-05-31 at 21:26 -0700, David Miller wrote: From: Ian Campbell ian.campb...@citrix.com Date: Fri, 29 May 2015 17:22:04 +0100 drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’: drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Wformat=] (txreq.offset~PAGE_MASK) + txreq.size); ^ txreq.offset and .size are uint16_t fields. Signed-off-by: Ian Campbell ian.campb...@citrix.com This may get rid of the compiler warning on your machine, but it creates one on mine: drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’: drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 5 has type ‘long unsigned int’ [-Wformat=] (txreq.offset~PAGE_MASK) + txreq.size); ^ There is a type involved in this calculation which is arch dependent, so you'll need to add a cast or something to make this warning go away in all cases. Ah, I only considered the types txreq.{offset,size} and missed thinking about PAGE_MASK. I'll resend with a cast. Ian. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] xen: netback: read hotplug script once at start of day.
On Mon, Jun 01, 2015 at 09:52:45AM +0100, Ian Campbell wrote: On Fri, 2015-05-29 at 18:38 +0100, Wei Liu wrote: On Fri, May 29, 2015 at 05:24:53PM +0100, Ian Campbell wrote: [...] if (be-vif != NULL) return 0; @@ -417,12 +409,23 @@ static int backend_create_xenvif(struct backend_info *be) return (err 0) ? err : -EINVAL; } + script = xenbus_read(XBT_NIL, dev-nodename, script, NULL); + if (IS_ERR(script)) { + int err = PTR_ERR(script); + xenbus_dev_fatal(dev, err, reading script); + return err; + } + vif = xenvif_alloc(dev-dev, dev-otherend_id, handle); if (IS_ERR(vif)) { err = PTR_ERR(vif); xenbus_dev_fatal(dev, err, creating interface); + kfree(script); return err; } + + vif-hotplug_script = script; + IMO it's better we make xenvif_alloc accept a new parameter called script then allocate vif-hotplug_script there. Then free vif-hotplug_script in xenvif_free. This way it's less error prone because all memory allocated for vif is managed in proper place - xenvif_alloc and xenvif_free. Well, except the allocation is still in xenbus_read via backend_create_xenvif, but yes I think that refactoring would be an improvement. What about storing it in struct backend_info and setting/restoring in netback_{probe,remove}? That might be best of all? Yes, that would be best. Wei. Ian. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] cxgb4: remove unused fn to enable/disable db coalescing
Remove unused function cxgb4_enable_db_coalescing() and cxgb4_disable_db_coalescing() Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 19 --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h | 2 -- 2 files changed, 21 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index 4f69b52..974b27c 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -2069,25 +2069,6 @@ out: } EXPORT_SYMBOL(cxgb4_sync_txq_pidx); -void cxgb4_disable_db_coalescing(struct net_device *dev) -{ - struct adapter *adap; - - adap = netdev2adap(dev); - t4_set_reg_field(adap, SGE_DOORBELL_CONTROL_A, NOCOALESCE_F, -NOCOALESCE_F); -} -EXPORT_SYMBOL(cxgb4_disable_db_coalescing); - -void cxgb4_enable_db_coalescing(struct net_device *dev) -{ - struct adapter *adap; - - adap = netdev2adap(dev); - t4_set_reg_field(adap, SGE_DOORBELL_CONTROL_A, NOCOALESCE_F, 0); -} -EXPORT_SYMBOL(cxgb4_enable_db_coalescing); - int cxgb4_read_tpte(struct net_device *dev, u32 stag, __be32 *tpte) { struct adapter *adap; diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h index df34293..14e8110 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h @@ -298,8 +298,6 @@ struct sk_buff *cxgb4_pktgl_to_skb(const struct pkt_gl *gl, unsigned int skb_len, unsigned int pull_len); int cxgb4_sync_txq_pidx(struct net_device *dev, u16 qid, u16 pidx, u16 size); int cxgb4_flush_eq_cache(struct net_device *dev); -void cxgb4_disable_db_coalescing(struct net_device *dev); -void cxgb4_enable_db_coalescing(struct net_device *dev); int cxgb4_read_tpte(struct net_device *dev, u32 stag, __be32 *tpte); u64 cxgb4_read_sge_timestamp(struct net_device *dev); -- 2.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/3] dsa: Add support for multiple cpu ports.
Andrew Lunn and...@lunn.ch writes: So the ports look like normal ports, and you configure then using the normal mechanisms. DSA does not use vlans. It uses an additional protocol header which the switch supports, to allow the CPU to direct packets out a specific port. Similarly, packets coming to the CPU from a port and marked with the port they ingressed. This means the ports are completely separated by default. When you add interfaces to a bridge, calls are made by the bridge code into DSA to setup the switch to hardware bridge the interface. And if the switch driver does not support it, software bridging is used instead. Unless you know what is going on under the hood, you have no idea that eth0 and eth1 are used to carry packets to the switch, and that the switch is bridging the interfaces. So it is linux concepts, with some hardware acceleration. Thanks a lot. This filled most of the blank spots. I should have done some research into what dsa actually is before posting my question. Now back to you question. What is clearly hardware and needs to go into device tree is the mapping between switch ports and cpu ports. eth0 -- port 6, eth1 -- port 5. But i've reconsidered putting into device tree the load balancing of which slave ports, lan[1-4], wan, are attached to which master port, eth[01]. It should not be in DT. We want a sensible default, which i would say is what i had in DT, allocate them every other, but implement this in software. And allow the user to move slaves between masters, using a user space command. Something like: ip link set dev lan4 master eth0 So if you wish, you can then have eth1 dedicated to WAN, and eth0 for lan[1-4]. Or any other combination. I would say implementing this command to move a slave between masters can come later, so long as we have a default which works for most people. Using every other is clearly between than only using one interface. Yes, that sounds reasonable. But I do still wonder if this model can be made flexible enough. How about a switch having more CPU ports than external ports (just an imaginary product - I don't know if anyone is crazy enough to make it)? Or what if I'd like to dedicate CPU port eth0 to VLAN 13, while CPU port eth1 handles everything else? With lan0 carrying an 802.1q trunk with both VLAN 13 and more, i.e. a mix of packets for both eth0 and eth1? Well, I'm being difficult now :) We can probably do fine without being able to express those things. And I realize that I'm a bit too late into any discussion about modelling this. Thanks again for taking the time to write such a good answer. Bjørn -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.32.66 tcp regression OOPs
Hi, On Mon, Jun 01, 2015 at 09:00:21AM +0200, Frans Klaver wrote: [cc: Willy Tarreau] On Mon, Jun 1, 2015 at 3:26 AM, starlight.201...@binnacle.cx wrote: Hello, Apoligies if I have submitted to the wrong lists. Encountered a regression in 2.6.32.66 relative to 2.6.32.65. Crash eight minutes after boot. Will responded with additional details if the OOPS is not sufficent. Best Regards Did you bisect it? Eric Dumazet notified me that of something possibly similar due to a mistake I made when backporting a fix by hand. Please apply the following patch to see if it fixes the problem : diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 5339f066234b..d1e2895bb63c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2136,7 +2136,7 @@ void tcp_send_fin(struct sock *sk) */ if (tskb (tcp_send_head(sk) || tcp_memory_pressure)) { coalesce: - TCP_SKB_CB(skb)-flags |= TCPCB_FLAG_FIN; + TCP_SKB_CB(tskb)-flags |= TCPCB_FLAG_FIN; TCP_SKB_CB(tskb)-end_seq++; tp-write_seq++; if (!tcp_send_head(sk)) { Thanks, Willy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [cdc_ncm] guidance and help refactoring cdc_ncm
On Mon, 2015-06-01 at 08:53 +0200, Enrico Mioso wrote: A 32-bit version of the driver (talking 32-bit NCM) is here: http://www.gstorm.eu/cdc_ncm.c I modified the original driver with the help of a very talented friend. It works: but there seem to be no real reasons to implement this properly. We did this in a consistent effort to understand what was wrong, and here it is. Well, you are really talking about two different things here. Do we ever fail because we only do 16 bit as opposed to 32 bit? Your 32 bit driver looks good, but it raises the question of what to do if this test: if (le16_to_cpu(ctx-ncm_parm.bmNtbFormatsSupported) USB_CDC_NCM_NTB32_SUPPORTED) { fails. Otherwise it looks ready for merging. Regards Oliver -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net] xen: netback: read hotplug script once at start of day.
When we come to tear things down in netback_remove() and generate the uevent it is possible that the xenstore directory has already been removed (details below). In such cases netback_uevent() won't be able to read the hotplug script and will write a xenstore error node. A recent change to the hypervisor exposed this race such that we now sometimes lose it (where apparently we didn't ever before). Instead read the hotplug script configuration during setup and use it for the lifetime of the backend device. The apparently more obvious fix of moving the transition to state=Closed in netback_remove() to after the uevent does not work because it is possible that we are already in state=Closed (in reaction to the guest having disconnected as it shutdown). Being already in Closed means the toolstack is at liberty to start tearing down the xenstore directories. In principal it might be possible to arrange to unregister the device sooner (e.g on transition to Closing) such that xenstore would still be there but this state machine is fragile and prone to anger... A modern Xen system only relies on the hotplug uevent for driver domains, when the backend is in the same domain as the toolstack it will run the necessary setup/teardown directly in the correct sequence wrt xenstore changes. Signed-off-by: Ian Campbell ian.campb...@citrix.com --- v2: Move script to backend_info and read/free it in netback_{probe,remove}. DaveM, could this go to all stable trees please. --- drivers/net/xen-netback/xenbus.c | 33 +++-- 1 file changed, 19 insertions(+), 14 deletions(-) diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c index 3d8dbf5..6380b28 100644 --- a/drivers/net/xen-netback/xenbus.c +++ b/drivers/net/xen-netback/xenbus.c @@ -34,6 +34,8 @@ struct backend_info { enum xenbus_state frontend_state; struct xenbus_watch hotplug_status_watch; u8 have_hotplug_status_watch:1; + + const char *hotplug_script; }; static int connect_rings(struct backend_info *be, struct xenvif_queue *queue); @@ -238,6 +240,7 @@ static int netback_remove(struct xenbus_device *dev) xenvif_free(be-vif); be-vif = NULL; } + kfree(be-hotplug_script); kfree(be); dev_set_drvdata(dev-dev, NULL); return 0; @@ -255,6 +258,7 @@ static int netback_probe(struct xenbus_device *dev, struct xenbus_transaction xbt; int err; int sg; + const char *script; struct backend_info *be = kzalloc(sizeof(struct backend_info), GFP_KERNEL); if (!be) { @@ -347,6 +351,15 @@ static int netback_probe(struct xenbus_device *dev, if (err) pr_debug(Error writing multi-queue-max-queues\n); + script = xenbus_read(XBT_NIL, dev-nodename, script, NULL); + if (IS_ERR(script)) { + err = PTR_ERR(script); + xenbus_dev_fatal(dev, err, reading script); + goto fail; + } + + be-hotplug_script = script; + err = xenbus_switch_state(dev, XenbusStateInitWait); if (err) goto fail; @@ -379,22 +392,14 @@ static int netback_uevent(struct xenbus_device *xdev, struct kobj_uevent_env *env) { struct backend_info *be = dev_get_drvdata(xdev-dev); - char *val; - val = xenbus_read(XBT_NIL, xdev-nodename, script, NULL); - if (IS_ERR(val)) { - int err = PTR_ERR(val); - xenbus_dev_fatal(xdev, err, reading script); - return err; - } else { - if (add_uevent_var(env, script=%s, val)) { - kfree(val); - return -ENOMEM; - } - kfree(val); - } + if (!be) + return 0; + + if (add_uevent_var(env, script=%s, be-hotplug_script)) + return -ENOMEM; - if (!be || !be-vif) + if (!be-vif) return 0; return add_uevent_var(env, vif=%s, be-vif-dev-name); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [cdc_ncm] guidance and help refactoring cdc_ncm
Thank you Oliver for the reply. On Mon, 1 Jun 2015, Oliver Neukum wrote: ==Date: Mon, 1 Jun 2015 09:48:26 ==From: Oliver Neukum oneu...@suse.com ==To: Enrico Mioso mrkiko...@gmail.com ==Cc: Greg KH g...@kroah.com, linux-...@vger.kernel.org, ==netdev@vger.kernel.org, you...@gmail.com ==Subject: Re: [cdc_ncm] guidance and help refactoring cdc_ncm == ==On Mon, 2015-06-01 at 08:53 +0200, Enrico Mioso wrote: == == A 32-bit version of the driver (talking 32-bit NCM) is here: == http://www.gstorm.eu/cdc_ncm.c == I modified the original driver with the help of a very talented friend. == It works: but there seem to be no real reasons to implement this properly. We == did this in a consistent effort to understand what was wrong, and here it is. == ==Well, you are really talking about two different things here. ==Do we ever fail because we only do 16 bit as opposed to 32 bit? ==Your 32 bit driver looks good, but it raises the question of what to do ==if this test: == ==if (le16_to_cpu(ctx-ncm_parm.bmNtbFormatsSupported) ==USB_CDC_NCM_NTB32_SUPPORTED) { == ==fails. Otherwise it looks ready for merging. == == Regards == Oliver == == == == Oh - I am sorry. Infact, I am taling about two different things here. No, I've seen no device failing because of this (16 vs. 32 bit problem). I was mentioning this only as an additional thing to consider, at least from my side. We are failing on some cases only because of the position we put the NDP part of the NCM frame. Infact, this 32-bit driver will work when the 16 bit one does, and fail when the 16 bit one does. Thank you for your review and consideration. It would be nice to see this code merged - but I think we might need to merge the two drivers in a way that reduces code de-duplication. But this might be work for a second take in case. Thank you again, Enrico -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/8] tun: add tun_is_little_endian() helper
On Fri, Apr 24, 2015 at 02:24:38PM +0200, Greg Kurz wrote: Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com Dave, could you please ack merging this through the virtio tree? --- drivers/net/tun.c |9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 857dca4..3c3d6c0 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -206,14 +206,19 @@ struct tun_struct { u32 flow_count; }; +static inline bool tun_is_little_endian(struct tun_struct *tun) +{ + return tun-flags TUN_VNET_LE; +} + static inline u16 tun16_to_cpu(struct tun_struct *tun, __virtio16 val) { - return __virtio16_to_cpu(tun-flags TUN_VNET_LE, val); + return __virtio16_to_cpu(tun_is_little_endian(tun), val); } static inline __virtio16 cpu_to_tun16(struct tun_struct *tun, u16 val) { - return __cpu_to_virtio16(tun-flags TUN_VNET_LE, val); + return __cpu_to_virtio16(tun_is_little_endian(tun), val); } static inline u32 tun_hashfn(u32 rxhash) ___ Virtualization mailing list virtualizat...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] xen: netback: fix printf format string warning
drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’: drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Wformat=] (txreq.offset~PAGE_MASK) + txreq.size); ^ PAGE_MASK's type can vary by arch, so a cast is needed. Signed-off-by: Ian Campbell ian.campb...@citrix.com v2: Cast to unsigned long, since PAGE_MASK can vary by arch. --- drivers/net/xen-netback/netback.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 4de46aa..0d25943 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1250,7 +1250,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, netdev_err(queue-vif-dev, txreq.offset: %x, size: %u, end: %lu\n, txreq.offset, txreq.size, - (txreq.offset~PAGE_MASK) + txreq.size); + (unsigned long)(txreq.offset~PAGE_MASK) + txreq.size); xenvif_fatal_tx_err(queue-vif); break; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 3/8] macvtap: introduce macvtap_is_little_endian() helper
On Fri, Apr 24, 2015 at 02:24:48PM +0200, Greg Kurz wrote: Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com Dave, could you pls ack merging this through the virtio tree? --- drivers/net/macvtap.c |9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 27ecc5c..a2f2958 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -49,14 +49,19 @@ struct macvtap_queue { #define MACVTAP_VNET_LE 0x8000 +static inline bool macvtap_is_little_endian(struct macvtap_queue *q) +{ + return q-flags MACVTAP_VNET_LE; +} + static inline u16 macvtap16_to_cpu(struct macvtap_queue *q, __virtio16 val) { - return __virtio16_to_cpu(q-flags MACVTAP_VNET_LE, val); + return __virtio16_to_cpu(macvtap_is_little_endian(q), val); } static inline __virtio16 cpu_to_macvtap16(struct macvtap_queue *q, u16 val) { - return __cpu_to_virtio16(q-flags MACVTAP_VNET_LE, val); + return __cpu_to_virtio16(macvtap_is_little_endian(q), val); } static struct proto macvtap_proto = { ___ Virtualization mailing list virtualizat...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.32.66 tcp regression OOPs
Hi, I found the patch late yesterday and applied it. Running fine now for 12 hours under active load. Recommend the patch be rolled into the tarball, or a notation added to the release page as this one has severe consequences. Thank You! At 09:49 6/1/2015 +0200, Willy Tarreau wrote: Hi, On Mon, Jun 01, 2015 at 09:00:21AM +0200, Frans Klaver wrote: [cc: Willy Tarreau] On Mon, Jun 1, 2015 at 3:26 AM, starlight.201...@binnacle.cx wrote: Hello, Apoligies if I have submitted to the wrong lists. Encountered a regression in 2.6.32.66 relative to 2.6.32.65. Crash eight minutes after boot. Will responded with additional details if the OOPS is not sufficent. Best Regards Did you bisect it? Eric Dumazet notified me that of something possibly similar due to a mistake I made when backporting a fix by hand. Please apply the following patch to see if it fixes the problem : diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 5339f066234b..d1e2895bb63c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2136,7 +2136,7 @@ void tcp_send_fin(struct sock *sk) */ if (tskb (tcp_send_head(sk) || tcp_memory_pressure)) { coalesce: - TCP_SKB_CB(skb)-flags |= TCPCB_FLAG_FIN; + TCP_SKB_CB(tskb)-flags |= TCPCB_FLAG_FIN; TCP_SKB_CB(tskb)-end_seq++; tp-write_seq++; if (!tcp_send_head(sk)) { Thanks, Willy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ipv4/udp: Verify multicast group is ours in upd_v4_early_demux()
From: Shawn Bohrer sboh...@rgmadvisors.com 421b3885bf6d56391297844f43fb7154a6396e12 udp: ipv4: Add udp early demux introduced a regression that allowed sockets bound to INADDR_ANY to receive packets from multicast groups that the socket had not joined. For example a socket that had joined 224.168.2.9 could also receive packets from 225.168.2.9 despite not having joined that group if ip_early_demux is enabled. Fix this by calling ip_check_mc_rcu() in udp_v4_early_demux() to verify that the multicast packet is indeed ours. Signed-off-by: Shawn Bohrer sboh...@rgmadvisors.com Reported-by: Yurij M. Plotnikov yurij.plotni...@oktetlabs.ru --- net/ipv4/udp.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index d10b7e0..17d31f5 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -90,6 +90,7 @@ #include linux/socket.h #include linux/sockios.h #include linux/igmp.h +#include linux/inetdevice.h #include linux/in.h #include linux/errno.h #include linux/timer.h @@ -1959,7 +1960,7 @@ void udp_v4_early_demux(struct sk_buff *skb) struct net *net = dev_net(skb-dev); const struct iphdr *iph; const struct udphdr *uh; - struct sock *sk; + struct sock *sk = NULL; struct dst_entry *dst; int dif = skb-dev-ifindex; @@ -1971,10 +1972,17 @@ void udp_v4_early_demux(struct sk_buff *skb) uh = udp_hdr(skb); if (skb-pkt_type == PACKET_BROADCAST || - skb-pkt_type == PACKET_MULTICAST) - sk = __udp4_lib_mcast_demux_lookup(net, uh-dest, iph-daddr, - uh-source, iph-saddr, dif); - else if (skb-pkt_type == PACKET_HOST) + skb-pkt_type == PACKET_MULTICAST) { + struct in_device *in_dev = __in_dev_get_rcu(skb-dev); + + if (in_dev) { + int our = ip_check_mc_rcu(in_dev, iph-daddr, iph-saddr, + iph-protocol); + if (our) + sk = __udp4_lib_mcast_demux_lookup(net, uh-dest, iph-daddr, + uh-source, iph-saddr, dif); + } + } else if (skb-pkt_type == PACKET_HOST) sk = __udp4_lib_demux_lookup(net, uh-dest, iph-daddr, uh-source, iph-saddr, dif); else -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] mac80211: Switch to new AEAD interface
Am Montag, 1. Juni 2015, 16:35:26 schrieb Johannes Berg: Hi Johannes, IOW, I think something like this would make sense: That looks definitely cleaner :-) Though, my main concern was just to ensure that the aad length value is not zero. Ciao Stephan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC net-next 1/3] net: infra for per-nexthop encap data
Having to add a new interface to apply encap onto a packet is a mechanism that works well today, allowing the setup of the encap to be done separately from the routes out of them, meaning that routing protocols and other user-space apps don't need to do anything special to add routes out of a new type of interface. However, the overhead of creating an interface is high, especially in terms of memory. Therefore, the traditional method won't work very well for large numbers of routes applying encap where there is a low degree of sharing of the encap. The solution is to introduce a way of defining encap on a per-nexthop basis (i.e. per-route if only one nexthop) through the addition of a new netlink attribute, RTA_ENCAP. The semantics of this attribute is that the data is interpreted according to the output interface type (RTA_OIF) and is opaque to the normal forwarding path. The output interface doesn't have to be defined per-nexthop, but instead represents the way of encapsulating the packet. There could be as few as one per namespace, but more could be created, particularly if they are used to define parameters which are shared by a large number of routes. However, the split of what goes in the encap data and what might be specified via interface attributes is entirely up to the encap-type implementation. New rtnetlink operations are defined to assist with the management of this data: - parse_encap for parsing the attribute given through rtnl and either sizing the in-memory version (if encap ptr is NULL) or filling in the in-memory version. RTA_ENCAP work for IPv4. This operations allows the interface to reject invalid encap specified by user-space and the sizing allows the kernel to have a different in memory implementation to the netlink API (which might be optimised for extensibility rather than speed of packet forwarding). - fill_encap for taking the in-memory version of the encap and filling in an RTA_ENCAP attribute in a netlink message. - match_encap for comparing an in-memory version of encap with an RTA_ENCAP version, returning 0 if matching or 1 if different. A new dst operation is also defined to allow encap-type interfaces to retrieve the encap data from their xmit functions and use it for encapsulating the packet and for further forwarding. Suggested-by: Eric W. Biederman ebied...@xmission.com Signed-off-by: Robert Shearman rshea...@brocade.com --- include/linux/rtnetlink.h | 7 +++ include/net/dst.h | 11 +++ include/net/dst_ops.h | 2 ++ include/net/rtnetlink.h| 11 +++ include/uapi/linux/rtnetlink.h | 1 + net/core/rtnetlink.c | 36 6 files changed, 68 insertions(+) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index a2324fb45cf4..470d822ddd61 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -22,6 +22,13 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev, void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev, gfp_t flags); +int rtnl_parse_encap(const struct net_device *dev, const struct nlattr *nla, +void *encap); +int rtnl_fill_encap(const struct net_device *dev, struct sk_buff *skb, + int encap_len, const void *encap); +int rtnl_match_encap(const struct net_device *dev, const struct nlattr *nla, +int encap_len, const void *encap); + /* RTNL is used as a global lock for all changes to network configuration */ extern void rtnl_lock(void); diff --git a/include/net/dst.h b/include/net/dst.h index 2bc73f8a00a9..df0e6ec18eca 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -506,4 +506,15 @@ static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst) } #endif +/* Get encap data for destination */ +static inline int dst_get_encap(struct sk_buff *skb, const void **encap) +{ + const struct dst_entry *dst = skb_dst(skb); + + if (!dst || !dst-ops-get_encap) + return 0; + + return dst-ops-get_encap(dst, encap); +} + #endif /* _NET_DST_H */ diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h index d64253914a6a..97f48cf8ef7d 100644 --- a/include/net/dst_ops.h +++ b/include/net/dst_ops.h @@ -32,6 +32,8 @@ struct dst_ops { struct neighbour * (*neigh_lookup)(const struct dst_entry *dst, struct sk_buff *skb, const void *daddr); + int (*get_encap)(const struct dst_entry *dst, +const void **encap); struct kmem_cache *kmem_cachep; diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h index 343d922d15c2..3121ade24957 100644 --- a/include/net/rtnetlink.h +++ b/include/net/rtnetlink.h @@ -95,6 +95,17 @@ struct rtnl_link_ops {
[RFC net-next 3/3] mpls: new ipmpls device for encapsulating IP packets as mpls
Allow creating an mpls device for the purposes of encapsulating IP packets with: ip link add type ipmpls This device defines its per-nexthop encapsulation data as a stack of labels, in the same format as for RTA_NEWST. It uses the encap data which will have been stored in the IP route to encapsulate the packet with that stack of labels, with the last label corresponding to a local label that defines how the packet will be sent out. The device sends packets over loopback to the local MPLS forwarding logic which performs all of the work. Stats are implemented, although any error in the sending via the real interface will be handled by the main mpls forwarding code and so not accounted by the interface. This implementation is based on an alternative earlier implementation by Eric W. Biederman. Signed-off-by: Robert Shearman rshea...@brocade.com --- include/uapi/linux/if_arp.h | 1 + net/mpls/Kconfig| 5 + net/mpls/Makefile | 1 + net/mpls/af_mpls.c | 2 + net/mpls/ipmpls.c | 284 5 files changed, 293 insertions(+) create mode 100644 net/mpls/ipmpls.c diff --git a/include/uapi/linux/if_arp.h b/include/uapi/linux/if_arp.h index 4d024d75d64b..17d669fd1781 100644 --- a/include/uapi/linux/if_arp.h +++ b/include/uapi/linux/if_arp.h @@ -88,6 +88,7 @@ #define ARPHRD_IEEE80211_RADIOTAP 803 /* IEEE 802.11 + radiotap header */ #define ARPHRD_IEEE802154804 #define ARPHRD_IEEE802154_MONITOR 805 /* IEEE 802.15.4 network monitor */ +#define ARPHRD_MPLS806 /* IP and IPv6 over MPLS tunnels */ #define ARPHRD_PHONET 820 /* PhoNet media type*/ #define ARPHRD_PHONET_PIPE 821 /* PhoNet pipe header */ diff --git a/net/mpls/Kconfig b/net/mpls/Kconfig index 17bde799c854..5264da94733a 100644 --- a/net/mpls/Kconfig +++ b/net/mpls/Kconfig @@ -27,4 +27,9 @@ config MPLS_ROUTING help Add support for forwarding of mpls packets. +config MPLS_IPTUNNEL + tristate MPLS: IP over MPLS tunnel support + help +A network device that encapsulates ip packets as mpls + endif # MPLS diff --git a/net/mpls/Makefile b/net/mpls/Makefile index 65bbe68c72e6..3a93c14b23c5 100644 --- a/net/mpls/Makefile +++ b/net/mpls/Makefile @@ -3,5 +3,6 @@ # obj-$(CONFIG_NET_MPLS_GSO) += mpls_gso.o obj-$(CONFIG_MPLS_ROUTING) += mpls_router.o +obj-$(CONFIG_MPLS_IPTUNNEL) += ipmpls.o mpls_router-y := af_mpls.o diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index 7b3f732269e4..68bdfbdddfaf 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -615,6 +615,7 @@ int nla_put_labels(struct sk_buff *skb, int attrtype, return 0; } +EXPORT_SYMBOL(nla_put_labels); int nla_get_labels(const struct nlattr *nla, u32 max_labels, u32 *labels, u32 label[]) @@ -660,6 +661,7 @@ int nla_get_labels(const struct nlattr *nla, *labels = nla_labels; return 0; } +EXPORT_SYMBOL(nla_get_labels); static int rtm_to_route_config(struct sk_buff *skb, struct nlmsghdr *nlh, struct mpls_route_config *cfg) diff --git a/net/mpls/ipmpls.c b/net/mpls/ipmpls.c new file mode 100644 index ..cf6894ae0c61 --- /dev/null +++ b/net/mpls/ipmpls.c @@ -0,0 +1,284 @@ +#include linux/types.h +#include linux/netdevice.h +#include linux/if_vlan.h +#include linux/if_arp.h +#include linux/ip.h +#include linux/ipv6.h +#include linux/module.h +#include linux/mpls.h +#include internal.h + +static LIST_HEAD(ipmpls_dev_list); + +#define MAX_NEW_LABELS 2 + +struct ipmpls_dev_priv { + struct net_device *out_dev; + struct list_head list; + struct net_device *dev; +}; + +static netdev_tx_t ipmpls_dev_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct ipmpls_dev_priv *priv = netdev_priv(dev); + struct net_device *out_dev = priv-out_dev; + struct mpls_shim_hdr *hdr; + bool bottom_of_stack = true; + int len = skb-len; + const void *encap; + int num_labels; + unsigned ttl; + const u32 *labels; + int ret; + int i; + + num_labels = dst_get_encap(skb, encap) / 4; + if (!num_labels) + goto drop; + + labels = encap; + + /* Obtain the ttl */ + if (skb-protocol == htons(ETH_P_IP)) { + ttl = ip_hdr(skb)-ttl; + } else if (skb-protocol == htons(ETH_P_IPV6)) { + ttl = ipv6_hdr(skb)-hop_limit; + } else if (skb-protocol == htons(ETH_P_MPLS_UC)) { + ttl = mpls_entry_decode(mpls_hdr(skb)).ttl; + bottom_of_stack = false; + } else { + goto drop; + } + + /* Now that the encap has been retrieved, there's no longer +* any need to keep the dst around so clear it out. +*/ + skb_dst_drop(skb); + skb_orphan(skb); + + skb-inner_protocol = skb-protocol; +
Re: [PATCH v2 net-next] vlan: Add GRO support for non hardware accelerated vlan
On 15/06/01 (月) 23:12, Eric Dumazet wrote: On Mon, 2015-06-01 at 21:55 +0900, Toshiaki Makita wrote: @@ -668,6 +753,9 @@ static int __init vlan_proto_init(void) if (err 0) goto err5; + for (i = 0; i ARRAY_SIZE(vlan_packet_offloads); i++) + dev_add_offload(vlan_packet_offloads[i]); + vlan_ioctl_set(vlan_ioctl_handler); return 0; My concern about this is : This might slow down GRO stack for other traffic, if dev_add_offload() for vlan offloads is called after dev_add_offload(ip_packet_offload) / dev_add_offload(ipv6_packet_offload) I didn't have that concern because there are already other similar offloads (eth, mpls_uc, mpls_mc). But indeed, they and this could slow down GRO stack. This is because list_add_rcu is used and this inserts in front of the offload_base list. void dev_add_offload(struct packet_offload *po) { struct list_head *head = offload_base; spin_lock(offload_lock); list_add_rcu(po-list, head); spin_unlock(offload_lock); } Can we ensure offload_base contains a sensible order of expected types ? Add priority to packet_offload like nf_hook_ops? Or have dev_add_offload() prioritize IP IPV6 others? Toshiaki Makita -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC net-next 2/3] ipv4: storing and retrieval of per-nexthop encap
Parse RTA_ENCAP attribute for one path and multipath routes. The encap length is stored in a newly added field to fib_nh, nh_encap_len, although this is added to a padding hole in the structure so that it doesn't increase the size at all. The encap data itself is stored at the end of the array of nexthops. Whilst this means that retrieval isn't optimal, especially if there are multiple nexthops, this avoids the memory cost of an extra pointer, as well as any potential change to the cache or instruction layout that could cause a performance impact. Currently, the dst structure allocated to represent the destination of the packet and used for retrieving the encap by the encap-type interface has been grown through the addition of the rt_encap_len and rt_encap fields. This isn't desirable and could be fixed by defining a new destination type with operations copied from the normal case, other than the addition of the get_encap operation. Signed-off-by: Robert Shearman rshea...@brocade.com --- include/net/ip_fib.h | 2 + include/net/route.h | 3 + net/ipv4/fib_frontend.c | 3 + net/ipv4/fib_lookup.h| 2 + net/ipv4/fib_semantics.c | 179 ++- net/ipv4/route.c | 24 +++ 6 files changed, 211 insertions(+), 2 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 54271ed0ed45..a06cec5eb3aa 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -44,6 +44,7 @@ struct fib_config { u32 fc_flow; u32 fc_nlflags; struct nl_info fc_nlinfo; + struct nlattr *fc_encap; }; struct fib_info; @@ -75,6 +76,7 @@ struct fib_nh { struct fib_info *nh_parent; unsigned intnh_flags; unsigned char nh_scope; + unsigned char nh_encap_len; #ifdef CONFIG_IP_ROUTE_MULTIPATH int nh_weight; int nh_power; diff --git a/include/net/route.h b/include/net/route.h index fe22d03afb6a..e8b58914c4c1 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -64,6 +64,9 @@ struct rtable { /* Miscellaneous cached information */ u32 rt_pmtu; + unsigned intrt_encap_len; + void*rt_encap; + struct list_headrt_uncached; struct uncached_list*rt_uncached_list; }; diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 872494e6e6eb..aa538ab7e3b9 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -656,6 +656,9 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb, case RTA_TABLE: cfg-fc_table = nla_get_u32(attr); break; + case RTA_ENCAP: + cfg-fc_encap = attr; + break; } } diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h index c6211ed60b03..003318c51ae8 100644 --- a/net/ipv4/fib_lookup.h +++ b/net/ipv4/fib_lookup.h @@ -34,6 +34,8 @@ int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event, u32 tb_id, unsigned int); void rtmsg_fib(int event, __be32 key, struct fib_alias *fa, int dst_len, u32 tb_id, const struct nl_info *info, unsigned int nlm_flags); +const void *fib_get_nh_encap(const struct fib_info *fi, +const struct fib_nh *nh); static inline void fib_result_assign(struct fib_result *res, struct fib_info *fi) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 28ec3c1823bf..db466b636241 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -257,6 +257,9 @@ static inline int nh_comp(const struct fib_info *fi, const struct fib_info *ofi) const struct fib_nh *onh = ofi-fib_nh; for_nexthops(fi) { + const void *onh_encap = fib_get_nh_encap(fi, nh); + const void *nh_encap = fib_get_nh_encap(fi, nh); + if (nh-nh_oif != onh-nh_oif || nh-nh_gw != onh-nh_gw || nh-nh_scope != onh-nh_scope || @@ -266,7 +269,10 @@ static inline int nh_comp(const struct fib_info *fi, const struct fib_info *ofi) #ifdef CONFIG_IP_ROUTE_CLASSID nh-nh_tclassid != onh-nh_tclassid || #endif - ((nh-nh_flags ^ onh-nh_flags) ~RTNH_F_DEAD)) + ((nh-nh_flags ^ onh-nh_flags) ~RTNH_F_DEAD) || + nh-nh_encap_len != onh-nh_encap_len || + memcmp(nh_encap, onh_encap, nh-nh_encap_len) + ) return -1; onh++; } endfor_nexthops(fi); @@ -374,6 +380,11 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi) /* may contain flow and gateway attribute */
[RFC net-next 0/3] IP imposition of per-nh MPLS encap
In order to be able to function as a Label Edge Router in an MPLS network, it is necessary to be able to take IP packets and impose an MPLS encap and forward them out. The traditional approach of setting up an interface for each tunnel endpoint doesn't scale for the common MPLS use-cases where each IP route tends to be assigned a different label as encap. The solution suggested here for further discussion is to provide the facility to define encap data on a per-nexthop basis using a new netlink attribue, RTA_ENCAP, which would be opaque to the IPv4/IPv6 forwarding code, but interpreted by the virtual interface assigned to the nexthop. A new ipmpls interface type is defined to show the use of this facility to allow IP packets to be imposed with an MPLS encap. However, the facility is designed to be general enough to be used by any encapsulation/tunneling mechanism that has similar requirements of high-scale, high-variation-of-encap. RFC because: - IPv6 side not implemented - struct rtable shouldn't be bloated by pointer+uint - Hasn't been thoroughly tested yet Robert Shearman (3): net: infra for per-nexthop encap data ipv4: storing and retrieval of per-nexthop encap mpls: new ipmpls device for encapsulating IP packets as mpls include/linux/rtnetlink.h | 7 + include/net/dst.h | 11 ++ include/net/dst_ops.h | 2 + include/net/ip_fib.h | 2 + include/net/route.h| 3 + include/net/rtnetlink.h| 11 ++ include/uapi/linux/if_arp.h| 1 + include/uapi/linux/rtnetlink.h | 1 + net/core/rtnetlink.c | 36 ++ net/ipv4/fib_frontend.c| 3 + net/ipv4/fib_lookup.h | 2 + net/ipv4/fib_semantics.c | 179 +- net/ipv4/route.c | 24 net/mpls/Kconfig | 5 + net/mpls/Makefile | 1 + net/mpls/af_mpls.c | 2 + net/mpls/ipmpls.c | 284 + 17 files changed, 572 insertions(+), 2 deletions(-) create mode 100644 net/mpls/ipmpls.c -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.32.66 tcp regression OOPs
On Mon, Jun 01, 2015 at 11:32:19AM -0400, starlight.201...@binnacle.cx wrote: Hi, I found the patch late yesterday and applied it. Running fine now for 12 hours under active load. Thank you. Recommend the patch be rolled into the tarball, or a notation added to the release page as this one has severe consequences. I'll emit 2.6.32.67 with it. I didn't know it was that easy to trigger it, and since feedback comes slowly on 2.6.32, I was waiting a bit for more feedback before doing another one. Thank you! Willy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next] vlan: Add GRO support for non hardware accelerated vlan
On Tue, 2015-06-02 at 01:03 +0900, Toshiaki Makita wrote: I didn't have that concern because there are already other similar offloads (eth, mpls_uc, mpls_mc). But indeed, they and this could slow down GRO stack. Right, but these mpls offloads are not installed on my kernels ;) And I already checked that eth was installed before IPv4/IPv6, although it might be pure luck. Add priority to packet_offload like nf_hook_ops? Or have dev_add_offload() prioritize IP IPV6 others? You also could use a CONFIG_NET_VLAN_GRO module, so that only users stuck with non accelerated vlan can load. But yes, we might take care of this problem at some point. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL
On 01/06/15 15:27, Thomas Graf wrote: Introduces a new Netlink attribute RTA_TUNNEL which allows routes to set tunnel transmit metadata and specify the tunnel endpoint or tunnel id on a per route basis. The route must point to a tunnel device which understands per skb tunnel metadata and has been put into the respective mode. We've been discussing something similar for the purposes of IP over MPLS, but most of the attributes for IP tunnels aren't relevant for MPLS. It be great if we can come up with something general enough that can serve both purposes. I've just sent a patch series ([RFC net-next 0/3] IP imposition of per-nh MPLS encap) which I believe would allow this. Thanks, Rob Signed-off-by: Thomas Graf tg...@suug.ch --- include/net/ip_fib.h | 3 +++ include/net/ip_tunnels.h | 1 - include/net/route.h| 10 include/uapi/linux/rtnetlink.h | 16 net/ipv4/fib_frontend.c| 57 ++ net/ipv4/fib_semantics.c | 45 + net/ipv4/route.c | 30 +- net/openvswitch/vport.h| 1 + 8 files changed, 161 insertions(+), 2 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 54271ed..1cd7cf8 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -22,6 +22,7 @@ #include net/fib_rules.h #include net/inetpeer.h #include linux/percpu.h +#include net/ip_tunnels.h struct fib_config { u8 fc_dst_len; @@ -44,6 +45,7 @@ struct fib_config { u32 fc_flow; u32 fc_nlflags; struct nl_info fc_nlinfo; + struct ip_tunnel_info fc_tunnel; }; struct fib_info; @@ -117,6 +119,7 @@ struct fib_info { #ifdef CONFIG_IP_ROUTE_MULTIPATH int fib_power; #endif + struct ip_tunnel_info *fib_tunnel; struct rcu_head rcu; struct fib_nh fib_nh[0]; #define fib_dev fib_nh[0].nh_dev diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index df8cfd3..b4ab930 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -9,7 +9,6 @@ #include net/dsfield.h #include net/gro_cells.h #include net/inet_ecn.h -#include net/ip.h #include net/netns/generic.h #include net/rtnetlink.h #include net/flow.h diff --git a/include/net/route.h b/include/net/route.h index 6ede321..dbda603 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -28,6 +28,7 @@ #include net/inetpeer.h #include net/flow.h #include net/inet_sock.h +#include net/ip_tunnels.h #include linux/in_route.h #include linux/rtnetlink.h #include linux/rcupdate.h @@ -66,6 +67,7 @@ struct rtable { struct list_headrt_uncached; struct uncached_list*rt_uncached_list; + struct ip_tunnel_info *rt_tun_info; }; static inline bool rt_is_input_route(const struct rtable *rt) @@ -198,6 +200,8 @@ struct in_ifaddr; void fib_add_ifaddr(struct in_ifaddr *); void fib_del_ifaddr(struct in_ifaddr *, struct in_ifaddr *); +int fib_dump_tun_info(struct sk_buff *skb, struct ip_tunnel_info *tun_info); + static inline void ip_rt_put(struct rtable *rt) { /* dst_release() accepts a NULL parameter. @@ -317,9 +321,15 @@ static inline int ip4_dst_hoplimit(const struct dst_entry *dst) static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb) { + struct rtable *rt; + if (skb_shinfo(skb)-tun_info) return skb_shinfo(skb)-tun_info; + rt = skb_rtable(skb); + if (rt) + return rt-rt_tun_info; + return NULL; } diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 17fb02f..1f7aa68 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -286,6 +286,21 @@ enum rt_class_t { /* Routing message attributes */ +enum rta_tunnel_t { + RTA_TUN_UNSPEC, + RTA_TUN_ID, + RTA_TUN_DST, + RTA_TUN_SRC, + RTA_TUN_TTL, + RTA_TUN_TOS, + RTA_TUN_SPORT, + RTA_TUN_DPORT, + RTA_TUN_FLAGS, + __RTA_TUN_MAX, +}; + +#define RTA_TUN_MAX (__RTA_TUN_MAX - 1) + enum rtattr_type_t { RTA_UNSPEC, RTA_DST, @@ -308,6 +323,7 @@ enum rtattr_type_t { RTA_VIA, RTA_NEWDST, RTA_PREF, + RTA_TUNNEL, /* destination VTEP */ __RTA_MAX }; diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 872494e..bfa77a6 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -580,6 +580,57 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg) return -EINVAL; } +static const struct nla_policy tunnel_policy[RTA_TUN_MAX + 1] = { + [RTA_TUN_ID]= { .type = NLA_U64 }, + [RTA_TUN_DST] = { .type = NLA_U32 }, +
Re: [PATCH 7/7] mac80211: Switch to new AEAD interface
On Mon, 2015-06-01 at 15:49 +0200, Stephan Mueller wrote: The contents, now, that's a more interesting question. I believe it can never be all zeroes, since association request frames are not encrypted/protected and thus at least one byte in here must be non-zero. The MAC addresses are also very likely non-zero, but technically 00:00:00:00:00:00 is a valid MAC address I believe. So, even when having a malicious AP, that value is never zero? The driver of the question is the following code in the patch set: + sg_set_buf(sg[0], aad[2], be16_to_cpup((__be16 *)aad)); ... + aead_request_set_crypt(aead_req, sg, sg, data_len, b_0); ... crypto_aead_encrypt(aead_req); When I played around with the aead_request_set_crypt, I saw a crash in the scatterlist handling of the crypto API when the first SGL entry has a zero length. Wait, I guess that's a *third* way for this to be zero a valid pointer but zero length data? Oh, no - you're referring to the CCM/GCM cases only, I guess, i.e. this part: - sg_init_one(assoc, aad[2], be16_to_cpup((__be16 *)aad)); + sg_set_buf(sg[0], aad[2], be16_to_cpup((__be16 *)aad)); I was looking at GMAC and that has a constant for the length :-) Ok - here the length is kinda passed a part of the AAD buffer, but this is really just some arcane code that should be fixed to use a proper struct. The value there, even though it is __be16 and looks like it came from the data, is actually created locally, see ccmp_special_blocks() and gcmp_special_blocks(). johannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] libceph: use kvfree() in ceph_put_page_vector()
Use kvfree() instead of open-coding it. Signed-off-by: Geliang Tang geliangt...@163.com --- net/ceph/pagevec.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c index 096d914..d4f5f22 100644 --- a/net/ceph/pagevec.c +++ b/net/ceph/pagevec.c @@ -51,10 +51,7 @@ void ceph_put_page_vector(struct page **pages, int num_pages, bool dirty) set_page_dirty_lock(pages[i]); put_page(pages[i]); } - if (is_vmalloc_addr(pages)) - vfree(pages); - else - kfree(pages); + kvfree(pages); } EXPORT_SYMBOL(ceph_put_page_vector); -- 2.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] mac80211: Switch to new AEAD interface
Am Donnerstag, 21. Mai 2015, 13:20:49 schrieb Johannes Berg: Hi Johannes, On Thu, 2015-05-21 at 18:44 +0800, Herbert Xu wrote: This patch makes use of the new AEAD interface which uses a single SG list instead of separate lists for the AD and plain text. Looks fine - want me to run any tests on it? Just a short question on ieee80211_aes_ccm_encrypt, ieee80211_aes_ccm_decrypt, ieee80211_aes_gcm_encrypt, ieee80211_aes_gcm_decrypt, ieee80211_aes_gmac: can the aad parameter of these functions be zero? -- Ciao Stephan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] mac80211: Switch to new AEAD interface
Am Montag, 1. Juni 2015, 15:42:41 schrieb Johannes Berg: Hi Johannes, On Mon, 2015-06-01 at 15:21 +0200, Stephan Mueller wrote: Just a short question on ieee80211_aes_ccm_encrypt, ieee80211_aes_ccm_decrypt, ieee80211_aes_gcm_encrypt, ieee80211_aes_gcm_decrypt, ieee80211_aes_gmac: can the aad parameter of these functions be zero? What do you mean by zero? The pointer itself can clearly never be NULL. Thanks for clarifying: indeed I mean the value of the pointer, not the pointer itself :-) The contents, now, that's a more interesting question. I believe it can never be all zeroes, since association request frames are not encrypted/protected and thus at least one byte in here must be non-zero. The MAC addresses are also very likely non-zero, but technically 00:00:00:00:00:00 is a valid MAC address I believe. So, even when having a malicious AP, that value is never zero? The driver of the question is the following code in the patch set: + sg_set_buf(sg[0], aad[2], be16_to_cpup((__be16 *)aad)); ... + aead_request_set_crypt(aead_req, sg, sg, data_len, b_0); ... crypto_aead_encrypt(aead_req); When I played around with the aead_request_set_crypt, I saw a crash in the scatterlist handling of the crypto API when the first SGL entry has a zero length. Ciao Stephan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] net/mlx4: fix typo in mlx4_set_vf_mac
fix typo in mlx4_set_vf_mac Signed-off-by: Carol L Soto cls...@linux.vnet.ibm.com --- drivers/net/ethernet/mellanox/mlx4/cmd.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c @@ -2687,7 +2687,7 @@ int mlx4_set_vf_mac(struct mlx4_dev *dev port = mlx4_slaves_closest_port(dev, slave, port); s_info = priv-mfunc.master.vf_admin[slave].vport[port]; s_info-mac = mac; - mlx4_info(dev, default mac on vf %d port %d to %llX will take afect only after vf restart\n, + mlx4_info(dev, default mac on vf %d port %d to %llX will take effect only after vf restart\n, vf, port, s_info-mac); return 0; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] net/mlx4: double free of dev_vfs
If user loads mlx4_core with num_vfs greater than supported then variable dev-dev_vfs is freed 2 times after unloading the driver. Signed-off-by: Carol L Soto cls...@linux.vnet.ibm.com --- drivers/net/ethernet/mellanox/mlx4/main.c |1 + 1 file changed, 1 insertion(+) --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -2685,6 +2685,7 @@ disable_sriov: free_mem: dev-persist-num_vfs = 0; kfree(dev-dev_vfs); + dev-dev_vfs = NULL; return dev_flags ~MLX4_FLAG_MASTER; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] net/mlx4: need to call close fw if alloc icm is called twice
If mlx4_enable_sriov is called by adapter without this feature MLX4_DEV_CAP_FLAG2_SYS_EQS then during this path the function alloc icm is called twice without freeing the structures from the first time. Signed-off-by: Carol L Soto cls...@linux.vnet.ibm.com --- drivers/net/ethernet/mellanox/mlx4/main.c |1 + 1 file changed, 1 insertion(+) --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -2837,6 +2837,7 @@ slave_start: existing_vfs, reset_flow); + mlx4_close_fw(dev); mlx4_cmd_cleanup(dev, MLX4_CMD_CLEANUP_ALL); dev-flags = dev_flags; if (!SRIOV_VALID_STATE(dev-flags)) { -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next] vlan: Add GRO support for non hardware accelerated vlan
On Mon, 2015-06-01 at 21:55 +0900, Toshiaki Makita wrote: @@ -668,6 +753,9 @@ static int __init vlan_proto_init(void) if (err 0) goto err5; + for (i = 0; i ARRAY_SIZE(vlan_packet_offloads); i++) + dev_add_offload(vlan_packet_offloads[i]); + vlan_ioctl_set(vlan_ioctl_handler); return 0; My concern about this is : This might slow down GRO stack for other traffic, if dev_add_offload() for vlan offloads is called after dev_add_offload(ip_packet_offload) / dev_add_offload(ipv6_packet_offload) This is because list_add_rcu is used and this inserts in front of the offload_base list. void dev_add_offload(struct packet_offload *po) { struct list_head *head = offload_base; spin_lock(offload_lock); list_add_rcu(po-list, head); spin_unlock(offload_lock); } Can we ensure offload_base contains a sensible order of expected types ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC 04/14] route: Extend flow representation with tunnel key
Add a new flowi_tunnel structure which is a subset of ip_tunnel_key to allow routes to match on tunnel metadata. For now, the tunnel id is added to flowi_tunnel which allows for routes to be bound to specific virtual tunnels. Signed-off-by: Thomas Graf tg...@suug.ch --- include/net/flow.h | 7 +++ include/net/ip_tunnels.h | 10 ++ net/ipv4/route.c | 2 ++ 3 files changed, 19 insertions(+) diff --git a/include/net/flow.h b/include/net/flow.h index 8109a15..c15fb5e 100644 --- a/include/net/flow.h +++ b/include/net/flow.h @@ -19,6 +19,10 @@ #define LOOPBACK_IFINDEX 1 +struct flowi_tunnel { + __be64 tun_id; +}; + struct flowi_common { int flowic_oif; int flowic_iif; @@ -30,6 +34,7 @@ struct flowi_common { #define FLOWI_FLAG_ANYSRC 0x01 #define FLOWI_FLAG_KNOWN_NH0x02 __u32 flowic_secid; + struct flowi_tunnel flowic_tun_key; }; union flowi_uli { @@ -66,6 +71,7 @@ struct flowi4 { #define flowi4_proto __fl_common.flowic_proto #define flowi4_flags __fl_common.flowic_flags #define flowi4_secid __fl_common.flowic_secid +#define flowi4_tun_key __fl_common.flowic_tun_key /* (saddr,daddr) must be grouped, same order as in IP header */ __be32 saddr; @@ -165,6 +171,7 @@ struct flowi { #define flowi_protou.__fl_common.flowic_proto #define flowi_flagsu.__fl_common.flowic_flags #define flowi_secidu.__fl_common.flowic_secid +#define flowi_tun_key u.__fl_common.flowic_tun_key } __attribute__((__aligned__(BITS_PER_LONG/8))); static inline struct flowi *flowi4_to_flowi(struct flowi4 *fl4) diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index 8b76ba1..df8cfd3 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -12,6 +12,7 @@ #include net/ip.h #include net/netns/generic.h #include net/rtnetlink.h +#include net/flow.h #if IS_ENABLED(CONFIG_IPV6) #include net/ipv6.h @@ -337,6 +338,15 @@ static inline void *ip_tunnel_info_opts(struct ip_tunnel_info *info, return info + 1; } +static inline void ip_tunnel_derive_key(struct sk_buff *skb, + struct flowi_tunnel *key) +{ + struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info; + + if (tun_info tun_info-mode == IP_TUNNEL_INFO_RX) + key-tun_id = tun_info-key.tun_id; +} + #endif /* CONFIG_INET */ #endif /* __NET_IP_TUNNELS_H */ diff --git a/net/ipv4/route.c b/net/ipv4/route.c index f605598..6e8e1be 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -109,6 +109,7 @@ #include linux/kmemleak.h #endif #include net/secure_seq.h +#include net/ip_tunnels.h #define RT_FL_TOS(oldflp4) \ ((oldflp4)-flowi4_tos (IPTOS_RT_MASK | RTO_ONLINK)) @@ -1716,6 +1717,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr, fl4.flowi4_scope = RT_SCOPE_UNIVERSE; fl4.daddr = daddr; fl4.saddr = saddr; + ip_tunnel_derive_key(skb, fl4.flowi4_tun_key); err = fib_lookup(net, fl4, res); if (err != 0) { if (!IN_DEV_FORWARD(in_dev)) -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC 02/14] ip_tunnel: support per packet tunnel metadata
This allows to attach an ip_tunnel_info metadata structure to skbs via skb_shared_info to represent receive side tunnel information as well as transmit side encapsulation instructions. The new field is added to skb_shared_info as the field is typically immutable after it has been attached. A new flag indicates whether the metadata is meant for receive or transmit. This allows to keep receive metadata attached to the skb all the way through the forwarding path without mistaking it for transmit instructions. The tun_info pointer is thus only released if a packet which has been received on a tunnel is being forwarded to tunnel device again. Since transmit instructions are immutable per flow which attaches them to the skb, a reference count is introduced which allows to reuse the metadata for many packets. Therefore, when a route later on receives the capability to attach tunnel metadata, it will only have to allocate the metadata once and can simply increment the reference counter for each packet that uses that instruction set. Signed-off-by: Thomas Graf tg...@suug.ch --- include/linux/skbuff.h| 1 + include/net/ip_tunnels.h | 45 + net/core/skbuff.c | 8 net/ipv4/ip_tunnel_core.c | 15 +++ 4 files changed, 69 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 6b41c15..83f9a59 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -323,6 +323,7 @@ struct skb_shared_info { unsigned short gso_segs; unsigned short gso_type; struct sk_buff *frag_list; + struct ip_tunnel_info *tun_info; struct skb_shared_hwtstamps hwtstamps; u32 tskey; __be32 ip6_frag_id; diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index 6b9d559..3968705 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -38,10 +38,20 @@ struct ip_tunnel_key { __be16 tp_dst; } __packed __aligned(4); /* Minimize padding. */ +/* Indicates whether the tunnel info structure represents receive + * or transmit tunnel parameters. + */ +enum { + IP_TUNNEL_INFO_RX, + IP_TUNNEL_INFO_TX, +}; + struct ip_tunnel_info { struct ip_tunnel_keykey; const void *options; + atomic_trefcnt; u8 options_len; + u8 mode; }; /* 6rd prefix/relay information */ @@ -284,6 +294,41 @@ static inline void iptunnel_xmit_stats(int err, } } +struct ip_tunnel_info *ip_tunnel_info_alloc(size_t optslen, gfp_t flags); + +static inline void ip_tunnel_info_get(struct ip_tunnel_info *info) +{ + atomic_inc(info-refcnt); +} + +static inline void ip_tunnel_info_put(struct ip_tunnel_info *info) +{ + if (!info) + return; + + if (atomic_dec_and_test(info-refcnt)) + kfree(info); +} + +static inline int skb_attach_tunnel_info(struct sk_buff *skb, +struct ip_tunnel_info *info) +{ + if (skb_unclone(skb, GFP_ATOMIC)) + return -ENOMEM; + + ip_tunnel_info_put(skb_shinfo(skb)-tun_info); + ip_tunnel_info_get(info); + skb_shinfo(skb)-tun_info = info; + + return 0; +} + +static inline void skb_release_tunnel_info(struct sk_buff *skb) +{ + ip_tunnel_info_put(skb_shinfo(skb)-tun_info); + skb_shinfo(skb)-tun_info = NULL; +} + #endif /* CONFIG_INET */ #endif /* __NET_IP_TUNNELS_H */ diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 9bac0e6..dbbace2 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -69,6 +69,7 @@ #include net/sock.h #include net/checksum.h #include net/ip6_checksum.h +#include net/ip_tunnels.h #include net/xfrm.h #include asm/uaccess.h @@ -594,6 +595,8 @@ static void skb_release_data(struct sk_buff *skb) uarg-callback(uarg, true); } + ip_tunnel_info_put(shinfo-tun_info); + if (shinfo-frag_list) kfree_skb_list(shinfo-frag_list); @@ -985,6 +988,11 @@ static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old) skb_shinfo(new)-gso_size = skb_shinfo(old)-gso_size; skb_shinfo(new)-gso_segs = skb_shinfo(old)-gso_segs; skb_shinfo(new)-gso_type = skb_shinfo(old)-gso_type; + + if (skb_shinfo(old)-tun_info) { + ip_tunnel_info_get(skb_shinfo(old)-tun_info); + skb_shinfo(new)-tun_info = skb_shinfo(old)-tun_info; + } } static inline int skb_alloc_rx_flag(const struct sk_buff *skb) diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index 6a51a71..bbd4f91 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -190,3 +190,18 @@ struct rtnl_link_stats64 *ip_tunnel_get_stats64(struct net_device *dev, return tot; } EXPORT_SYMBOL_GPL(ip_tunnel_get_stats64); +
[net-next RFC 01/14] ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic
Rename the tunnel metadata data structures currently internal to OVS and make them generic for use by all IP tunnels. Both structures are kernel internal and will stay that way. Their members are exposed to user space through individual Netlink attributes by OVS. It will therefore be possible to extend/modify these structures without affecting user ABI. Signed-off-by: Thomas Graf tg...@suug.ch --- include/net/ip_tunnels.h | 63 + include/uapi/linux/openvswitch.h | 2 +- net/openvswitch/actions.c| 2 +- net/openvswitch/datapath.h | 5 +-- net/openvswitch/flow.c | 4 +-- net/openvswitch/flow.h | 76 ++-- net/openvswitch/flow_netlink.c | 16 - net/openvswitch/flow_netlink.h | 2 +- net/openvswitch/vport-geneve.c | 17 + net/openvswitch/vport-gre.c | 16 - net/openvswitch/vport-vxlan.c| 18 +- net/openvswitch/vport.c | 30 net/openvswitch/vport.h | 12 +++ 13 files changed, 128 insertions(+), 135 deletions(-) diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index d8214cb..6b9d559 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -22,6 +22,28 @@ /* Keep error state on tunnel for 30 sec */ #define IPTUNNEL_ERR_TIMEO (30*HZ) +/* Used to memset ip_tunnel padding. */ +#define IP_TUNNEL_KEY_SIZE \ + (offsetof(struct ip_tunnel_key, tp_dst) + \ +FIELD_SIZEOF(struct ip_tunnel_key, tp_dst)) + +struct ip_tunnel_key { + __be64 tun_id; + __be32 ipv4_src; + __be32 ipv4_dst; + __be16 tun_flags; + __u8ipv4_tos; + __u8ipv4_ttl; + __be16 tp_src; + __be16 tp_dst; +} __packed __aligned(4); /* Minimize padding. */ + +struct ip_tunnel_info { + struct ip_tunnel_keykey; + const void *options; + u8 options_len; +}; + /* 6rd prefix/relay information */ #ifdef CONFIG_IPV6_SIT_6RD struct ip_tunnel_6rd_parm { @@ -136,6 +158,47 @@ int ip_tunnel_encap_add_ops(const struct ip_tunnel_encap_ops *op, int ip_tunnel_encap_del_ops(const struct ip_tunnel_encap_ops *op, unsigned int num); +static inline void __ip_tunnel_info_init(struct ip_tunnel_info *tun_info, +__be32 saddr, __be32 daddr, +u8 tos, u8 ttl, +__be16 tp_src, __be16 tp_dst, +__be64 tun_id, __be16 tun_flags, +const void *opts, u8 opts_len) +{ + tun_info-key.tun_id = tun_id; + tun_info-key.ipv4_src = saddr; + tun_info-key.ipv4_dst = daddr; + tun_info-key.ipv4_tos = tos; + tun_info-key.ipv4_ttl = ttl; + tun_info-key.tun_flags = tun_flags; + + /* For the tunnel types on the top of IPsec, the tp_src and tp_dst of +* the upper tunnel are used. +* E.g: GRE over IPSEC, the tp_src and tp_port are zero. +*/ + tun_info-key.tp_src = tp_src; + tun_info-key.tp_dst = tp_dst; + + /* Clear struct padding. */ + if (sizeof(tun_info-key) != IP_TUNNEL_KEY_SIZE) + memset((unsigned char *)tun_info-key + IP_TUNNEL_KEY_SIZE, + 0, sizeof(tun_info-key) - IP_TUNNEL_KEY_SIZE); + + tun_info-options = opts; + tun_info-options_len = opts_len; +} + +static inline void ip_tunnel_info_init(struct ip_tunnel_info *tun_info, + const struct iphdr *iph, + __be16 tp_src, __be16 tp_dst, + __be64 tun_id, __be16 tun_flags, + const void *opts, u8 opts_len) +{ + __ip_tunnel_info_init(tun_info, iph-saddr, iph-daddr, + iph-tos, iph-ttl, tp_src, tp_dst, + tun_id, tun_flags, opts, opts_len); +} + #ifdef CONFIG_INET int ip_tunnel_init(struct net_device *dev); diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index bbd49a0..fffe317 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -319,7 +319,7 @@ enum ovs_key_attr { * the accepted length of the array. */ #ifdef __KERNEL__ - OVS_KEY_ATTR_TUNNEL_INFO, /* struct ovs_tunnel_info */ + OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */ #endif __OVS_KEY_ATTR_MAX }; diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index b491c1c..34cad57 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -610,7
Re: [PATCH] ethtool: changes of emac_regs structure accordingly within driver emac_regs structure.
On Mon, 1 June 2015 12:57 +0400 Ben Hutchings b...@decadent.org.uk wrote: On Thu, 2015-05-21 at 19:09 +0400, Ivan Mikhaylov wrote: In ibm_emac.c in ethtool size of emac structure which passing through to driver is nailed down and not correlating with current emac_regs structure. Signed-off-by: Ivan Mikhaylov i...@ru.ibm.com [...] This is not backward-compatible. It ought to be possible to mix and match old and new ethtool and driver, except for the EMAC4SYNC case which has been broken up until now. Using the new definition of struct emac_regs, I think the driver and ethtool need to agree that the MAC register dump sizes are: EMAC: offsetof(struct emac_regs, u1) EMAC4: offsetof(struct emac_regs, u1.emac4) + sizeof(p-u1.emac4) EMAC4SYNC: offsetof(struct emac_regs, u1.emac4sync) + sizeof(p-u1.emac4sync) Ben. -- Ben Hutchings Reality is just a crutch for people who can't handle science fiction. Actually it is backward-compatible because we don't care about size which is coming from driver side, only what we doing is map of driver structure to ethtool structure and results will be same for emac and emac4. struct emac_regs *p = (struct emac_regs *)(hdr + 1); Also size which you mentioned (112 emac, 116 emac4) can be different from what you saying cause this managed by dts files where we can set something like 0x100 or 0x80 for this memory area and we will still have problem in representing MII area if this size wasn't set right in dts. Ethtool will be work in same way even if we have emac or emac4. Thank you for respond! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] mac80211: Switch to new AEAD interface
On Mon, 2015-06-01 at 15:21 +0200, Stephan Mueller wrote: Just a short question on ieee80211_aes_ccm_encrypt, ieee80211_aes_ccm_decrypt, ieee80211_aes_gcm_encrypt, ieee80211_aes_gcm_decrypt, ieee80211_aes_gmac: can the aad parameter of these functions be zero? What do you mean by zero? The pointer itself can clearly never be NULL. The contents, now, that's a more interesting question. I believe it can never be all zeroes, since association request frames are not encrypted/protected and thus at least one byte in here must be non-zero. The MAC addresses are also very likely non-zero, but technically 00:00:00:00:00:00 is a valid MAC address I believe. johannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sctp: fix ASCONF list handling
On Fri, May 29, 2015 at 01:50:37PM -0300, Marcelo Ricardo Leitner wrote: On Fri, May 29, 2015 at 09:17:26AM -0400, Neil Horman wrote: On Thu, May 28, 2015 at 11:46:29AM -0300, Marcelo Ricardo Leitner wrote: On Thu, May 28, 2015 at 10:27:32AM -0300, Marcelo Ricardo Leitner wrote: On Thu, May 28, 2015 at 08:17:27AM -0300, Marcelo Ricardo Leitner wrote: On Thu, May 28, 2015 at 06:15:11AM -0400, Neil Horman wrote: On Wed, May 27, 2015 at 09:52:17PM -0300, mleit...@redhat.com wrote: From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com -auto_asconf_splist is per namespace and mangled by functions like sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization. Also, the call to inet_sk_copy_descendant() was backuping -auto_asconf_list through the copy but was not honoring -do_auto_asconf, which could lead to list corruption if it was different between both sockets. This commit thus fixes the list handling by adding a spinlock to protect against multiple writers and converts the list to be protected by RCU too, so that we don't have a lock inverstion issue at sctp_addr_wq_timeout_handler(). And as this list now uses RCU, we cannot do such backup and restore while copying descendant data anymore as readers may be traversing the list meanwhile. We fix this by simply ignoring/not copying those fields, placed at the end of struct sctp_sock, so we can just ignore it together with struct ipv6_pinfo data. For that we create sctp_copy_descendant() so we don't clutter inet_sk_copy_descendant() with SCTP info. Issue was found with a test application that kept flipping sysctl default_auto_asconf on and off. Fixes: 9f7d653b67ae (sctp: Add Auto-ASCONF support (core).) Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com --- include/net/netns/sctp.h | 6 +- include/net/sctp/structs.h | 2 ++ net/sctp/protocol.c| 6 +- net/sctp/socket.c | 39 ++- 4 files changed, 38 insertions(+), 15 deletions(-) diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h index 3573a81815ad9e0efb6ceb721eb066d3726419f0..e080bebb3147af39c8275261f57018eb01e917b0 100644 --- a/include/net/netns/sctp.h +++ b/include/net/netns/sctp.h @@ -30,12 +30,15 @@ struct netns_sctp { struct list_head local_addr_list; struct list_head addr_waitq; struct timer_list addr_wq_timer; - struct list_head auto_asconf_splist; + struct list_head __rcu auto_asconf_splist; You should use the addr_wq_lock here instead of creating a new lock, as thats already used to protect most accesses to the list you are concerned about. Ok, that works too. Though truthfully, that shouldn't be necessecary. The list in question is only read in one location and only written in one location. You can likely just rcu-ify, as the write side is in process context and protected by lock_sock. It should, it's not protected by lock_sock as this list resides in netns_sctp structure, which lock_sock doesn't cover. Write side is in process context yes, but this list is written in sctp_init_sock(), sctp_destroy_sock() and sctp_setsockopt_auto_asconf(), so one could trigger this by either creating/destroying sockets if default_auto_asconf=1 or just by creating a bunch of sockets and flipping asconf via setsockopt (or a combination of these operations). (I'll point this out in the changelog) Hmm.. by reusing addr_wq_lock we don't need to rcu-ify the list, as the reader is inside that lock too, so I can just protect auto_asconf_splist writers with addr_wq_lock. Nice, thanks Neil. Cannot really do that.. as that creates a lock inversion between sctp_destroy_sock() (which already holds lock_sock) and sctp_addr_wq_timeout_handler(), which first grabs addr_wq_lock and then locks socket by socket. Due to that, I'm afraid reusing this lock is not possible, and we should stick with the patch.. what do you think? (though I have to fix the nits in there) I don't think thats accurate. You are correct in that the the locks are taken in opposing order, which would imply a lock inversion that could result in deadlock, but we can avoid that by deferring the asconf list removal until after sk_common_release and unlock_sock_bh is called in sctp_close. That will make the lock ordering consistent. Alternatively, we can pre-emptively take the asconf_lock in sctp_close before locking the socket. For
[net-next RFC 11/14] openvswitch: Use regular VXLAN net_device device
This gets rid of all OVS specific VXLAN code in the receive and transmit path by using a VXLAN net_device to represent the vport. Only a small shim layer remains which takes care of handling the VXLAN specific OVS Netlink configuration. Unexports vxlan_sock_add(), vxlan_sock_release(), vxlan_xmit_skb() since they are no longer needed. Signed-off-by: Thomas Graf tg...@suug.ch Signed-off-by: Pravin B Shelar pshe...@nicira.com --- drivers/net/vxlan.c| 23 +-- include/net/vxlan.h| 14 +- net/openvswitch/Kconfig| 12 -- net/openvswitch/Makefile | 1 - net/openvswitch/flow_netlink.c | 5 +- net/openvswitch/vport-netdev.c | 176 +- net/openvswitch/vport-vxlan.c | 322 - 7 files changed, 193 insertions(+), 360 deletions(-) delete mode 100644 net/openvswitch/vport-vxlan.c diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 3acab95..b696871 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -74,6 +74,10 @@ static struct rtnl_link_ops vxlan_link_ops; static const u8 all_zeros_mac[ETH_ALEN]; +static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port, +vxlan_rcv_t *rcv, void *data, +bool no_share, u32 flags); + /* per-network namespace private data for this module */ struct vxlan_net { struct list_head vxlan_list; @@ -1020,7 +1024,7 @@ static bool vxlan_group_used(struct vxlan_net *vn, struct vxlan_dev *dev) return false; } -void vxlan_sock_release(struct vxlan_sock *vs) +static void vxlan_sock_release(struct vxlan_sock *vs) { struct sock *sk = vs-sock-sk; struct net *net = sock_net(sk); @@ -1036,7 +1040,6 @@ void vxlan_sock_release(struct vxlan_sock *vs) queue_work(vxlan_wq, vs-del_work); } -EXPORT_SYMBOL_GPL(vxlan_sock_release); /* Update multicast group membership when first VNI on * multicast address is brought up @@ -1761,10 +1764,10 @@ err: } #endif -int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb, - __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df, - __be16 src_port, __be16 dst_port, - struct vxlan_metadata *md, bool xnet, u32 vxflags) +static int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb, + __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df, + __be16 src_port, __be16 dst_port, + struct vxlan_metadata *md, bool xnet, u32 vxflags) { struct vxlanhdr *vxh; int min_headroom; @@ -1834,7 +1837,6 @@ int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb, ttl, df, src_port, dst_port, xnet, !(vxflags VXLAN_F_UDP_CSUM)); } -EXPORT_SYMBOL_GPL(vxlan_xmit_skb); /* Bypass encapsulation if the destination is local */ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan, @@ -2609,9 +2611,9 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port, return vs; } -struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port, - vxlan_rcv_t *rcv, void *data, - bool no_share, u32 flags) +static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port, +vxlan_rcv_t *rcv, void *data, +bool no_share, u32 flags) { struct vxlan_net *vn = net_generic(net, vxlan_net_id); struct vxlan_sock *vs; @@ -2632,7 +2634,6 @@ struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port, return vxlan_socket_create(net, port, rcv, data, flags); } -EXPORT_SYMBOL_GPL(vxlan_sock_add); static int vxlan_dev_configure(struct net *src_net, struct net_device *dev, struct vxlan_config *conf) diff --git a/include/net/vxlan.h b/include/net/vxlan.h index c037b27..d3ce81f 100644 --- a/include/net/vxlan.h +++ b/include/net/vxlan.h @@ -197,19 +197,13 @@ struct vxlan_dev { VXLAN_F_REMCSUM_NOPARTIAL |\ VXLAN_F_FLOW_BASED) -struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port, - vxlan_rcv_t *rcv, void *data, - bool no_share, u32 flags); - struct net_device *vxlan_dev_create(struct net *net, const char *name, u8 name_assign_type, struct vxlan_config *conf); -void vxlan_sock_release(struct vxlan_sock *vs); - -int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb, - __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df, - __be16 src_port, __be16 dst_port, struct
[net-next RFC 10/14] openvswitch: Abstract vport name through ovs_vport_name()
This allows to get rid of the get_name() vport ops later on. Signed-off-by: Thomas Graf tg...@suug.ch --- net/openvswitch/datapath.c | 4 ++-- net/openvswitch/vport-internal_dev.c | 1 - net/openvswitch/vport-netdev.c | 6 -- net/openvswitch/vport-netdev.h | 1 - net/openvswitch/vport.c | 4 ++-- net/openvswitch/vport.h | 5 + 6 files changed, 9 insertions(+), 12 deletions(-) diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index c3ecfd4..8986558 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -176,7 +176,7 @@ static inline struct datapath *get_dp(struct net *net, int dp_ifindex) const char *ovs_dp_name(const struct datapath *dp) { struct vport *vport = ovs_vport_ovsl_rcu(dp, OVSP_LOCAL); - return vport-ops-get_name(vport); + return ovs_vport_name(vport); } static int get_dpifindex(const struct datapath *dp) @@ -1786,7 +1786,7 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb, if (nla_put_u32(skb, OVS_VPORT_ATTR_PORT_NO, vport-port_no) || nla_put_u32(skb, OVS_VPORT_ATTR_TYPE, vport-ops-type) || nla_put_string(skb, OVS_VPORT_ATTR_NAME, - vport-ops-get_name(vport))) + ovs_vport_name(vport))) goto nla_put_failure; ovs_vport_get_stats(vport, vport_stats); diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c index a2c205d..c058bbf 100644 --- a/net/openvswitch/vport-internal_dev.c +++ b/net/openvswitch/vport-internal_dev.c @@ -242,7 +242,6 @@ static struct vport_ops ovs_internal_vport_ops = { .type = OVS_VPORT_TYPE_INTERNAL, .create = internal_dev_create, .destroy= internal_dev_destroy, - .get_name = ovs_netdev_get_name, .send = internal_dev_recv, }; diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c index cb22051..ef11a41 100644 --- a/net/openvswitch/vport-netdev.c +++ b/net/openvswitch/vport-netdev.c @@ -170,11 +170,6 @@ static void netdev_destroy(struct vport *vport) call_rcu(vport-rcu, free_port_rcu); } -const char *ovs_netdev_get_name(const struct vport *vport) -{ - return vport-dev-name; -} - static unsigned int packet_length(const struct sk_buff *skb) { unsigned int length = skb-len - ETH_HLEN; @@ -222,7 +217,6 @@ static struct vport_ops ovs_netdev_vport_ops = { .type = OVS_VPORT_TYPE_NETDEV, .create = netdev_create, .destroy= netdev_destroy, - .get_name = ovs_netdev_get_name, .send = netdev_send, }; diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h index 1c52aed..684fb88 100644 --- a/net/openvswitch/vport-netdev.h +++ b/net/openvswitch/vport-netdev.h @@ -26,7 +26,6 @@ struct vport *ovs_netdev_get_vport(struct net_device *dev); -const char *ovs_netdev_get_name(const struct vport *); void ovs_netdev_detach_dev(struct vport *); int __init ovs_netdev_init(void); diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c index af23ba0..d14f594 100644 --- a/net/openvswitch/vport.c +++ b/net/openvswitch/vport.c @@ -113,7 +113,7 @@ struct vport *ovs_vport_locate(const struct net *net, const char *name) struct vport *vport; hlist_for_each_entry_rcu(vport, bucket, hash_node) - if (!strcmp(name, vport-ops-get_name(vport)) + if (!strcmp(name, ovs_vport_name(vport)) net_eq(ovs_dp_get_net(vport-dp), net)) return vport; @@ -226,7 +226,7 @@ struct vport *ovs_vport_add(const struct vport_parms *parms) } bucket = hash_bucket(ovs_dp_get_net(vport-dp), -vport-ops-get_name(vport)); +ovs_vport_name(vport)); hlist_add_head_rcu(vport-hash_node, bucket); return vport; } diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h index e05ec68..1a689c2 100644 --- a/net/openvswitch/vport.h +++ b/net/openvswitch/vport.h @@ -237,6 +237,11 @@ static inline void ovs_skb_postpush_rcsum(struct sk_buff *skb, skb-csum = csum_add(skb-csum, csum_partial(start, len, 0)); } +static inline const char *ovs_vport_name(struct vport *vport) +{ + return vport-dev ? vport-dev-name : vport-ops-get_name(vport); +} + int ovs_vport_ops_register(struct vport_ops *ops); void ovs_vport_ops_unregister(struct vport_ops *ops); -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC 09/14] openvswitch: Move dev pointer into vport itself
This is the first step in representing all OVS vports as regular struct net_devices. Move the net_device pointer into the vport structure itself to get rid of struct vport_netdev. Signed-off-by: Thomas Graf tg...@suug.ch Signed-off-by: Pravin B Shelar pshe...@nicira.com --- net/openvswitch/datapath.c | 7 +-- net/openvswitch/dp_notify.c | 5 +-- net/openvswitch/vport-internal_dev.c | 37 +++- net/openvswitch/vport-netdev.c | 84 net/openvswitch/vport-netdev.h | 12 -- net/openvswitch/vport.h | 3 +- 6 files changed, 58 insertions(+), 90 deletions(-) diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 3315e3a..c3ecfd4 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -188,7 +188,7 @@ static int get_dpifindex(const struct datapath *dp) local = ovs_vport_rcu(dp, OVSP_LOCAL); if (local) - ifindex = netdev_vport_priv(local)-dev-ifindex; + ifindex = local-dev-ifindex; else ifindex = 0; @@ -2205,13 +2205,10 @@ static void __net_exit list_vports_from_net(struct net *net, struct net *dnet, struct vport *vport; hlist_for_each_entry(vport, dp-ports[i], dp_hash_node) { - struct netdev_vport *netdev_vport; - if (vport-ops-type != OVS_VPORT_TYPE_INTERNAL) continue; - netdev_vport = netdev_vport_priv(vport); - if (dev_net(netdev_vport-dev) == dnet) + if (dev_net(vport-dev) == dnet) list_add(vport-detach_list, head); } } diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c index 2c631fe..a7a80a6 100644 --- a/net/openvswitch/dp_notify.c +++ b/net/openvswitch/dp_notify.c @@ -58,13 +58,10 @@ void ovs_dp_notify_wq(struct work_struct *work) struct hlist_node *n; hlist_for_each_entry_safe(vport, n, dp-ports[i], dp_hash_node) { - struct netdev_vport *netdev_vport; - if (vport-ops-type != OVS_VPORT_TYPE_NETDEV) continue; - netdev_vport = netdev_vport_priv(vport); - if (!(netdev_vport-dev-priv_flags IFF_OVS_DATAPATH)) + if (!(vport-dev-priv_flags IFF_OVS_DATAPATH)) dp_detach_port_notify(vport); } } diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c index 6a55f71..a2c205d 100644 --- a/net/openvswitch/vport-internal_dev.c +++ b/net/openvswitch/vport-internal_dev.c @@ -156,49 +156,44 @@ static void do_setup(struct net_device *netdev) static struct vport *internal_dev_create(const struct vport_parms *parms) { struct vport *vport; - struct netdev_vport *netdev_vport; struct internal_dev *internal_dev; int err; - vport = ovs_vport_alloc(sizeof(struct netdev_vport), - ovs_internal_vport_ops, parms); + vport = ovs_vport_alloc(0, ovs_internal_vport_ops, parms); if (IS_ERR(vport)) { err = PTR_ERR(vport); goto error; } - netdev_vport = netdev_vport_priv(vport); - - netdev_vport-dev = alloc_netdev(sizeof(struct internal_dev), -parms-name, NET_NAME_UNKNOWN, -do_setup); - if (!netdev_vport-dev) { + vport-dev = alloc_netdev(sizeof(struct internal_dev), + parms-name, NET_NAME_UNKNOWN, do_setup); + if (!vport-dev) { err = -ENOMEM; goto error_free_vport; } - dev_net_set(netdev_vport-dev, ovs_dp_get_net(vport-dp)); - internal_dev = internal_dev_priv(netdev_vport-dev); + dev_net_set(vport-dev, ovs_dp_get_net(vport-dp)); + internal_dev = internal_dev_priv(vport-dev); internal_dev-vport = vport; /* Restrict bridge port to current netns. */ if (vport-port_no == OVSP_LOCAL) - netdev_vport-dev-features |= NETIF_F_NETNS_LOCAL; + vport-dev-features |= NETIF_F_NETNS_LOCAL; rtnl_lock(); - err = register_netdevice(netdev_vport-dev); + err = register_netdevice(vport-dev); if (err) goto error_free_netdev; - dev_set_promiscuity(netdev_vport-dev, 1); + dev_set_promiscuity(vport-dev, 1); rtnl_unlock(); - netif_start_queue(netdev_vport-dev); + netif_start_queue(vport-dev);
[net-next RFC 03/14] vxlan: Flow based tunneling
Allows putting a VXLAN device into a new flow-based mode in which it will populate a tunnel info structure for each packet received. The metadata structure will contain the outer header and tunnel header fields which have been stripped off. Layers further up in the stack such as routing, tc or netfitler can later match on these fields. On the transmit side, it allows skbs to carry their own encapsulation instructions thus allowing encapsulations parameters to be set per flow/route. This prepares the VXLAN device to be steered by the routing subsystem which will allow to support encapsulation for a large number of tunnel endpoints and tunnel ids through a single net_device which improves the scalability of current VXLAN tunnels. Signed-off-by: Thomas Graf tg...@suug.ch Signed-off-by: Pravin B Shelar pshe...@nicira.com --- drivers/net/vxlan.c | 147 --- include/linux/skbuff.h | 1 + include/net/ip_tunnels.h | 8 +++ include/net/route.h | 8 +++ include/net/vxlan.h | 4 +- include/uapi/linux/if_link.h | 1 + 6 files changed, 146 insertions(+), 23 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 34c519e..d5edba5 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1164,10 +1164,12 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff *skb, struct vxlanhdr *vh, /* Callback from net/ipv4/udp.c to receive packets */ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) { + struct ip_tunnel_info *tun_info = NULL; struct vxlan_sock *vs; struct vxlanhdr *vxh; u32 flags, vni; - struct vxlan_metadata md = {0}; + struct vxlan_metadata _md; + struct vxlan_metadata *md = _md; /* Need Vxlan and inner Ethernet header to be present */ if (!pskb_may_pull(skb, VXLAN_HLEN)) @@ -1202,6 +1204,33 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) vni = VXLAN_VNI_MASK; } + if (vs-flags VXLAN_F_FLOW_BASED) { + const struct iphdr *iph = ip_hdr(skb); + + /* TODO: Consider optimizing by looking up in flow cache */ + tun_info = ip_tunnel_info_alloc(sizeof(*md), GFP_ATOMIC); + if (!tun_info) + goto drop; + + tun_info-key.ipv4_src = iph-saddr; + tun_info-key.ipv4_dst = iph-daddr; + tun_info-key.ipv4_tos = iph-tos; + tun_info-key.ipv4_ttl = iph-ttl; + tun_info-key.tp_src = udp_hdr(skb)-source; + tun_info-key.tp_dst = udp_hdr(skb)-dest; + + tun_info-mode = IP_TUNNEL_INFO_RX; + tun_info-key.tun_flags = TUNNEL_KEY; + tun_info-key.tun_id = cpu_to_be64(vni 8); + if (udp_hdr(skb)-check != 0) + tun_info-key.tun_flags |= TUNNEL_CSUM; + + md = ip_tunnel_info_opts(tun_info, sizeof(*md)); + skb_attach_tunnel_info(skb, tun_info); + } else { + memset(md, 0, sizeof(*md)); + } + /* For backwards compatibility, only allow reserved fields to be * used by VXLAN extensions if explicitly requested. */ @@ -1209,13 +1238,16 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) struct vxlanhdr_gbp *gbp; gbp = (struct vxlanhdr_gbp *)vxh; - md.gbp = ntohs(gbp-policy_id); + md-gbp = ntohs(gbp-policy_id); + + if (tun_info) + tun_info-key.tun_flags |= TUNNEL_VXLAN_OPT; if (gbp-dont_learn) - md.gbp |= VXLAN_GBP_DONT_LEARN; + md-gbp |= VXLAN_GBP_DONT_LEARN; if (gbp-policy_applied) - md.gbp |= VXLAN_GBP_POLICY_APPLIED; + md-gbp |= VXLAN_GBP_POLICY_APPLIED; flags = ~VXLAN_GBP_USED_BITS; } @@ -1233,8 +1265,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) goto bad_flags; } - md.vni = vxh-vx_vni; - vs-rcv(vs, skb, md); + md-vni = vxh-vx_vni; + vs-rcv(vs, skb, md); return 0; drop: @@ -1254,6 +1286,7 @@ error: static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, struct vxlan_metadata *md) { + struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info; struct iphdr *oip = NULL; struct ipv6hdr *oip6 = NULL; struct vxlan_dev *vxlan; @@ -1263,7 +1296,12 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, int err = 0; union vxlan_addr *remote_ip; - vni = ntohl(md-vni) 8; + /* For flow based devices, map all packets to VNI 0 */ + if (vs-flags VXLAN_F_FLOW_BASED) + vni = 0; + else + vni =
[net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices
This is the first series in a greater effort to bring the scalability and programmability advantages of OVS to the rest of the network stack and to get rid of as much OVS specific code as possible. This first series focuses on getting rid of OVS tunnel vports and use regular tunnel net_devices instead. As part of this effort, the routing subsystem is extended with support for flow based tunneling. In this new tunneling mode, the route is able to match on tunnel information as well as set tunnel encapsulation parameters per route. This allows to perform L3 forwarding for a large number of tunnel endpoints and virtual networks using a single tunnel net_device. TODO: - Geneve support - IPv6 support - Benchmarks Pravin Shelar (1): openvswitch: Use regular GRE net_device instead of vport Thomas Graf (13): ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic ip_tunnel: support per packet tunnel metadata vxlan: Flow based tunneling route: Extend flow representation with tunnel key route: Per route tunnel metadata with RTA_TUNNEL fib: Add fib rule match on tunnel id vxlan: Factor out device configuration openvswitch: Allocate attach ip_tunnel_info for tunnel set action openvswitch: Move dev pointer into vport itself openvswitch: Abstract vport name through ovs_vport_name() openvswitch: Use regular VXLAN net_device device vxlan: remove indirect call to vxlan_rcv() and vni member arp: Associate ARP requests with tunnel info drivers/net/vxlan.c | 663 --- include/linux/skbuff.h | 2 + include/net/fib_rules.h | 1 + include/net/flow.h | 7 + include/net/ip_fib.h | 3 + include/net/ip_tunnels.h | 127 ++- include/net/route.h | 18 + include/net/vxlan.h | 82 - include/uapi/linux/fib_rules.h | 2 +- include/uapi/linux/if_link.h | 1 + include/uapi/linux/openvswitch.h | 2 +- include/uapi/linux/rtnetlink.h | 16 + net/core/dev.c | 5 +- net/core/fib_rules.c | 17 +- net/core/skbuff.c| 8 + net/ipv4/arp.c | 8 + net/ipv4/fib_frontend.c | 57 +++ net/ipv4/fib_semantics.c | 45 +++ net/ipv4/ip_gre.c| 161 - net/ipv4/ip_tunnel_core.c| 15 + net/ipv4/route.c | 32 +- net/openvswitch/Kconfig | 12 - net/openvswitch/Makefile | 2 - net/openvswitch/actions.c| 10 +- net/openvswitch/datapath.c | 19 +- net/openvswitch/datapath.h | 5 +- net/openvswitch/dp_notify.c | 5 +- net/openvswitch/flow.c | 4 +- net/openvswitch/flow.h | 77 +--- net/openvswitch/flow_netlink.c | 78 - net/openvswitch/flow_netlink.h | 3 +- net/openvswitch/vport-geneve.c | 17 +- net/openvswitch/vport-gre.c | 313 - net/openvswitch/vport-internal_dev.c | 38 +- net/openvswitch/vport-netdev.c | 271 +++--- net/openvswitch/vport-netdev.h | 13 - net/openvswitch/vport-vxlan.c| 322 - net/openvswitch/vport.c | 34 +- net/openvswitch/vport.h | 21 +- 39 files changed, 1334 insertions(+), 1182 deletions(-) delete mode 100644 net/openvswitch/vport-gre.c delete mode 100644 net/openvswitch/vport-vxlan.c -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC 07/14] vxlan: Factor out device configuration
This factors out the device configuration out of the RTNL newlink API which allows for in-kernel creation of VXLAN net_devices. Signed-off-by: Thomas Graf tg...@suug.ch --- drivers/net/vxlan.c | 332 include/net/vxlan.h | 59 ++ 2 files changed, 236 insertions(+), 155 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index d5edba5..3acab95 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -54,10 +54,6 @@ #define PORT_HASH_BITS 8 #define PORT_HASH_SIZE (1PORT_HASH_BITS) -#define VNI_HASH_BITS 10 -#define VNI_HASH_SIZE (1VNI_HASH_BITS) -#define FDB_HASH_BITS 8 -#define FDB_HASH_SIZE (1FDB_HASH_BITS) #define FDB_AGE_DEFAULT 300 /* 5 min */ #define FDB_AGE_INTERVAL (10 * HZ) /* rescan interval */ @@ -74,6 +70,7 @@ module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, Log packets received with corrupted ECN); static int vxlan_net_id; +static struct rtnl_link_ops vxlan_link_ops; static const u8 all_zeros_mac[ETH_ALEN]; @@ -84,21 +81,6 @@ struct vxlan_net { spinlock_tsock_lock; }; -union vxlan_addr { - struct sockaddr_in sin; - struct sockaddr_in6 sin6; - struct sockaddr sa; -}; - -struct vxlan_rdst { - union vxlan_addr remote_ip; - __be16 remote_port; - u32 remote_vni; - u32 remote_ifindex; - struct list_head list; - struct rcu_head rcu; -}; - /* Forwarding table entry */ struct vxlan_fdb { struct hlist_node hlist;/* linked list of entries */ @@ -111,31 +93,6 @@ struct vxlan_fdb { u8eth_addr[ETH_ALEN]; }; -/* Pseudo network device */ -struct vxlan_dev { - struct hlist_node hlist;/* vni hash table */ - struct list_head next; /* vxlan's per namespace list */ - struct vxlan_sock *vn_sock; /* listening socket */ - struct net_device *dev; - struct net*net; /* netns for packet i/o */ - struct vxlan_rdst default_dst; /* default destination */ - union vxlan_addr saddr;/* source address */ - __be16dst_port; - __u16 port_min; /* source port range */ - __u16 port_max; - __u8 tos; /* TOS override */ - __u8 ttl; - u32 flags;/* VXLAN_F_* in vxlan.h */ - - unsigned long age_interval; - struct timer_list age_timer; - spinlock_thash_lock; - unsigned int addrcnt; - unsigned int addrmax; - - struct hlist_head fdb_head[FDB_HASH_SIZE]; -}; - /* salt for hash table */ static u32 vxlan_salt __read_mostly; static struct workqueue_struct *vxlan_wq; @@ -345,7 +302,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan, if (send_ip vxlan_nla_put_addr(skb, NDA_DST, rdst-remote_ip)) goto nla_put_failure; - if (rdst-remote_port rdst-remote_port != vxlan-dst_port + if (rdst-remote_port rdst-remote_port != vxlan-cfg.dst_port nla_put_be16(skb, NDA_PORT, rdst-remote_port)) goto nla_put_failure; if (rdst-remote_vni != vxlan-default_dst.remote_vni @@ -749,7 +706,8 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan, if (!(flags NLM_F_CREATE)) return -ENOENT; - if (vxlan-addrmax vxlan-addrcnt = vxlan-addrmax) + if (vxlan-cfg.addrmax + vxlan-addrcnt = vxlan-cfg.addrmax) return -ENOSPC; /* Disallow replace to add a multicast entry */ @@ -835,7 +793,7 @@ static int vxlan_fdb_parse(struct nlattr *tb[], struct vxlan_dev *vxlan, return -EINVAL; *port = nla_get_be16(tb[NDA_PORT]); } else { - *port = vxlan-dst_port; + *port = vxlan-cfg.dst_port; } if (tb[NDA_VNI]) { @@ -1021,7 +979,7 @@ static bool vxlan_snoop(struct net_device *dev, vxlan_fdb_create(vxlan, src_mac, src_ip, NUD_REACHABLE, NLM_F_EXCL|NLM_F_CREATE, -vxlan-dst_port, +vxlan-cfg.dst_port, vxlan-default_dst.remote_vni, 0, NTF_SELF); spin_unlock(vxlan-hash_lock); @@ -1945,7 +1903,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, u32 flags = vxlan-flags; if (rdst) { - dst_port = rdst-remote_port ? rdst-remote_port : vxlan-dst_port; + dst_port = rdst-remote_port ? rdst-remote_port : vxlan-cfg.dst_port;
[net-next RFC 14/14] arp: Associate ARP requests with tunnel info
Since ARP performs its own route lookup call, eventually returned tunnel metadata must be attached manually. Signed-off-by: Thomas Graf tg...@suug.ch --- net/ipv4/arp.c | 8 1 file changed, 8 insertions(+) diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 933a928..6cf0502 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -489,6 +489,7 @@ struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, unsigned char *arp_ptr; int hlen = LL_RESERVED_SPACE(dev); int tlen = dev-needed_tailroom; + struct rtable *rt; /* * Allocate a buffer @@ -577,6 +578,13 @@ struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, } memcpy(arp_ptr, dest_ip, 4); + rt = ip_route_output(dev_net(dev), dest_ip, src_ip, 0, dev-ifindex); + if (!IS_ERR(rt)) { + if (rt-rt_tun_info) + skb_attach_tunnel_info(skb, rt-rt_tun_info); + ip_rt_put(rt); + } + return skb; out: -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC 06/14] fib: Add fib rule match on tunnel id
This add the ability to select a routing table based on the tunnel id which allows to maintain separate routing tables for each virtual tunnel network. ip rule add from all tunnel-id 100 lookup 100 ip rule add from all tunnel-id 200 lookup 200 Signed-off-by: Thomas Graf tg...@suug.ch --- include/net/fib_rules.h| 1 + include/uapi/linux/fib_rules.h | 2 +- net/core/fib_rules.c | 17 +++-- 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h index 6d67383..822ed1e 100644 --- a/include/net/fib_rules.h +++ b/include/net/fib_rules.h @@ -19,6 +19,7 @@ struct fib_rule { u8 action; /* 3 bytes hole, try to use */ u32 target; + __be64 tun_id; struct fib_rule __rcu *ctarget; struct net *fr_net; diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h index 2b82d7e..96161b8 100644 --- a/include/uapi/linux/fib_rules.h +++ b/include/uapi/linux/fib_rules.h @@ -43,7 +43,7 @@ enum { FRA_UNUSED5, FRA_FWMARK, /* mark */ FRA_FLOW, /* flow/class id */ - FRA_UNUSED6, + FRA_TUN_ID, FRA_SUPPRESS_IFGROUP, FRA_SUPPRESS_PREFIXLEN, FRA_TABLE, /* Extended table id */ diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index 9a12668..6da78c9 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -186,6 +186,9 @@ static int fib_rule_match(struct fib_rule *rule, struct fib_rules_ops *ops, if ((rule-mark ^ fl-flowi_mark) rule-mark_mask) goto out; + if (rule-tun_id (rule-tun_id != fl-flowi_tun_key.tun_id)) + goto out; + ret = ops-match(rule, fl, flags); out: return (rule-flags FIB_RULE_INVERT) ? !ret : ret; @@ -330,6 +333,9 @@ static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh) if (tb[FRA_FWMASK]) rule-mark_mask = nla_get_u32(tb[FRA_FWMASK]); + if (tb[FRA_TUN_ID]) + rule-tun_id = nla_get_be64(tb[FRA_TUN_ID]); + rule-action = frh-action; rule-flags = frh-flags; rule-table = frh_get_table(frh, tb); @@ -473,6 +479,10 @@ static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh) (rule-mark_mask != nla_get_u32(tb[FRA_FWMASK]))) continue; + if (tb[FRA_TUN_ID] + (rule-tun_id != nla_get_be64(tb[FRA_TUN_ID]))) + continue; + if (!ops-compare(rule, frh, tb)) continue; @@ -535,7 +545,8 @@ static inline size_t fib_rule_nlmsg_size(struct fib_rules_ops *ops, + nla_total_size(4) /* FRA_SUPPRESS_PREFIXLEN */ + nla_total_size(4) /* FRA_SUPPRESS_IFGROUP */ + nla_total_size(4) /* FRA_FWMARK */ -+ nla_total_size(4); /* FRA_FWMASK */ ++ nla_total_size(4) /* FRA_FWMASK */ ++ nla_total_size(8); /* FRA_TUN_ID */ if (ops-nlmsg_payload) payload += ops-nlmsg_payload(rule); @@ -591,7 +602,9 @@ static int fib_nl_fill_rule(struct sk_buff *skb, struct fib_rule *rule, ((rule-mark_mask || rule-mark) nla_put_u32(skb, FRA_FWMASK, rule-mark_mask)) || (rule-target -nla_put_u32(skb, FRA_GOTO, rule-target))) +nla_put_u32(skb, FRA_GOTO, rule-target)) || + (rule-tun_id +nla_put_be64(skb, FRA_TUN_ID, rule-tun_id))) goto nla_put_failure; if (rule-suppress_ifgroup != -1) { -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL
Introduces a new Netlink attribute RTA_TUNNEL which allows routes to set tunnel transmit metadata and specify the tunnel endpoint or tunnel id on a per route basis. The route must point to a tunnel device which understands per skb tunnel metadata and has been put into the respective mode. Signed-off-by: Thomas Graf tg...@suug.ch --- include/net/ip_fib.h | 3 +++ include/net/ip_tunnels.h | 1 - include/net/route.h| 10 include/uapi/linux/rtnetlink.h | 16 net/ipv4/fib_frontend.c| 57 ++ net/ipv4/fib_semantics.c | 45 + net/ipv4/route.c | 30 +- net/openvswitch/vport.h| 1 + 8 files changed, 161 insertions(+), 2 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 54271ed..1cd7cf8 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -22,6 +22,7 @@ #include net/fib_rules.h #include net/inetpeer.h #include linux/percpu.h +#include net/ip_tunnels.h struct fib_config { u8 fc_dst_len; @@ -44,6 +45,7 @@ struct fib_config { u32 fc_flow; u32 fc_nlflags; struct nl_info fc_nlinfo; + struct ip_tunnel_info fc_tunnel; }; struct fib_info; @@ -117,6 +119,7 @@ struct fib_info { #ifdef CONFIG_IP_ROUTE_MULTIPATH int fib_power; #endif + struct ip_tunnel_info *fib_tunnel; struct rcu_head rcu; struct fib_nh fib_nh[0]; #define fib_devfib_nh[0].nh_dev diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index df8cfd3..b4ab930 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -9,7 +9,6 @@ #include net/dsfield.h #include net/gro_cells.h #include net/inet_ecn.h -#include net/ip.h #include net/netns/generic.h #include net/rtnetlink.h #include net/flow.h diff --git a/include/net/route.h b/include/net/route.h index 6ede321..dbda603 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -28,6 +28,7 @@ #include net/inetpeer.h #include net/flow.h #include net/inet_sock.h +#include net/ip_tunnels.h #include linux/in_route.h #include linux/rtnetlink.h #include linux/rcupdate.h @@ -66,6 +67,7 @@ struct rtable { struct list_headrt_uncached; struct uncached_list*rt_uncached_list; + struct ip_tunnel_info *rt_tun_info; }; static inline bool rt_is_input_route(const struct rtable *rt) @@ -198,6 +200,8 @@ struct in_ifaddr; void fib_add_ifaddr(struct in_ifaddr *); void fib_del_ifaddr(struct in_ifaddr *, struct in_ifaddr *); +int fib_dump_tun_info(struct sk_buff *skb, struct ip_tunnel_info *tun_info); + static inline void ip_rt_put(struct rtable *rt) { /* dst_release() accepts a NULL parameter. @@ -317,9 +321,15 @@ static inline int ip4_dst_hoplimit(const struct dst_entry *dst) static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb) { + struct rtable *rt; + if (skb_shinfo(skb)-tun_info) return skb_shinfo(skb)-tun_info; + rt = skb_rtable(skb); + if (rt) + return rt-rt_tun_info; + return NULL; } diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 17fb02f..1f7aa68 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -286,6 +286,21 @@ enum rt_class_t { /* Routing message attributes */ +enum rta_tunnel_t { + RTA_TUN_UNSPEC, + RTA_TUN_ID, + RTA_TUN_DST, + RTA_TUN_SRC, + RTA_TUN_TTL, + RTA_TUN_TOS, + RTA_TUN_SPORT, + RTA_TUN_DPORT, + RTA_TUN_FLAGS, + __RTA_TUN_MAX, +}; + +#define RTA_TUN_MAX (__RTA_TUN_MAX - 1) + enum rtattr_type_t { RTA_UNSPEC, RTA_DST, @@ -308,6 +323,7 @@ enum rtattr_type_t { RTA_VIA, RTA_NEWDST, RTA_PREF, + RTA_TUNNEL, /* destination VTEP */ __RTA_MAX }; diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 872494e..bfa77a6 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -580,6 +580,57 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg) return -EINVAL; } +static const struct nla_policy tunnel_policy[RTA_TUN_MAX + 1] = { + [RTA_TUN_ID]= { .type = NLA_U64 }, + [RTA_TUN_DST] = { .type = NLA_U32 }, + [RTA_TUN_SRC] = { .type = NLA_U32 }, + [RTA_TUN_TTL] = { .type = NLA_U8 }, + [RTA_TUN_TOS] = { .type = NLA_U8 }, + [RTA_TUN_SPORT] = { .type = NLA_U16 }, + [RTA_TUN_DPORT] = { .type = NLA_U16 }, + [RTA_TUN_FLAGS] = { .type = NLA_U16 }, +}; + +static int parse_rta_tunnel(struct fib_config *cfg, struct nlattr *attr) +{ + struct nlattr
[net-next RFC 12/14] vxlan: remove indirect call to vxlan_rcv() and vni member
With the removal of the special treating of OVS VXLAN vports, the indirect call to vxlan_rcv() can be avoided and the VNI member in vxlan_metadata can be removed. Signed-off-by: Thomas Graf tg...@suug.ch --- drivers/net/vxlan.c | 225 +--- include/net/vxlan.h | 7 -- 2 files changed, 107 insertions(+), 125 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index b696871..9cc7d5a 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -75,7 +75,6 @@ static struct rtnl_link_ops vxlan_link_ops; static const u8 all_zeros_mac[ETH_ALEN]; static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port, -vxlan_rcv_t *rcv, void *data, bool no_share, u32 flags); /* per-network namespace private data for this module */ @@ -1122,6 +1121,102 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff *skb, struct vxlanhdr *vh, return vh; } +static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, + struct vxlan_metadata *md, __u32 vni) +{ + struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info; + struct iphdr *oip = NULL; + struct ipv6hdr *oip6 = NULL; + struct vxlan_dev *vxlan; + struct pcpu_sw_netstats *stats; + union vxlan_addr saddr; + int err = 0; + union vxlan_addr *remote_ip; + + /* For flow based devices, map all packets to VNI 0 */ + if (vs-flags VXLAN_F_FLOW_BASED) + vni = 0; + + /* Is this VNI defined? */ + vxlan = vxlan_vs_find_vni(vs, vni); + if (!vxlan) + goto drop; + + remote_ip = vxlan-default_dst.remote_ip; + skb_reset_mac_header(skb); + skb_scrub_packet(skb, !net_eq(vxlan-net, dev_net(vxlan-dev))); + skb-protocol = eth_type_trans(skb, vxlan-dev); + skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN); + + /* Ignore packet loops (and multicast echo) */ + if (ether_addr_equal(eth_hdr(skb)-h_source, vxlan-dev-dev_addr)) + goto drop; + + /* Re-examine inner Ethernet packet */ + if (remote_ip-sa.sa_family == AF_INET) { + oip = ip_hdr(skb); + saddr.sin.sin_addr.s_addr = oip-saddr; + saddr.sa.sa_family = AF_INET; + + if (tun_info) { + tun_info-key.ipv4_src = oip-saddr; + tun_info-key.ipv4_dst = oip-daddr; + tun_info-key.ipv4_tos = oip-tos; + tun_info-key.ipv4_ttl = oip-ttl; + } +#if IS_ENABLED(CONFIG_IPV6) + } else { + oip6 = ipv6_hdr(skb); + saddr.sin6.sin6_addr = oip6-saddr; + saddr.sa.sa_family = AF_INET6; + + /* TODO : Fill IPv6 tunnel info */ +#endif + } + + if ((vxlan-flags VXLAN_F_LEARN) + vxlan_snoop(skb-dev, saddr, eth_hdr(skb)-h_source)) + goto drop; + + skb_reset_network_header(skb); + if (!(vs-flags VXLAN_F_FLOW_BASED)) + skb-mark = md-gbp; + + if (oip6) + err = IP6_ECN_decapsulate(oip6, skb); + if (oip) + err = IP_ECN_decapsulate(oip, skb); + + if (unlikely(err)) { + if (log_ecn_error) { + if (oip6) + net_info_ratelimited(non-ECT from %pI6\n, +oip6-saddr); + if (oip) + net_info_ratelimited(non-ECT from %pI4 with TOS=%#x\n, +oip-saddr, oip-tos); + } + if (err 1) { + ++vxlan-dev-stats.rx_frame_errors; + ++vxlan-dev-stats.rx_errors; + goto drop; + } + } + + stats = this_cpu_ptr(vxlan-dev-tstats); + u64_stats_update_begin(stats-syncp); + stats-rx_packets++; + stats-rx_bytes += skb-len; + u64_stats_update_end(stats-syncp); + + netif_rx(skb); + + return; +drop: + /* Consume bad packet */ + kfree_skb(skb); +} + /* Callback from net/ipv4/udp.c to receive packets */ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) { @@ -1226,8 +1321,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) goto bad_flags; } - md-vni = vxh-vx_vni; - vs-rcv(vs, skb, md); + vxlan_rcv(vs, skb, md, vni 8); return 0; drop: @@ -1244,105 +1338,6 @@ error: return 1; } -static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, - struct vxlan_metadata *md) -{ - struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info; - struct iphdr *oip = NULL; - struct ipv6hdr *oip6 = NULL; - struct
[net-next RFC 13/14] openvswitch: Use regular GRE net_device instead of vport
From: Pravin Shelar pshe...@nicira.com Removes all of the OVS specific GRE code and makes OVS use a GRE net_device . Signed-off-by: Pravin B Shelar pshe...@nicira.com --- net/core/dev.c | 5 +- net/ipv4/ip_gre.c | 161 - net/openvswitch/Makefile | 1 - net/openvswitch/vport-gre.c| 313 - net/openvswitch/vport-netdev.c | 7 +- 5 files changed, 168 insertions(+), 319 deletions(-) delete mode 100644 net/openvswitch/vport-gre.c diff --git a/net/core/dev.c b/net/core/dev.c index 594163d..656f3b4 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6969,6 +6969,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, INIT_LIST_HEAD(dev-ptype_all); INIT_LIST_HEAD(dev-ptype_specific); dev-priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM; + + strcpy(dev-name, name); + dev-name_assign_type = name_assign_type; setup(dev); dev-num_tx_queues = txqs; @@ -6983,8 +6986,6 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, goto free_all; #endif - strcpy(dev-name, name); - dev-name_assign_type = name_assign_type; dev-group = INIT_NETDEV_GROUP; if (!dev-ethtool_ops) dev-ethtool_ops = default_ethtool_ops; diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 5fd7064..b37515e 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -25,6 +25,7 @@ #include linux/udp.h #include linux/if_arp.h #include linux/mroute.h +#include linux/if_vlan.h #include linux/init.h #include linux/in6.h #include linux/inetdevice.h @@ -115,6 +116,8 @@ static bool log_ecn_error = true; module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, Log packets received with corrupted ECN); +#define GRE_TAP_FB_NAME gretap0 + static struct rtnl_link_ops ipgre_link_ops __read_mostly; static int ipgre_tunnel_init(struct net_device *dev); @@ -217,7 +220,17 @@ static int ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi) iph-saddr, iph-daddr, tpi-key); if (tunnel) { + skb_pop_mac_header(skb); + if (tunnel-dev == itn-fb_tunnel_dev) { + struct ip_tunnel_info *tun_info; + + tun_info = ip_tunnel_info_alloc(0, GFP_ATOMIC); + + /* TODO: setup tun info from tpi */ + skb_attach_tunnel_info(skb, tun_info); + } + ip_tunnel_rcv(tunnel, skb, tpi, log_ecn_error); return PACKET_RCVD; } @@ -287,6 +300,135 @@ out: return NETDEV_TX_OK; } +/* TODO: share xmit code */ +static inline struct rtable *tunnel_route_lookup(struct net *net, +const struct ip_tunnel_key *key, +u32 mark, +struct flowi4 *fl, +u8 protocol) +{ + struct rtable *rt; + + memset(fl, 0, sizeof(*fl)); + fl-daddr = key-ipv4_dst; + fl-saddr = key-ipv4_src; + fl-flowi4_tos = RT_TOS(key-ipv4_tos); + fl-flowi4_mark = mark; + fl-flowi4_proto = protocol; + + rt = ip_route_output_key(net, fl); + return rt; +} + + +/* Returns the least-significant 32 bits of a __be64. */ +static __be32 be64_get_low32(__be64 x) +{ +#ifdef __BIG_ENDIAN + return (__force __be32)x; +#else + return (__force __be32)((__force u64)x 32); +#endif +} + +static __be16 filter_tnl_flags(__be16 flags) +{ + return flags (TUNNEL_CSUM | TUNNEL_KEY); +} + + +static struct sk_buff *__build_header(struct sk_buff *skb, + const struct ip_tunnel_info *tun_info, + int tunnel_hlen) +{ + struct tnl_ptk_info tpi; + + skb = gre_handle_offloads(skb, !!(tun_info-key.tun_flags TUNNEL_CSUM)); + if (IS_ERR(skb)) + return skb; + + tpi.flags = filter_tnl_flags(tun_info-key.tun_flags); + tpi.proto = htons(ETH_P_TEB); + tpi.key = be64_get_low32(tun_info-key.tun_id); + tpi.seq = 0; + gre_build_header(skb, tpi, tunnel_hlen); + + return skb; +} + +static netdev_tx_t gre_fb_xmit(struct sk_buff *skb, + struct net_device *dev) +{ + struct net *net = dev_net(dev); + struct ip_tunnel_info *tun_info; + const struct ip_tunnel_key *key; + struct flowi4 fl; + struct rtable *rt; + int min_headroom; + int tunnel_hlen; + __be16 df; + int err; + + tun_info = skb_shinfo(skb)-tun_info; + if (unlikely(!tun_info)) { + err = -EINVAL; + goto err_free_skb; + } + + key = tun_info-key; + + rt =
[net-next RFC 08/14] openvswitch: Allocate attach ip_tunnel_info for tunnel set action
Make use of the new skb tunnel metadata field by allocating a ip_tunnel_info per OVS tunnel set action and then attaching that metadata to each skb that passes the set action. The old egress_tun_info via the OVS_CB() is left in place until all tunnel vports have been converted to the new method. Signed-off-by: Thomas Graf tg...@suug.ch Signed-off-by: Pravin B Shelar pshe...@nicira.com --- net/openvswitch/actions.c | 8 +- net/openvswitch/datapath.c | 8 +++--- net/openvswitch/flow.h | 5 net/openvswitch/flow_netlink.c | 59 +- net/openvswitch/flow_netlink.h | 1 + 5 files changed, 69 insertions(+), 12 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 34cad57..484d965 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -726,7 +726,13 @@ static int execute_set_action(struct sk_buff *skb, { /* Only tunnel set execution is supported without a mask. */ if (nla_type(a) == OVS_KEY_ATTR_TUNNEL_INFO) { - OVS_CB(skb)-egress_tun_info = nla_data(a); + struct ovs_tunnel_info *tun = nla_data(a); + + skb_attach_tunnel_info(skb, tun-info); + + /* FIXME: Remove when all vports have been converted */ + OVS_CB(skb)-egress_tun_info = tun-info; + return 0; } diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 3b90461..3315e3a 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -1004,7 +1004,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info) } ovs_unlock(); - ovs_nla_free_flow_actions(old_acts); + ovs_nla_free_flow_actions_rcu(old_acts); ovs_flow_free(new_flow, false); } @@ -1016,7 +1016,7 @@ err_unlock_ovs: ovs_unlock(); kfree_skb(reply); err_kfree_acts: - kfree(acts); + ovs_nla_free_flow_actions(acts); err_kfree_flow: ovs_flow_free(new_flow, false); error: @@ -1143,7 +1143,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info) if (reply) ovs_notify(dp_flow_genl_family, reply, info); if (old_acts) - ovs_nla_free_flow_actions(old_acts); + ovs_nla_free_flow_actions_rcu(old_acts); return 0; @@ -1151,7 +1151,7 @@ err_unlock_ovs: ovs_unlock(); kfree_skb(reply); err_kfree_acts: - kfree(acts); + ovs_nla_free_flow_actions(acts); error: return error; } diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h index cadc6c5..193eab9 100644 --- a/net/openvswitch/flow.h +++ b/net/openvswitch/flow.h @@ -45,6 +45,11 @@ struct sk_buff; #define TUN_METADATA_OPTS(flow_key, opt_len) \ ((void *)((flow_key)-tun_opts + TUN_METADATA_OFFSET(opt_len))) +struct ovs_tunnel_info +{ + struct ip_tunnel_info *info; +}; + #define OVS_SW_FLOW_KEY_METADATA_SIZE \ (offsetof(struct sw_flow_key, recirc_id) + \ FIELD_SIZEOF(struct sw_flow_key, recirc_id)) diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index ecfa530..35086c6 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -1548,11 +1548,45 @@ static struct sw_flow_actions *nla_alloc_flow_actions(int size, bool log) return sfa; } +static void ovs_nla_free_set_action(const struct nlattr *a) +{ + const struct nlattr *ovs_key = nla_data(a); + struct ovs_tunnel_info *ovs_tun; + + switch (nla_type(ovs_key)) { + case OVS_KEY_ATTR_TUNNEL_INFO: + ovs_tun = nla_data(ovs_key); + ip_tunnel_info_put(ovs_tun-info); + break; + } +} + +void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts) +{ + const struct nlattr *a; + int rem; + + nla_for_each_attr(a, sf_acts-actions, sf_acts-actions_len, rem) { + switch (nla_type(a)) { + case OVS_ACTION_ATTR_SET: + ovs_nla_free_set_action(a); + break; + } + } + + kfree(sf_acts); +} + +static void __ovs_nla_free_flow_actions(struct rcu_head *head) +{ + ovs_nla_free_flow_actions(container_of(head, struct sw_flow_actions, rcu)); +} + /* Schedules 'sf_acts' to be freed after the next RCU grace period. * The caller must hold rcu_read_lock for this to be sensible. */ -void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts) +void ovs_nla_free_flow_actions_rcu(struct sw_flow_actions *sf_acts) { - kfree_rcu(sf_acts, rcu); + call_rcu(sf_acts-rcu, __ovs_nla_free_flow_actions); } static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa, @@ -1747,6 +1781,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr, struct sw_flow_match match;
Re: [PATCH 7/7] mac80211: Switch to new AEAD interface
On Mon, 2015-06-01 at 16:05 +0200, Johannes Berg wrote: Ok - here the length is kinda passed a part of the AAD buffer, but this is really just some arcane code that should be fixed to use a proper struct. The value there, even though it is __be16 and looks like it came from the data, is actually created locally, see ccmp_special_blocks() and gcmp_special_blocks(). IOW, I think something like this would make sense: (but I'll hold it until after Herbert's patches I guess) From 20bd0e92ab0d7ef545687da762228622bcdabeec Mon Sep 17 00:00:00 2001 From: Johannes Berg johannes.b...@intel.com Date: Mon, 1 Jun 2015 16:33:11 +0200 Subject: [PATCH] mac80211: move AAD length out of AAD buffer The code currently passes the AAD buffer as a __be16 with the length, followed by the actual data, but doesn't use a struct or make this explicit in any other way, so it's confusing. Change the code to pass the AAD length explicity outside of the buffer. Reported-by: Stephan Mueller smuel...@chronox.de Signed-off-by: Johannes Berg johannes.b...@intel.com --- net/mac80211/aes_ccm.c | 18 +++--- net/mac80211/aes_ccm.h | 14 ++- net/mac80211/aes_gcm.c | 10 net/mac80211/aes_gcm.h | 6 +++-- net/mac80211/wpa.c | 64 +++--- 5 files changed, 62 insertions(+), 50 deletions(-) diff --git a/net/mac80211/aes_ccm.c b/net/mac80211/aes_ccm.c index 208df7c0b6ea..b6e2f096127a 100644 --- a/net/mac80211/aes_ccm.c +++ b/net/mac80211/aes_ccm.c @@ -19,9 +19,10 @@ #include key.h #include aes_ccm.h -void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, - u8 *data, size_t data_len, u8 *mic, - size_t mic_len) +void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, + u8 *aad, size_t aad_len, + u8 *data, size_t data_len, + u8 *mic, size_t mic_len) { struct scatterlist assoc, pt, ct[2]; @@ -33,7 +34,7 @@ void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, memset(aead_req, 0, sizeof(aead_req_data)); sg_init_one(pt, data, data_len); - sg_init_one(assoc, aad[2], be16_to_cpup((__be16 *)aad)); + sg_init_one(assoc, aad, aad_len); sg_init_table(ct, 2); sg_set_buf(ct[0], data, data_len); sg_set_buf(ct[1], mic, mic_len); @@ -45,9 +46,10 @@ void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, crypto_aead_encrypt(aead_req); } -int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, - u8 *data, size_t data_len, u8 *mic, - size_t mic_len) +int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, + u8 *aad, size_t aad_len, + u8 *data, size_t data_len, + u8 *mic, size_t mic_len) { struct scatterlist assoc, pt, ct[2]; char aead_req_data[sizeof(struct aead_request) + @@ -61,7 +63,7 @@ int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, memset(aead_req, 0, sizeof(aead_req_data)); sg_init_one(pt, data, data_len); - sg_init_one(assoc, aad[2], be16_to_cpup((__be16 *)aad)); + sg_init_one(assoc, aad, aad_len); sg_init_table(ct, 2); sg_set_buf(ct[0], data, data_len); sg_set_buf(ct[1], mic, mic_len); diff --git a/net/mac80211/aes_ccm.h b/net/mac80211/aes_ccm.h index 6a73d1e4d186..bfe355e4a680 100644 --- a/net/mac80211/aes_ccm.h +++ b/net/mac80211/aes_ccm.h @@ -15,12 +15,14 @@ struct crypto_aead *ieee80211_aes_key_setup_encrypt(const u8 key[], size_t key_len, size_t mic_len); -void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, - u8 *data, size_t data_len, u8 *mic, - size_t mic_len); -int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, - u8 *data, size_t data_len, u8 *mic, - size_t mic_len); +void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, + u8 *aad, size_t aad_len, + u8 *data, size_t data_len, + u8 *mic, size_t mic_len); +int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, + u8 *aad, size_t aad_len, + u8 *data, size_t data_len, + u8 *mic, size_t mic_len); void ieee80211_aes_key_free(struct crypto_aead *tfm); #endif /* AES_CCM_H */ diff --git a/net/mac80211/aes_gcm.c b/net/mac80211/aes_gcm.c index fd278bbe1b0d..fb6823c5e381 100644 --- a/net/mac80211/aes_gcm.c +++ b/net/mac80211/aes_gcm.c @@
Re: [PATCH] libceph: use kvfree() in ceph_put_page_vector()
On Mon, Jun 1, 2015 at 5:36 PM, Geliang Tang geliangt...@163.com wrote: Use kvfree() instead of open-coding it. Signed-off-by: Geliang Tang geliangt...@163.com --- net/ceph/pagevec.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c index 096d914..d4f5f22 100644 --- a/net/ceph/pagevec.c +++ b/net/ceph/pagevec.c @@ -51,10 +51,7 @@ void ceph_put_page_vector(struct page **pages, int num_pages, bool dirty) set_page_dirty_lock(pages[i]); put_page(pages[i]); } - if (is_vmalloc_addr(pages)) - vfree(pages); - else - kfree(pages); + kvfree(pages); } EXPORT_SYMBOL(ceph_put_page_vector); Already fixed in testing, wasn't pushed to linux-next though, sorry! Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] net: core: 'ethtool' issue with querying phy settings
On Sun, 2015-05-31 at 17:19 -0700, David Miller wrote: From: Ben Hutchings b...@decadent.org.uk Date: Sun, 31 May 2015 20:54:06 +0100 On Fri, 2015-05-22 at 16:15 -0400, David Miller wrote: From: Arun Parameswaran apara...@broadcom.com Date: Wed, 20 May 2015 14:35:30 -0700 When trying to configure the settings for PHY1, using commands like 'ethtool -s eth0 phyad 1 speed 100', the 'ethtool' seems to modify other settings apart from the speed of the PHY1, in the above case. The ethtool seems to query the settings for PHY0, and use this as the base to apply the new settings to the PHY1. This is causing the other settings of the PHY 1 to be wrongly configured. The issue is caused by the '_ethtool_get_settings()' API, which gets called because of the 'ETHTOOL_GSET' command, is clearing the 'cmd' pointer (of type 'struct ethtool_cmd') by calling memset. This clears all the parameters (if any) passed for the 'ETHTOOL_GSET' cmd. So the driver's callback is always invoked with 'cmd-phy_address' as '0'. The '_ethtool_get_settings()' is called from other files in the 'net/core'. So the fix is applied to the 'ethtool_get_settings()' which is only called in the context of the 'ethtool'. Signed-off-by: Arun Parameswaran apara...@broadcom.com Reviewed-by: Ray Jui r...@broadcom.com Reviewed-by: Scott Branden sbran...@broadcom.com Applied and queued up for -stable, thanks. Please revert this. This is an incompatible API change, not a bug fix. The established semantics are that 'phyad' is filled in by the driver; it is not a parameter to the ETHTOOL_GSET command. But then how in the world can the user specify specific PHY ADs for a device that will respond to more than one? ETHTOOL_SSET sets the current PHY address and ETHTOOL_GSET gets it. If multiple PHYs need to be configured for a single link then the driver should configure them all at the same time rather than making it the administrator's problem. What we can't support with the current API are: - multiple physical links behind a single net device (different configuration possible for each link) - multiple PHYs are needed for a single link, and the driver can't automatically decide which to use (multiple addresses to set) Ben. -- Ben Hutchings Power corrupts. Absolute power is kind of neat. - John Lehman, Secretary of the US Navy 1981-1987 signature.asc Description: This is a digitally signed message part
Re: [PATCH] ethtool: changes of emac_regs structure accordingly within driver emac_regs structure.
On Mon, 2015-06-01 at 16:30 +0400, Ivan Mikhaylov wrote: On Mon, 1 June 2015 12:57 +0400 Ben Hutchings b...@decadent.org.uk wrote: On Thu, 2015-05-21 at 19:09 +0400, Ivan Mikhaylov wrote: In ibm_emac.c in ethtool size of emac structure which passing through to driver is nailed down and not correlating with current emac_regs structure. Signed-off-by: Ivan Mikhaylov i...@ru.ibm.com [...] This is not backward-compatible. It ought to be possible to mix and match old and new ethtool and driver, except for the EMAC4SYNC case which has been broken up until now. Using the new definition of struct emac_regs, I think the driver and ethtool need to agree that the MAC register dump sizes are: EMAC: offsetof(struct emac_regs, u1) EMAC4: offsetof(struct emac_regs, u1.emac4) + sizeof(p-u1.emac4) EMAC4SYNC: offsetof(struct emac_regs, u1.emac4sync) + sizeof(p-u1.emac4sync) Ben. -- Ben Hutchings Reality is just a crutch for people who can't handle science fiction. Actually it is backward-compatible because we don't care about size which is coming from driver side, only what we doing is map of driver structure to ethtool structure and results will be same for emac and emac4. struct emac_regs *p = (struct emac_regs *)(hdr + 1); The following registers won't be printed correctly. Also size which you mentioned (112 emac, 116 emac4) can be different from what you saying cause this managed by dts files where we can set something like 0x100 or 0x80 for this memory area and we will still have problem in representing MII area if this size wasn't set right in dts. Yes, I understand that. However, the in-tree device trees consistently use those as the resource sizes so I think ethtool used to work properly for the machines supported by those. Increasing the size of the MAC register dump is a regression for them. Ben. Ethtool will be work in same way even if we have emac or emac4. Thank you for respond! -- Ben Hutchings Power corrupts. Absolute power is kind of neat. - John Lehman, Secretary of the US Navy 1981-1987 signature.asc Description: This is a digitally signed message part
[PATCH net-next 3/5] rocker: install untagged VLAN (vid=0) support for each port
From: Scott Feldman sfel...@gmail.com On port probe, install by default untagged VLAN support. This is equivalent to running the command: bridge vlan add vid 0 dev DEV self A user could, if they wanted, manaully removing untagged support from the port by running the command: bridge vlan del vid 0 dev DEV self But installing it by default on port initialization gives the normal expected behavior. Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 76e2281..bd56273 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -3157,6 +3157,8 @@ static int rocker_port_vlan_flood_group(struct rocker_port *rocker_port, for (i = 0; i rocker-port_count; i++) { p = rocker-ports[i]; + if (!p) + continue; if (!rocker_port_is_bridged(p)) continue; if (test_bit(ntohs(vlan_id), p-vlan_bitmap)) { @@ -3216,7 +3218,7 @@ static int rocker_port_vlan_l2_groups(struct rocker_port *rocker_port, for (i = 0; i rocker-port_count; i++) { p = rocker-ports[i]; - if (test_bit(ntohs(vlan_id), p-vlan_bitmap)) + if (p test_bit(ntohs(vlan_id), p-vlan_bitmap)) ref++; } @@ -4882,6 +4884,7 @@ static int rocker_probe_port(struct rocker *rocker, unsigned int port_number) const struct pci_dev *pdev = rocker-pdev; struct rocker_port *rocker_port; struct net_device *dev; + u16 untagged_vid = 0; int err; dev = alloc_etherdev(sizeof(struct rocker_port)); @@ -4917,16 +4920,27 @@ static int rocker_probe_port(struct rocker *rocker, unsigned int port_number) rocker_port_set_learning(rocker_port, SWITCHDEV_TRANS_NONE); - rocker_port-internal_vlan_id = - rocker_port_internal_vlan_id_get(rocker_port, dev-ifindex); err = rocker_port_ig_tbl(rocker_port, SWITCHDEV_TRANS_NONE, 0); if (err) { dev_err(pdev-dev, install ig port table failed\n); goto err_port_ig_tbl; } + rocker_port-internal_vlan_id = + rocker_port_internal_vlan_id_get(rocker_port, dev-ifindex); + + err = rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE, + untagged_vid, 0); + if (err) { + netdev_err(rocker_port-dev, install untagged VLAN failed\n); + goto err_untagged_vlan; + } + return 0; +err_untagged_vlan: + rocker_port_ig_tbl(rocker_port, SWITCHDEV_TRANS_NONE, + ROCKER_OP_FLAG_REMOVE); err_port_ig_tbl: unregister_netdev(dev); err_register_netdev: -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/5] rocker: cleanup vlan table on error adding vlan
From: Scott Feldman sfel...@gmail.com Basic house keeping: If there is an error adding the router MAC for this vlan, removing the just installed VLAN table entry to leave device in same state as before failure. Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index ab6d871..76e2281 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4342,7 +4342,12 @@ static int rocker_port_vlan_add(struct rocker_port *rocker_port, if (err) return err; - return rocker_port_router_mac(rocker_port, trans, 0, htons(vid)); + err = rocker_port_router_mac(rocker_port, trans, 0, htons(vid)); + if (err) + rocker_port_vlan(rocker_port, trans, +ROCKER_OP_FLAG_REMOVE, vid); + + return err; } static int rocker_port_vlans_add(struct rocker_port *rocker_port, -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 4/5] rocker: install/remove router MAC for untagged VLAN when joining/leaving bridge
From: Scott Feldman sfel...@gmail.com When the port joins a bridge, the port's internal VLAN ID needs to change to the bridge's internal VLAN ID. Likewise, when leaving the bridge, the internal VLAN ID reverts back the port's original internal VLAN ID. (The internal VLAN ID is used by device to internally mark untagged pkts with some VLAN, which will eventually be removed on egress...think PVID). When the internal VLAN ID changes, we need to update the VLAN table entries and the router MAC entries for IP/IPv6 to reflect the new internal VLAN ID. This patch makes use of the common rocker_port_vlan_add/del functions to make sure the tables are updated for the current internal VLAN ID. Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c | 42 -- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index bd56273..3eb3eba 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -5178,41 +5178,49 @@ static bool rocker_port_dev_check(const struct net_device *dev) static int rocker_port_bridge_join(struct rocker_port *rocker_port, struct net_device *bridge) { + u16 untagged_vid = 0; int err; - rocker_port_internal_vlan_id_put(rocker_port, -rocker_port-dev-ifindex); - - rocker_port-bridge_dev = bridge; + /* Port is joining bridge, so the internal VLAN for the +* port is going to change to the bridge internal VLAN. +* Let's remove untagged VLAN (vid=0) from port and +* re-add once internal VLAN has changed. +*/ - /* Use bridge internal VLAN ID for untagged pkts */ - err = rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE, - ROCKER_OP_FLAG_REMOVE, 0); + err = rocker_port_vlan_del(rocker_port, untagged_vid, 0); if (err) return err; + + rocker_port_internal_vlan_id_put(rocker_port, +rocker_port-dev-ifindex); rocker_port-internal_vlan_id = rocker_port_internal_vlan_id_get(rocker_port, bridge-ifindex); - return rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE, 0, 0); + + rocker_port-bridge_dev = bridge; + + return rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE, + untagged_vid, 0); } static int rocker_port_bridge_leave(struct rocker_port *rocker_port) { + u16 untagged_vid = 0; int err; - rocker_port_internal_vlan_id_put(rocker_port, -rocker_port-bridge_dev-ifindex); - - rocker_port-bridge_dev = NULL; - - /* Use port internal VLAN ID for untagged pkts */ - err = rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE, - ROCKER_OP_FLAG_REMOVE, 0); + err = rocker_port_vlan_del(rocker_port, untagged_vid, 0); if (err) return err; + + rocker_port_internal_vlan_id_put(rocker_port, +rocker_port-bridge_dev-ifindex); rocker_port-internal_vlan_id = rocker_port_internal_vlan_id_get(rocker_port, rocker_port-dev-ifindex); - err = rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE, 0, 0); + + rocker_port-bridge_dev = NULL; + + err = rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE, + untagged_vid, 0); if (err) return err; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 5/5] rocker: remove support for legacy VLAN ndo ops
From: Scott Feldman sfel...@gmail.com Remove support for legacy ndo ops .ndo_vlan_rx_add_vid/.ndo_vlan_rx_kill_vid. Rocker will use bridge_setlink/dellink exclusively for VLAN add/del operations. The legacy ops are needed if using 8021q driver module to setup VLANs on the port. But an alternative exists in using bridge_setlink/delink to setup VLANs, which doesn't depend on 8021q module. So rocker will switch to the newer setlink/dellink ops. VLANs can added/delete from the port, regardless if port is bridged or not, using the bridge commands: bridge vlan [add|del] vid VID dev DEV self (Yes, I agree it's confusing to use the bridge command to set a VLAN on a non-bridged port). Using setlink/dellink over legacy ops let's us handle the stacked driver case automatically. It's built-in. setlink also pass additional flags (PVID, egress untagged) that aren't available with the legacy ops. Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c | 34 +- 1 file changed, 1 insertion(+), 33 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 3eb3eba..e3fb97a 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4185,35 +4185,6 @@ static int rocker_port_set_mac_address(struct net_device *dev, void *p) return 0; } -static int rocker_port_vlan_rx_add_vid(struct net_device *dev, - __be16 proto, u16 vid) -{ - struct rocker_port *rocker_port = netdev_priv(dev); - int err; - - err = rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE, 0, vid); - if (err) - return err; - - return rocker_port_router_mac(rocker_port, SWITCHDEV_TRANS_NONE, - 0, htons(vid)); -} - -static int rocker_port_vlan_rx_kill_vid(struct net_device *dev, - __be16 proto, u16 vid) -{ - struct rocker_port *rocker_port = netdev_priv(dev); - int err; - - err = rocker_port_router_mac(rocker_port, SWITCHDEV_TRANS_NONE, -ROCKER_OP_FLAG_REMOVE, htons(vid)); - if (err) - return err; - - return rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE, - ROCKER_OP_FLAG_REMOVE, vid); -} - static int rocker_port_get_phys_port_name(struct net_device *dev, char *buf, size_t len) { @@ -4235,8 +4206,6 @@ static const struct net_device_ops rocker_port_netdev_ops = { .ndo_stop = rocker_port_stop, .ndo_start_xmit = rocker_port_xmit, .ndo_set_mac_address= rocker_port_set_mac_address, - .ndo_vlan_rx_add_vid= rocker_port_vlan_rx_add_vid, - .ndo_vlan_rx_kill_vid = rocker_port_vlan_rx_kill_vid, .ndo_bridge_getlink = switchdev_port_bridge_getlink, .ndo_bridge_setlink = switchdev_port_bridge_setlink, .ndo_bridge_dellink = switchdev_port_bridge_dellink, @@ -4908,8 +4877,7 @@ static int rocker_probe_port(struct rocker *rocker, unsigned int port_number) NAPI_POLL_WEIGHT); rocker_carrier_init(rocker_port); - dev-features |= NETIF_F_NETNS_LOCAL | -NETIF_F_HW_VLAN_CTAG_FILTER; + dev-features |= NETIF_F_NETNS_LOCAL; err = register_netdev(dev); if (err) { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/5] rocker: zero allocate ports array
From: Scott Feldman sfel...@gmail.com When allocating the array of rocker port pointers, zero the array values so we can test for !NULL to see if port is allocated/registered. We'll need this later when installing untagged VLAN support for each port, during port probe. It's a long story, but to install a VLAN (vid=0 for untagged, in this case) on a port, we'll need to scan other ports to see if the VLAN group for that VLAN has been setup. To scan the other ports, we need to walk the port array. Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 36f7edf..ab6d871 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4936,7 +4936,7 @@ static int rocker_probe_ports(struct rocker *rocker) int err; alloc_size = sizeof(struct rocker_port *) * rocker-port_count; - rocker-ports = kmalloc(alloc_size, GFP_KERNEL); + rocker-ports = kzalloc(alloc_size, GFP_KERNEL); if (!rocker-ports) return -ENOMEM; for (i = 0; i rocker-port_count; i++) { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/5] rocker: enable by default untagged VLAN support
From: Scott Feldman sfel...@gmail.com This patch set is a followup to Simon Horman's RFC patch: [PATCH/RFC net-next] rocker: by default accept untagged packets Now, on port probe, we install untagged VLAN (vid=0) support for each port as the default. This is equivalent to the command: bridge vlan add vid 0 dev DEV self Accepting untagged VLAN pkts is a reasonable default, but the user could override this with: bridge vlan del vid 0 dev DEV self With this, we no longer need 8021q module to install vid=0 when port interface opens. In fact, we don't need support for legacy VLAN ndo ops at all since they're superseded by bridge_setlink/dellink. So remove legacy VLAN ndo ops support in driver. (The legacy VLAN ndo ops are supported by bonding/team drivers, but don't fit into the transaction model offered by switchdev, so switching all VLAN functions to bridge_setlink/dellink switchdev support gets us stacked driver + transaction model support). Scott Feldman (5): rocker: zero allocate ports array rocker: cleanup vlan table on error adding vlan rocker: install untagged VLAN (vid=0) support for each port rocker: install/remove router MAC for untagged VLAN when joining/leaving bridge rocker: remove support for legacy VLAN ndo ops drivers/net/ethernet/rocker/rocker.c | 105 -- 1 file changed, 50 insertions(+), 55 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] Netfilter fix for net
Hi David, The following patch reverts the ebtables chunk that enforces counters that was introduced in the recently applied d26e2c9ffa38 ('Revert netfilter: ensure number of counters is 0 in do_replace()') since this breaks ebtables. You can pull this change from: git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git Thanks! The following changes since commit 9302d7bb0c5cd46be5706859301f18c137b2439f: sctp: Fix mangled IPv4 addresses on a IPv6 listening socket (2015-05-27 14:15:26 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git master for you to fetch changes up to d26e2c9ffa385dd1b646f43c1397ba12af9ed431: Revert netfilter: ensure number of counters is 0 in do_replace() (2015-06-01 19:45:47 +0200) Bernhard Thaler (1): Revert netfilter: ensure number of counters is 0 in do_replace() net/bridge/netfilter/ebtables.c |4 1 file changed, 4 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] Revert netfilter: ensure number of counters is 0 in do_replace()
From: Bernhard Thaler bernhard.tha...@wvnet.at This partially reverts commit 1086bbe97a07 (netfilter: ensure number of counters is 0 in do_replace()) in net/bridge/netfilter/ebtables.c. Setting rules with ebtables does not work any more with 1086bbe97a07 place. There is an error message and no rules set in the end. e.g. ~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP Unable to update the kernel. Two possible causes: 1. Multiple ebtables programs were executing simultaneously. The ebtables userspace tool doesn't by default support multiple ebtables programs running Reverting the ebtables part of 1086bbe97a07 makes this work again. Signed-off-by: Bernhard Thaler bernhard.tha...@wvnet.at Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- net/bridge/netfilter/ebtables.c |4 1 file changed, 4 deletions(-) diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index 24c7c96..91180a7 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -1117,8 +1117,6 @@ static int do_replace(struct net *net, const void __user *user, return -ENOMEM; if (tmp.num_counters = INT_MAX / sizeof(struct ebt_counter)) return -ENOMEM; - if (tmp.num_counters == 0) - return -EINVAL; tmp.name[sizeof(tmp.name) - 1] = 0; @@ -2161,8 +2159,6 @@ static int compat_copy_ebt_replace_from_user(struct ebt_replace *repl, return -ENOMEM; if (tmp.num_counters = INT_MAX / sizeof(struct ebt_counter)) return -ENOMEM; - if (tmp.num_counters == 0) - return -EINVAL; memcpy(repl, tmp, offsetof(struct ebt_replace, hook_entry)); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Fix couple of issues with 'ethtool' get/set API's
On 15-05-31 12:59 PM, Ben Hutchings wrote: On Fri, 2015-05-22 at 15:43 -0700, Arun Parameswaran wrote: Hi, The patch fixes 2 issues with 'ethtool' getting/setting parametres in the do_gset() do_sset() API's. I have pushed a patch to the Kernel to fix an issue in the handling of the 'ethtool' commands which got accepted. This Kernel patch was based on Linux v4.1-rc4 and is available in: https://github.com/Broadcom/cygnus-linux/tree/net-core-ethtool-fix-v1 The Kernel was always clearing the command from the 'ethtool' resulting in all operations to deal with PHY0. This prevents querying/setting PHY 1's settings. [...] Each net device can be associated with a single PHY at a time, and the ETHTOOL_GSET implementation should fill in the PHY address in the ethtool_cmd::phy_address field. Where there are multiple PHYs that can be connected to the net device's MAC, an ETHTOOL_SSET operation can be used to change that PHY address. The above can be done by the driver when there is one PHY per MAC. In our case we have multiple PHYs controlled by the same MAC. I should have clarified this earlier, I apologize. When we specify the 'phyad', in the command line, we were expecting the 'ethtool' to fetch/set data for that 'phyad'. This is the intend of the patch. With the patch (in 'ethtool' and Kernel), if 'phyad' is not specified, it will still function as you described above, it will be up to the driver to return the proper 'phyad' and related settings. The ethtool API is not meant for controlling other PHYs that aren't connected to the MAC; if you want to do that then create more net devices for them or use the MDIO ioctls. In the SoC, there are multiple PHYs (in our case there are 2) controlled by the same MAC. We are trying to use 'ethtool' to control both the PHYs connected to the same MAC. Ben. Thanks Arun -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Fix couple of issues with 'ethtool' get/set API's
On Mon, 2015-06-01 at 10:14 -0700, Arun Parameswaran wrote: On 15-05-31 12:59 PM, Ben Hutchings wrote: On Fri, 2015-05-22 at 15:43 -0700, Arun Parameswaran wrote: Hi, The patch fixes 2 issues with 'ethtool' getting/setting parametres in the do_gset() do_sset() API's. I have pushed a patch to the Kernel to fix an issue in the handling of the 'ethtool' commands which got accepted. This Kernel patch was based on Linux v4.1-rc4 and is available in: https://github.com/Broadcom/cygnus-linux/tree/net-core-ethtool-fix-v1 The Kernel was always clearing the command from the 'ethtool' resulting in all operations to deal with PHY0. This prevents querying/setting PHY 1's settings. [...] Each net device can be associated with a single PHY at a time, and the ETHTOOL_GSET implementation should fill in the PHY address in the ethtool_cmd::phy_address field. Where there are multiple PHYs that can be connected to the net device's MAC, an ETHTOOL_SSET operation can be used to change that PHY address. The above can be done by the driver when there is one PHY per MAC. In our case we have multiple PHYs controlled by the same MAC. I should have clarified this earlier, I apologize. I understand that you can have multiple PHYs on the same MDIO bus, but not how the MAC can use them at the same time. Is this hardware level bonding? Or are multiple PHYs needed for a single link? When we specify the 'phyad', in the command line, we were expecting the 'ethtool' to fetch/set data for that 'phyad'. This is the intend of the patch. With the patch (in 'ethtool' and Kernel), if 'phyad' is not specified, it will still function as you described above, it will be up to the driver to return the proper 'phyad' and related settings. [...] But without the patch in ethtool and other programs calling this API (it's not just the ethtool command!), you get random junk as the phy_address. How will you tell whether it's valid or not? Ben. -- Ben Hutchings Power corrupts. Absolute power is kind of neat. - John Lehman, Secretary of the US Navy 1981-1987 signature.asc Description: This is a digitally signed message part
Re: [PATCH net] bnx2x: Move statistics implementation into semaphores
From: Yuval Mintz yuval.mi...@qlogic.com Date: Mon, 1 Jun 2015 15:08:18 +0300 Commit dff173de84958 (bnx2x: Fix statistics locking scheme) changed the bnx2x locking around statistics state into using a mutex - but the lock is being accessed via a timer which is forbidden. [If compiled with CONFIG_DEBUG_MUTEXES, logs show a warning about accessing the mutex in interrupt context] This moves the implementation into using a semaphore [with size '1'] instead. Signed-off-by: Yuval Mintz yuval.mi...@qlogic.com Signed-off-by: Ariel Elior ariel.el...@qlogic.com Applied, thank you. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Fix couple of issues with 'ethtool' get/set API's
On Mon, 2015-06-01 at 12:12 -0700, Arun Parameswaran wrote: On 15-06-01 11:07 AM, Ben Hutchings wrote: On Mon, 2015-06-01 at 10:14 -0700, Arun Parameswaran wrote: On 15-05-31 12:59 PM, Ben Hutchings wrote: On Fri, 2015-05-22 at 15:43 -0700, Arun Parameswaran wrote: Hi, The patch fixes 2 issues with 'ethtool' getting/setting parametres in the do_gset() do_sset() API's. I have pushed a patch to the Kernel to fix an issue in the handling of the 'ethtool' commands which got accepted. This Kernel patch was based on Linux v4.1-rc4 and is available in: https://github.com/Broadcom/cygnus-linux/tree/net-core-ethtool-fix-v1 The Kernel was always clearing the command from the 'ethtool' resulting in all operations to deal with PHY0. This prevents querying/setting PHY 1's settings. [...] Each net device can be associated with a single PHY at a time, and the ETHTOOL_GSET implementation should fill in the PHY address in the ethtool_cmd::phy_address field. Where there are multiple PHYs that can be connected to the net device's MAC, an ETHTOOL_SSET operation can be used to change that PHY address. The above can be done by the driver when there is one PHY per MAC. In our case we have multiple PHYs controlled by the same MAC. I should have clarified this earlier, I apologize. I understand that you can have multiple PHYs on the same MDIO bus, but not how the MAC can use them at the same time. Is this hardware level bonding? Or are multiple PHYs needed for a single link? We have an internal switch which manages the traffic to the PHY's (ports). There is 1 PHY per external port. The MAC is connected to the internal port of the switch. Then you should create net devices for those external ports as well as the internal port. If I understand the switchdev API rightly, the external port devices should implement the ethtool {get,set}_settings operations and the ndo_switch_parent_id_get operation. The existing net device should expose only the internal link to the switch (which presumably isn't configurable at all). [...] But this prevents the 'ethtool' from being used to get/set data of specific PHY's. That is fine because it is meant to manage the net device's own link (in this case, the internal port), not other switch ports. Ben. -- Ben Hutchings Power corrupts. Absolute power is kind of neat. - John Lehman, Secretary of the US Navy 1981-1987 signature.asc Description: This is a digitally signed message part
[PATCH 2/2] geneve: allow user to specify TOS info for tunnel frames
Signed-off-by: John W. Linville linvi...@tuxdriver.com --- I have the corresponding iproute2 patch ready, but I am holding it for now to avoid confusion on the list and such... drivers/net/geneve.c | 18 ++ include/uapi/linux/if_link.h | 1 + 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 1675dfdbfa70..78d49d186e05 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -46,6 +46,7 @@ struct geneve_dev { struct geneve_sock *sock; /* socket used for geneve tunnel */ u8 vni[3]; /* virtual network ID for tunnel */ u8 ttl; /* TTL override */ + u8 tos; /* TOS override */ struct sockaddr_in remote; /* IPv4 address for link partner */ struct list_head next;/* geneve's per namespace list */ }; @@ -194,7 +195,12 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev) /* TODO: port min/max limits should be configurable */ sport = udp_flow_src_port(dev_net(dev), skb, 0, 0, true); + tos = geneve-tos; + if (tos == 1) + tos = ip_tunnel_get_dsfield(iip, skb); + memset(fl4, 0, sizeof(fl4)); + fl4.flowi4_tos = RT_TOS(tos); fl4.daddr = geneve-remote.sin_addr.s_addr; rt = ip_route_output_key(geneve-net, fl4); if (IS_ERR(rt)) { @@ -208,9 +214,7 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev) goto rt_tx_error; } - /* TODO: tos should be configurable */ - - tos = ip_tunnel_ecn_encap(0, iip, skb); + tos = ip_tunnel_ecn_encap(tos, iip, skb); ttl = geneve-ttl; if (!ttl IN_MULTICAST(ntohl(fl4.daddr))) @@ -300,6 +304,7 @@ static const struct nla_policy geneve_policy[IFLA_GENEVE_MAX + 1] = { [IFLA_GENEVE_ID]= { .type = NLA_U32 }, [IFLA_GENEVE_REMOTE]= { .len = FIELD_SIZEOF(struct iphdr, daddr) }, [IFLA_GENEVE_TTL] = { .type = NLA_U8 }, + [IFLA_GENEVE_TOS] = { .type = NLA_U8 }, }; static int geneve_validate(struct nlattr *tb[], struct nlattr *data[]) @@ -370,6 +375,9 @@ static int geneve_newlink(struct net *net, struct net_device *dev, if (data[IFLA_GENEVE_TTL]) geneve-ttl = nla_get_u8(data[IFLA_GENEVE_TTL]); + if (data[IFLA_GENEVE_TOS]) + geneve-tos = nla_get_u8(data[IFLA_GENEVE_TOS]); + list_add(geneve-next, gn-geneve_list); hlist_add_head_rcu(geneve-hlist, gn-vni_list[hash]); @@ -393,6 +401,7 @@ static size_t geneve_get_size(const struct net_device *dev) return nla_total_size(sizeof(__u32)) + /* IFLA_GENEVE_ID */ nla_total_size(sizeof(struct in_addr)) + /* IFLA_GENEVE_REMOTE */ nla_total_size(sizeof(__u8)) + /* IFLA_GENEVE_TTL */ + nla_total_size(sizeof(__u8)) + /* IFLA_GENEVE_TOS */ 0; } @@ -409,7 +418,8 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev) geneve-remote.sin_addr.s_addr)) goto nla_put_failure; - if (nla_put_u8(skb, IFLA_GENEVE_TTL, geneve-ttl)) + if (nla_put_u8(skb, IFLA_GENEVE_TTL, geneve-ttl) || + nla_put_u8(skb, IFLA_GENEVE_TOS, geneve-tos)) goto nla_put_failure; return 0; diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 6edb8d268d58..ab90c196dde0 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -396,6 +396,7 @@ enum { IFLA_GENEVE_ID, IFLA_GENEVE_REMOTE, IFLA_GENEVE_TTL, + IFLA_GENEVE_TOS, __IFLA_GENEVE_MAX }; #define IFLA_GENEVE_MAX(__IFLA_GENEVE_MAX - 1) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] geneve: allow user to specify TTL for tunnel frames
Signed-off-by: John W. Linville linvi...@tuxdriver.com --- I have the corresponding iproute2 patch ready, but I am holding it for now to avoid confusion on the list and such... drivers/net/geneve.c | 18 ++ include/uapi/linux/if_link.h | 1 + 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index b7eafa4c1a67..1675dfdbfa70 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -44,7 +44,8 @@ struct geneve_dev { struct net *net;/* netns for packet i/o */ struct net_device *dev;/* netdev for geneve tunnel */ struct geneve_sock *sock; /* socket used for geneve tunnel */ - u8 vni[3]; /* virtual network ID for tunnel */ + u8 vni[3]; /* virtual network ID for tunnel */ + u8 ttl; /* TTL override */ struct sockaddr_in remote; /* IPv4 address for link partner */ struct list_head next;/* geneve's per namespace list */ }; @@ -184,7 +185,7 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev) struct flowi4 fl4; int err; __be16 sport; - __u8 tos, ttl = 0; + __u8 tos, ttl; iip = ip_hdr(skb); @@ -207,11 +208,12 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev) goto rt_tx_error; } - /* TODO: tos and ttl should be configurable */ + /* TODO: tos should be configurable */ tos = ip_tunnel_ecn_encap(0, iip, skb); - if (IN_MULTICAST(ntohl(fl4.daddr))) + ttl = geneve-ttl; + if (!ttl IN_MULTICAST(ntohl(fl4.daddr))) ttl = 1; ttl = ttl ? : ip4_dst_hoplimit(rt-dst); @@ -297,6 +299,7 @@ static void geneve_setup(struct net_device *dev) static const struct nla_policy geneve_policy[IFLA_GENEVE_MAX + 1] = { [IFLA_GENEVE_ID]= { .type = NLA_U32 }, [IFLA_GENEVE_REMOTE]= { .len = FIELD_SIZEOF(struct iphdr, daddr) }, + [IFLA_GENEVE_TTL] = { .type = NLA_U8 }, }; static int geneve_validate(struct nlattr *tb[], struct nlattr *data[]) @@ -364,6 +367,9 @@ static int geneve_newlink(struct net *net, struct net_device *dev, if (err) return err; + if (data[IFLA_GENEVE_TTL]) + geneve-ttl = nla_get_u8(data[IFLA_GENEVE_TTL]); + list_add(geneve-next, gn-geneve_list); hlist_add_head_rcu(geneve-hlist, gn-vni_list[hash]); @@ -386,6 +392,7 @@ static size_t geneve_get_size(const struct net_device *dev) { return nla_total_size(sizeof(__u32)) + /* IFLA_GENEVE_ID */ nla_total_size(sizeof(struct in_addr)) + /* IFLA_GENEVE_REMOTE */ + nla_total_size(sizeof(__u8)) + /* IFLA_GENEVE_TTL */ 0; } @@ -402,6 +409,9 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev) geneve-remote.sin_addr.s_addr)) goto nla_put_failure; + if (nla_put_u8(skb, IFLA_GENEVE_TTL, geneve-ttl)) + goto nla_put_failure; + return 0; nla_put_failure: diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 2ca17d1cff3f..6edb8d268d58 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -395,6 +395,7 @@ enum { IFLA_GENEVE_UNSPEC, IFLA_GENEVE_ID, IFLA_GENEVE_REMOTE, + IFLA_GENEVE_TTL, __IFLA_GENEVE_MAX }; #define IFLA_GENEVE_MAX(__IFLA_GENEVE_MAX - 1) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] rocker: remove rocker parameter from functions that have rocker_port parameter
On Mon, Jun 01, 2015 at 01:25:04PM +0900, Simon Horman wrote: The rocker (switch) of a rocker_port may be trivially obtained from the latter it seems cleaner not to pass the former to a function when the latter is being passed anyway. Excellent idea and commonly used in many other hardware drivers. rocker_port_rx_proc() is omitted from this change as it is a hot path case. Signed-off-by: Simon Horman simon.hor...@netronome.com Acked-by: Scott Feldman sfel...@gmail.com Acked-by: Andy Gospodarek go...@cumulusnetworks.com --- v2 * Dropped RFC designation * Omit rocker_port_rx_proc() from this change as it is a hot path case, as suggested by Scott Feldman * Added Scott Feldman's Ack --- drivers/net/ethernet/rocker/rocker.c | 115 ++- 1 file changed, 45 insertions(+), 70 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 36f7edfc3c7a..d246647b3653 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -1172,11 +1172,11 @@ static void rocker_dma_rings_fini(struct rocker *rocker) rocker_dma_ring_destroy(rocker, rocker-cmd_ring); } -static int rocker_dma_rx_ring_skb_map(const struct rocker *rocker, - const struct rocker_port *rocker_port, +static int rocker_dma_rx_ring_skb_map(const struct rocker_port *rocker_port, struct rocker_desc_info *desc_info, struct sk_buff *skb, size_t buf_len) { + const struct rocker *rocker = rocker_port-rocker; struct pci_dev *pdev = rocker-pdev; dma_addr_t dma_handle; @@ -1201,8 +1201,7 @@ static size_t rocker_port_rx_buf_len(const struct rocker_port *rocker_port) return rocker_port-dev-mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN; } -static int rocker_dma_rx_ring_skb_alloc(const struct rocker *rocker, - const struct rocker_port *rocker_port, +static int rocker_dma_rx_ring_skb_alloc(const struct rocker_port *rocker_port, struct rocker_desc_info *desc_info) { struct net_device *dev = rocker_port-dev; @@ -1219,8 +1218,7 @@ static int rocker_dma_rx_ring_skb_alloc(const struct rocker *rocker, skb = netdev_alloc_skb_ip_align(dev, buf_len); if (!skb) return -ENOMEM; - err = rocker_dma_rx_ring_skb_map(rocker, rocker_port, desc_info, - skb, buf_len); + err = rocker_dma_rx_ring_skb_map(rocker_port, desc_info, skb, buf_len); if (err) { dev_kfree_skb_any(skb); return err; @@ -1257,15 +1255,15 @@ static void rocker_dma_rx_ring_skb_free(const struct rocker *rocker, dev_kfree_skb_any(skb); } -static int rocker_dma_rx_ring_skbs_alloc(const struct rocker *rocker, - const struct rocker_port *rocker_port) +static int rocker_dma_rx_ring_skbs_alloc(const struct rocker_port *rocker_port) { const struct rocker_dma_ring_info *rx_ring = rocker_port-rx_ring; + const struct rocker *rocker = rocker_port-rocker; int i; int err; for (i = 0; i rx_ring-size; i++) { - err = rocker_dma_rx_ring_skb_alloc(rocker, rocker_port, + err = rocker_dma_rx_ring_skb_alloc(rocker_port, rx_ring-desc_info[i]); if (err) goto rollback; @@ -1278,10 +1276,10 @@ rollback: return err; } -static void rocker_dma_rx_ring_skbs_free(const struct rocker *rocker, - const struct rocker_port *rocker_port) +static void rocker_dma_rx_ring_skbs_free(const struct rocker_port *rocker_port) { const struct rocker_dma_ring_info *rx_ring = rocker_port-rx_ring; + const struct rocker *rocker = rocker_port-rocker; int i; for (i = 0; i rx_ring-size; i++) @@ -1327,7 +1325,7 @@ static int rocker_port_dma_rings_init(struct rocker_port *rocker_port) goto err_dma_rx_ring_bufs_alloc; } - err = rocker_dma_rx_ring_skbs_alloc(rocker, rocker_port); + err = rocker_dma_rx_ring_skbs_alloc(rocker_port); if (err) { netdev_err(rocker_port-dev, failed to alloc rx dma ring skbs\n); goto err_dma_rx_ring_skbs_alloc; @@ -1353,7 +1351,7 @@ static void rocker_port_dma_rings_fini(struct rocker_port *rocker_port) { struct rocker *rocker = rocker_port-rocker; - rocker_dma_rx_ring_skbs_free(rocker, rocker_port); + rocker_dma_rx_ring_skbs_free(rocker_port); rocker_dma_ring_bufs_free(rocker, rocker_port-rx_ring, PCI_DMA_BIDIRECTIONAL); rocker_dma_ring_destroy(rocker, rocker_port-rx_ring); @@ -1588,22 +1586,20 @@ static
Re: [PATCH net-next v2 01/14] sfc: Add code to export port_num in netdev-dev_port
From: Shradha Shah ss...@solarflare.com Date: Mon, 1 Jun 2015 14:00:12 +0100 In the case where we have multiple functions (PFs and VFs), this sysfs entry is useful to identify the physical port corresponding to the function we are interested in. Signed-off-by: Shradha Shah ss...@solarflare.com This is a low effort change. You retained all of the error handling changes that were only necessary when you added the new sysfs file, but are completely unnecessary if you're just reporting it via netdev-dev_port. This is extremely disappointing, because you expect me to put a good effort into reviewing your changes yet you aren't putting that level of effort into the submission itself. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html