date:20150601

pull-request: wireless-drivers 2015-06-01

2015-06-01 Thread Kalle Valo

Hi Dave,

here are three more important fixes I'm hoping to get to 4.1 still, I
hope I'm not too late with these. Please let me know if there are any
problems.

Kalle

The following changes since commit aefa441b150279dd8d25658e018898a3fe9a6769:

  Merge tag 'iwlwifi-for-kalle-2015-05-21' of 
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes 
(2015-05-22 10:47:02 +0300)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git 
tags/wireless-drivers-for-davem-2015-06-01

for you to fetch changes up to 38fe44e61a894f1c7b3e60b0614030271070ea53:

  Merge tag 'iwlwifi-for-kalle-2015-05-28' of 
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes 
(2015-05-28 16:28:03 +0300)



iwlwifi:

* fix OTP parsing 8260
* fix powersave handling for 8260

brcmfmac:

* fix null pointer crash


Arend van Spriel (1):
  brcmfmac: avoid null pointer access when brcmf_msgbuf_get_pktid() fails

Ilan Peer (1):
  iwlwifi: pcie: fix tracking of cmd_in_flight

Kalle Valo (1):
  Merge tag 'iwlwifi-for-kalle-2015-05-28' of 
https://git.kernel.org/.../iwlwifi/iwlwifi-fixes

Liad Kaufman (1):
  iwlwifi: nvm: fix otp parsing in 8000 hw family

 drivers/net/wireless/brcm80211/brcmfmac/msgbuf.c |   12 +--
 drivers/net/wireless/iwlwifi/iwl-nvm-parse.c |2 +-
 drivers/net/wireless/iwlwifi/pcie/internal.h |6 +++---
 drivers/net/wireless/iwlwifi/pcie/trans.c|4 ++--
 drivers/net/wireless/iwlwifi/pcie/tx.c   |   23 +-
 5 files changed, 20 insertions(+), 27 deletions(-)

-- 
Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [cdc_ncm] guidance and help refactoring cdc_ncm

2015-06-01 Thread Enrico Mioso

Hello Greg, hello everyone reading.
I am sorry If I wasn't specific enough - I'll be this time! :) Or I try at 
least.

On Mon, 1 Jun 2015, Greg KH wrote:

==Date: Mon, 1 Jun 2015 02:59:17
==From: Greg KH g...@kroah.com
==To: Enrico Mioso mrkiko...@gmail.com
==Cc: linux-...@vger.kernel.org, netdev@vger.kernel.org,
==Oliver Neukum oli...@neukum.org
==Subject: Re: [cdc_ncm] guidance and help refactoring cdc_ncm
==
==On Sun, May 31, 2015 at 04:37:11PM +0200, Enrico Mioso wrote:
== Hello guys.
== I am writing to you all to ask for help and assistance in refactoring the
== cdc_ncm driver to support newer devices.
== In particular - I would need step-by-step guidance in doing this: or any
== other kind of help would be anyway greatly apreciated.
== 
== 1 - What we need:
==   We would need to refactor the driver to be able to re-order parts of the 
NCM
==   package itself.
==   In particular, being a single NCM frame composed of different parts, we 
would
==   need more flexibility in changing their order.
==
==Do you have hardware that needs this now?  What exactly needs to be done
==here that currently doesn't work?

yes - there is hardware that curently doesn't work with the actual code.
In particular, I am referring to Huawei 3G / 4G modems:
- Huawei E3131: will not work with some firmware versions, works with others / 
  olders ones
- Huawei E3372 (LTE modem): will not work.

I received various mail messages from people trying to configure different 
devices that aren't working: and partly the situation is confusing since 
sometimes devices with very similar product names are pretty different, or 
derive from different hardware branches.

Regarding what needs to be done: it's important to note that those devices 
follow an USB specification.
the network control Model spec as found here:
http://www.usb.org/developers/docs/devclass_docs/NCM10_012011.zip
aims to provide more efficient netowkring over USB solutions, batching frames 
for example.
The fundamental packet unit here is the NCM Package: which can hold more 
ethernet-style frames in it. The device and the host will negotiate once 
appropriate some frame characteristics.

The specification doesn't mandate where some parts of the package should be 
placed: infact, they can be put in somewhat arbitrary places.

This is true for the NDP part: we actually as I understand it, are putting it 
near the beginning of the frame.
Some Huawei devices, started to mandate it to be at the end of it, ignoring the 
frame otherwise.

==
== 2 - What might be nice
==   To do so, it would be nice to have the driver queue up frames, sending 
them
==   out as needed. this already happens to a certain extent, but the NCM 
package
==   is created in the process and updated in the while as I understood the 
code.
==   The best thing would be to have the NCM package created only before 
sending
==   it out, to achieve for best performance and code readability.
==
==Would this really make things faster?

Probably it would, depending on the setup we are considering. Considering a 
standard setup where those devices are being connected to a laptop or a PC with 
relatively high resources, this would not make so much difference.
But it's not so unusual to see these devices running coupled with small 
devices: and here this could make the difference in some cases.
But this would not be my main goal: getting things working faster is good, but 
I would like just to see them working now: and so I am trying to gather help / 
information / guidance / code in general so in case I might try if needed in 
the future.

==
== I already contactedprivately some of you to have some more insight on what
== needs to be done, and to understand better how to organize the effort. I
== unfortunately miss the time to do this right now: and infact I can't even be
== sure to be able to do this, due to various problems (my tesis, my life in
== general).
== But gathering more informations and in general trying to get some help is
== the best thing I feel like doing right now.
== 
== The compelling reasons I find for trying to fix the situation are:
== 1 - The fact these drivers are used in different products integrating or
==   interfacing with 3G/4G technologies.
==
==Is there hardware that has out-of-tree drivers that implement what you
==are referring to here?  Or does someone just want this to make the
==hardware work better?
==
==I think we need more specifics before being able to determine exactly
==what needs to be done.
==
==thanks,
==
==greg k-h
==
Thank you Greg for your precious help.

Once again - some devices will just not work.

There is an out-of-tree vendor driver implementing what I am referring to: it 
contains code to work with many different devices from Huawei, but only the NCM 
related parts would be of use in this scenario. Other devices are already 
supported and working in the kernel.
the driver can be found here:

Re: 2.6.32.66 tcp regression OOPs

2015-06-01 Thread Frans Klaver

[cc: Willy Tarreau]

On Mon, Jun 1, 2015 at 3:26 AM,  starlight.201...@binnacle.cx wrote:
 Hello,

 Apoligies if I have submitted to the wrong lists.

 Encountered a regression in
 2.6.32.66 relative to 2.6.32.65.

 Crash eight minutes after boot.

 Will responded with additional details
 if the OOPS is not sufficent.

 Best Regards


Did you bisect it?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pull-request: mac80211-next 2015-05-29

2015-06-01 Thread Johannes Berg

On Sun, 2015-05-31 at 17:35 -0700, David Miller wrote:
 Pulled, but there was a small conflict in include/net/mac80211.h which seemed
 trivially resolvable.  Take a look and send me a fixup if I got it wrong,
 thanks.

Looks good - it seems I managed to apply two different docbook fixes to
the two different trees ... Talk about being confused :)

johannes


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [cdc_ncm] guidance and help refactoring cdc_ncm

2015-06-01 Thread Enrico Mioso

Thank you Oliver, thank you all for reading this thread and the attention.
For a more detailed discussion and how we got here, you might google for the 
thread:
Is this 32-bit NCM?
and
Is this 32-bit NCM?y (follow up).
Where the y letter comes from a mistake of mine.

The specification does only mandate the position of the NTH (header). The rest 
can be in any order and position in general. This will work with most devices: 
except, of course, those from Huawei.
Our aggregate looks something like this from my perspective (anyone correct me 
in case):
NTH: header
NDP: contains indexing informations
ethernet packet 1
ethernet packet 2
...
ethernet packet n;

While it should look like:
NTH: header
ethernet packet 1
ethernet packet 2
...
ethernet packet n;
NDP: contains indexing informations

but, when introducing such a change: you might break other devices now working. 
Infact, clearly there are multiple vendors producing NCM device, as you might 
also see by looking at the dirver's comments.
So in general, we should be able to dynamically change the way in which the 
driver order things in the package.
and that's why I initially proposed the redesign.

thank you guys, for the patience and time.
Enrico
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 21/25] time/posix-clock:Convert to the 64bit methods for k_clock and posix_clock_operations structure

2015-06-01 Thread Baolin Wang

This patch converts the posix clock operations over to the new methods
with timespec64/itimerspec64 type to making them ready for 2038, and
it is based on the ptp patch series.

And also changes to the 64bit methods for k_clock structure, that
converts the timespec/itimerspec type to timespec64/itimerspec64 type.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 drivers/ptp/ptp_clock.c |   26 --
 include/linux/posix-clock.h |   10 +-
 kernel/time/posix-clock.c   |   20 ++--
 3 files changed, 23 insertions(+), 33 deletions(-)

diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index bee8270..8c086e7 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -97,32 +97,24 @@ static s32 scaled_ppm_to_ppb(long ppm)
 
 /* posix clock implementation */
 
-static int ptp_clock_getres(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_getres(struct posix_clock *pc, struct timespec64 *tp)
 {
tp-tv_sec = 0;
tp-tv_nsec = 1;
return 0;
 }
 
-static int ptp_clock_settime(struct posix_clock *pc, const struct timespec *tp)
+static int ptp_clock_settime(struct posix_clock *pc,
+   const struct timespec64 *tp)
 {
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
-   struct timespec64 ts = timespec_to_timespec64(*tp);
-
-   return ptp-info-settime64(ptp-info, ts);
+   return ptp-info-settime64(ptp-info, tp);
 }
 
-static int ptp_clock_gettime(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_gettime(struct posix_clock *pc, struct timespec64 *tp)
 {
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
-   struct timespec64 ts;
-   int err;
-
-   err = ptp-info-gettime64(ptp-info, ts);
-   if (!err)
-   *tp = timespec64_to_timespec(ts);
-
-   return err;
+   return ptp-info-gettime64(ptp-info, tp);
 }
 
 static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx)
@@ -134,8 +126,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct 
timex *tx)
ops = ptp-info;
 
if (tx-modes  ADJ_SETOFFSET) {
-   struct timespec ts;
-   ktime_t kt;
+   struct timespec64 ts;
s64 delta;
 
ts.tv_sec  = tx-time.tv_sec;
@@ -147,8 +138,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct 
timex *tx)
if ((unsigned long) ts.tv_nsec = NSEC_PER_SEC)
return -EINVAL;
 
-   kt = timespec_to_ktime(ts);
-   delta = ktime_to_ns(kt);
+   delta = timespec64_to_ns(ts);
err = ops-adjtime(ops, delta);
} else if (tx-modes  ADJ_FREQUENCY) {
s32 ppb = scaled_ppm_to_ppb(tx-freq);
diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h
index 34c4498..fd7e22c 100644
--- a/include/linux/posix-clock.h
+++ b/include/linux/posix-clock.h
@@ -59,23 +59,23 @@ struct posix_clock_operations {
 
int  (*clock_adjtime)(struct posix_clock *pc, struct timex *tx);
 
-   int  (*clock_gettime)(struct posix_clock *pc, struct timespec *ts);
+   int  (*clock_gettime)(struct posix_clock *pc, struct timespec64 *ts);
 
-   int  (*clock_getres) (struct posix_clock *pc, struct timespec *ts);
+   int  (*clock_getres)(struct posix_clock *pc, struct timespec64 *ts);
 
int  (*clock_settime)(struct posix_clock *pc,
- const struct timespec *ts);
+ const struct timespec64 *ts);
 
int  (*timer_create) (struct posix_clock *pc, struct k_itimer *kit);
 
int  (*timer_delete) (struct posix_clock *pc, struct k_itimer *kit);
 
void (*timer_gettime)(struct posix_clock *pc,
- struct k_itimer *kit, struct itimerspec *tsp);
+ struct k_itimer *kit, struct itimerspec64 *tsp);
 
int  (*timer_settime)(struct posix_clock *pc,
  struct k_itimer *kit, int flags,
- struct itimerspec *tsp, struct itimerspec *old);
+ struct itimerspec64 *tsp, struct itimerspec64 
*old);
/*
 * Optional character device methods:
 */
diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c
index ce033c7..e21e4c1 100644
--- a/kernel/time/posix-clock.c
+++ b/kernel/time/posix-clock.c
@@ -297,7 +297,7 @@ out:
return err;
 }
 
-static int pc_clock_gettime(clockid_t id, struct timespec *ts)
+static int pc_clock_gettime(clockid_t id, struct timespec64 *ts)
 {
struct posix_clock_desc cd;
int err;
@@ -316,7 +316,7 @@ static int pc_clock_gettime(clockid_t id, struct timespec 
*ts)
return err;
 }
 
-static int pc_clock_getres(clockid_t id, struct timespec *ts)
+static int pc_clock_getres(clockid_t id, struct timespec64 *ts)
 {

Re: [PATCH v2] xen: netback: fix printf format string warning

2015-06-01 Thread Wei Liu

On Mon, Jun 01, 2015 at 11:30:04AM +0100, Ian Campbell wrote:
 drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’:
 drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%lu’ expects 
 argument of type ‘long unsigned int’, but argument 5 has type ‘int’ 
 [-Wformat=]
 (txreq.offset~PAGE_MASK) + txreq.size);
 ^
 
 PAGE_MASK's type can vary by arch, so a cast is needed.
 
 Signed-off-by: Ian Campbell ian.campb...@citrix.com

Acked-by: Wei Liu wei.l...@citrix.com

 
 v2: Cast to unsigned long, since PAGE_MASK can vary by arch.
 ---
  drivers/net/xen-netback/netback.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/net/xen-netback/netback.c 
 b/drivers/net/xen-netback/netback.c
 index 4de46aa..0d25943 100644
 --- a/drivers/net/xen-netback/netback.c
 +++ b/drivers/net/xen-netback/netback.c
 @@ -1250,7 +1250,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue 
 *queue,
   netdev_err(queue-vif-dev,
  txreq.offset: %x, size: %u, end: %lu\n,
  txreq.offset, txreq.size,
 -(txreq.offset~PAGE_MASK) + txreq.size);
 +(unsigned long)(txreq.offset~PAGE_MASK) + 
 txreq.size);
   xenvif_fatal_tx_err(queue-vif);
   break;
   }
 -- 
 1.7.10.4
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [cdc_ncm] guidance and help refactoring cdc_ncm

2015-06-01 Thread Oliver Neukum

On Mon, 2015-06-01 at 13:41 +0200, Enrico Mioso wrote:
 Thank you Oliver, thank you all for reading this thread and the attention.
 For a more detailed discussion and how we got here, you might google for the 
 thread:
 Is this 32-bit NCM?
 and
 Is this 32-bit NCM?y (follow up).
 Where the y letter comes from a mistake of mine.

Having read them it looks like the issues of padding and
sequence numbers are open.

 The specification does only mandate the position of the NTH (header). The 
 rest 
 can be in any order and position in general. This will work with most 
 devices: 
 except, of course, those from Huawei.

Indeed. And a redesign for crap devices looks like
a bad idea.

 Our aggregate looks something like this from my perspective (anyone correct 
 me 
 in case):
 NTH: header
 NDP: contains indexing informations
 ethernet packet 1
 ethernet packet 2
 ...
 ethernet packet n;
 
 While it should look like:
 NTH: header
 ethernet packet 1
 ethernet packet 2
 ...
 ethernet packet n;
 NDP: contains indexing informations
 
 but, when introducing such a change: you might break other devices now 
 working. 
 Infact, clearly there are multiple vendors producing NCM device, as you might 
 also see by looking at the dirver's comments.
 So in general, we should be able to dynamically change the way in which the 
 driver order things in the package.
 and that's why I initially proposed the redesign.

OK, so the NDP needs to be at the end. However in the old thread
you state that this requires the NDP to be built between the
final aggregate and physically transmitting. I think this is a false
choice. You could just as well copy the NDP around provided you reserve
enough space at the end of the skb.

Regards
Oliver



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] bnx2x: Move statistics implementation into semaphores

2015-06-01 Thread Yuval Mintz

Commit dff173de84958 (bnx2x: Fix statistics locking scheme) changed the
bnx2x locking around statistics state into using a mutex - but the lock
is being accessed via a timer which is forbidden.

[If compiled with CONFIG_DEBUG_MUTEXES, logs show a warning about
accessing the mutex in interrupt context]

This moves the implementation into using a semaphore [with size '1']
instead.

Signed-off-by: Yuval Mintz yuval.mi...@qlogic.com
Signed-off-by: Ariel Elior ariel.el...@qlogic.com
---
Hi Dave,

Please consider applying this to `net'.

Thanks,
Yuval
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h   |  2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  |  9 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c | 20 ++--
 3 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index a3b0f7a..1f82a04 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -1774,7 +1774,7 @@ struct bnx2x {
int stats_state;
 
/* used for synchronization of concurrent threads statistics handling */
-   struct mutexstats_lock;
+   struct semaphorestats_lock;
 
/* used by dmae command loader */
struct dmae_command stats_dmae;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index fd52ce9..33501bc 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12054,7 +12054,7 @@ static int bnx2x_init_bp(struct bnx2x *bp)
mutex_init(bp-port.phy_mutex);
mutex_init(bp-fw_mb_mutex);
mutex_init(bp-drv_info_mutex);
-   mutex_init(bp-stats_lock);
+   sema_init(bp-stats_lock, 1);
bp-drv_info_mng_owner = false;
 
INIT_DELAYED_WORK(bp-sp_task, bnx2x_sp_task);
@@ -13690,9 +13690,10 @@ static int bnx2x_eeh_nic_unload(struct bnx2x *bp)
cancel_delayed_work_sync(bp-sp_task);
cancel_delayed_work_sync(bp-period_task);
 
-   mutex_lock(bp-stats_lock);
-   bp-stats_state = STATS_STATE_DISABLED;
-   mutex_unlock(bp-stats_lock);
+   if (!down_timeout(bp-stats_lock, HZ / 10)) {
+   bp-stats_state = STATS_STATE_DISABLED;
+   up(bp-stats_lock);
+   }
 
bnx2x_save_statistics(bp);
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c
index 266b055..69d699f0 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c
@@ -1372,19 +1372,23 @@ void bnx2x_stats_handle(struct bnx2x *bp, enum 
bnx2x_stats_event event)
 * that context in case someone is in the middle of a transition.
 * For other events, wait a bit until lock is taken.
 */
-   if (!mutex_trylock(bp-stats_lock)) {
+   if (down_trylock(bp-stats_lock)) {
if (event == STATS_EVENT_UPDATE)
return;
 
DP(BNX2X_MSG_STATS,
   Unlikely stats' lock contention [event %d]\n, event);
-   mutex_lock(bp-stats_lock);
+   if (unlikely(down_timeout(bp-stats_lock, HZ / 10))) {
+   BNX2X_ERR(Failed to take stats lock [event %d]\n,
+ event);
+   return;
+   }
}
 
bnx2x_stats_stm[state][event].action(bp);
bp-stats_state = bnx2x_stats_stm[state][event].next_state;
 
-   mutex_unlock(bp-stats_lock);
+   up(bp-stats_lock);
 
if ((event != STATS_EVENT_UPDATE) || netif_msg_timer(bp))
DP(BNX2X_MSG_STATS, state %d - event %d - state %d\n,
@@ -1970,7 +1974,11 @@ int bnx2x_stats_safe_exec(struct bnx2x *bp,
/* Wait for statistics to end [while blocking further requests],
 * then run supplied function 'safely'.
 */
-   mutex_lock(bp-stats_lock);
+   rc = down_timeout(bp-stats_lock, HZ / 10);
+   if (unlikely(rc)) {
+   BNX2X_ERR(Failed to take statistics lock for safe 
execution\n);
+   goto out_no_lock;
+   }
 
bnx2x_stats_comp(bp);
while (bp-stats_pending  cnt--)
@@ -1988,7 +1996,7 @@ out:
/* No need to restart statistics - if they're enabled, the timer
 * will restart the statistics.
 */
-   mutex_unlock(bp-stats_lock);
-
+   up(bp-stats_lock);
+out_no_lock:
return rc;
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 14/14] sfc: leak vports if a VF is assigned during PF unload

2015-06-01 Thread Shradha Shah

From: Daniel Pieczko dpiec...@solarflare.com

If any VF is assigned as the PF is unloaded, do not attempt to
remove its vport or the vswitch.  These will be removed if the
driver binds to the PF again, as an entity reset occurs during
probe.

A 'force' flag is added to efx_ef10_pci_sriov_disable() to
distinguish between disabling SR-IOV and driver unload.
SR-IOV cannot be disabled if VFs are assigned to guests.

If the PF driver is unloaded while VFs are assigned, the driver
may try to bind to the VF again at a later point if the driver
has been reloaded and the VF returns to the same domain as the PF.
In this case, the PF will not have a VF data structure, so the VF
can check this and drop out of probe early.

In this case, efx-vf_count will be zero but VFs will be present.
The user is advised to remove the VF and re-create it. The check
at the beginning of efx_ef10_pci_sriov_disable() that
efx-vf_count is non-zero is removed to allow SR-IOV to be
disabled in this case. Also, if the PF driver is unloaded, it
will disable SR-IOV to remove these unknown VFs.

By not disabling bus-mastering if VFs are still assigned, the VF
will continue to pass traffic after the PF has been removed.

When using the max_vfs module parameter, if VFs are already
present do not try to initialise any more.

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c   | 20 
 drivers/net/ethernet/sfc/ef10_sriov.c | 35 ---
 drivers/net/ethernet/sfc/ef10_sriov.h |  2 ++
 drivers/net/ethernet/sfc/efx.c|  4 +++-
 4 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index e73d7b5..142fa23 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -680,6 +680,24 @@ static int efx_ef10_probe_pf(struct efx_nic *efx)
 static int efx_ef10_probe_vf(struct efx_nic *efx)
 {
int rc;
+   struct pci_dev *pci_dev_pf;
+
+   /* If the parent PF has no VF data structure, it doesn't know about this
+* VF so fail probe.  The VF needs to be re-created.  This can happen
+* if the PF driver is unloaded while the VF is assigned to a guest.
+*/
+   pci_dev_pf = efx-pci_dev-physfn;
+   if (pci_dev_pf) {
+   struct efx_nic *efx_pf = pci_get_drvdata(pci_dev_pf);
+   struct efx_ef10_nic_data *nic_data_pf = efx_pf-nic_data;
+
+   if (!nic_data_pf-vf) {
+   netif_info(efx, drv, efx-net_dev,
+  The VF cannot link to its parent PF; 
+  please destroy and re-create the VF\n);
+   return -EBUSY;
+   }
+   }
 
rc = efx_ef10_probe(efx);
if (rc)
@@ -697,6 +715,8 @@ static int efx_ef10_probe_vf(struct efx_nic *efx)
struct efx_ef10_nic_data *nic_data = efx-nic_data;
 
nic_data_p-vf[nic_data-vf_index].efx = efx;
+   nic_data_p-vf[nic_data-vf_index].pci_dev =
+   efx-pci_dev;
} else
netif_info(efx, drv, efx-net_dev,
   Could not get the PF id from VF\n);
diff --git a/drivers/net/ethernet/sfc/ef10_sriov.c 
b/drivers/net/ethernet/sfc/ef10_sriov.c
index 41ab18d..6c9b6e4 100644
--- a/drivers/net/ethernet/sfc/ef10_sriov.c
+++ b/drivers/net/ethernet/sfc/ef10_sriov.c
@@ -165,6 +165,11 @@ static void efx_ef10_sriov_free_vf_vports(struct efx_nic 
*efx)
for (i = 0; i  efx-vf_count; i++) {
struct ef10_vf *vf = nic_data-vf + i;
 
+   /* If VF is assigned, do not free the vport  */
+   if (vf-pci_dev 
+   vf-pci_dev-dev_flags  PCI_DEV_FLAGS_ASSIGNED)
+   continue;
+
if (vf-vport_assigned) {
efx_ef10_evb_port_assign(efx, EVB_PORT_ID_NULL, i);
vf-vport_assigned = 0;
@@ -380,7 +385,9 @@ void efx_ef10_vswitching_remove_pf(struct efx_nic *efx)
efx_ef10_vport_free(efx, nic_data-vport_id);
nic_data-vport_id = EVB_PORT_ID_ASSIGNED;
 
-   efx_ef10_vswitch_free(efx, nic_data-vport_id);
+   /* Only free the vswitch if no VFs are assigned */
+   if (!pci_vfs_assigned(efx-pci_dev))
+   efx_ef10_vswitch_free(efx, nic_data-vport_id);
 }
 
 void efx_ef10_vswitching_remove_vf(struct efx_nic *efx)
@@ -413,20 +420,22 @@ fail1:
return rc;
 }
 
-static int efx_ef10_pci_sriov_disable(struct efx_nic *efx)
+static int efx_ef10_pci_sriov_disable(struct efx_nic *efx, bool force)
 {
struct pci_dev *dev = efx-pci_dev;
+   unsigned int vfs_assigned = 0;
 
-   if (!efx-vf_count)
-   return 0;
+   vfs_assigned = pci_vfs_assigned(dev);
 
-   if (pci_vfs_assigned(dev)) {
-   netif_err(efx, drv,

[PATCH net-next v2 13/14] sfc: force removal of VF and vport on driver removal

2015-06-01 Thread Shradha Shah

From: Daniel Pieczko dpiec...@solarflare.com

When the driver unloads, force the unbind and removal of any
VFs in the host with the PF.  The PF cannot remove vports and
vswitches if they are still being used by a VF driver, and when
unloading the sfc driver the removal order is not guaranteed,
so the instruction from the PF to the VF to unbind enforces a
suitable ordering so that vswitches and vports can be removed.

As a result of this, manually unbinding the driver from a single
PF will result in all of its VFs in the host also being removed.

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10_sriov.c | 9 +
 drivers/net/ethernet/sfc/efx.c| 3 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/ef10_sriov.c 
b/drivers/net/ethernet/sfc/ef10_sriov.c
index 083c534..41ab18d 100644
--- a/drivers/net/ethernet/sfc/ef10_sriov.c
+++ b/drivers/net/ethernet/sfc/ef10_sriov.c
@@ -448,11 +448,20 @@ int efx_ef10_sriov_init(struct efx_nic *efx)
 void efx_ef10_sriov_fini(struct efx_nic *efx)
 {
struct efx_ef10_nic_data *nic_data = efx-nic_data;
+   unsigned int i;
int rc;
 
if (!nic_data-vf)
return;
 
+   /* Remove any VFs in the host */
+   for (i = 0; i  efx-vf_count; ++i) {
+   struct efx_nic *vf_efx = nic_data-vf[i].efx;
+
+   if (vf_efx)
+   vf_efx-pci_dev-driver-remove(vf_efx-pci_dev);
+   }
+
rc = efx_ef10_pci_sriov_disable(efx);
if (rc)
netif_dbg(efx, drv, efx-net_dev,
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index fe3481c..6887871 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -2867,7 +2867,8 @@ static void efx_pci_remove_main(struct efx_nic *efx)
 }
 
 /* Final NIC shutdown
- * This is called only at module unload (or hotplug removal).
+ * This is called only at module unload (or hotplug removal).  A PF can call
+ * this on its VFs to ensure they are unbound first.
  */
 static void efx_pci_remove(struct pci_dev *pci_dev)
 {

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 10/14] sfc: suppress vadaptor stats when EVB is not present

2015-06-01 Thread Shradha Shah

From: Daniel Pieczko dpiec...@solarflare.com

The raw_mask array is not initialised, so it needs to be
explicitly set to zero in the 'else' branch.

If the EVB capability is not present, a port cannot have multiple
functions so the per-port MAC stats are correct and should match
the corresponding vadaptor stats, so this redundancy can be
removed from the ethtool stats output.

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c  | 12 +---
 drivers/net/ethernet/sfc/mcdi_pcol.h |  2 ++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index cb4c972..39d0cf1 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -1161,13 +1161,19 @@ static u64 efx_ef10_raw_stat_mask(struct efx_nic *efx)
 
 static void efx_ef10_get_stat_mask(struct efx_nic *efx, unsigned long *mask)
 {
+   struct efx_ef10_nic_data *nic_data = efx-nic_data;
u64 raw_mask[2];
 
raw_mask[0] = efx_ef10_raw_stat_mask(efx);
 
-   /* All functions see the vadaptor stats */
-   raw_mask[0] |= ~((1ULL  EF10_STAT_rx_unicast) - 1);
-   raw_mask[1] = (1ULL  (EF10_STAT_COUNT - 63)) - 1;
+   /* Only show vadaptor stats when EVB capability is present */
+   if (nic_data-datapath_caps 
+   (1  MC_CMD_GET_CAPABILITIES_OUT_EVB_LBN)) {
+   raw_mask[0] |= ~((1ULL  EF10_STAT_rx_unicast) - 1);
+   raw_mask[1] = (1ULL  (EF10_STAT_COUNT - 63)) - 1;
+   } else {
+   raw_mask[1] = 0;
+   }
 
 #if BITS_PER_LONG == 64
mask[0] = raw_mask[0];
diff --git a/drivers/net/ethernet/sfc/mcdi_pcol.h 
b/drivers/net/ethernet/sfc/mcdi_pcol.h
index 181978d..45fca9f 100644
--- a/drivers/net/ethernet/sfc/mcdi_pcol.h
+++ b/drivers/net/ethernet/sfc/mcdi_pcol.h
@@ -5600,6 +5600,8 @@
 #defineMC_CMD_GET_CAPABILITIES_OUT_MCAST_FILTER_CHAINING_WIDTH 1
 #defineMC_CMD_GET_CAPABILITIES_OUT_PM_AND_RXDP_COUNTERS_LBN 27
 #defineMC_CMD_GET_CAPABILITIES_OUT_PM_AND_RXDP_COUNTERS_WIDTH 1
+#defineMC_CMD_GET_CAPABILITIES_OUT_EVB_LBN 30
+#defineMC_CMD_GET_CAPABILITIES_OUT_EVB_WIDTH 1
 /* RxDPCPU firmware id. */
 #define   MC_CMD_GET_CAPABILITIES_OUT_RX_DPCPU_FW_ID_OFST 4
 #define   MC_CMD_GET_CAPABILITIES_OUT_RX_DPCPU_FW_ID_LEN 2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 11/14] sfc: don't update stats on VF when called in atomic context

2015-06-01 Thread Shradha Shah

From: Daniel Pieczko dpiec...@solarflare.com

The ifenslave command to set up a bond runs in an atomic
context, and it queries the stats on the devices that are
being enslaved. A VF needs to make an MCDI call to update
its stats, which is not allowed in atomic context.

The releasing of the stats_lock is moved to the beginning of
the VF stats update function so that in_interrupt() can be
used; it must be taken again before returning from this
function.

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 39d0cf1..e73d7b5 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -1305,11 +1305,24 @@ static int efx_ef10_try_update_nic_stats_vf(struct 
efx_nic *efx)
__le64 *dma_stats;
int rc;
 
+   spin_unlock_bh(efx-stats_lock);
+
+   if (in_interrupt()) {
+   /* If in atomic context, cannot update stats.  Just update the
+* software stats and return so the caller can continue.
+*/
+   spin_lock_bh(efx-stats_lock);
+   efx_update_sw_stats(efx, stats);
+   return 0;
+   }
+
efx_ef10_get_stat_mask(efx, mask);
 
rc = efx_nic_alloc_buffer(efx, stats_buf, dma_len, GFP_ATOMIC);
-   if (rc)
+   if (rc) {
+   spin_lock_bh(efx-stats_lock);
return rc;
+   }
 
dma_stats = stats_buf.addr;
dma_stats[MC_CMD_MAC_GENERATION_END] = EFX_MC_STATS_GENERATION_INVALID;
@@ -1320,7 +1333,6 @@ static int efx_ef10_try_update_nic_stats_vf(struct 
efx_nic *efx)
MCDI_SET_DWORD(inbuf, MAC_STATS_IN_DMA_LEN, dma_len);
MCDI_SET_DWORD(inbuf, MAC_STATS_IN_PORT_ID, EVB_PORT_ID_ASSIGNED);
 
-   spin_unlock_bh(efx-stats_lock);
rc = efx_mcdi_rpc_quiet(efx, MC_CMD_MAC_STATS, inbuf, sizeof(inbuf),
NULL, 0, NULL);
spin_lock_bh(efx-stats_lock);

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 12/14] sfc: do not allow VFs to be destroyed if assigned to guests

2015-06-01 Thread Shradha Shah

From: Daniel Pieczko dpiec...@solarflare.com

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10_sriov.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/sfc/ef10_sriov.c 
b/drivers/net/ethernet/sfc/ef10_sriov.c
index cd52454..083c534 100644
--- a/drivers/net/ethernet/sfc/ef10_sriov.c
+++ b/drivers/net/ethernet/sfc/ef10_sriov.c
@@ -417,6 +417,15 @@ static int efx_ef10_pci_sriov_disable(struct efx_nic *efx)
 {
struct pci_dev *dev = efx-pci_dev;
 
+   if (!efx-vf_count)
+   return 0;
+
+   if (pci_vfs_assigned(dev)) {
+   netif_err(efx, drv, efx-net_dev, VFs are assigned to guests; 
+ please detach them before disabling SR-IOV\n);
+   return -EBUSY;
+   }
+
pci_disable_sriov(dev);
efx_ef10_sriov_free_vf_vswitching(efx);
efx-vf_count = 0;

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 00/25] Convert the posix_clock_operations and k_clock structure to ready for 2038

2015-06-01 Thread Baolin Wang

This patch series changes the 32-bit time types (timespec/itimerspec) to
the 64-bit types (timespec64/itimerspec64), since 32-bit time types will
break in the year 2038.

This patch series introduces new methods with timespec64/itimerspec64 type,
and removes the old ones with timespec/itimerspec type for 
posix_clock_operations
and k_clock structure.

Baolin Wang (25):
  time:Introduce struct itimerspec64
  timekeeping:Introduce the current_kernel_time64()
  hrtimer:Introduce hrtimer_get_res64()
  security: Introduce security_settime64()
  time:Introduce the do_sys_settimeofday64()
  posix-timers:Introduce {get,put}_timespec/{get,put}_itimerspec
  posix-timers: Split up timer_gettime()/timer_settime()/clock_settime()/
clock_gettime()/clock_getres().
  posix-timers: Convert timer_gettime()/timer_settime()/clock_settime()/
clock_gettime()/clock_getres() to timespec64/itimerspec64.
  mmtimer:Convert to timespec64/itimerspec64
  alarmtimer:Convert to timespec64/itimerspec64
  posix-clock:Convert to timespec64/itimerspec64
  time:Introduce timespec64_to_jiffies()/jiffies_to_timespec64()
  cputime:Introduce cputime_to_timespec64()/timespec64_to_cputime()
  posix-cpu-timers:Convert to timespec64/itimerspec64
  k_clock:Remove timespec/itimerspec

 arch/powerpc/include/asm/cputime.h|6 +-
 arch/s390/include/asm/cputime.h   |8 +-
 drivers/char/mmtimer.c|   36 +++--
 drivers/ptp/ptp_clock.c   |   26 +---
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/asm-generic/cputime_nsecs.h   |4 +-
 include/linux/cputime.h   |   16 ++
 include/linux/hrtimer.h   |   16 +-
 include/linux/jiffies.h   |   21 ++-
 include/linux/posix-clock.h   |   10 +-
 include/linux/posix-timers.h  |   18 +--
 include/linux/security.h  |   25 +++-
 include/linux/time64.h|   35 +
 include/linux/timekeeping.h   |   26 +++-
 kernel/time/alarmtimer.c  |   43 +++---
 kernel/time/hrtimer.c |   10 +-
 kernel/time/posix-clock.c |   20 +--
 kernel/time/posix-cpu-timers.c|   84 ++-
 kernel/time/posix-timers.c|  259 +
 kernel/time/time.c|   20 +--
 kernel/time/timekeeping.c |6 +-
 kernel/time/timekeeping.h |1 -
 security/commoncap.c  |2 +-
 security/security.c   |2 +-
 24 files changed, 437 insertions(+), 267 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 07/14] sfc: DMA the VF stats only when requested

2015-06-01 Thread Shradha Shah

From: Daniel Pieczko dpiec...@solarflare.com

Firmware does not support a periodic DMA of vadaptor-stats
on VFs, so only update the stats buffer when stats are
requested (when running ethtool -S or an ip/ifconfig
command that reports stats).

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c  | 149 +--
 drivers/net/ethernet/sfc/mcdi_pcol.h |   4 +-
 2 files changed, 112 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 554aff4..323ca47 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -1189,7 +1189,50 @@ static size_t efx_ef10_describe_stats(struct efx_nic 
*efx, u8 *names)
  mask, names);
 }
 
-static int efx_ef10_try_update_nic_stats(struct efx_nic *efx)
+static size_t efx_ef10_update_stats_common(struct efx_nic *efx, u64 
*full_stats,
+  struct rtnl_link_stats64 *core_stats)
+{
+   DECLARE_BITMAP(mask, EF10_STAT_COUNT);
+   struct efx_ef10_nic_data *nic_data = efx-nic_data;
+   u64 *stats = nic_data-stats;
+   size_t stats_count = 0, index;
+
+   efx_ef10_get_stat_mask(efx, mask);
+
+   if (full_stats) {
+   for_each_set_bit(index, mask, EF10_STAT_COUNT) {
+   if (efx_ef10_stat_desc[index].name) {
+   *full_stats++ = stats[index];
+   ++stats_count;
+   }
+   }
+   }
+
+   if (core_stats) {
+   core_stats-rx_packets = stats[EF10_STAT_port_rx_packets];
+   core_stats-tx_packets = stats[EF10_STAT_port_tx_packets];
+   core_stats-rx_bytes = stats[EF10_STAT_port_rx_bytes];
+   core_stats-tx_bytes = stats[EF10_STAT_port_tx_bytes];
+   core_stats-rx_dropped = stats[EF10_STAT_port_rx_nodesc_drops] +
+stats[GENERIC_STAT_rx_nodesc_trunc] +
+stats[GENERIC_STAT_rx_noskb_drops];
+   core_stats-multicast = stats[EF10_STAT_port_rx_multicast];
+   core_stats-rx_length_errors =
+   stats[EF10_STAT_port_rx_gtjumbo] +
+   stats[EF10_STAT_port_rx_length_error];
+   core_stats-rx_crc_errors = stats[EF10_STAT_port_rx_bad];
+   core_stats-rx_frame_errors =
+   stats[EF10_STAT_port_rx_align_error];
+   core_stats-rx_fifo_errors = stats[EF10_STAT_port_rx_overflow];
+   core_stats-rx_errors = (core_stats-rx_length_errors +
+core_stats-rx_crc_errors +
+core_stats-rx_frame_errors);
+   }
+
+   return stats_count;
+}
+
+static int efx_ef10_try_update_nic_stats_pf(struct efx_nic *efx)
 {
struct efx_ef10_nic_data *nic_data = efx-nic_data;
DECLARE_BITMAP(mask, EF10_STAT_COUNT);
@@ -1226,57 +1269,83 @@ static int efx_ef10_try_update_nic_stats(struct efx_nic 
*efx)
 }
 
 
-static size_t efx_ef10_update_stats(struct efx_nic *efx, u64 *full_stats,
-   struct rtnl_link_stats64 *core_stats)
+static size_t efx_ef10_update_stats_pf(struct efx_nic *efx, u64 *full_stats,
+  struct rtnl_link_stats64 *core_stats)
 {
-   DECLARE_BITMAP(mask, EF10_STAT_COUNT);
-   struct efx_ef10_nic_data *nic_data = efx-nic_data;
-   u64 *stats = nic_data-stats;
-   size_t stats_count = 0, index;
int retry;
 
-   efx_ef10_get_stat_mask(efx, mask);
-
/* If we're unlucky enough to read statistics during the DMA, wait
 * up to 10ms for it to finish (typically takes 500us)
 */
for (retry = 0; retry  100; ++retry) {
-   if (efx_ef10_try_update_nic_stats(efx) == 0)
+   if (efx_ef10_try_update_nic_stats_pf(efx) == 0)
break;
udelay(100);
}
 
-   if (full_stats) {
-   for_each_set_bit(index, mask, EF10_STAT_COUNT) {
-   if (efx_ef10_stat_desc[index].name) {
-   *full_stats++ = stats[index];
-   ++stats_count;
-   }
-   }
-   }
+   return efx_ef10_update_stats_common(efx, full_stats, core_stats);
+}
 
-   if (core_stats) {
-   core_stats-rx_packets = stats[EF10_STAT_port_rx_packets];
-   core_stats-tx_packets = stats[EF10_STAT_port_tx_packets];
-   core_stats-rx_bytes = stats[EF10_STAT_port_rx_bytes];
-   core_stats-tx_bytes = stats[EF10_STAT_port_tx_bytes];
-   core_stats-rx_dropped = stats[EF10_STAT_port_rx_nodesc_drops] +
-stats[GENERIC_STAT_rx_nodesc_trunc] +
-

[PATCH net-next v2 06/14] sfc: display vadaptor statistics for all interfaces

2015-06-01 Thread Shradha Shah

From: Daniel Pieczko dpiec...@solarflare.com

All interfaces will display vadaptor statistics, so set all the
relevant bits in the stats bitmask. Only functions with the
LINKCTRL flag will see other stats, including (per-port) MAC stats.

The vadaptor stats are from rx_unicast to tx_overflow.

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c  | 39 
 drivers/net/ethernet/sfc/mcdi_pcol.h | 20 ++
 drivers/net/ethernet/sfc/nic.h   | 18 +
 3 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index a574dd3..554aff4 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -1045,6 +1045,24 @@ static const struct efx_hw_stat_desc 
efx_ef10_stat_desc[EF10_STAT_COUNT] = {
EF10_DMA_STAT(port_rx_dp_streaming_packets, RXDP_STREAMING_PKTS),
EF10_DMA_STAT(port_rx_dp_hlb_fetch, RXDP_HLB_FETCH_CONDITIONS),
EF10_DMA_STAT(port_rx_dp_hlb_wait, RXDP_HLB_WAIT_CONDITIONS),
+   EF10_DMA_STAT(rx_unicast, VADAPTER_RX_UNICAST_PACKETS),
+   EF10_DMA_STAT(rx_unicast_bytes, VADAPTER_RX_UNICAST_BYTES),
+   EF10_DMA_STAT(rx_multicast, VADAPTER_RX_MULTICAST_PACKETS),
+   EF10_DMA_STAT(rx_multicast_bytes, VADAPTER_RX_MULTICAST_BYTES),
+   EF10_DMA_STAT(rx_broadcast, VADAPTER_RX_BROADCAST_PACKETS),
+   EF10_DMA_STAT(rx_broadcast_bytes, VADAPTER_RX_BROADCAST_BYTES),
+   EF10_DMA_STAT(rx_bad, VADAPTER_RX_BAD_PACKETS),
+   EF10_DMA_STAT(rx_bad_bytes, VADAPTER_RX_BAD_BYTES),
+   EF10_DMA_STAT(rx_overflow, VADAPTER_RX_OVERFLOW),
+   EF10_DMA_STAT(tx_unicast, VADAPTER_TX_UNICAST_PACKETS),
+   EF10_DMA_STAT(tx_unicast_bytes, VADAPTER_TX_UNICAST_BYTES),
+   EF10_DMA_STAT(tx_multicast, VADAPTER_TX_MULTICAST_PACKETS),
+   EF10_DMA_STAT(tx_multicast_bytes, VADAPTER_TX_MULTICAST_BYTES),
+   EF10_DMA_STAT(tx_broadcast, VADAPTER_TX_BROADCAST_PACKETS),
+   EF10_DMA_STAT(tx_broadcast_bytes, VADAPTER_TX_BROADCAST_BYTES),
+   EF10_DMA_STAT(tx_bad, VADAPTER_TX_BAD_PACKETS),
+   EF10_DMA_STAT(tx_bad_bytes, VADAPTER_TX_BAD_BYTES),
+   EF10_DMA_STAT(tx_overflow, VADAPTER_TX_OVERFLOW),
 };
 
 #define HUNT_COMMON_STAT_MASK ((1ULL  EF10_STAT_port_tx_bytes) | \
@@ -1125,6 +1143,10 @@ static u64 efx_ef10_raw_stat_mask(struct efx_nic *efx)
u32 port_caps = efx_mcdi_phy_get_caps(efx);
struct efx_ef10_nic_data *nic_data = efx-nic_data;
 
+   if (!(efx-mcdi-fn_flags 
+ 1  MC_CMD_DRV_ATTACH_EXT_OUT_FLAG_LINKCTRL))
+   return 0;
+
if (port_caps  (1  MC_CMD_PHY_CAP_4FDX_LBN))
raw_mask |= HUNT_40G_EXTRA_STAT_MASK;
else
@@ -1139,13 +1161,22 @@ static u64 efx_ef10_raw_stat_mask(struct efx_nic *efx)
 
 static void efx_ef10_get_stat_mask(struct efx_nic *efx, unsigned long *mask)
 {
-   u64 raw_mask = efx_ef10_raw_stat_mask(efx);
+   u64 raw_mask[2];
+
+   raw_mask[0] = efx_ef10_raw_stat_mask(efx);
+
+   /* All functions see the vadaptor stats */
+   raw_mask[0] |= ~((1ULL  EF10_STAT_rx_unicast) - 1);
+   raw_mask[1] = (1ULL  (EF10_STAT_COUNT - 63)) - 1;
 
 #if BITS_PER_LONG == 64
-   mask[0] = raw_mask;
+   mask[0] = raw_mask[0];
+   mask[1] = raw_mask[1];
 #else
-   mask[0] = raw_mask  0x;
-   mask[1] = raw_mask  32;
+   mask[0] = raw_mask[0]  0x;
+   mask[1] = raw_mask[0]  32;
+   mask[2] = raw_mask[1]  0x;
+   mask[3] = raw_mask[1]  32;
 #endif
 }
 
diff --git a/drivers/net/ethernet/sfc/mcdi_pcol.h 
b/drivers/net/ethernet/sfc/mcdi_pcol.h
index 1e11bb8..0e497b3 100644
--- a/drivers/net/ethernet/sfc/mcdi_pcol.h
+++ b/drivers/net/ethernet/sfc/mcdi_pcol.h
@@ -2896,6 +2896,26 @@
  * descriptor fetch. Valid for EF10 with PM_AND_RXDP_COUNTERS capability only.
  */
 #define  MC_CMD_MAC_RXDP_HLB_WAIT_CONDITIONS  0x48
+#define  MC_CMD_MAC_VADAPTER_RX_DMABUF_START  0x4c /* enum */
+#define  MC_CMD_MAC_VADAPTER_RX_UNICAST_PACKETS  0x4c /* enum */
+#define  MC_CMD_MAC_VADAPTER_RX_UNICAST_BYTES  0x4d /* enum */
+#define  MC_CMD_MAC_VADAPTER_RX_MULTICAST_PACKETS  0x4e /* enum */
+#define  MC_CMD_MAC_VADAPTER_RX_MULTICAST_BYTES  0x4f /* enum */
+#define  MC_CMD_MAC_VADAPTER_RX_BROADCAST_PACKETS  0x50 /* enum */
+#define  MC_CMD_MAC_VADAPTER_RX_BROADCAST_BYTES  0x51 /* enum */
+#define  MC_CMD_MAC_VADAPTER_RX_BAD_PACKETS  0x52 /* enum */
+#define  MC_CMD_MAC_VADAPTER_RX_BAD_BYTES  0x53 /* enum */
+#define  MC_CMD_MAC_VADAPTER_RX_OVERFLOW  0x54 /* enum */
+#define  MC_CMD_MAC_VADAPTER_TX_DMABUF_START  0x57 /* enum */
+#define  MC_CMD_MAC_VADAPTER_TX_UNICAST_PACKETS  0x57 /* enum */
+#define  MC_CMD_MAC_VADAPTER_TX_UNICAST_BYTES  0x58 /* enum */
+#define  MC_CMD_MAC_VADAPTER_TX_MULTICAST_PACKETS  0x59 /*

[PATCH net-next v2 08/14] sfc: update netdevice statistics to use vadaptor stats

2015-06-01 Thread Shradha Shah

From: Daniel Pieczko dpiec...@solarflare.com

The netdevice statistics (in /proc/net/dev) are per-function
stats so they must use the vadaptor stats. Change the use of
MAC stats to vadaptor stats, and remove any statistics that
can only be measured per-port.  All stats that are removed
will be shown as zeroes when these statistics are displayed.

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c | 41 ++---
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 323ca47..99bf296 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -1209,24 +1209,25 @@ static size_t efx_ef10_update_stats_common(struct 
efx_nic *efx, u64 *full_stats,
}
 
if (core_stats) {
-   core_stats-rx_packets = stats[EF10_STAT_port_rx_packets];
-   core_stats-tx_packets = stats[EF10_STAT_port_tx_packets];
-   core_stats-rx_bytes = stats[EF10_STAT_port_rx_bytes];
-   core_stats-tx_bytes = stats[EF10_STAT_port_tx_bytes];
-   core_stats-rx_dropped = stats[EF10_STAT_port_rx_nodesc_drops] +
-stats[GENERIC_STAT_rx_nodesc_trunc] +
+   core_stats-rx_packets = stats[EF10_STAT_rx_unicast] +
+stats[EF10_STAT_rx_multicast] +
+stats[EF10_STAT_rx_broadcast];
+   core_stats-tx_packets = stats[EF10_STAT_tx_unicast] +
+stats[EF10_STAT_tx_multicast] +
+stats[EF10_STAT_tx_broadcast];
+   core_stats-rx_bytes = stats[EF10_STAT_rx_unicast_bytes] +
+  stats[EF10_STAT_rx_multicast_bytes] +
+  stats[EF10_STAT_rx_broadcast_bytes];
+   core_stats-tx_bytes = stats[EF10_STAT_tx_unicast_bytes] +
+  stats[EF10_STAT_tx_multicast_bytes] +
+  stats[EF10_STAT_tx_broadcast_bytes];
+   core_stats-rx_dropped = stats[GENERIC_STAT_rx_nodesc_trunc] +
 stats[GENERIC_STAT_rx_noskb_drops];
-   core_stats-multicast = stats[EF10_STAT_port_rx_multicast];
-   core_stats-rx_length_errors =
-   stats[EF10_STAT_port_rx_gtjumbo] +
-   stats[EF10_STAT_port_rx_length_error];
-   core_stats-rx_crc_errors = stats[EF10_STAT_port_rx_bad];
-   core_stats-rx_frame_errors =
-   stats[EF10_STAT_port_rx_align_error];
-   core_stats-rx_fifo_errors = stats[EF10_STAT_port_rx_overflow];
-   core_stats-rx_errors = (core_stats-rx_length_errors +
-core_stats-rx_crc_errors +
-core_stats-rx_frame_errors);
+   core_stats-multicast = stats[EF10_STAT_rx_multicast];
+   core_stats-rx_crc_errors = stats[EF10_STAT_rx_bad];
+   core_stats-rx_fifo_errors = stats[EF10_STAT_rx_overflow];
+   core_stats-rx_errors = core_stats-rx_crc_errors;
+   core_stats-tx_errors = stats[EF10_STAT_tx_bad];
}
 
return stats_count;
@@ -1309,7 +1310,7 @@ static int efx_ef10_try_update_nic_stats_vf(struct 
efx_nic *efx)
 
MCDI_SET_QWORD(inbuf, MAC_STATS_IN_DMA_ADDR, stats_buf.dma_addr);
MCDI_POPULATE_DWORD_1(inbuf, MAC_STATS_IN_CMD,
- MAC_STATS_IN_DMA, true);
+ MAC_STATS_IN_DMA, 1);
MCDI_SET_DWORD(inbuf, MAC_STATS_IN_DMA_LEN, dma_len);
MCDI_SET_DWORD(inbuf, MAC_STATS_IN_PORT_ID, EVB_PORT_ID_ASSIGNED);
 
@@ -1321,8 +1322,10 @@ static int efx_ef10_try_update_nic_stats_vf(struct 
efx_nic *efx)
goto out;
 
generation_end = dma_stats[MC_CMD_MAC_GENERATION_END];
-   if (generation_end == EFX_MC_STATS_GENERATION_INVALID)
+   if (generation_end == EFX_MC_STATS_GENERATION_INVALID) {
+   WARN_ON_ONCE(1);
goto out;
+   }
rmb();
efx_nic_update_stats(efx_ef10_stat_desc, EF10_STAT_COUNT, mask,
 stats, stats_buf.addr, false);

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 03/14] sfc: Implement ndo_gets_phys_port_id() for EF10 VFs

2015-06-01 Thread Shradha Shah

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c   | 11 +++
 drivers/net/ethernet/sfc/ef10_sriov.c | 14 ++
 drivers/net/ethernet/sfc/ef10_sriov.h |  3 +++
 drivers/net/ethernet/sfc/efx.c|  1 +
 drivers/net/ethernet/sfc/net_driver.h |  2 ++
 drivers/net/ethernet/sfc/nic.h|  1 +
 drivers/net/ethernet/sfc/sriov.c  | 11 +++
 drivers/net/ethernet/sfc/sriov.h  |  2 ++
 8 files changed, 45 insertions(+)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 389a45d..714e7cf 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -405,6 +405,16 @@ static int efx_ef10_probe(struct efx_nic *efx)
 
efx_ptp_probe(efx, NULL);
 
+#ifdef CONFIG_SFC_SRIOV
+   if ((efx-pci_dev-physfn)  (!efx-pci_dev-is_physfn)) {
+   struct pci_dev *pci_dev_pf = efx-pci_dev-physfn;
+   struct efx_nic *efx_pf = pci_get_drvdata(pci_dev_pf);
+
+   efx_pf-type-get_mac_address(efx_pf, nic_data-port_id);
+   } else
+#endif
+   ether_addr_copy(nic_data-port_id, efx-net_dev-perm_addr);
+
return 0;
 
 fail5:
@@ -4139,6 +4149,7 @@ const struct efx_nic_type efx_hunt_a0_vf_nic_type = {
.vswitching_probe = efx_ef10_vswitching_probe_vf,
.vswitching_restore = efx_ef10_vswitching_restore_vf,
.vswitching_remove = efx_ef10_vswitching_remove_vf,
+   .sriov_get_phys_port_id = efx_ef10_sriov_get_phys_port_id,
 #endif
.get_mac_address = efx_ef10_get_mac_address_vf,
.set_mac_address = efx_ef10_set_mac_address,
diff --git a/drivers/net/ethernet/sfc/ef10_sriov.c 
b/drivers/net/ethernet/sfc/ef10_sriov.c
index 3969b1b..cd52454 100644
--- a/drivers/net/ethernet/sfc/ef10_sriov.c
+++ b/drivers/net/ethernet/sfc/ef10_sriov.c
@@ -736,3 +736,17 @@ int efx_ef10_sriov_get_vf_config(struct efx_nic *efx, int 
vf_i,
 
return 0;
 }
+
+int efx_ef10_sriov_get_phys_port_id(struct efx_nic *efx,
+   struct netdev_phys_item_id *ppid)
+{
+   struct efx_ef10_nic_data *nic_data = efx-nic_data;
+
+   if (!is_valid_ether_addr(nic_data-port_id))
+   return -EOPNOTSUPP;
+
+   ppid-id_len = ETH_ALEN;
+   memcpy(ppid-id, nic_data-port_id, ppid-id_len);
+
+   return 0;
+}
diff --git a/drivers/net/ethernet/sfc/ef10_sriov.h 
b/drivers/net/ethernet/sfc/ef10_sriov.h
index b985576..ffc92a5 100644
--- a/drivers/net/ethernet/sfc/ef10_sriov.h
+++ b/drivers/net/ethernet/sfc/ef10_sriov.h
@@ -54,6 +54,9 @@ int efx_ef10_sriov_get_vf_config(struct efx_nic *efx, int 
vf_i,
 int efx_ef10_sriov_set_vf_link_state(struct efx_nic *efx, int vf_i,
 int link_state);
 
+int efx_ef10_sriov_get_phys_port_id(struct efx_nic *efx,
+   struct netdev_phys_item_id *ppid);
+
 int efx_ef10_vswitching_probe_pf(struct efx_nic *efx);
 int efx_ef10_vswitching_probe_vf(struct efx_nic *efx);
 int efx_ef10_vswitching_restore_pf(struct efx_nic *efx);
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 9eafa39..fe3481c 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -2282,6 +2282,7 @@ static const struct net_device_ops efx_netdev_ops = {
.ndo_set_vf_spoofchk= efx_sriov_set_vf_spoofchk,
.ndo_get_vf_config  = efx_sriov_get_vf_config,
.ndo_set_vf_link_state  = efx_sriov_set_vf_link_state,
+   .ndo_get_phys_port_id   = efx_sriov_get_phys_port_id,
 #endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = efx_netpoll,
diff --git a/drivers/net/ethernet/sfc/net_driver.h 
b/drivers/net/ethernet/sfc/net_driver.h
index a468a22..d72f522 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1350,6 +1350,8 @@ struct efx_nic_type {
   struct ifla_vf_info *ivi);
int (*sriov_set_vf_link_state)(struct efx_nic *efx, int vf_i,
   int link_state);
+   int (*sriov_get_phys_port_id)(struct efx_nic *efx,
+ struct netdev_phys_item_id *ppid);
int (*vswitching_probe)(struct efx_nic *efx);
int (*vswitching_restore)(struct efx_nic *efx);
void (*vswitching_remove)(struct efx_nic *efx);
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index db8562e..e146e30 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -524,6 +524,7 @@ struct efx_ef10_nic_data {
unsigned int vport_id;
bool must_probe_vswitching;
unsigned int pf_index;
+   u8 port_id[ETH_ALEN];
 #ifdef CONFIG_SFC_SRIOV
unsigned int vf_index;
struct ef10_vf *vf;
diff --git a/drivers/net/ethernet/sfc/sriov.c b/drivers/net/ethernet/sfc/sriov.c
index 6c5edbd..816c446 100644
--- a/drivers/net/ethernet/sfc/sriov.c

[PATCH net-next v2 05/14] sfc: set the port-id when calling MC_CMD_MAC_STATS

2015-06-01 Thread Shradha Shah

From: Daniel Pieczko dpiec...@solarflare.com

The port-id must be known so that the RMON level can be
set for the collection of vadapter stats.

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/mcdi_port.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/sfc/mcdi_port.c 
b/drivers/net/ethernet/sfc/mcdi_port.c
index 9bf04cb..fffc348 100644
--- a/drivers/net/ethernet/sfc/mcdi_port.c
+++ b/drivers/net/ethernet/sfc/mcdi_port.c
@@ -924,6 +924,7 @@ enum efx_stats_action {
 static int efx_mcdi_mac_stats(struct efx_nic *efx,
  enum efx_stats_action action, int clear)
 {
+   struct efx_ef10_nic_data *nic_data = efx-nic_data;
MCDI_DECLARE_BUF(inbuf, MC_CMD_MAC_STATS_IN_LEN);
int rc;
int change = action == EFX_STATS_PULL ? 0 : 1;
@@ -945,6 +946,7 @@ static int efx_mcdi_mac_stats(struct efx_nic *efx,
  MAC_STATS_IN_PERIODIC_NOEVENT, 1,
  MAC_STATS_IN_PERIOD_MS, period);
MCDI_SET_DWORD(inbuf, MAC_STATS_IN_DMA_LEN, dma_len);
+   MCDI_SET_DWORD(inbuf, MAC_STATS_IN_PORT_ID, nic_data-vport_id);
 
rc = efx_mcdi_rpc(efx, MC_CMD_MAC_STATS, inbuf, sizeof(inbuf),
  NULL, 0, NULL);

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 09/14] sfc: suppress ENOENT error messages from MC_CMD_MAC_STATS

2015-06-01 Thread Shradha Shah

From: Daniel Pieczko dpiec...@solarflare.com

MC_CMD_MAC_STATS can be called on a function before a
vadaptor has been created, as the kernel can call into this
through ndo_get_stats/ndo_get_stats64.

If MC_CMD_MAC_STATS is called before the DMA queues have been
setup, so that a vadaptor has not been created yet, firmware
will return ENOENT. This is expected, so suppress the MCDI
error message in this case.

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c  | 11 ---
 drivers/net/ethernet/sfc/mcdi_port.c |  8 ++--
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 99bf296..cb4c972 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -1315,11 +1315,16 @@ static int efx_ef10_try_update_nic_stats_vf(struct 
efx_nic *efx)
MCDI_SET_DWORD(inbuf, MAC_STATS_IN_PORT_ID, EVB_PORT_ID_ASSIGNED);
 
spin_unlock_bh(efx-stats_lock);
-   rc = efx_mcdi_rpc(efx, MC_CMD_MAC_STATS, inbuf, sizeof(inbuf), NULL,
- 0, NULL);
+   rc = efx_mcdi_rpc_quiet(efx, MC_CMD_MAC_STATS, inbuf, sizeof(inbuf),
+   NULL, 0, NULL);
spin_lock_bh(efx-stats_lock);
-   if (rc)
+   if (rc) {
+   /* Expect ENOENT if DMA queues have not been set up */
+   if (rc != -ENOENT || atomic_read(efx-active_queues))
+   efx_mcdi_display_error(efx, MC_CMD_MAC_STATS,
+  sizeof(inbuf), NULL, 0, rc);
goto out;
+   }
 
generation_end = dma_stats[MC_CMD_MAC_GENERATION_END];
if (generation_end == EFX_MC_STATS_GENERATION_INVALID) {
diff --git a/drivers/net/ethernet/sfc/mcdi_port.c 
b/drivers/net/ethernet/sfc/mcdi_port.c
index fffc348..7f295c4 100644
--- a/drivers/net/ethernet/sfc/mcdi_port.c
+++ b/drivers/net/ethernet/sfc/mcdi_port.c
@@ -948,8 +948,12 @@ static int efx_mcdi_mac_stats(struct efx_nic *efx,
MCDI_SET_DWORD(inbuf, MAC_STATS_IN_DMA_LEN, dma_len);
MCDI_SET_DWORD(inbuf, MAC_STATS_IN_PORT_ID, nic_data-vport_id);
 
-   rc = efx_mcdi_rpc(efx, MC_CMD_MAC_STATS, inbuf, sizeof(inbuf),
- NULL, 0, NULL);
+   rc = efx_mcdi_rpc_quiet(efx, MC_CMD_MAC_STATS, inbuf, sizeof(inbuf),
+   NULL, 0, NULL);
+   /* Expect ENOENT if DMA queues have not been set up */
+   if (rc  (rc != -ENOENT || atomic_read(efx-active_queues)))
+   efx_mcdi_display_error(efx, MC_CMD_MAC_STATS, sizeof(inbuf),
+  NULL, 0, rc);
return rc;
 }
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 04/14] sfc: add port_ prefix to MAC stats

2015-06-01 Thread Shradha Shah

From: Daniel Pieczko dpiec...@solarflare.com

The MAC stats are per-port and will only be displayed on the PF
with control of the link (one per physical port). Vadapter stats
will also be displayed for this PF, so distinguish the MAC stats
by adding a prefix of port_.

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c  | 251 ++-
 drivers/net/ethernet/sfc/mcdi_pcol.h |   4 +-
 drivers/net/ethernet/sfc/nic.h   | 106 +++
 3 files changed, 182 insertions(+), 179 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 714e7cf..a574dd3 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -990,93 +990,94 @@ static int efx_ef10_reset(struct efx_nic *efx, enum 
reset_type reset_type)
[GENERIC_STAT_ ## ext_name] = { #ext_name, 0, 0 }
 
 static const struct efx_hw_stat_desc efx_ef10_stat_desc[EF10_STAT_COUNT] = {
-   EF10_DMA_STAT(tx_bytes, TX_BYTES),
-   EF10_DMA_STAT(tx_packets, TX_PKTS),
-   EF10_DMA_STAT(tx_pause, TX_PAUSE_PKTS),
-   EF10_DMA_STAT(tx_control, TX_CONTROL_PKTS),
-   EF10_DMA_STAT(tx_unicast, TX_UNICAST_PKTS),
-   EF10_DMA_STAT(tx_multicast, TX_MULTICAST_PKTS),
-   EF10_DMA_STAT(tx_broadcast, TX_BROADCAST_PKTS),
-   EF10_DMA_STAT(tx_lt64, TX_LT64_PKTS),
-   EF10_DMA_STAT(tx_64, TX_64_PKTS),
-   EF10_DMA_STAT(tx_65_to_127, TX_65_TO_127_PKTS),
-   EF10_DMA_STAT(tx_128_to_255, TX_128_TO_255_PKTS),
-   EF10_DMA_STAT(tx_256_to_511, TX_256_TO_511_PKTS),
-   EF10_DMA_STAT(tx_512_to_1023, TX_512_TO_1023_PKTS),
-   EF10_DMA_STAT(tx_1024_to_15xx, TX_1024_TO_15XX_PKTS),
-   EF10_DMA_STAT(tx_15xx_to_jumbo, TX_15XX_TO_JUMBO_PKTS),
-   EF10_DMA_STAT(rx_bytes, RX_BYTES),
-   EF10_DMA_INVIS_STAT(rx_bytes_minus_good_bytes, RX_BAD_BYTES),
-   EF10_OTHER_STAT(rx_good_bytes),
-   EF10_OTHER_STAT(rx_bad_bytes),
-   EF10_DMA_STAT(rx_packets, RX_PKTS),
-   EF10_DMA_STAT(rx_good, RX_GOOD_PKTS),
-   EF10_DMA_STAT(rx_bad, RX_BAD_FCS_PKTS),
-   EF10_DMA_STAT(rx_pause, RX_PAUSE_PKTS),
-   EF10_DMA_STAT(rx_control, RX_CONTROL_PKTS),
-   EF10_DMA_STAT(rx_unicast, RX_UNICAST_PKTS),
-   EF10_DMA_STAT(rx_multicast, RX_MULTICAST_PKTS),
-   EF10_DMA_STAT(rx_broadcast, RX_BROADCAST_PKTS),
-   EF10_DMA_STAT(rx_lt64, RX_UNDERSIZE_PKTS),
-   EF10_DMA_STAT(rx_64, RX_64_PKTS),
-   EF10_DMA_STAT(rx_65_to_127, RX_65_TO_127_PKTS),
-   EF10_DMA_STAT(rx_128_to_255, RX_128_TO_255_PKTS),
-   EF10_DMA_STAT(rx_256_to_511, RX_256_TO_511_PKTS),
-   EF10_DMA_STAT(rx_512_to_1023, RX_512_TO_1023_PKTS),
-   EF10_DMA_STAT(rx_1024_to_15xx, RX_1024_TO_15XX_PKTS),
-   EF10_DMA_STAT(rx_15xx_to_jumbo, RX_15XX_TO_JUMBO_PKTS),
-   EF10_DMA_STAT(rx_gtjumbo, RX_GTJUMBO_PKTS),
-   EF10_DMA_STAT(rx_bad_gtjumbo, RX_JABBER_PKTS),
-   EF10_DMA_STAT(rx_overflow, RX_OVERFLOW_PKTS),
-   EF10_DMA_STAT(rx_align_error, RX_ALIGN_ERROR_PKTS),
-   EF10_DMA_STAT(rx_length_error, RX_LENGTH_ERROR_PKTS),
-   EF10_DMA_STAT(rx_nodesc_drops, RX_NODESC_DROPS),
+   EF10_DMA_STAT(port_tx_bytes, TX_BYTES),
+   EF10_DMA_STAT(port_tx_packets, TX_PKTS),
+   EF10_DMA_STAT(port_tx_pause, TX_PAUSE_PKTS),
+   EF10_DMA_STAT(port_tx_control, TX_CONTROL_PKTS),
+   EF10_DMA_STAT(port_tx_unicast, TX_UNICAST_PKTS),
+   EF10_DMA_STAT(port_tx_multicast, TX_MULTICAST_PKTS),
+   EF10_DMA_STAT(port_tx_broadcast, TX_BROADCAST_PKTS),
+   EF10_DMA_STAT(port_tx_lt64, TX_LT64_PKTS),
+   EF10_DMA_STAT(port_tx_64, TX_64_PKTS),
+   EF10_DMA_STAT(port_tx_65_to_127, TX_65_TO_127_PKTS),
+   EF10_DMA_STAT(port_tx_128_to_255, TX_128_TO_255_PKTS),
+   EF10_DMA_STAT(port_tx_256_to_511, TX_256_TO_511_PKTS),
+   EF10_DMA_STAT(port_tx_512_to_1023, TX_512_TO_1023_PKTS),
+   EF10_DMA_STAT(port_tx_1024_to_15xx, TX_1024_TO_15XX_PKTS),
+   EF10_DMA_STAT(port_tx_15xx_to_jumbo, TX_15XX_TO_JUMBO_PKTS),
+   EF10_DMA_STAT(port_rx_bytes, RX_BYTES),
+   EF10_DMA_INVIS_STAT(port_rx_bytes_minus_good_bytes, RX_BAD_BYTES),
+   EF10_OTHER_STAT(port_rx_good_bytes),
+   EF10_OTHER_STAT(port_rx_bad_bytes),
+   EF10_DMA_STAT(port_rx_packets, RX_PKTS),
+   EF10_DMA_STAT(port_rx_good, RX_GOOD_PKTS),
+   EF10_DMA_STAT(port_rx_bad, RX_BAD_FCS_PKTS),
+   EF10_DMA_STAT(port_rx_pause, RX_PAUSE_PKTS),
+   EF10_DMA_STAT(port_rx_control, RX_CONTROL_PKTS),
+   EF10_DMA_STAT(port_rx_unicast, RX_UNICAST_PKTS),
+   EF10_DMA_STAT(port_rx_multicast, RX_MULTICAST_PKTS),
+   EF10_DMA_STAT(port_rx_broadcast, RX_BROADCAST_PKTS),
+   EF10_DMA_STAT(port_rx_lt64, RX_UNDERSIZE_PKTS),
+   EF10_DMA_STAT(port_rx_64, RX_64_PKTS),
+   EF10_DMA_STAT(port_rx_65_to_127, RX_65_TO_127_PKTS),
+   EF10_DMA_STAT(port_rx_128_to_255, RX_128_TO_255_PKTS),
+   EF10_DMA_STAT(port_rx_256_to_511, RX_256_TO_511_PKTS),
+

Re: [PATCH v2 net] xen: netback: read hotplug script once at start of day.

2015-06-01 Thread Wei Liu

On Mon, Jun 01, 2015 at 11:30:24AM +0100, Ian Campbell wrote:
 When we come to tear things down in netback_remove() and generate the
 uevent it is possible that the xenstore directory has already been
 removed (details below).
 
 In such cases netback_uevent() won't be able to read the hotplug
 script and will write a xenstore error node.
 
 A recent change to the hypervisor exposed this race such that we now
 sometimes lose it (where apparently we didn't ever before).
 
 Instead read the hotplug script configuration during setup and use it
 for the lifetime of the backend device.
 
 The apparently more obvious fix of moving the transition to
 state=Closed in netback_remove() to after the uevent does not work
 because it is possible that we are already in state=Closed (in
 reaction to the guest having disconnected as it shutdown). Being
 already in Closed means the toolstack is at liberty to start tearing
 down the xenstore directories. In principal it might be possible to
 arrange to unregister the device sooner (e.g on transition to Closing)
 such that xenstore would still be there but this state machine is
 fragile and prone to anger...
 
 A modern Xen system only relies on the hotplug uevent for driver
 domains, when the backend is in the same domain as the toolstack it
 will run the necessary setup/teardown directly in the correct sequence
 wrt xenstore changes.
 
 Signed-off-by: Ian Campbell ian.campb...@citrix.com

Acked-by: Wei Liu wei.l...@citrix.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 01/14] sfc: Add code to export port_num in netdev-dev_port

2015-06-01 Thread Shradha Shah

In the case where we have multiple functions (PFs and VFs), this
sysfs entry is useful to identify the physical port corresponding
to the function we are interested in.

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index a547ceb..dacf9f8 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -249,6 +249,7 @@ static int efx_ef10_get_mac_address_vf(struct efx_nic *efx, 
u8 *mac_address)
 static int efx_ef10_probe(struct efx_nic *efx)
 {
struct efx_ef10_nic_data *nic_data;
+   struct net_device *net_dev = efx-net_dev;
int i, rc;
 
/* We can have one VI for each 8K region.  However, until we
@@ -326,6 +327,7 @@ static int efx_ef10_probe(struct efx_nic *efx)
if (rc  0)
goto fail3;
efx-port_num = rc;
+   net_dev-dev_port = rc;
 
rc = efx-type-get_mac_address(efx, efx-net_dev-perm_addr);
if (rc)
@@ -334,6 +336,7 @@ static int efx_ef10_probe(struct efx_nic *efx)
rc = efx_ef10_get_sysclk_freq(efx);
if (rc  0)
goto fail3;
+
efx-timer_quantum_ns = 1536000 / rc; /* 1536 cycles */
 
/* Check whether firmware supports bug 35388 workaround.
@@ -341,9 +344,9 @@ static int efx_ef10_probe(struct efx_nic *efx)
 * ask if it's already enabled
 */
rc = efx_mcdi_set_workaround(efx, MC_CMD_WORKAROUND_BUG35388, true);
-   if (rc == 0)
+   if (rc == 0) {
nic_data-workaround_35388 = true;
-   else if (rc == -EPERM) {
+   } else if (rc == -EPERM) {
unsigned int enabled;
 
rc = efx_mcdi_get_workarounds(efx, NULL, enabled);
@@ -351,9 +354,10 @@ static int efx_ef10_probe(struct efx_nic *efx)
goto fail3;
nic_data-workaround_35388 = enabled 
MC_CMD_GET_WORKAROUNDS_OUT_BUG35388;
-   }
-   else if (rc != -ENOSYS  rc != -ENOENT)
+   } else if (rc != -ENOSYS  rc != -ENOENT) {
goto fail3;
+   }
+
netif_dbg(efx, probe, efx-net_dev,
  workaround for bug 35388 is %sabled\n,
  nic_data-workaround_35388 ? en : dis);

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 02/14] sfc: Add sysfs entry for flags (link control and primary)

2015-06-01 Thread Shradha Shah

On  every adapter there will be one primary PF per adaptor and
one link control PF per port.

Signed-off-by: Shradha Shah ss...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c | 58 -
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index dacf9f8..389a45d 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -246,6 +246,34 @@ static int efx_ef10_get_mac_address_vf(struct efx_nic 
*efx, u8 *mac_address)
return 0;
 }
 
+static ssize_t efx_ef10_show_link_control_flag(struct device *dev,
+  struct device_attribute *attr,
+  char *buf)
+{
+   struct efx_nic *efx = pci_get_drvdata(to_pci_dev(dev));
+
+   return sprintf(buf, %d\n,
+  ((efx-mcdi-fn_flags) 
+   (1  MC_CMD_DRV_ATTACH_EXT_OUT_FLAG_LINKCTRL))
+  ? 1 : 0);
+}
+
+static ssize_t efx_ef10_show_primary_flag(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+   struct efx_nic *efx = pci_get_drvdata(to_pci_dev(dev));
+
+   return sprintf(buf, %d\n,
+  ((efx-mcdi-fn_flags) 
+   (1  MC_CMD_DRV_ATTACH_EXT_OUT_FLAG_PRIMARY))
+  ? 1 : 0);
+}
+
+static DEVICE_ATTR(link_control_flag, 0444, efx_ef10_show_link_control_flag,
+  NULL);
+static DEVICE_ATTR(primary_flag, 0444, efx_ef10_show_primary_flag, NULL);
+
 static int efx_ef10_probe(struct efx_nic *efx)
 {
struct efx_ef10_nic_data *nic_data;
@@ -312,30 +340,39 @@ static int efx_ef10_probe(struct efx_nic *efx)
if (rc)
goto fail3;
 
-   rc = efx_ef10_get_pf_index(efx);
+   rc = device_create_file(efx-pci_dev-dev,
+   dev_attr_link_control_flag);
if (rc)
goto fail3;
 
+   rc = device_create_file(efx-pci_dev-dev, dev_attr_primary_flag);
+   if (rc)
+   goto fail4;
+
+   rc = efx_ef10_get_pf_index(efx);
+   if (rc)
+   goto fail5;
+
rc = efx_ef10_init_datapath_caps(efx);
if (rc  0)
-   goto fail3;
+   goto fail5;
 
efx-rx_packet_len_offset =
ES_DZ_RX_PREFIX_PKTLEN_OFST - ES_DZ_RX_PREFIX_SIZE;
 
rc = efx_mcdi_port_get_number(efx);
if (rc  0)
-   goto fail3;
+   goto fail5;
efx-port_num = rc;
net_dev-dev_port = rc;
 
rc = efx-type-get_mac_address(efx, efx-net_dev-perm_addr);
if (rc)
-   goto fail3;
+   goto fail5;
 
rc = efx_ef10_get_sysclk_freq(efx);
if (rc  0)
-   goto fail3;
+   goto fail5;
 
efx-timer_quantum_ns = 1536000 / rc; /* 1536 cycles */
 
@@ -355,7 +392,7 @@ static int efx_ef10_probe(struct efx_nic *efx)
nic_data-workaround_35388 = enabled 
MC_CMD_GET_WORKAROUNDS_OUT_BUG35388;
} else if (rc != -ENOSYS  rc != -ENOENT) {
-   goto fail3;
+   goto fail5;
}
 
netif_dbg(efx, probe, efx-net_dev,
@@ -364,12 +401,16 @@ static int efx_ef10_probe(struct efx_nic *efx)
 
rc = efx_mcdi_mon_probe(efx);
if (rc  rc != -EPERM)
-   goto fail3;
+   goto fail5;
 
efx_ptp_probe(efx, NULL);
 
return 0;
 
+fail5:
+   device_remove_file(efx-pci_dev-dev, dev_attr_primary_flag);
+fail4:
+   device_remove_file(efx-pci_dev-dev, dev_attr_link_control_flag);
 fail3:
efx_mcdi_fini(efx);
 fail2:
@@ -612,6 +653,9 @@ static void efx_ef10_remove(struct efx_nic *efx)
if (!nic_data-must_restore_piobufs)
efx_ef10_free_piobufs(efx);
 
+   device_remove_file(efx-pci_dev-dev, dev_attr_primary_flag);
+   device_remove_file(efx-pci_dev-dev, dev_attr_link_control_flag);
+
efx_mcdi_fini(efx);
efx_nic_free_buffer(efx, nic_data-mcdi_buf);
kfree(nic_data);

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/7] net: dsa: add new driver for ar8xxx family

2015-06-01 Thread Paul Bolle

Just a nit: a license mismatch.

On Thu, 2015-05-28 at 18:42 -0700, Mathieu Olivari wrote:
 --- /dev/null
 +++ b/drivers/net/dsa/ar8xxx.c

 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 and
 + * only version 2 as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.

This states the license is GPL v2.

 +MODULE_LICENSE(GPL);

And this states, according to include/linux/module.h, that the license
is GPL v2 or later. So I think that either the comment at the top of
this file or the ident used in the MODULE_LICENSE() macro needs to
change.


Paul Bolle

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Patch v3 30/36] net/mlx4: Cache irq_desc-affinity instead of irq_desc

2015-06-01 Thread Jiang Liu

The field 'affinity' in irq_desc won't change once the irq_desc data
structure is created. So cache irq_desc-affinity instead of irq_desc.
This also helps to hide struct irq_desc from device drivers.

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 drivers/net/ethernet/mellanox/mlx4/en_cq.c   |6 +++---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   |5 +
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c 
b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 22da4d0d0f05..a03a01625398 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -31,6 +31,7 @@
  *
  */
 
+#include linux/irq.h
 #include linux/mlx4/cq.h
 #include linux/mlx4/qp.h
 #include linux/mlx4/cmd.h
@@ -135,9 +136,8 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct 
mlx4_en_cq *cq,
mdev-dev-caps.num_comp_vectors;
}
 
-   cq-irq_desc =
-   irq_to_desc(mlx4_eq_get_irq(mdev-dev,
-   cq-vector));
+   cq-irq_affinity = irq_get_affinity_mask(
+   mlx4_eq_get_irq(mdev-dev, cq-vector));
} else {
/* For TX we use the same irq per
ring we assigned for the RX*/
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 2a77a6b19121..5675febf478e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1044,14 +1044,11 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int 
budget)
/* If we used up all the quota - we're probably not done yet... */
if (done == budget) {
int cpu_curr;
-   const struct cpumask *aff;
 
INC_PERF_COUNTER(priv-pstats.napi_quota);
 
cpu_curr = smp_processor_id();
-   aff = irq_desc_get_irq_data(cq-irq_desc)-affinity;
-
-   if (likely(cpumask_test_cpu(cpu_curr, aff)))
+   if (likely(cpumask_test_cpu(cpu_curr, cq-irq_affinity)))
return budget;
 
/* Current cpu is not according to smp_irq_affinity -
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h 
b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index d021f079f181..33d544a1fe84 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -357,7 +357,7 @@ struct mlx4_en_cq {
 #define CQ_USER_PEND (MLX4_EN_CQ_STATE_POLL | MLX4_EN_CQ_STATE_POLL_YIELD)
spinlock_t poll_lock; /* protects from LLS/napi conflicts */
 #endif  /* CONFIG_NET_RX_BUSY_POLL */
-   struct irq_desc *irq_desc;
+   struct cpumask  *irq_affinity;
 };
 
 struct mlx4_en_port_profile {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] xen: netback: read hotplug script once at start of day.

2015-06-01 Thread Ian Campbell

On Fri, 2015-05-29 at 18:38 +0100, Wei Liu wrote:
 On Fri, May 29, 2015 at 05:24:53PM +0100, Ian Campbell wrote:
 [...]
  if (be-vif != NULL)
  return 0;
  @@ -417,12 +409,23 @@ static int backend_create_xenvif(struct backend_info 
  *be)
  return (err  0) ? err : -EINVAL;
  }
   
  +   script = xenbus_read(XBT_NIL, dev-nodename, script, NULL);
  +   if (IS_ERR(script)) {
  +   int err = PTR_ERR(script);
  +   xenbus_dev_fatal(dev, err, reading script);
  +   return err;
  +   }
  +
  vif = xenvif_alloc(dev-dev, dev-otherend_id, handle);
  if (IS_ERR(vif)) {
  err = PTR_ERR(vif);
  xenbus_dev_fatal(dev, err, creating interface);
  +   kfree(script);
  return err;
  }
  +
  +   vif-hotplug_script = script;
  +
 
 IMO it's better we make xenvif_alloc accept a new parameter called
 script then allocate vif-hotplug_script there. Then free
 vif-hotplug_script in xenvif_free. This way it's less error prone
 because all memory allocated for vif is managed in proper place -
 xenvif_alloc and xenvif_free.

Well, except the allocation is still in xenbus_read via
backend_create_xenvif, but yes I think that refactoring would be an
improvement.

What about storing it in struct backend_info and setting/restoring in
netback_{probe,remove}? That might be best of all?

Ian.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] [PATCH] net: socket: Fix the wrong returns for recvmsg and sendmsg

2015-06-01 Thread Junling Zheng

Hi, Greg:

We found that after v3.10.73, recvmsg might return -EFAULT while -EINVAL
was expected.

We tested it through the recvmsg01 testcase come from LTP testsuit. It set
msg-msg_namelen to -1 and the recvmsg syscall returned errno 14, which is
unexpected (errno 22 is expected):

recvmsg014  TFAIL  :  invalid socket length ; returned -1 (expected -1),
errno 14 (expected 22)

Linux mainline has no this bug for commit 08adb7dab fixes it accidentally.
However, it is too large and complex to be backported to LTS 3.10.

So, I made the following patch to fix the above problem for LTS 3.10.

Cheers,

Junling



Commit 281c9c36 (net: compat: Update get_compat_msghdr() to match
copy_msghdr_from_user() behaviour) made get_compat_msghdr() return
error if msg_sys-msg_namelen was negative, which changed the behaviors
of recvmsg and sendmsg syscall in a lib32 system:

Before commit 281c9c36, get_compat_msghdr() wouldn't fail and it would
return -EINVAL in move_addr_to_user() or somewhere if msg_sys-msg_namelen
was invalid and then syscall returned -EINVAL, which is correct.

And now, when msg_sys-msg_namelen is negative, get_compat_msghdr() will
fail and wants to return -EINVAL, however, the outer syscall will return
-EFAULT directly, which is unexpected.

This patch gets the return value of get_compat_msghdr() as well as
copy_msghdr_from_user(), then returns this expected value if
get_compat_msghdr() fails.

Fixes: 281c9c36 (net: compat: Update get_compat_msghdr() to match 
copy_msghdr_from_user() behaviour)
Signed-off-by: Junling Zheng zhengjunl...@huawei.com
Signed-off-by: Hanbing Xu xuhanb...@huawei.com
Cc: Li Zefan lize...@huawei.com
Cc: Al Viro v...@zeniv.linux.org.uk
Cc: David Miller da...@davemloft.net
---
 net/socket.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index fc90b4f..53b6e41 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1988,14 +1988,12 @@ static int ___sys_sendmsg(struct socket *sock, struct 
msghdr __user *msg,
int err, ctl_len, total_len;
 
err = -EFAULT;
-   if (MSG_CMSG_COMPAT  flags) {
-   if (get_compat_msghdr(msg_sys, msg_compat))
-   return -EFAULT;
-   } else {
+   if (MSG_CMSG_COMPAT  flags)
+   err = get_compat_msghdr(msg_sys, msg_compat);
+   else
err = copy_msghdr_from_user(msg_sys, msg);
-   if (err)
-   return err;
-   }
+   if (err)
+   return err;
 
if (msg_sys-msg_iovlen  UIO_FASTIOV) {
err = -EMSGSIZE;
@@ -2200,14 +2198,12 @@ static int ___sys_recvmsg(struct socket *sock, struct 
msghdr __user *msg,
struct sockaddr __user *uaddr;
int __user *uaddr_len;
 
-   if (MSG_CMSG_COMPAT  flags) {
-   if (get_compat_msghdr(msg_sys, msg_compat))
-   return -EFAULT;
-   } else {
+   if (MSG_CMSG_COMPAT  flags)
+   err = get_compat_msghdr(msg_sys, msg_compat);
+   else
err = copy_msghdr_from_user(msg_sys, msg);
-   if (err)
-   return err;
-   }
+   if (err)
+   return err;
 
if (msg_sys-msg_iovlen  UIO_FASTIOV) {
err = -EMSGSIZE;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [cdc_ncm] guidance and help refactoring cdc_ncm

2015-06-01 Thread Oliver Neukum

On Mon, 2015-06-01 at 10:24 +0200, Enrico Mioso wrote:
 We are failing on some cases only because of the position we put the
 NDP part 
 of the NCM frame. Infact, this 32-bit driver will work when the 16 bit
 one 
 does, and fail when the 16 bit one does.

I think the discussion would benefit from a clearer explanation
how those devices need an aggregate to look and how our aggregates
look like.

Regards
Oliver


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [cdc_ncm] guidance and help refactoring cdc_ncm

2015-06-01 Thread Enrico Mioso


On Mon, 1 Jun 2015, Oliver Neukum wrote:

==Date: Mon, 1 Jun 2015 14:00:22
==From: Oliver Neukum oneu...@suse.com
==To: Enrico Mioso mrkiko...@gmail.com
==Cc: you...@gmail.com, Greg KH g...@kroah.com, linux-...@vger.kernel.org,
==netdev@vger.kernel.org
==Subject: Re: [cdc_ncm] guidance and help refactoring cdc_ncm
==
==On Mon, 2015-06-01 at 13:41 +0200, Enrico Mioso wrote:
== Thank you Oliver, thank you all for reading this thread and the attention.
== For a more detailed discussion and how we got here, you might google for 
the 
== thread:
== Is this 32-bit NCM?
== and
== Is this 32-bit NCM?y (follow up).
== Where the y letter comes from a mistake of mine.
==
==Having read them it looks like the issues of padding and
==sequence numbers are open.
==
== The specification does only mandate the position of the NTH (header). The 
rest 
== can be in any order and position in general. This will work with most 
devices: 
== except, of course, those from Huawei.
==
==Indeed. And a redesign for crap devices looks like
==a bad idea.
==
== Our aggregate looks something like this from my perspective (anyone correct 
me 
== in case):
== NTH: header
== NDP: contains indexing informations
== ethernet packet 1
== ethernet packet 2
== ...
== ethernet packet n;
== 
== While it should look like:
== NTH: header
== ethernet packet 1
== ethernet packet 2
== ...
== ethernet packet n;
== NDP: contains indexing informations
== 
== but, when introducing such a change: you might break other devices now 
working. 
== Infact, clearly there are multiple vendors producing NCM device, as you 
might 
== also see by looking at the dirver's comments.
== So in general, we should be able to dynamically change the way in which the 
== driver order things in the package.
== and that's why I initially proposed the redesign.
==
==OK, so the NDP needs to be at the end. However in the old thread
==you state that this requires the NDP to be built between the
==final aggregate and physically transmitting. I think this is a false
==choice. You could just as well copy the NDP around provided you reserve
==enough space at the end of the skb.
Yes, probably you can do so. I am not against anything at this moment.
==
==  Regards
==  Oliver
==
==
==
==
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 net-next] vlan: Add GRO support for non hardware accelerated vlan

2015-06-01 Thread Toshiaki Makita

Currently packets with non-hardware-accelerated vlan cannot be handled
by GRO. This causes low performance for 802.1ad and stacked vlan, as their
vlan tags are currently not stripped by hardware.

This patch adds GRO support for non-hardware-accelerated vlan and
improves receive performance of them.

Test Environment:
 vlan device (.1Q) on vlan device (.1ad) on ixgbe (82599)

Result:

- Before

$ netperf -t TCP_STREAM -H 192.168.20.2 -l 60
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  16384  1638460.005233.17

Rx side CPU usage:
  %usr  %sys  %irq %soft %idle
  0.27 58.03  0.00 41.70  0.00

- After

$ netperf -t TCP_STREAM -H 192.168.20.2 -l 60
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  16384  1638460.007586.85

Rx side CPU usage:
  %usr  %sys  %irq %soft %idle
  0.50 25.83  0.00 59.53 14.14

Signed-off-by: Toshiaki Makita makita.toshi...@lab.ntt.co.jp
---
v2:
- Add compare_vlan_header() as per Eric Dumazet.

 include/linux/if_vlan.h | 20 +++
 net/8021q/vlan.c| 94 +
 2 files changed, 114 insertions(+)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index a40d298..67ce5bd 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -628,4 +628,24 @@ static inline netdev_features_t vlan_features_check(const 
struct sk_buff *skb,
return features;
 }
 
+/**
+ * compare_vlan_header - Compare two vlan headers
+ * @h1: Pointer to vlan header
+ * @h2: Pointer to vlan header
+ *
+ * Compare two vlan headers, returns 0 if equal.
+ *
+ * Please note that alignment of h1  h2 are only guaranteed to be 16 bits.
+ */
+static inline unsigned long compare_vlan_header(const struct vlan_hdr *h1,
+   const struct vlan_hdr *h2)
+{
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+   return *(u32 *)h1 ^ *(u32 *)h2;
+#else
+   return ((__force u32)h1-h_vlan_TCI ^ (__force u32)h2-h_vlan_TCI) |
+  ((__force u32)h1-h_vlan_encapsulated_proto ^
+   (__force u32)h2-h_vlan_encapsulated_proto);
+#endif
+}
 #endif /* !(_LINUX_IF_VLAN_H_) */
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 59555f0..9c4f884 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -618,6 +618,90 @@ out:
return err;
 }
 
+static struct sk_buff **vlan_gro_receive(struct sk_buff **head,
+struct sk_buff *skb)
+{
+   struct sk_buff *p, **pp = NULL;
+   struct vlan_hdr *vhdr;
+   unsigned int hlen, off_vlan;
+   const struct packet_offload *ptype;
+   __be16 type;
+   int flush = 1;
+
+   off_vlan = skb_gro_offset(skb);
+   hlen = off_vlan + sizeof(*vhdr);
+   vhdr = skb_gro_header_fast(skb, off_vlan);
+   if (skb_gro_header_hard(skb, hlen)) {
+   vhdr = skb_gro_header_slow(skb, hlen, off_vlan);
+   if (unlikely(!vhdr))
+   goto out;
+   }
+
+   type = vhdr-h_vlan_encapsulated_proto;
+
+   rcu_read_lock();
+   ptype = gro_find_receive_by_type(type);
+   if (!ptype)
+   goto out_unlock;
+
+   flush = 0;
+
+   for (p = *head; p; p = p-next) {
+   struct vlan_hdr *vhdr2;
+
+   if (!NAPI_GRO_CB(p)-same_flow)
+   continue;
+
+   vhdr2 = (struct vlan_hdr *)(p-data + off_vlan);
+   if (compare_vlan_header(vhdr, vhdr2))
+   NAPI_GRO_CB(p)-same_flow = 0;
+   }
+
+   skb_gro_pull(skb, sizeof(*vhdr));
+   skb_gro_postpull_rcsum(skb, vhdr, sizeof(*vhdr));
+   pp = ptype-callbacks.gro_receive(head, skb);
+
+out_unlock:
+   rcu_read_unlock();
+out:
+   NAPI_GRO_CB(skb)-flush |= flush;
+
+   return pp;
+}
+
+static int vlan_gro_complete(struct sk_buff *skb, int nhoff)
+{
+   struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb-data + nhoff);
+   __be16 type = vhdr-h_vlan_encapsulated_proto;
+   struct packet_offload *ptype;
+   int err = -ENOENT;
+
+   rcu_read_lock();
+   ptype = gro_find_complete_by_type(type);
+   if (ptype)
+   err = ptype-callbacks.gro_complete(skb, nhoff + sizeof(*vhdr));
+
+   rcu_read_unlock();
+   return err;
+}
+
+static struct packet_offload vlan_packet_offloads[] __read_mostly = {
+   {
+   .type = cpu_to_be16(ETH_P_8021Q),
+   .callbacks = {
+   .gro_receive = vlan_gro_receive,
+   .gro_complete = vlan_gro_complete,
+   },
+   },
+   {
+   .type = cpu_to_be16(ETH_P_8021AD),
+   .callbacks = {
+   .gro_receive =

Re: [PATCH] xen: netback: fix error printf format string.

2015-06-01 Thread Ian Campbell

On Sun, 2015-05-31 at 21:26 -0700, David Miller wrote:
 From: Ian Campbell ian.campb...@citrix.com
 Date: Fri, 29 May 2015 17:22:04 +0100
 
  drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’:
  drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%lu’ expects 
  argument of type ‘long unsigned int’, but argument 5 has type ‘int’ 
  [-Wformat=]
  (txreq.offset~PAGE_MASK) + txreq.size);
  ^
  
  txreq.offset and .size are uint16_t fields.
  
  Signed-off-by: Ian Campbell ian.campb...@citrix.com
 
 This may get rid of the compiler warning on your machine, but it creates
 one on mine:
 
 drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’:
 drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%u’ expects 
 argument of type ‘unsigned int’, but argument 5 has type ‘long unsigned int’ 
 [-Wformat=]
 (txreq.offset~PAGE_MASK) + txreq.size);
 ^
 
 There is a type involved in this calculation which is arch
 dependent, so you'll need to add a cast or something to
 make this warning go away in all cases.

Ah, I only considered the types txreq.{offset,size} and missed thinking
about PAGE_MASK.

I'll resend with a cast.

Ian.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] xen: netback: read hotplug script once at start of day.

2015-06-01 Thread Wei Liu

On Mon, Jun 01, 2015 at 09:52:45AM +0100, Ian Campbell wrote:
 On Fri, 2015-05-29 at 18:38 +0100, Wei Liu wrote:
  On Fri, May 29, 2015 at 05:24:53PM +0100, Ian Campbell wrote:
  [...]
 if (be-vif != NULL)
 return 0;
   @@ -417,12 +409,23 @@ static int backend_create_xenvif(struct 
   backend_info *be)
 return (err  0) ? err : -EINVAL;
 }

   + script = xenbus_read(XBT_NIL, dev-nodename, script, NULL);
   + if (IS_ERR(script)) {
   + int err = PTR_ERR(script);
   + xenbus_dev_fatal(dev, err, reading script);
   + return err;
   + }
   +
 vif = xenvif_alloc(dev-dev, dev-otherend_id, handle);
 if (IS_ERR(vif)) {
 err = PTR_ERR(vif);
 xenbus_dev_fatal(dev, err, creating interface);
   + kfree(script);
 return err;
 }
   +
   + vif-hotplug_script = script;
   +
  
  IMO it's better we make xenvif_alloc accept a new parameter called
  script then allocate vif-hotplug_script there. Then free
  vif-hotplug_script in xenvif_free. This way it's less error prone
  because all memory allocated for vif is managed in proper place -
  xenvif_alloc and xenvif_free.
 
 Well, except the allocation is still in xenbus_read via
 backend_create_xenvif, but yes I think that refactoring would be an
 improvement.
 
 What about storing it in struct backend_info and setting/restoring in
 netback_{probe,remove}? That might be best of all?
 

Yes, that would be best.

Wei.

 Ian.
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] cxgb4: remove unused fn to enable/disable db coalescing

2015-06-01 Thread Hariprasad Shenai

Remove unused function cxgb4_enable_db_coalescing() and
cxgb4_disable_db_coalescing()

Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 19 ---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h  |  2 --
 2 files changed, 21 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 4f69b52..974b27c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2069,25 +2069,6 @@ out:
 }
 EXPORT_SYMBOL(cxgb4_sync_txq_pidx);
 
-void cxgb4_disable_db_coalescing(struct net_device *dev)
-{
-   struct adapter *adap;
-
-   adap = netdev2adap(dev);
-   t4_set_reg_field(adap, SGE_DOORBELL_CONTROL_A, NOCOALESCE_F,
-NOCOALESCE_F);
-}
-EXPORT_SYMBOL(cxgb4_disable_db_coalescing);
-
-void cxgb4_enable_db_coalescing(struct net_device *dev)
-{
-   struct adapter *adap;
-
-   adap = netdev2adap(dev);
-   t4_set_reg_field(adap, SGE_DOORBELL_CONTROL_A, NOCOALESCE_F, 0);
-}
-EXPORT_SYMBOL(cxgb4_enable_db_coalescing);
-
 int cxgb4_read_tpte(struct net_device *dev, u32 stag, __be32 *tpte)
 {
struct adapter *adap;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index df34293..14e8110 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
@@ -298,8 +298,6 @@ struct sk_buff *cxgb4_pktgl_to_skb(const struct pkt_gl *gl,
   unsigned int skb_len, unsigned int pull_len);
 int cxgb4_sync_txq_pidx(struct net_device *dev, u16 qid, u16 pidx, u16 size);
 int cxgb4_flush_eq_cache(struct net_device *dev);
-void cxgb4_disable_db_coalescing(struct net_device *dev);
-void cxgb4_enable_db_coalescing(struct net_device *dev);
 int cxgb4_read_tpte(struct net_device *dev, u32 stag, __be32 *tpte);
 u64 cxgb4_read_sge_timestamp(struct net_device *dev);
 
-- 
2.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 2/3] dsa: Add support for multiple cpu ports.

2015-06-01 Thread Bjørn Mork

Andrew Lunn and...@lunn.ch writes:

 So the ports look like normal ports, and you configure then using the
 normal mechanisms.

 DSA does not use vlans. It uses an additional protocol header which
 the switch supports, to allow the CPU to direct packets out a specific
 port. Similarly, packets coming to the CPU from a port and marked with
 the port they ingressed. This means the ports are completely separated
 by default. When you add interfaces to a bridge, calls are made by the
 bridge code into DSA to setup the switch to hardware bridge the
 interface. And if the switch driver does not support it, software
 bridging is used instead. Unless you know what is going on under the
 hood, you have no idea that eth0 and eth1 are used to carry packets to
 the switch, and that the switch is bridging the interfaces. So it is
 linux concepts, with some hardware acceleration.

Thanks a lot.  This filled most of the blank spots. I should have done
some research into what dsa actually is before posting my question.

 Now back to you question. What is clearly hardware and needs to go
 into device tree is the mapping between switch ports and cpu
 ports. eth0 -- port 6, eth1 -- port 5.

 But i've reconsidered putting into device tree the load balancing of
 which slave ports, lan[1-4], wan, are attached to which master port,
 eth[01]. It should not be in DT. We want a sensible default, which i
 would say is what i had in DT, allocate them every other, but
 implement this in software. And allow the user to move slaves between
 masters, using a user space command. Something like:

 ip link set dev lan4 master eth0

 So if you wish, you can then have eth1 dedicated to WAN, and eth0 for
 lan[1-4]. Or any other combination.

 I would say implementing this command to move a slave between masters
 can come later, so long as we have a default which works for most
 people. Using every other is clearly between than only using one
 interface.

Yes, that sounds reasonable.

But I do still wonder if this model can be made flexible enough. How
about a switch having more CPU ports than external ports (just an
imaginary product - I don't know if anyone is crazy enough to make it)?
Or what if I'd like to dedicate CPU port eth0 to VLAN 13, while CPU port
eth1 handles everything else?  With lan0 carrying an 802.1q trunk with
both VLAN 13 and more, i.e. a mix of packets for both eth0 and eth1?

Well, I'm being difficult now :) We can probably do fine without being
able to express those things.  And I realize that I'm a bit too late
into any discussion about modelling this.

Thanks again for taking the time to write such a good answer.



Bjørn
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.32.66 tcp regression OOPs

2015-06-01 Thread Willy Tarreau

Hi,

On Mon, Jun 01, 2015 at 09:00:21AM +0200, Frans Klaver wrote:
 [cc: Willy Tarreau]
 
 On Mon, Jun 1, 2015 at 3:26 AM,  starlight.201...@binnacle.cx wrote:
  Hello,
 
  Apoligies if I have submitted to the wrong lists.
 
  Encountered a regression in
  2.6.32.66 relative to 2.6.32.65.
 
  Crash eight minutes after boot.
 
  Will responded with additional details
  if the OOPS is not sufficent.
 
  Best Regards
 
 
 Did you bisect it?

Eric Dumazet notified me that of something possibly similar due to
a mistake I made when backporting a fix by hand.

Please apply the following patch to see if it fixes the problem :

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c  
index 5339f066234b..d1e2895bb63c 100644 
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2136,7 +2136,7 @@ void tcp_send_fin(struct sock *sk)
 */
if (tskb  (tcp_send_head(sk) || tcp_memory_pressure)) {
 coalesce:
-   TCP_SKB_CB(skb)-flags |= TCPCB_FLAG_FIN;
+   TCP_SKB_CB(tskb)-flags |= TCPCB_FLAG_FIN;
TCP_SKB_CB(tskb)-end_seq++;
tp-write_seq++;
if (!tcp_send_head(sk)) {

Thanks,
Willy

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [cdc_ncm] guidance and help refactoring cdc_ncm

2015-06-01 Thread Oliver Neukum

On Mon, 2015-06-01 at 08:53 +0200, Enrico Mioso wrote:

 A 32-bit version of the driver (talking 32-bit NCM) is here:
 http://www.gstorm.eu/cdc_ncm.c
 I modified the original driver with the help of a very talented friend.
 It works: but there seem to be no real reasons to implement this properly. We 
 did this in a consistent effort to understand what was wrong, and here it is.

Well, you are really talking about two different things here.
Do we ever fail because we only do 16 bit as opposed to 32 bit?
Your 32 bit driver looks good, but it raises the question of what to do
if this test:

if (le16_to_cpu(ctx-ncm_parm.bmNtbFormatsSupported) 
USB_CDC_NCM_NTB32_SUPPORTED) {

fails. Otherwise it looks ready for merging.

Regards
Oliver



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 net] xen: netback: read hotplug script once at start of day.

2015-06-01 Thread Ian Campbell

When we come to tear things down in netback_remove() and generate the
uevent it is possible that the xenstore directory has already been
removed (details below).

In such cases netback_uevent() won't be able to read the hotplug
script and will write a xenstore error node.

A recent change to the hypervisor exposed this race such that we now
sometimes lose it (where apparently we didn't ever before).

Instead read the hotplug script configuration during setup and use it
for the lifetime of the backend device.

The apparently more obvious fix of moving the transition to
state=Closed in netback_remove() to after the uevent does not work
because it is possible that we are already in state=Closed (in
reaction to the guest having disconnected as it shutdown). Being
already in Closed means the toolstack is at liberty to start tearing
down the xenstore directories. In principal it might be possible to
arrange to unregister the device sooner (e.g on transition to Closing)
such that xenstore would still be there but this state machine is
fragile and prone to anger...

A modern Xen system only relies on the hotplug uevent for driver
domains, when the backend is in the same domain as the toolstack it
will run the necessary setup/teardown directly in the correct sequence
wrt xenstore changes.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
v2: Move script to backend_info and read/free it in
netback_{probe,remove}.

DaveM, could this go to all stable trees please.
---
 drivers/net/xen-netback/xenbus.c |   33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 3d8dbf5..6380b28 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -34,6 +34,8 @@ struct backend_info {
enum xenbus_state frontend_state;
struct xenbus_watch hotplug_status_watch;
u8 have_hotplug_status_watch:1;
+
+   const char *hotplug_script;
 };
 
 static int connect_rings(struct backend_info *be, struct xenvif_queue *queue);
@@ -238,6 +240,7 @@ static int netback_remove(struct xenbus_device *dev)
xenvif_free(be-vif);
be-vif = NULL;
}
+   kfree(be-hotplug_script);
kfree(be);
dev_set_drvdata(dev-dev, NULL);
return 0;
@@ -255,6 +258,7 @@ static int netback_probe(struct xenbus_device *dev,
struct xenbus_transaction xbt;
int err;
int sg;
+   const char *script;
struct backend_info *be = kzalloc(sizeof(struct backend_info),
  GFP_KERNEL);
if (!be) {
@@ -347,6 +351,15 @@ static int netback_probe(struct xenbus_device *dev,
if (err)
pr_debug(Error writing multi-queue-max-queues\n);
 
+   script = xenbus_read(XBT_NIL, dev-nodename, script, NULL);
+   if (IS_ERR(script)) {
+   err = PTR_ERR(script);
+   xenbus_dev_fatal(dev, err, reading script);
+   goto fail;
+   }
+
+   be-hotplug_script = script;
+
err = xenbus_switch_state(dev, XenbusStateInitWait);
if (err)
goto fail;
@@ -379,22 +392,14 @@ static int netback_uevent(struct xenbus_device *xdev,
  struct kobj_uevent_env *env)
 {
struct backend_info *be = dev_get_drvdata(xdev-dev);
-   char *val;
 
-   val = xenbus_read(XBT_NIL, xdev-nodename, script, NULL);
-   if (IS_ERR(val)) {
-   int err = PTR_ERR(val);
-   xenbus_dev_fatal(xdev, err, reading script);
-   return err;
-   } else {
-   if (add_uevent_var(env, script=%s, val)) {
-   kfree(val);
-   return -ENOMEM;
-   }
-   kfree(val);
-   }
+   if (!be)
+   return 0;
+
+   if (add_uevent_var(env, script=%s, be-hotplug_script))
+   return -ENOMEM;
 
-   if (!be || !be-vif)
+   if (!be-vif)
return 0;
 
return add_uevent_var(env, vif=%s, be-vif-dev-name);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [cdc_ncm] guidance and help refactoring cdc_ncm

2015-06-01 Thread Enrico Mioso

Thank you Oliver for the reply.


On Mon, 1 Jun 2015, Oliver Neukum wrote:

==Date: Mon, 1 Jun 2015 09:48:26
==From: Oliver Neukum oneu...@suse.com
==To: Enrico Mioso mrkiko...@gmail.com
==Cc: Greg KH g...@kroah.com, linux-...@vger.kernel.org,
==netdev@vger.kernel.org, you...@gmail.com
==Subject: Re: [cdc_ncm] guidance and help refactoring cdc_ncm
==
==On Mon, 2015-06-01 at 08:53 +0200, Enrico Mioso wrote:
==
== A 32-bit version of the driver (talking 32-bit NCM) is here:
== http://www.gstorm.eu/cdc_ncm.c
== I modified the original driver with the help of a very talented friend.
== It works: but there seem to be no real reasons to implement this properly. 
We 
== did this in a consistent effort to understand what was wrong, and here it 
is.
==
==Well, you are really talking about two different things here.
==Do we ever fail because we only do 16 bit as opposed to 32 bit?
==Your 32 bit driver looks good, but it raises the question of what to do
==if this test:
==
==if (le16_to_cpu(ctx-ncm_parm.bmNtbFormatsSupported) 
==USB_CDC_NCM_NTB32_SUPPORTED) {
==
==fails. Otherwise it looks ready for merging.
==
==  Regards
==  Oliver
==
==
==
==

Oh - I am sorry. Infact, I am taling about two different things here. No, I've 
seen no device failing because of this (16 vs. 32 bit problem).
I was mentioning this only as an additional thing to consider, at least from my 
side.
We are failing on some cases only because of the position we put the NDP part 
of the NCM frame. Infact, this 32-bit driver will work when the 16 bit one 
does, and fail when the 16 bit one does.
Thank you for your review and consideration.
It would be nice to see this code merged - but I think we might need to merge 
the two drivers in a way that reduces code de-duplication. But this might be 
work for a second take in case.
Thank you again,
Enrico
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 2/8] tun: add tun_is_little_endian() helper

2015-06-01 Thread Michael S. Tsirkin

On Fri, Apr 24, 2015 at 02:24:38PM +0200, Greg Kurz wrote:
 Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com

Dave, could you please ack merging this through
the virtio tree?

 ---
  drivers/net/tun.c |9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/net/tun.c b/drivers/net/tun.c
 index 857dca4..3c3d6c0 100644
 --- a/drivers/net/tun.c
 +++ b/drivers/net/tun.c
 @@ -206,14 +206,19 @@ struct tun_struct {
   u32 flow_count;
  };
  
 +static inline bool tun_is_little_endian(struct tun_struct *tun)
 +{
 + return tun-flags  TUN_VNET_LE;
 +}
 +
  static inline u16 tun16_to_cpu(struct tun_struct *tun, __virtio16 val)
  {
 - return __virtio16_to_cpu(tun-flags  TUN_VNET_LE, val);
 + return __virtio16_to_cpu(tun_is_little_endian(tun), val);
  }
  
  static inline __virtio16 cpu_to_tun16(struct tun_struct *tun, u16 val)
  {
 - return __cpu_to_virtio16(tun-flags  TUN_VNET_LE, val);
 + return __cpu_to_virtio16(tun_is_little_endian(tun), val);
  }
  
  static inline u32 tun_hashfn(u32 rxhash)
 
 ___
 Virtualization mailing list
 virtualizat...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/virtualization
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] xen: netback: fix printf format string warning

2015-06-01 Thread Ian Campbell

drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’:
drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%lu’ expects 
argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Wformat=]
(txreq.offset~PAGE_MASK) + txreq.size);
^

PAGE_MASK's type can vary by arch, so a cast is needed.

Signed-off-by: Ian Campbell ian.campb...@citrix.com

v2: Cast to unsigned long, since PAGE_MASK can vary by arch.
---
 drivers/net/xen-netback/netback.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 4de46aa..0d25943 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1250,7 +1250,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue 
*queue,
netdev_err(queue-vif-dev,
   txreq.offset: %x, size: %u, end: %lu\n,
   txreq.offset, txreq.size,
-  (txreq.offset~PAGE_MASK) + txreq.size);
+  (unsigned long)(txreq.offset~PAGE_MASK) + 
txreq.size);
xenvif_fatal_tx_err(queue-vif);
break;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 3/8] macvtap: introduce macvtap_is_little_endian() helper

2015-06-01 Thread Michael S. Tsirkin

On Fri, Apr 24, 2015 at 02:24:48PM +0200, Greg Kurz wrote:
 Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com

Dave, could you pls ack merging this through the virtio tree?

 ---
  drivers/net/macvtap.c |9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
 index 27ecc5c..a2f2958 100644
 --- a/drivers/net/macvtap.c
 +++ b/drivers/net/macvtap.c
 @@ -49,14 +49,19 @@ struct macvtap_queue {
  
  #define MACVTAP_VNET_LE 0x8000
  
 +static inline bool macvtap_is_little_endian(struct macvtap_queue *q)
 +{
 + return q-flags  MACVTAP_VNET_LE;
 +}
 +
  static inline u16 macvtap16_to_cpu(struct macvtap_queue *q, __virtio16 val)
  {
 - return __virtio16_to_cpu(q-flags  MACVTAP_VNET_LE, val);
 + return __virtio16_to_cpu(macvtap_is_little_endian(q), val);
  }
  
  static inline __virtio16 cpu_to_macvtap16(struct macvtap_queue *q, u16 val)
  {
 - return __cpu_to_virtio16(q-flags  MACVTAP_VNET_LE, val);
 + return __cpu_to_virtio16(macvtap_is_little_endian(q), val);
  }
  
  static struct proto macvtap_proto = {
 
 ___
 Virtualization mailing list
 virtualizat...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/virtualization
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.32.66 tcp regression OOPs

2015-06-01 Thread starlight . 2015q2

Hi,

I found the patch late yesterday and applied it.

Running fine now for 12 hours under active load.

Recommend the patch be rolled into the tarball,
or a notation added to the release page as this
one has severe consequences.

Thank You!


At 09:49 6/1/2015 +0200, Willy Tarreau wrote:
Hi,

On Mon, Jun 01, 2015 at 09:00:21AM +0200, Frans Klaver wrote:
 [cc: Willy Tarreau]
 
 On Mon, Jun 1, 2015 at 3:26 AM,  
starlight.201...@binnacle.cx wrote:
  Hello,
 
  Apoligies if I have submitted to the wrong lists.
 
  Encountered a regression in
  2.6.32.66 relative to 2.6.32.65.
 
  Crash eight minutes after boot.
 
  Will responded with additional details
  if the OOPS is not sufficent.
 
  Best Regards
 
 
 Did you bisect it?

Eric Dumazet notified me that of something possibly similar due to
a mistake I made when backporting a fix by hand.

Please apply the following patch to see if it fixes the problem 
:

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c   
   
index 5339f066234b..d1e2895bb63c 100644  
   
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2136,7 +2136,7 @@ void tcp_send_fin(struct sock *sk)
   */
   if (tskb  (tcp_send_head(sk) || tcp_memory_pressure)) {
 coalesce:
-  TCP_SKB_CB(skb)-flags |= TCPCB_FLAG_FIN;
+  TCP_SKB_CB(tskb)-flags |= TCPCB_FLAG_FIN;
   TCP_SKB_CB(tskb)-end_seq++;
   tp-write_seq++;
   if (!tcp_send_head(sk)) {

Thanks,
Willy

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] ipv4/udp: Verify multicast group is ours in upd_v4_early_demux()

2015-06-01 Thread Shawn Bohrer

From: Shawn Bohrer sboh...@rgmadvisors.com

421b3885bf6d56391297844f43fb7154a6396e12 udp: ipv4: Add udp early
demux introduced a regression that allowed sockets bound to INADDR_ANY
to receive packets from multicast groups that the socket had not joined.
For example a socket that had joined 224.168.2.9 could also receive
packets from 225.168.2.9 despite not having joined that group if
ip_early_demux is enabled.

Fix this by calling ip_check_mc_rcu() in udp_v4_early_demux() to verify
that the multicast packet is indeed ours.

Signed-off-by: Shawn Bohrer sboh...@rgmadvisors.com
Reported-by: Yurij M. Plotnikov yurij.plotni...@oktetlabs.ru
---
 net/ipv4/udp.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d10b7e0..17d31f5 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -90,6 +90,7 @@
 #include linux/socket.h
 #include linux/sockios.h
 #include linux/igmp.h
+#include linux/inetdevice.h
 #include linux/in.h
 #include linux/errno.h
 #include linux/timer.h
@@ -1959,7 +1960,7 @@ void udp_v4_early_demux(struct sk_buff *skb)
struct net *net = dev_net(skb-dev);
const struct iphdr *iph;
const struct udphdr *uh;
-   struct sock *sk;
+   struct sock *sk = NULL;
struct dst_entry *dst;
int dif = skb-dev-ifindex;
 
@@ -1971,10 +1972,17 @@ void udp_v4_early_demux(struct sk_buff *skb)
uh = udp_hdr(skb);
 
if (skb-pkt_type == PACKET_BROADCAST ||
-   skb-pkt_type == PACKET_MULTICAST)
-   sk = __udp4_lib_mcast_demux_lookup(net, uh-dest, iph-daddr,
-  uh-source, iph-saddr, dif);
-   else if (skb-pkt_type == PACKET_HOST)
+   skb-pkt_type == PACKET_MULTICAST) {
+   struct in_device *in_dev = __in_dev_get_rcu(skb-dev);
+
+   if (in_dev) {
+   int our = ip_check_mc_rcu(in_dev, iph-daddr, 
iph-saddr,
+ iph-protocol);
+   if (our)
+   sk = __udp4_lib_mcast_demux_lookup(net, 
uh-dest, iph-daddr,
+  uh-source, 
iph-saddr, dif);
+   }
+   } else if (skb-pkt_type == PACKET_HOST)
sk = __udp4_lib_demux_lookup(net, uh-dest, iph-daddr,
 uh-source, iph-saddr, dif);
else
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] mac80211: Switch to new AEAD interface

2015-06-01 Thread Stephan Mueller

Am Montag, 1. Juni 2015, 16:35:26 schrieb Johannes Berg:

Hi Johannes,


IOW, I think something like this would make sense:


That looks definitely cleaner :-)

Though, my main concern was just to ensure that the aad length value is not 
zero.


Ciao
Stephan
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC net-next 1/3] net: infra for per-nexthop encap data

2015-06-01 Thread Robert Shearman

Having to add a new interface to apply encap onto a packet is a
mechanism that works well today, allowing the setup of the encap to be
done separately from the routes out of them, meaning that routing
protocols and other user-space apps don't need to do anything special
to add routes out of a new type of interface. However, the overhead of
creating an interface is high, especially in terms of
memory. Therefore, the traditional method won't work very well for
large numbers of routes applying encap where there is a low degree of
sharing of the encap.

The solution is to introduce a way of defining encap on a per-nexthop
basis (i.e. per-route if only one nexthop) through the addition of a
new netlink attribute, RTA_ENCAP. The semantics of this attribute is
that the data is interpreted according to the output interface type
(RTA_OIF) and is opaque to the normal forwarding path. The output
interface doesn't have to be defined per-nexthop, but instead
represents the way of encapsulating the packet. There could be as few
as one per namespace, but more could be created, particularly if they
are used to define parameters which are shared by a large number of
routes. However, the split of what goes in the encap data and what
might be specified via interface attributes is entirely up to the
encap-type implementation.

New rtnetlink operations are defined to assist with the management of
this data:
- parse_encap for parsing the attribute given through rtnl and either
  sizing the in-memory version (if encap ptr is NULL) or filling in the
  in-memory version.  RTA_ENCAP work for IPv4. This operations allows
  the interface to reject invalid encap specified by user-space and the
  sizing allows the kernel to have a different in memory implementation
  to the netlink API (which might be optimised for extensibility rather
  than speed of packet forwarding).
- fill_encap for taking the in-memory version of the encap and filling
  in an RTA_ENCAP attribute in a netlink message.
- match_encap for comparing an in-memory version of encap with an
  RTA_ENCAP version, returning 0 if matching or 1 if different.

A new dst operation is also defined to allow encap-type interfaces to
retrieve the encap data from their xmit functions and use it for
encapsulating the packet and for further forwarding.

Suggested-by: Eric W. Biederman ebied...@xmission.com
Signed-off-by: Robert Shearman rshea...@brocade.com
---
 include/linux/rtnetlink.h  |  7 +++
 include/net/dst.h  | 11 +++
 include/net/dst_ops.h  |  2 ++
 include/net/rtnetlink.h| 11 +++
 include/uapi/linux/rtnetlink.h |  1 +
 net/core/rtnetlink.c   | 36 
 6 files changed, 68 insertions(+)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index a2324fb45cf4..470d822ddd61 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -22,6 +22,13 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct 
net_device *dev,
 void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev,
   gfp_t flags);
 
+int rtnl_parse_encap(const struct net_device *dev, const struct nlattr *nla,
+void *encap);
+int rtnl_fill_encap(const struct net_device *dev, struct sk_buff *skb,
+   int encap_len, const void *encap);
+int rtnl_match_encap(const struct net_device *dev, const struct nlattr *nla,
+int encap_len, const void *encap);
+
 
 /* RTNL is used as a global lock for all changes to network configuration  */
 extern void rtnl_lock(void);
diff --git a/include/net/dst.h b/include/net/dst.h
index 2bc73f8a00a9..df0e6ec18eca 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -506,4 +506,15 @@ static inline struct xfrm_state *dst_xfrm(const struct 
dst_entry *dst)
 }
 #endif
 
+/* Get encap data for destination */
+static inline int dst_get_encap(struct sk_buff *skb, const void **encap)
+{
+   const struct dst_entry *dst = skb_dst(skb);
+
+   if (!dst || !dst-ops-get_encap)
+   return 0;
+
+   return dst-ops-get_encap(dst, encap);
+}
+
 #endif /* _NET_DST_H */
diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index d64253914a6a..97f48cf8ef7d 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -32,6 +32,8 @@ struct dst_ops {
struct neighbour *  (*neigh_lookup)(const struct dst_entry *dst,
struct sk_buff *skb,
const void *daddr);
+   int (*get_encap)(const struct dst_entry *dst,
+const void **encap);
 
struct kmem_cache   *kmem_cachep;
 
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 343d922d15c2..3121ade24957 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -95,6 +95,17 @@ struct rtnl_link_ops {

[RFC net-next 3/3] mpls: new ipmpls device for encapsulating IP packets as mpls

2015-06-01 Thread Robert Shearman

Allow creating an mpls device for the purposes of encapsulating IP
packets with:

  ip link add type ipmpls

This device defines its per-nexthop encapsulation data as a stack of
labels, in the same format as for RTA_NEWST. It uses the encap data
which will have been stored in the IP route to encapsulate the packet
with that stack of labels, with the last label corresponding to a
local label that defines how the packet will be sent out. The device
sends packets over loopback to the local MPLS forwarding logic which
performs all of the work.

Stats are implemented, although any error in the sending via the real
interface will be handled by the main mpls forwarding code and so not
accounted by the interface.

This implementation is based on an alternative earlier implementation
by Eric W. Biederman.

Signed-off-by: Robert Shearman rshea...@brocade.com
---
 include/uapi/linux/if_arp.h |   1 +
 net/mpls/Kconfig|   5 +
 net/mpls/Makefile   |   1 +
 net/mpls/af_mpls.c  |   2 +
 net/mpls/ipmpls.c   | 284 
 5 files changed, 293 insertions(+)
 create mode 100644 net/mpls/ipmpls.c

diff --git a/include/uapi/linux/if_arp.h b/include/uapi/linux/if_arp.h
index 4d024d75d64b..17d669fd1781 100644
--- a/include/uapi/linux/if_arp.h
+++ b/include/uapi/linux/if_arp.h
@@ -88,6 +88,7 @@
 #define ARPHRD_IEEE80211_RADIOTAP 803  /* IEEE 802.11 + radiotap header */
 #define ARPHRD_IEEE802154804
 #define ARPHRD_IEEE802154_MONITOR 805  /* IEEE 802.15.4 network monitor */
+#define ARPHRD_MPLS806 /* IP and IPv6 over MPLS tunnels */
 
 #define ARPHRD_PHONET  820 /* PhoNet media type*/
 #define ARPHRD_PHONET_PIPE 821 /* PhoNet pipe header   */
diff --git a/net/mpls/Kconfig b/net/mpls/Kconfig
index 17bde799c854..5264da94733a 100644
--- a/net/mpls/Kconfig
+++ b/net/mpls/Kconfig
@@ -27,4 +27,9 @@ config MPLS_ROUTING
help
 Add support for forwarding of mpls packets.
 
+config MPLS_IPTUNNEL
+   tristate MPLS: IP over MPLS tunnel support
+   help
+A network device that encapsulates ip packets as mpls
+
 endif # MPLS
diff --git a/net/mpls/Makefile b/net/mpls/Makefile
index 65bbe68c72e6..3a93c14b23c5 100644
--- a/net/mpls/Makefile
+++ b/net/mpls/Makefile
@@ -3,5 +3,6 @@
 #
 obj-$(CONFIG_NET_MPLS_GSO) += mpls_gso.o
 obj-$(CONFIG_MPLS_ROUTING) += mpls_router.o
+obj-$(CONFIG_MPLS_IPTUNNEL) += ipmpls.o
 
 mpls_router-y := af_mpls.o
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 7b3f732269e4..68bdfbdddfaf 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -615,6 +615,7 @@ int nla_put_labels(struct sk_buff *skb, int attrtype,
 
return 0;
 }
+EXPORT_SYMBOL(nla_put_labels);
 
 int nla_get_labels(const struct nlattr *nla,
   u32 max_labels, u32 *labels, u32 label[])
@@ -660,6 +661,7 @@ int nla_get_labels(const struct nlattr *nla,
*labels = nla_labels;
return 0;
 }
+EXPORT_SYMBOL(nla_get_labels);
 
 static int rtm_to_route_config(struct sk_buff *skb,  struct nlmsghdr *nlh,
   struct mpls_route_config *cfg)
diff --git a/net/mpls/ipmpls.c b/net/mpls/ipmpls.c
new file mode 100644
index ..cf6894ae0c61
--- /dev/null
+++ b/net/mpls/ipmpls.c
@@ -0,0 +1,284 @@
+#include linux/types.h
+#include linux/netdevice.h
+#include linux/if_vlan.h
+#include linux/if_arp.h
+#include linux/ip.h
+#include linux/ipv6.h
+#include linux/module.h
+#include linux/mpls.h
+#include internal.h
+
+static LIST_HEAD(ipmpls_dev_list);
+
+#define MAX_NEW_LABELS 2
+
+struct ipmpls_dev_priv {
+   struct net_device *out_dev;
+   struct list_head list;
+   struct net_device *dev;
+};
+
+static netdev_tx_t ipmpls_dev_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+   struct ipmpls_dev_priv *priv = netdev_priv(dev);
+   struct net_device *out_dev = priv-out_dev;
+   struct mpls_shim_hdr *hdr;
+   bool bottom_of_stack = true;
+   int len = skb-len;
+   const void *encap;
+   int num_labels;
+   unsigned ttl;
+   const u32 *labels;
+   int ret;
+   int i;
+
+   num_labels = dst_get_encap(skb, encap) / 4;
+   if (!num_labels)
+   goto drop;
+
+   labels = encap;
+
+   /* Obtain the ttl */
+   if (skb-protocol == htons(ETH_P_IP)) {
+   ttl = ip_hdr(skb)-ttl;
+   } else if (skb-protocol == htons(ETH_P_IPV6)) {
+   ttl = ipv6_hdr(skb)-hop_limit;
+   } else if (skb-protocol == htons(ETH_P_MPLS_UC)) {
+   ttl = mpls_entry_decode(mpls_hdr(skb)).ttl;
+   bottom_of_stack = false;
+   } else {
+   goto drop;
+   }
+
+   /* Now that the encap has been retrieved, there's no longer
+* any need to keep the dst around so clear it out.
+*/
+   skb_dst_drop(skb);
+   skb_orphan(skb);
+
+   skb-inner_protocol = skb-protocol;
+

Re: [PATCH v2 net-next] vlan: Add GRO support for non hardware accelerated vlan

2015-06-01 Thread Toshiaki Makita


On 15/06/01 (月) 23:12, Eric Dumazet wrote:

On Mon, 2015-06-01 at 21:55 +0900, Toshiaki Makita wrote:


@@ -668,6 +753,9 @@ static int __init vlan_proto_init(void)
if (err  0)
goto err5;

+   for (i = 0; i  ARRAY_SIZE(vlan_packet_offloads); i++)
+   dev_add_offload(vlan_packet_offloads[i]);
+
vlan_ioctl_set(vlan_ioctl_handler);
return 0;


My concern about this is :

This might slow down GRO stack for other traffic, if dev_add_offload()
for vlan offloads is called after
dev_add_offload(ip_packet_offload) /
dev_add_offload(ipv6_packet_offload)


I didn't have that concern because there are already other similar 
offloads (eth, mpls_uc, mpls_mc). But indeed, they and this could slow 
down GRO stack.





This is because list_add_rcu is used and this inserts in front of the
offload_base list.

void dev_add_offload(struct packet_offload *po)
{
 struct list_head *head = offload_base;

 spin_lock(offload_lock);
 list_add_rcu(po-list, head);
 spin_unlock(offload_lock);
}

Can we ensure offload_base contains a sensible order of expected types ?


Add priority to packet_offload like nf_hook_ops?
Or have dev_add_offload() prioritize IP  IPV6  others?

Toshiaki Makita
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC net-next 2/3] ipv4: storing and retrieval of per-nexthop encap

2015-06-01 Thread Robert Shearman

Parse RTA_ENCAP attribute for one path and multipath routes. The encap
length is stored in a newly added field to fib_nh, nh_encap_len,
although this is added to a padding hole in the structure so that it
doesn't increase the size at all. The encap data itself is stored at
the end of the array of nexthops. Whilst this means that retrieval
isn't optimal, especially if there are multiple nexthops, this avoids
the memory cost of an extra pointer, as well as any potential change
to the cache or instruction layout that could cause a performance
impact.

Currently, the dst structure allocated to represent the destination of
the packet and used for retrieving the encap by the encap-type
interface has been grown through the addition of the rt_encap_len and
rt_encap fields. This isn't desirable and could be fixed by defining a
new destination type with operations copied from the normal case,
other than the addition of the get_encap operation.

Signed-off-by: Robert Shearman rshea...@brocade.com
---
 include/net/ip_fib.h |   2 +
 include/net/route.h  |   3 +
 net/ipv4/fib_frontend.c  |   3 +
 net/ipv4/fib_lookup.h|   2 +
 net/ipv4/fib_semantics.c | 179 ++-
 net/ipv4/route.c |  24 +++
 6 files changed, 211 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 54271ed0ed45..a06cec5eb3aa 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -44,6 +44,7 @@ struct fib_config {
u32 fc_flow;
u32 fc_nlflags;
struct nl_info  fc_nlinfo;
+   struct nlattr *fc_encap;
  };
 
 struct fib_info;
@@ -75,6 +76,7 @@ struct fib_nh {
struct fib_info *nh_parent;
unsigned intnh_flags;
unsigned char   nh_scope;
+   unsigned char   nh_encap_len;
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
int nh_weight;
int nh_power;
diff --git a/include/net/route.h b/include/net/route.h
index fe22d03afb6a..e8b58914c4c1 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -64,6 +64,9 @@ struct rtable {
/* Miscellaneous cached information */
u32 rt_pmtu;
 
+   unsigned intrt_encap_len;
+   void*rt_encap;
+
struct list_headrt_uncached;
struct uncached_list*rt_uncached_list;
 };
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 872494e6e6eb..aa538ab7e3b9 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -656,6 +656,9 @@ static int rtm_to_fib_config(struct net *net, struct 
sk_buff *skb,
case RTA_TABLE:
cfg-fc_table = nla_get_u32(attr);
break;
+   case RTA_ENCAP:
+   cfg-fc_encap = attr;
+   break;
}
}
 
diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index c6211ed60b03..003318c51ae8 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -34,6 +34,8 @@ int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int 
event, u32 tb_id,
  unsigned int);
 void rtmsg_fib(int event, __be32 key, struct fib_alias *fa, int dst_len,
   u32 tb_id, const struct nl_info *info, unsigned int nlm_flags);
+const void *fib_get_nh_encap(const struct fib_info *fi,
+const struct fib_nh *nh);
 
 static inline void fib_result_assign(struct fib_result *res,
 struct fib_info *fi)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 28ec3c1823bf..db466b636241 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -257,6 +257,9 @@ static inline int nh_comp(const struct fib_info *fi, const 
struct fib_info *ofi)
const struct fib_nh *onh = ofi-fib_nh;
 
for_nexthops(fi) {
+   const void *onh_encap = fib_get_nh_encap(fi, nh);
+   const void *nh_encap = fib_get_nh_encap(fi, nh);
+
if (nh-nh_oif != onh-nh_oif ||
nh-nh_gw  != onh-nh_gw ||
nh-nh_scope != onh-nh_scope ||
@@ -266,7 +269,10 @@ static inline int nh_comp(const struct fib_info *fi, const 
struct fib_info *ofi)
 #ifdef CONFIG_IP_ROUTE_CLASSID
nh-nh_tclassid != onh-nh_tclassid ||
 #endif
-   ((nh-nh_flags ^ onh-nh_flags)  ~RTNH_F_DEAD))
+   ((nh-nh_flags ^ onh-nh_flags)  ~RTNH_F_DEAD) ||
+   nh-nh_encap_len != onh-nh_encap_len ||
+   memcmp(nh_encap, onh_encap, nh-nh_encap_len)
+   )
return -1;
onh++;
} endfor_nexthops(fi);
@@ -374,6 +380,11 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi)
/* may contain flow and gateway attribute */

[RFC net-next 0/3] IP imposition of per-nh MPLS encap

2015-06-01 Thread Robert Shearman

In order to be able to function as a Label Edge Router in an MPLS
network, it is necessary to be able to take IP packets and impose an
MPLS encap and forward them out. The traditional approach of setting
up an interface for each tunnel endpoint doesn't scale for the
common MPLS use-cases where each IP route tends to be assigned a
different label as encap.

The solution suggested here for further discussion is to provide the
facility to define encap data on a per-nexthop basis using a new
netlink attribue, RTA_ENCAP, which would be opaque to the IPv4/IPv6
forwarding code, but interpreted by the virtual interface assigned to
the nexthop.

A new ipmpls interface type is defined to show the use of this
facility to allow IP packets to be imposed with an MPLS
encap. However, the facility is designed to be general enough to be
used by any encapsulation/tunneling mechanism that has similar
requirements of high-scale, high-variation-of-encap.

RFC because:
 - IPv6 side not implemented
 - struct rtable shouldn't be bloated by pointer+uint
 - Hasn't been thoroughly tested yet

Robert Shearman (3):
  net: infra for per-nexthop encap data
  ipv4: storing and retrieval of per-nexthop encap
  mpls: new ipmpls device for encapsulating IP packets as mpls

 include/linux/rtnetlink.h  |   7 +
 include/net/dst.h  |  11 ++
 include/net/dst_ops.h  |   2 +
 include/net/ip_fib.h   |   2 +
 include/net/route.h|   3 +
 include/net/rtnetlink.h|  11 ++
 include/uapi/linux/if_arp.h|   1 +
 include/uapi/linux/rtnetlink.h |   1 +
 net/core/rtnetlink.c   |  36 ++
 net/ipv4/fib_frontend.c|   3 +
 net/ipv4/fib_lookup.h  |   2 +
 net/ipv4/fib_semantics.c   | 179 +-
 net/ipv4/route.c   |  24 
 net/mpls/Kconfig   |   5 +
 net/mpls/Makefile  |   1 +
 net/mpls/af_mpls.c |   2 +
 net/mpls/ipmpls.c  | 284 +
 17 files changed, 572 insertions(+), 2 deletions(-)
 create mode 100644 net/mpls/ipmpls.c

-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.32.66 tcp regression OOPs

2015-06-01 Thread Willy Tarreau

On Mon, Jun 01, 2015 at 11:32:19AM -0400, starlight.201...@binnacle.cx wrote:
 Hi,
 
 I found the patch late yesterday and applied it.
 
 Running fine now for 12 hours under active load.

Thank you.

 Recommend the patch be rolled into the tarball,
 or a notation added to the release page as this
 one has severe consequences.

I'll emit 2.6.32.67 with it. I didn't know it was that easy to trigger
it, and since feedback comes slowly on 2.6.32, I was waiting a bit for
more feedback before doing another one.

Thank you!
Willy

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next] vlan: Add GRO support for non hardware accelerated vlan

2015-06-01 Thread Eric Dumazet

On Tue, 2015-06-02 at 01:03 +0900, Toshiaki Makita wrote:

 I didn't have that concern because there are already other similar 
 offloads (eth, mpls_uc, mpls_mc). But indeed, they and this could slow 
 down GRO stack.

Right, but these mpls offloads are not installed on my kernels ;)

And I already checked that eth was installed before IPv4/IPv6,
although it might be pure luck.



 Add priority to packet_offload like nf_hook_ops?
 Or have dev_add_offload() prioritize IP  IPV6  others?


You also could use a CONFIG_NET_VLAN_GRO module, so that only
users stuck with non accelerated vlan can load.

But yes, we might take care of this problem at some point.



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL

2015-06-01 Thread Robert Shearman


On 01/06/15 15:27, Thomas Graf wrote:

Introduces a new Netlink attribute RTA_TUNNEL which allows routes
to set tunnel transmit metadata and specify the tunnel endpoint or
tunnel id on a per route basis. The route must point to a tunnel
device which understands per skb tunnel metadata and has been put
into the respective mode.


We've been discussing something similar for the purposes of IP over 
MPLS, but most of the attributes for IP tunnels aren't relevant for 
MPLS. It be great if we can come up with something general enough that 
can serve both purposes. I've just sent a patch series ([RFC net-next 
0/3] IP imposition of per-nh MPLS encap) which I believe would allow this.


Thanks,
Rob



Signed-off-by: Thomas Graf tg...@suug.ch
---
  include/net/ip_fib.h   |  3 +++
  include/net/ip_tunnels.h   |  1 -
  include/net/route.h| 10 
  include/uapi/linux/rtnetlink.h | 16 
  net/ipv4/fib_frontend.c| 57 ++
  net/ipv4/fib_semantics.c   | 45 +
  net/ipv4/route.c   | 30 +-
  net/openvswitch/vport.h|  1 +
  8 files changed, 161 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 54271ed..1cd7cf8 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -22,6 +22,7 @@
  #include net/fib_rules.h
  #include net/inetpeer.h
  #include linux/percpu.h
+#include net/ip_tunnels.h

  struct fib_config {
u8  fc_dst_len;
@@ -44,6 +45,7 @@ struct fib_config {
u32 fc_flow;
u32 fc_nlflags;
struct nl_info  fc_nlinfo;
+   struct ip_tunnel_info   fc_tunnel;
   };

  struct fib_info;
@@ -117,6 +119,7 @@ struct fib_info {
  #ifdef CONFIG_IP_ROUTE_MULTIPATH
int fib_power;
  #endif
+   struct ip_tunnel_info   *fib_tunnel;
struct rcu_head rcu;
struct fib_nh   fib_nh[0];
  #define fib_dev   fib_nh[0].nh_dev
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index df8cfd3..b4ab930 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -9,7 +9,6 @@
  #include net/dsfield.h
  #include net/gro_cells.h
  #include net/inet_ecn.h
-#include net/ip.h
  #include net/netns/generic.h
  #include net/rtnetlink.h
  #include net/flow.h
diff --git a/include/net/route.h b/include/net/route.h
index 6ede321..dbda603 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -28,6 +28,7 @@
  #include net/inetpeer.h
  #include net/flow.h
  #include net/inet_sock.h
+#include net/ip_tunnels.h
  #include linux/in_route.h
  #include linux/rtnetlink.h
  #include linux/rcupdate.h
@@ -66,6 +67,7 @@ struct rtable {

struct list_headrt_uncached;
struct uncached_list*rt_uncached_list;
+   struct ip_tunnel_info   *rt_tun_info;
  };

  static inline bool rt_is_input_route(const struct rtable *rt)
@@ -198,6 +200,8 @@ struct in_ifaddr;
  void fib_add_ifaddr(struct in_ifaddr *);
  void fib_del_ifaddr(struct in_ifaddr *, struct in_ifaddr *);

+int fib_dump_tun_info(struct sk_buff *skb, struct ip_tunnel_info *tun_info);
+
  static inline void ip_rt_put(struct rtable *rt)
  {
/* dst_release() accepts a NULL parameter.
@@ -317,9 +321,15 @@ static inline int ip4_dst_hoplimit(const struct dst_entry 
*dst)

  static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb)
  {
+   struct rtable *rt;
+
if (skb_shinfo(skb)-tun_info)
return skb_shinfo(skb)-tun_info;

+   rt = skb_rtable(skb);
+   if (rt)
+   return rt-rt_tun_info;
+
return NULL;
  }

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 17fb02f..1f7aa68 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -286,6 +286,21 @@ enum rt_class_t {

  /* Routing message attributes */

+enum rta_tunnel_t {
+   RTA_TUN_UNSPEC,
+   RTA_TUN_ID,
+   RTA_TUN_DST,
+   RTA_TUN_SRC,
+   RTA_TUN_TTL,
+   RTA_TUN_TOS,
+   RTA_TUN_SPORT,
+   RTA_TUN_DPORT,
+   RTA_TUN_FLAGS,
+   __RTA_TUN_MAX,
+};
+
+#define RTA_TUN_MAX (__RTA_TUN_MAX - 1)
+
  enum rtattr_type_t {
RTA_UNSPEC,
RTA_DST,
@@ -308,6 +323,7 @@ enum rtattr_type_t {
RTA_VIA,
RTA_NEWDST,
RTA_PREF,
+   RTA_TUNNEL, /* destination VTEP */
__RTA_MAX
  };

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 872494e..bfa77a6 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -580,6 +580,57 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void 
__user *arg)
return -EINVAL;
  }

+static const struct nla_policy tunnel_policy[RTA_TUN_MAX + 1] = {
+   [RTA_TUN_ID]= { .type = NLA_U64 },
+   [RTA_TUN_DST]   = { .type = NLA_U32 },
+

Re: [PATCH 7/7] mac80211: Switch to new AEAD interface

2015-06-01 Thread Johannes Berg

On Mon, 2015-06-01 at 15:49 +0200, Stephan Mueller wrote:

 The contents, now, that's a more interesting question. I believe it can
 never be all zeroes, since association request frames are not
 encrypted/protected and thus at least one byte in here must be non-zero.
 The MAC addresses are also very likely non-zero, but technically
 00:00:00:00:00:00 is a valid MAC address I believe.
 
 So, even when having a malicious AP, that value is never zero? The driver of 
 the question is the following code in the patch set:
 
 +   sg_set_buf(sg[0], aad[2], be16_to_cpup((__be16 *)aad));
 
 ...
 
 +   aead_request_set_crypt(aead_req, sg, sg, data_len, b_0);
 
 ...
 
 crypto_aead_encrypt(aead_req);
 
 
 When I played around with the aead_request_set_crypt, I saw a crash in the 
 scatterlist handling of the crypto API when the first SGL entry has a zero 
 length.

Wait, I guess that's a *third* way for this to be zero a valid pointer
but zero length data?

Oh, no - you're referring to the CCM/GCM cases only, I guess, i.e. this
part:

-   sg_init_one(assoc, aad[2], be16_to_cpup((__be16 *)aad));
+   sg_set_buf(sg[0], aad[2], be16_to_cpup((__be16 *)aad));

I was looking at GMAC and that has a constant for the length :-)

Ok - here the length is kinda passed a part of the AAD buffer, but this
is really just some arcane code that should be fixed to use a proper
struct. The value there, even though it is __be16 and looks like it came
from the data, is actually created locally, see ccmp_special_blocks()
and gcmp_special_blocks().

johannes

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] libceph: use kvfree() in ceph_put_page_vector()

2015-06-01 Thread Geliang Tang

Use kvfree() instead of open-coding it.

Signed-off-by: Geliang Tang geliangt...@163.com
---
 net/ceph/pagevec.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
index 096d914..d4f5f22 100644
--- a/net/ceph/pagevec.c
+++ b/net/ceph/pagevec.c
@@ -51,10 +51,7 @@ void ceph_put_page_vector(struct page **pages, int 
num_pages, bool dirty)
set_page_dirty_lock(pages[i]);
put_page(pages[i]);
}
-   if (is_vmalloc_addr(pages))
-   vfree(pages);
-   else
-   kfree(pages);
+   kvfree(pages);
 }
 EXPORT_SYMBOL(ceph_put_page_vector);
 
-- 
2.3.4


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] mac80211: Switch to new AEAD interface

2015-06-01 Thread Stephan Mueller

Am Donnerstag, 21. Mai 2015, 13:20:49 schrieb Johannes Berg:

Hi Johannes,

 On Thu, 2015-05-21 at 18:44 +0800, Herbert Xu wrote:
  This patch makes use of the new AEAD interface which uses a single
  SG list instead of separate lists for the AD and plain text.
 
 Looks fine - want me to run any tests on it?

Just a short question on ieee80211_aes_ccm_encrypt, ieee80211_aes_ccm_decrypt, 
ieee80211_aes_gcm_encrypt, ieee80211_aes_gcm_decrypt, ieee80211_aes_gmac: can 
the aad parameter of these functions be zero?

-- 
Ciao
Stephan
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] mac80211: Switch to new AEAD interface

2015-06-01 Thread Stephan Mueller

Am Montag, 1. Juni 2015, 15:42:41 schrieb Johannes Berg:

Hi Johannes,

On Mon, 2015-06-01 at 15:21 +0200, Stephan Mueller wrote:
 Just a short question on ieee80211_aes_ccm_encrypt,
 ieee80211_aes_ccm_decrypt, ieee80211_aes_gcm_encrypt,
 ieee80211_aes_gcm_decrypt, ieee80211_aes_gmac: can the aad parameter of
 these functions be zero?

What do you mean by zero? The pointer itself can clearly never be
NULL.

Thanks for clarifying: indeed I mean the value of the pointer, not the pointer 
itself :-)

The contents, now, that's a more interesting question. I believe it can
never be all zeroes, since association request frames are not
encrypted/protected and thus at least one byte in here must be non-zero.
The MAC addresses are also very likely non-zero, but technically
00:00:00:00:00:00 is a valid MAC address I believe.

So, even when having a malicious AP, that value is never zero? The driver of 
the question is the following code in the patch set:

+   sg_set_buf(sg[0], aad[2], be16_to_cpup((__be16 *)aad));

...

+   aead_request_set_crypt(aead_req, sg, sg, data_len, b_0);

...

crypto_aead_encrypt(aead_req);


When I played around with the aead_request_set_crypt, I saw a crash in the 
scatterlist handling of the crypto API when the first SGL entry has a zero 
length.

Ciao
Stephan
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] net/mlx4: fix typo in mlx4_set_vf_mac

2015-06-01 Thread clsoto

fix typo in mlx4_set_vf_mac

Signed-off-by: Carol L Soto cls...@linux.vnet.ibm.com
---
 drivers/net/ethernet/mellanox/mlx4/cmd.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -2687,7 +2687,7 @@ int mlx4_set_vf_mac(struct mlx4_dev *dev
port = mlx4_slaves_closest_port(dev, slave, port);
s_info = priv-mfunc.master.vf_admin[slave].vport[port];
s_info-mac = mac;
-   mlx4_info(dev, default mac on vf %d port %d to %llX will take afect 
only after vf restart\n,
+   mlx4_info(dev, default mac on vf %d port %d to %llX will take effect 
only after vf restart\n,
  vf, port, s_info-mac);
return 0;
 }

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] net/mlx4: double free of dev_vfs

2015-06-01 Thread clsoto

If user loads mlx4_core with num_vfs greater than supported
then variable dev-dev_vfs is freed 2 times after unloading
the driver.

Signed-off-by: Carol L Soto cls...@linux.vnet.ibm.com
---
 drivers/net/ethernet/mellanox/mlx4/main.c |1 +
 1 file changed, 1 insertion(+)

--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2685,6 +2685,7 @@ disable_sriov:
 free_mem:
dev-persist-num_vfs = 0;
kfree(dev-dev_vfs);
+   dev-dev_vfs = NULL;
return dev_flags  ~MLX4_FLAG_MASTER;
 }
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] net/mlx4: need to call close fw if alloc icm is called twice

2015-06-01 Thread clsoto

If mlx4_enable_sriov is called by adapter without this
feature MLX4_DEV_CAP_FLAG2_SYS_EQS then during this
path the function alloc icm is called twice without 
freeing the structures from the first time.

Signed-off-by: Carol L Soto cls...@linux.vnet.ibm.com
---
 drivers/net/ethernet/mellanox/mlx4/main.c |1 +
 1 file changed, 1 insertion(+)

--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2837,6 +2837,7 @@ slave_start:
  existing_vfs,
  reset_flow);
 
+   mlx4_close_fw(dev);
mlx4_cmd_cleanup(dev, MLX4_CMD_CLEANUP_ALL);
dev-flags = dev_flags;
if (!SRIOV_VALID_STATE(dev-flags)) {

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next] vlan: Add GRO support for non hardware accelerated vlan

2015-06-01 Thread Eric Dumazet

On Mon, 2015-06-01 at 21:55 +0900, Toshiaki Makita wrote:

 @@ -668,6 +753,9 @@ static int __init vlan_proto_init(void)
   if (err  0)
   goto err5;
  
 + for (i = 0; i  ARRAY_SIZE(vlan_packet_offloads); i++)
 + dev_add_offload(vlan_packet_offloads[i]);
 +
   vlan_ioctl_set(vlan_ioctl_handler);
   return 0;

My concern about this is :

This might slow down GRO stack for other traffic, if dev_add_offload()
for vlan offloads is called after 
dev_add_offload(ip_packet_offload) /
dev_add_offload(ipv6_packet_offload)


This is because list_add_rcu is used and this inserts in front of the
offload_base list.

void dev_add_offload(struct packet_offload *po)
{
struct list_head *head = offload_base;

spin_lock(offload_lock);
list_add_rcu(po-list, head);
spin_unlock(offload_lock);
}

Can we ensure offload_base contains a sensible order of expected types ?


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC 04/14] route: Extend flow representation with tunnel key

2015-06-01 Thread Thomas Graf

Add a new flowi_tunnel structure which is a subset of ip_tunnel_key
to allow routes to match on tunnel metadata. For now, the tunnel id
is added to flowi_tunnel which allows for routes to be bound to
specific virtual tunnels.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/net/flow.h   |  7 +++
 include/net/ip_tunnels.h | 10 ++
 net/ipv4/route.c |  2 ++
 3 files changed, 19 insertions(+)

diff --git a/include/net/flow.h b/include/net/flow.h
index 8109a15..c15fb5e 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -19,6 +19,10 @@
 
 #define LOOPBACK_IFINDEX   1
 
+struct flowi_tunnel {
+   __be64  tun_id;
+};
+
 struct flowi_common {
int flowic_oif;
int flowic_iif;
@@ -30,6 +34,7 @@ struct flowi_common {
 #define FLOWI_FLAG_ANYSRC  0x01
 #define FLOWI_FLAG_KNOWN_NH0x02
__u32   flowic_secid;
+   struct flowi_tunnel flowic_tun_key;
 };
 
 union flowi_uli {
@@ -66,6 +71,7 @@ struct flowi4 {
 #define flowi4_proto   __fl_common.flowic_proto
 #define flowi4_flags   __fl_common.flowic_flags
 #define flowi4_secid   __fl_common.flowic_secid
+#define flowi4_tun_key __fl_common.flowic_tun_key
 
/* (saddr,daddr) must be grouped, same order as in IP header */
__be32  saddr;
@@ -165,6 +171,7 @@ struct flowi {
 #define flowi_protou.__fl_common.flowic_proto
 #define flowi_flagsu.__fl_common.flowic_flags
 #define flowi_secidu.__fl_common.flowic_secid
+#define flowi_tun_key  u.__fl_common.flowic_tun_key
 } __attribute__((__aligned__(BITS_PER_LONG/8)));
 
 static inline struct flowi *flowi4_to_flowi(struct flowi4 *fl4)
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 8b76ba1..df8cfd3 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -12,6 +12,7 @@
 #include net/ip.h
 #include net/netns/generic.h
 #include net/rtnetlink.h
+#include net/flow.h
 
 #if IS_ENABLED(CONFIG_IPV6)
 #include net/ipv6.h
@@ -337,6 +338,15 @@ static inline void *ip_tunnel_info_opts(struct 
ip_tunnel_info *info,
return info + 1;
 }
 
+static inline void ip_tunnel_derive_key(struct sk_buff *skb,
+   struct flowi_tunnel *key)
+{
+   struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info;
+
+   if (tun_info  tun_info-mode == IP_TUNNEL_INFO_RX)
+   key-tun_id = tun_info-key.tun_id;
+}
+
 #endif /* CONFIG_INET */
 
 #endif /* __NET_IP_TUNNELS_H */
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f605598..6e8e1be 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -109,6 +109,7 @@
 #include linux/kmemleak.h
 #endif
 #include net/secure_seq.h
+#include net/ip_tunnels.h
 
 #define RT_FL_TOS(oldflp4) \
((oldflp4)-flowi4_tos  (IPTOS_RT_MASK | RTO_ONLINK))
@@ -1716,6 +1717,7 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
fl4.daddr = daddr;
fl4.saddr = saddr;
+   ip_tunnel_derive_key(skb, fl4.flowi4_tun_key);
err = fib_lookup(net, fl4, res);
if (err != 0) {
if (!IN_DEV_FORWARD(in_dev))
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC 02/14] ip_tunnel: support per packet tunnel metadata

2015-06-01 Thread Thomas Graf

This allows to attach an ip_tunnel_info metadata structure to skbs
via skb_shared_info to represent receive side tunnel information
as well as transmit side encapsulation instructions.

The new field is added to skb_shared_info as the field is typically
immutable after it has been attached. A new flag indicates whether
the metadata is meant for receive or transmit. This allows to keep
receive metadata attached to the skb all the way through the
forwarding path without mistaking it for transmit instructions. The
tun_info pointer is thus only released if a packet which has been
received on a tunnel is being forwarded to tunnel device again.

Since transmit instructions are immutable per flow which attaches
them to the skb, a reference count is introduced which allows to
reuse the metadata for many packets. Therefore, when a route later
on receives the capability to attach tunnel metadata, it will only
have to allocate the metadata once and can simply increment the
reference counter for each packet that uses that instruction set.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/linux/skbuff.h|  1 +
 include/net/ip_tunnels.h  | 45 +
 net/core/skbuff.c |  8 
 net/ipv4/ip_tunnel_core.c | 15 +++
 4 files changed, 69 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6b41c15..83f9a59 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -323,6 +323,7 @@ struct skb_shared_info {
unsigned short  gso_segs;
unsigned short  gso_type;
struct sk_buff  *frag_list;
+   struct ip_tunnel_info   *tun_info;
struct skb_shared_hwtstamps hwtstamps;
u32 tskey;
__be32  ip6_frag_id;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 6b9d559..3968705 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -38,10 +38,20 @@ struct ip_tunnel_key {
__be16  tp_dst;
 } __packed __aligned(4); /* Minimize padding. */
 
+/* Indicates whether the tunnel info structure represents receive
+ * or transmit tunnel parameters.
+ */
+enum {
+   IP_TUNNEL_INFO_RX,
+   IP_TUNNEL_INFO_TX,
+};
+
 struct ip_tunnel_info {
struct ip_tunnel_keykey;
const void  *options;
+   atomic_trefcnt;
u8  options_len;
+   u8  mode;
 };
 
 /* 6rd prefix/relay information */
@@ -284,6 +294,41 @@ static inline void iptunnel_xmit_stats(int err,
}
 }
 
+struct ip_tunnel_info *ip_tunnel_info_alloc(size_t optslen, gfp_t flags);
+
+static inline void ip_tunnel_info_get(struct ip_tunnel_info *info)
+{
+   atomic_inc(info-refcnt);
+}
+
+static inline void ip_tunnel_info_put(struct ip_tunnel_info *info)
+{
+   if (!info)
+   return;
+
+   if (atomic_dec_and_test(info-refcnt))
+   kfree(info);
+}
+
+static inline int skb_attach_tunnel_info(struct sk_buff *skb,
+struct ip_tunnel_info *info)
+{
+   if (skb_unclone(skb, GFP_ATOMIC))
+   return -ENOMEM;
+
+   ip_tunnel_info_put(skb_shinfo(skb)-tun_info);
+   ip_tunnel_info_get(info);
+   skb_shinfo(skb)-tun_info = info;
+
+   return 0;
+}
+
+static inline void skb_release_tunnel_info(struct sk_buff *skb)
+{
+   ip_tunnel_info_put(skb_shinfo(skb)-tun_info);
+   skb_shinfo(skb)-tun_info = NULL;
+}
+
 #endif /* CONFIG_INET */
 
 #endif /* __NET_IP_TUNNELS_H */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9bac0e6..dbbace2 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -69,6 +69,7 @@
 #include net/sock.h
 #include net/checksum.h
 #include net/ip6_checksum.h
+#include net/ip_tunnels.h
 #include net/xfrm.h
 
 #include asm/uaccess.h
@@ -594,6 +595,8 @@ static void skb_release_data(struct sk_buff *skb)
uarg-callback(uarg, true);
}
 
+   ip_tunnel_info_put(shinfo-tun_info);
+
if (shinfo-frag_list)
kfree_skb_list(shinfo-frag_list);
 
@@ -985,6 +988,11 @@ static void copy_skb_header(struct sk_buff *new, const 
struct sk_buff *old)
skb_shinfo(new)-gso_size = skb_shinfo(old)-gso_size;
skb_shinfo(new)-gso_segs = skb_shinfo(old)-gso_segs;
skb_shinfo(new)-gso_type = skb_shinfo(old)-gso_type;
+
+   if (skb_shinfo(old)-tun_info) {
+   ip_tunnel_info_get(skb_shinfo(old)-tun_info);
+   skb_shinfo(new)-tun_info = skb_shinfo(old)-tun_info;
+   }
 }
 
 static inline int skb_alloc_rx_flag(const struct sk_buff *skb)
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 6a51a71..bbd4f91 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -190,3 +190,18 @@ struct rtnl_link_stats64 *ip_tunnel_get_stats64(struct 
net_device *dev,
return tot;
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_get_stats64);
+

[net-next RFC 01/14] ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic

2015-06-01 Thread Thomas Graf

Rename the tunnel metadata data structures currently internal to
OVS and make them generic for use by all IP tunnels.

Both structures are kernel internal and will stay that way. Their
members are exposed to user space through individual Netlink
attributes by OVS. It will therefore be possible to extend/modify
these structures without affecting user ABI.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/net/ip_tunnels.h | 63 +
 include/uapi/linux/openvswitch.h |  2 +-
 net/openvswitch/actions.c|  2 +-
 net/openvswitch/datapath.h   |  5 +--
 net/openvswitch/flow.c   |  4 +--
 net/openvswitch/flow.h   | 76 ++--
 net/openvswitch/flow_netlink.c   | 16 -
 net/openvswitch/flow_netlink.h   |  2 +-
 net/openvswitch/vport-geneve.c   | 17 +
 net/openvswitch/vport-gre.c  | 16 -
 net/openvswitch/vport-vxlan.c| 18 +-
 net/openvswitch/vport.c  | 30 
 net/openvswitch/vport.h  | 12 +++
 13 files changed, 128 insertions(+), 135 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index d8214cb..6b9d559 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -22,6 +22,28 @@
 /* Keep error state on tunnel for 30 sec */
 #define IPTUNNEL_ERR_TIMEO (30*HZ)
 
+/* Used to memset ip_tunnel padding. */
+#define IP_TUNNEL_KEY_SIZE \
+   (offsetof(struct ip_tunnel_key, tp_dst) +   \
+FIELD_SIZEOF(struct ip_tunnel_key, tp_dst))
+
+struct ip_tunnel_key {
+   __be64  tun_id;
+   __be32  ipv4_src;
+   __be32  ipv4_dst;
+   __be16  tun_flags;
+   __u8ipv4_tos;
+   __u8ipv4_ttl;
+   __be16  tp_src;
+   __be16  tp_dst;
+} __packed __aligned(4); /* Minimize padding. */
+
+struct ip_tunnel_info {
+   struct ip_tunnel_keykey;
+   const void  *options;
+   u8  options_len;
+};
+
 /* 6rd prefix/relay information */
 #ifdef CONFIG_IPV6_SIT_6RD
 struct ip_tunnel_6rd_parm {
@@ -136,6 +158,47 @@ int ip_tunnel_encap_add_ops(const struct 
ip_tunnel_encap_ops *op,
 int ip_tunnel_encap_del_ops(const struct ip_tunnel_encap_ops *op,
unsigned int num);
 
+static inline void __ip_tunnel_info_init(struct ip_tunnel_info *tun_info,
+__be32 saddr, __be32 daddr,
+u8 tos, u8 ttl,
+__be16 tp_src, __be16 tp_dst,
+__be64 tun_id, __be16 tun_flags,
+const void *opts, u8 opts_len)
+{
+   tun_info-key.tun_id = tun_id;
+   tun_info-key.ipv4_src = saddr;
+   tun_info-key.ipv4_dst = daddr;
+   tun_info-key.ipv4_tos = tos;
+   tun_info-key.ipv4_ttl = ttl;
+   tun_info-key.tun_flags = tun_flags;
+
+   /* For the tunnel types on the top of IPsec, the tp_src and tp_dst of
+* the upper tunnel are used.
+* E.g: GRE over IPSEC, the tp_src and tp_port are zero.
+*/
+   tun_info-key.tp_src = tp_src;
+   tun_info-key.tp_dst = tp_dst;
+
+   /* Clear struct padding. */
+   if (sizeof(tun_info-key) != IP_TUNNEL_KEY_SIZE)
+   memset((unsigned char *)tun_info-key + IP_TUNNEL_KEY_SIZE,
+  0, sizeof(tun_info-key) - IP_TUNNEL_KEY_SIZE);
+
+   tun_info-options = opts;
+   tun_info-options_len = opts_len;
+}
+
+static inline void ip_tunnel_info_init(struct ip_tunnel_info *tun_info,
+  const struct iphdr *iph,
+  __be16 tp_src, __be16 tp_dst,
+  __be64 tun_id, __be16 tun_flags,
+  const void *opts, u8 opts_len)
+{
+   __ip_tunnel_info_init(tun_info, iph-saddr, iph-daddr,
+ iph-tos, iph-ttl, tp_src, tp_dst,
+ tun_id, tun_flags, opts, opts_len);
+}
+
 #ifdef CONFIG_INET
 
 int ip_tunnel_init(struct net_device *dev);
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index bbd49a0..fffe317 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -319,7 +319,7 @@ enum ovs_key_attr {
 * the accepted length of the array. */
 
 #ifdef __KERNEL__
-   OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ovs_tunnel_info */
+   OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
 #endif
__OVS_KEY_ATTR_MAX
 };
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index b491c1c..34cad57 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -610,7

Re: [PATCH] ethtool: changes of emac_regs structure accordingly within driver emac_regs structure.

2015-06-01 Thread Ivan Mikhaylov

On Mon, 1 June 2015 12:57 +0400
Ben Hutchings b...@decadent.org.uk wrote:

On Thu, 2015-05-21 at 19:09 +0400, Ivan Mikhaylov wrote:
 In ibm_emac.c in ethtool size of emac structure which passing through
 to driver is nailed down and not correlating with current emac_regs
 structure.
 
 Signed-off-by: Ivan Mikhaylov i...@ru.ibm.com
[...]

This is not backward-compatible.  It ought to be possible to mix and
match old and new ethtool and driver, except for the EMAC4SYNC case
which has been broken up until now.

Using the new definition of struct emac_regs, I think the driver and
ethtool need to agree that the MAC register dump sizes are:

EMAC:  offsetof(struct emac_regs, u1)
EMAC4: offsetof(struct emac_regs, u1.emac4) + sizeof(p-u1.emac4)
EMAC4SYNC: offsetof(struct emac_regs, u1.emac4sync) +
sizeof(p-u1.emac4sync)

Ben.

-- 
Ben Hutchings
Reality is just a crutch for people who can't handle science fiction.

Actually it is backward-compatible because we don't care about size
which is coming from driver side, only what we doing is map of driver
structure to ethtool structure and results will be same
for emac and emac4.

 struct emac_regs *p = (struct emac_regs *)(hdr + 1);

Also size which you mentioned (112 emac, 116 emac4) can be different
from what you saying cause this managed by dts files where we can set
something like 0x100 or 0x80 for this memory area and we will still
have problem in representing MII area if this size wasn't set right
in dts.

Ethtool will be work in same way even if we have emac or emac4.

Thank you for respond!

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] mac80211: Switch to new AEAD interface

2015-06-01 Thread Johannes Berg

On Mon, 2015-06-01 at 15:21 +0200, Stephan Mueller wrote:

 Just a short question on ieee80211_aes_ccm_encrypt, 
 ieee80211_aes_ccm_decrypt, 
 ieee80211_aes_gcm_encrypt, ieee80211_aes_gcm_decrypt, ieee80211_aes_gmac: can 
 the aad parameter of these functions be zero?

What do you mean by zero? The pointer itself can clearly never be
NULL.

The contents, now, that's a more interesting question. I believe it can
never be all zeroes, since association request frames are not
encrypted/protected and thus at least one byte in here must be non-zero.
The MAC addresses are also very likely non-zero, but technically
00:00:00:00:00:00 is a valid MAC address I believe.

johannes


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sctp: fix ASCONF list handling

2015-06-01 Thread Neil Horman

On Fri, May 29, 2015 at 01:50:37PM -0300, Marcelo Ricardo Leitner wrote:
 On Fri, May 29, 2015 at 09:17:26AM -0400, Neil Horman wrote:
  On Thu, May 28, 2015 at 11:46:29AM -0300, Marcelo Ricardo Leitner wrote:
   On Thu, May 28, 2015 at 10:27:32AM -0300, Marcelo Ricardo Leitner wrote:
On Thu, May 28, 2015 at 08:17:27AM -0300, Marcelo Ricardo Leitner wrote:
 On Thu, May 28, 2015 at 06:15:11AM -0400, Neil Horman wrote:
  On Wed, May 27, 2015 at 09:52:17PM -0300, mleit...@redhat.com wrote:
   From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com
  
   -auto_asconf_splist is per namespace and mangled by functions 
   like
   sctp_setsockopt_auto_asconf() which doesn't guarantee any 
   serialization.
  
   Also, the call to inet_sk_copy_descendant() was backuping
   -auto_asconf_list through the copy but was not honoring
   -do_auto_asconf, which could lead to list corruption if it was
   different between both sockets.
  
   This commit thus fixes the list handling by adding a spinlock to 
   protect
   against multiple writers and converts the list to be protected by 
   RCU
   too, so that we don't have a lock inverstion issue at
   sctp_addr_wq_timeout_handler().
  
   And as this list now uses RCU, we cannot do such backup and 
   restore
   while copying descendant data anymore as readers may be 
   traversing the
   list meanwhile. We fix this by simply ignoring/not copying those 
   fields,
   placed at the end of struct sctp_sock, so we can just ignore it 
   together
   with struct ipv6_pinfo data. For that we create 
   sctp_copy_descendant()
   so we don't clutter inet_sk_copy_descendant() with SCTP info.
  
   Issue was found with a test application that kept flipping sysctl
   default_auto_asconf on and off.
  
   Fixes: 9f7d653b67ae (sctp: Add Auto-ASCONF support (core).)
   Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com
   ---
include/net/netns/sctp.h   |  6 +-
include/net/sctp/structs.h |  2 ++
net/sctp/protocol.c|  6 +-
net/sctp/socket.c  | 39 
   ++-
4 files changed, 38 insertions(+), 15 deletions(-)
  
   diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
   index 
   3573a81815ad9e0efb6ceb721eb066d3726419f0..e080bebb3147af39c8275261f57018eb01e917b0
100644
   --- a/include/net/netns/sctp.h
   +++ b/include/net/netns/sctp.h
   @@ -30,12 +30,15 @@ struct netns_sctp {
 struct list_head local_addr_list;
 struct list_head addr_waitq;
 struct timer_list addr_wq_timer;
   - struct list_head auto_asconf_splist;
   + struct list_head __rcu auto_asconf_splist;
  You should use the addr_wq_lock here instead of creating a new 
  lock, as thats
  already used to protect most accesses to the list you are concerned 
  about.

 Ok, that works too.

  Though truthfully, that shouldn't be necessecary.  The list in 
  question is only
  read in one location and only written in one location.  You can 
  likely just
  rcu-ify, as the write side is in process context and protected by 
  lock_sock.

 It should, it's not protected by lock_sock as this list resides in
 netns_sctp structure, which lock_sock doesn't cover. Write side is in
 process context yes, but this list is written in sctp_init_sock(),
 sctp_destroy_sock() and sctp_setsockopt_auto_asconf(), so one could
 trigger this by either creating/destroying sockets if
 default_auto_asconf=1 or just by creating a bunch of sockets and
 flipping asconf via setsockopt (or a combination of these operations).
 (I'll point this out in the changelog)
   
Hmm.. by reusing addr_wq_lock we don't need to rcu-ify the list, as the
reader is inside that lock too, so I can just protect auto_asconf_splist
writers with addr_wq_lock.
   
Nice, thanks Neil.
  
   Cannot really do that.. as that creates a lock inversion between
   sctp_destroy_sock() (which already holds lock_sock) and
   sctp_addr_wq_timeout_handler(), which first grabs addr_wq_lock and then
   locks socket by socket.
  
   Due to that, I'm afraid reusing this lock is not possible, and we should
   stick with the patch.. what do you think? (though I have to fix the nits
   in there)
  
  I don't think thats accurate.  You are correct in that the the locks are 
  taken
  in opposing order, which would imply a lock inversion that could result in
  deadlock, but we can avoid that by deferring the asconf list removal until 
  after
  sk_common_release and unlock_sock_bh is called in sctp_close.  That will 
  make
  the lock ordering consistent.  Alternatively, we can pre-emptively take the
  asconf_lock in sctp_close before locking the socket.
 
 For

[net-next RFC 11/14] openvswitch: Use regular VXLAN net_device device

2015-06-01 Thread Thomas Graf

This gets rid of all OVS specific VXLAN code in the receive and
transmit path by using a VXLAN net_device to represent the vport.
Only a small shim layer remains which takes care of handling the
VXLAN specific OVS Netlink configuration.

Unexports vxlan_sock_add(), vxlan_sock_release(), vxlan_xmit_skb()
since they are no longer needed.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 drivers/net/vxlan.c|  23 +--
 include/net/vxlan.h|  14 +-
 net/openvswitch/Kconfig|  12 --
 net/openvswitch/Makefile   |   1 -
 net/openvswitch/flow_netlink.c |   5 +-
 net/openvswitch/vport-netdev.c | 176 +-
 net/openvswitch/vport-vxlan.c  | 322 -
 7 files changed, 193 insertions(+), 360 deletions(-)
 delete mode 100644 net/openvswitch/vport-vxlan.c

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 3acab95..b696871 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -74,6 +74,10 @@ static struct rtnl_link_ops vxlan_link_ops;
 
 static const u8 all_zeros_mac[ETH_ALEN];
 
+static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
+vxlan_rcv_t *rcv, void *data,
+bool no_share, u32 flags);
+
 /* per-network namespace private data for this module */
 struct vxlan_net {
struct list_head  vxlan_list;
@@ -1020,7 +1024,7 @@ static bool vxlan_group_used(struct vxlan_net *vn, struct 
vxlan_dev *dev)
return false;
 }
 
-void vxlan_sock_release(struct vxlan_sock *vs)
+static void vxlan_sock_release(struct vxlan_sock *vs)
 {
struct sock *sk = vs-sock-sk;
struct net *net = sock_net(sk);
@@ -1036,7 +1040,6 @@ void vxlan_sock_release(struct vxlan_sock *vs)
 
queue_work(vxlan_wq, vs-del_work);
 }
-EXPORT_SYMBOL_GPL(vxlan_sock_release);
 
 /* Update multicast group membership when first VNI on
  * multicast address is brought up
@@ -1761,10 +1764,10 @@ err:
 }
 #endif
 
-int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
-  __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-  __be16 src_port, __be16 dst_port,
-  struct vxlan_metadata *md, bool xnet, u32 vxflags)
+static int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff 
*skb,
+ __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
+ __be16 src_port, __be16 dst_port,
+ struct vxlan_metadata *md, bool xnet, u32 vxflags)
 {
struct vxlanhdr *vxh;
int min_headroom;
@@ -1834,7 +1837,6 @@ int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, 
struct sk_buff *skb,
   ttl, df, src_port, dst_port, xnet,
   !(vxflags  VXLAN_F_UDP_CSUM));
 }
-EXPORT_SYMBOL_GPL(vxlan_xmit_skb);
 
 /* Bypass encapsulation if the destination is local */
 static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev 
*src_vxlan,
@@ -2609,9 +2611,9 @@ static struct vxlan_sock *vxlan_socket_create(struct net 
*net, __be16 port,
return vs;
 }
 
-struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
- vxlan_rcv_t *rcv, void *data,
- bool no_share, u32 flags)
+static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
+vxlan_rcv_t *rcv, void *data,
+bool no_share, u32 flags)
 {
struct vxlan_net *vn = net_generic(net, vxlan_net_id);
struct vxlan_sock *vs;
@@ -2632,7 +2634,6 @@ struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 
port,
 
return vxlan_socket_create(net, port, rcv, data, flags);
 }
-EXPORT_SYMBOL_GPL(vxlan_sock_add);
 
 static int vxlan_dev_configure(struct net *src_net, struct net_device *dev,
   struct vxlan_config *conf)
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index c037b27..d3ce81f 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -197,19 +197,13 @@ struct vxlan_dev {
 VXLAN_F_REMCSUM_NOPARTIAL |\
 VXLAN_F_FLOW_BASED)
 
-struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
- vxlan_rcv_t *rcv, void *data,
- bool no_share, u32 flags);
-
 struct net_device *vxlan_dev_create(struct net *net, const char *name,
u8 name_assign_type, struct vxlan_config 
*conf);
 
-void vxlan_sock_release(struct vxlan_sock *vs);
-
-int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
-  __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-  __be16 src_port, __be16 dst_port, struct

[net-next RFC 10/14] openvswitch: Abstract vport name through ovs_vport_name()

2015-06-01 Thread Thomas Graf

This allows to get rid of the get_name() vport ops later on.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 net/openvswitch/datapath.c   | 4 ++--
 net/openvswitch/vport-internal_dev.c | 1 -
 net/openvswitch/vport-netdev.c   | 6 --
 net/openvswitch/vport-netdev.h   | 1 -
 net/openvswitch/vport.c  | 4 ++--
 net/openvswitch/vport.h  | 5 +
 6 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index c3ecfd4..8986558 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -176,7 +176,7 @@ static inline struct datapath *get_dp(struct net *net, int 
dp_ifindex)
 const char *ovs_dp_name(const struct datapath *dp)
 {
struct vport *vport = ovs_vport_ovsl_rcu(dp, OVSP_LOCAL);
-   return vport-ops-get_name(vport);
+   return ovs_vport_name(vport);
 }
 
 static int get_dpifindex(const struct datapath *dp)
@@ -1786,7 +1786,7 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, 
struct sk_buff *skb,
if (nla_put_u32(skb, OVS_VPORT_ATTR_PORT_NO, vport-port_no) ||
nla_put_u32(skb, OVS_VPORT_ATTR_TYPE, vport-ops-type) ||
nla_put_string(skb, OVS_VPORT_ATTR_NAME,
-  vport-ops-get_name(vport)))
+  ovs_vport_name(vport)))
goto nla_put_failure;
 
ovs_vport_get_stats(vport, vport_stats);
diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c
index a2c205d..c058bbf 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -242,7 +242,6 @@ static struct vport_ops ovs_internal_vport_ops = {
.type   = OVS_VPORT_TYPE_INTERNAL,
.create = internal_dev_create,
.destroy= internal_dev_destroy,
-   .get_name   = ovs_netdev_get_name,
.send   = internal_dev_recv,
 };
 
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index cb22051..ef11a41 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -170,11 +170,6 @@ static void netdev_destroy(struct vport *vport)
call_rcu(vport-rcu, free_port_rcu);
 }
 
-const char *ovs_netdev_get_name(const struct vport *vport)
-{
-   return vport-dev-name;
-}
-
 static unsigned int packet_length(const struct sk_buff *skb)
 {
unsigned int length = skb-len - ETH_HLEN;
@@ -222,7 +217,6 @@ static struct vport_ops ovs_netdev_vport_ops = {
.type   = OVS_VPORT_TYPE_NETDEV,
.create = netdev_create,
.destroy= netdev_destroy,
-   .get_name   = ovs_netdev_get_name,
.send   = netdev_send,
 };
 
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index 1c52aed..684fb88 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -26,7 +26,6 @@
 
 struct vport *ovs_netdev_get_vport(struct net_device *dev);
 
-const char *ovs_netdev_get_name(const struct vport *);
 void ovs_netdev_detach_dev(struct vport *);
 
 int __init ovs_netdev_init(void);
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index af23ba0..d14f594 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -113,7 +113,7 @@ struct vport *ovs_vport_locate(const struct net *net, const 
char *name)
struct vport *vport;
 
hlist_for_each_entry_rcu(vport, bucket, hash_node)
-   if (!strcmp(name, vport-ops-get_name(vport)) 
+   if (!strcmp(name, ovs_vport_name(vport)) 
net_eq(ovs_dp_get_net(vport-dp), net))
return vport;
 
@@ -226,7 +226,7 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
}
 
bucket = hash_bucket(ovs_dp_get_net(vport-dp),
-vport-ops-get_name(vport));
+ovs_vport_name(vport));
hlist_add_head_rcu(vport-hash_node, bucket);
return vport;
}
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index e05ec68..1a689c2 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -237,6 +237,11 @@ static inline void ovs_skb_postpush_rcsum(struct sk_buff 
*skb,
skb-csum = csum_add(skb-csum, csum_partial(start, len, 0));
 }
 
+static inline const char *ovs_vport_name(struct vport *vport)
+{
+   return vport-dev ? vport-dev-name : vport-ops-get_name(vport);
+}
+
 int ovs_vport_ops_register(struct vport_ops *ops);
 void ovs_vport_ops_unregister(struct vport_ops *ops);
 
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC 09/14] openvswitch: Move dev pointer into vport itself

2015-06-01 Thread Thomas Graf

This is the first step in representing all OVS vports as regular
struct net_devices. Move the net_device pointer into the vport
structure itself to get rid of struct vport_netdev.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 net/openvswitch/datapath.c   |  7 +--
 net/openvswitch/dp_notify.c  |  5 +--
 net/openvswitch/vport-internal_dev.c | 37 +++-
 net/openvswitch/vport-netdev.c   | 84 
 net/openvswitch/vport-netdev.h   | 12 --
 net/openvswitch/vport.h  |  3 +-
 6 files changed, 58 insertions(+), 90 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 3315e3a..c3ecfd4 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -188,7 +188,7 @@ static int get_dpifindex(const struct datapath *dp)
 
local = ovs_vport_rcu(dp, OVSP_LOCAL);
if (local)
-   ifindex = netdev_vport_priv(local)-dev-ifindex;
+   ifindex = local-dev-ifindex;
else
ifindex = 0;
 
@@ -2205,13 +2205,10 @@ static void __net_exit list_vports_from_net(struct net 
*net, struct net *dnet,
struct vport *vport;
 
hlist_for_each_entry(vport, dp-ports[i], 
dp_hash_node) {
-   struct netdev_vport *netdev_vport;
-
if (vport-ops-type != OVS_VPORT_TYPE_INTERNAL)
continue;
 
-   netdev_vport = netdev_vport_priv(vport);
-   if (dev_net(netdev_vport-dev) == dnet)
+   if (dev_net(vport-dev) == dnet)
list_add(vport-detach_list, head);
}
}
diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index 2c631fe..a7a80a6 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -58,13 +58,10 @@ void ovs_dp_notify_wq(struct work_struct *work)
struct hlist_node *n;
 
hlist_for_each_entry_safe(vport, n, dp-ports[i], 
dp_hash_node) {
-   struct netdev_vport *netdev_vport;
-
if (vport-ops-type != OVS_VPORT_TYPE_NETDEV)
continue;
 
-   netdev_vport = netdev_vport_priv(vport);
-   if (!(netdev_vport-dev-priv_flags  
IFF_OVS_DATAPATH))
+   if (!(vport-dev-priv_flags  
IFF_OVS_DATAPATH))
dp_detach_port_notify(vport);
}
}
diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c
index 6a55f71..a2c205d 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -156,49 +156,44 @@ static void do_setup(struct net_device *netdev)
 static struct vport *internal_dev_create(const struct vport_parms *parms)
 {
struct vport *vport;
-   struct netdev_vport *netdev_vport;
struct internal_dev *internal_dev;
int err;
 
-   vport = ovs_vport_alloc(sizeof(struct netdev_vport),
-   ovs_internal_vport_ops, parms);
+   vport = ovs_vport_alloc(0, ovs_internal_vport_ops, parms);
if (IS_ERR(vport)) {
err = PTR_ERR(vport);
goto error;
}
 
-   netdev_vport = netdev_vport_priv(vport);
-
-   netdev_vport-dev = alloc_netdev(sizeof(struct internal_dev),
-parms-name, NET_NAME_UNKNOWN,
-do_setup);
-   if (!netdev_vport-dev) {
+   vport-dev = alloc_netdev(sizeof(struct internal_dev),
+ parms-name, NET_NAME_UNKNOWN, do_setup);
+   if (!vport-dev) {
err = -ENOMEM;
goto error_free_vport;
}
 
-   dev_net_set(netdev_vport-dev, ovs_dp_get_net(vport-dp));
-   internal_dev = internal_dev_priv(netdev_vport-dev);
+   dev_net_set(vport-dev, ovs_dp_get_net(vport-dp));
+   internal_dev = internal_dev_priv(vport-dev);
internal_dev-vport = vport;
 
/* Restrict bridge port to current netns. */
if (vport-port_no == OVSP_LOCAL)
-   netdev_vport-dev-features |= NETIF_F_NETNS_LOCAL;
+   vport-dev-features |= NETIF_F_NETNS_LOCAL;
 
rtnl_lock();
-   err = register_netdevice(netdev_vport-dev);
+   err = register_netdevice(vport-dev);
if (err)
goto error_free_netdev;
 
-   dev_set_promiscuity(netdev_vport-dev, 1);
+   dev_set_promiscuity(vport-dev, 1);
rtnl_unlock();
-   netif_start_queue(netdev_vport-dev);
+   netif_start_queue(vport-dev);

[net-next RFC 03/14] vxlan: Flow based tunneling

2015-06-01 Thread Thomas Graf

Allows putting a VXLAN device into a new flow-based mode in which it
will populate a tunnel info structure for each packet received. The
metadata structure will contain the outer header and tunnel header
fields which have been stripped off. Layers further up in the stack
such as routing, tc or netfitler can later match on these fields.

On the transmit side, it allows skbs to carry their own encapsulation
instructions thus allowing encapsulations parameters to be set per
flow/route.

This prepares the VXLAN device to be steered by the routing subsystem
which will allow to support encapsulation for a large number of tunnel
endpoints and tunnel ids through a single net_device which improves
the scalability of current VXLAN tunnels.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 drivers/net/vxlan.c  | 147 ---
 include/linux/skbuff.h   |   1 +
 include/net/ip_tunnels.h |   8 +++
 include/net/route.h  |   8 +++
 include/net/vxlan.h  |   4 +-
 include/uapi/linux/if_link.h |   1 +
 6 files changed, 146 insertions(+), 23 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 34c519e..d5edba5 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1164,10 +1164,12 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff 
*skb, struct vxlanhdr *vh,
 /* Callback from net/ipv4/udp.c to receive packets */
 static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
+   struct ip_tunnel_info *tun_info = NULL;
struct vxlan_sock *vs;
struct vxlanhdr *vxh;
u32 flags, vni;
-   struct vxlan_metadata md = {0};
+   struct vxlan_metadata _md;
+   struct vxlan_metadata *md = _md;
 
/* Need Vxlan and inner Ethernet header to be present */
if (!pskb_may_pull(skb, VXLAN_HLEN))
@@ -1202,6 +1204,33 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
vni = VXLAN_VNI_MASK;
}
 
+   if (vs-flags  VXLAN_F_FLOW_BASED) {
+   const struct iphdr *iph = ip_hdr(skb);
+
+   /* TODO: Consider optimizing by looking up in flow cache */
+   tun_info = ip_tunnel_info_alloc(sizeof(*md), GFP_ATOMIC);
+   if (!tun_info)
+   goto drop;
+
+   tun_info-key.ipv4_src = iph-saddr;
+   tun_info-key.ipv4_dst = iph-daddr;
+   tun_info-key.ipv4_tos = iph-tos;
+   tun_info-key.ipv4_ttl = iph-ttl;
+   tun_info-key.tp_src = udp_hdr(skb)-source;
+   tun_info-key.tp_dst = udp_hdr(skb)-dest;
+
+   tun_info-mode = IP_TUNNEL_INFO_RX;
+   tun_info-key.tun_flags = TUNNEL_KEY;
+   tun_info-key.tun_id = cpu_to_be64(vni  8);
+   if (udp_hdr(skb)-check != 0)
+   tun_info-key.tun_flags |= TUNNEL_CSUM;
+
+   md = ip_tunnel_info_opts(tun_info, sizeof(*md));
+   skb_attach_tunnel_info(skb, tun_info);
+   } else {
+   memset(md, 0, sizeof(*md));
+   }
+
/* For backwards compatibility, only allow reserved fields to be
 * used by VXLAN extensions if explicitly requested.
 */
@@ -1209,13 +1238,16 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
struct vxlanhdr_gbp *gbp;
 
gbp = (struct vxlanhdr_gbp *)vxh;
-   md.gbp = ntohs(gbp-policy_id);
+   md-gbp = ntohs(gbp-policy_id);
+
+   if (tun_info)
+   tun_info-key.tun_flags |= TUNNEL_VXLAN_OPT;
 
if (gbp-dont_learn)
-   md.gbp |= VXLAN_GBP_DONT_LEARN;
+   md-gbp |= VXLAN_GBP_DONT_LEARN;
 
if (gbp-policy_applied)
-   md.gbp |= VXLAN_GBP_POLICY_APPLIED;
+   md-gbp |= VXLAN_GBP_POLICY_APPLIED;
 
flags = ~VXLAN_GBP_USED_BITS;
}
@@ -1233,8 +1265,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
goto bad_flags;
}
 
-   md.vni = vxh-vx_vni;
-   vs-rcv(vs, skb, md);
+   md-vni = vxh-vx_vni;
+   vs-rcv(vs, skb, md);
return 0;
 
 drop:
@@ -1254,6 +1286,7 @@ error:
 static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
  struct vxlan_metadata *md)
 {
+   struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info;
struct iphdr *oip = NULL;
struct ipv6hdr *oip6 = NULL;
struct vxlan_dev *vxlan;
@@ -1263,7 +1296,12 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct 
sk_buff *skb,
int err = 0;
union vxlan_addr *remote_ip;
 
-   vni = ntohl(md-vni)  8;
+   /* For flow based devices, map all packets to VNI 0 */
+   if (vs-flags  VXLAN_F_FLOW_BASED)
+   vni = 0;
+   else
+   vni =

[net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices

2015-06-01 Thread Thomas Graf

This is the first series in a greater effort to bring the scalability
and programmability advantages of OVS to the rest of the network
stack and to get rid of as much OVS specific code as possible.

This first series focuses on getting rid of OVS tunnel vports and use
regular tunnel net_devices instead. As part of this effort, the
routing subsystem is extended with support for flow based tunneling.
In this new tunneling mode, the route is able to match on tunnel
information as well as set tunnel encapsulation parameters per route.
This allows to perform L3 forwarding for a large number of tunnel
endpoints and virtual networks using a single tunnel net_device.

TODO:
 - Geneve support
 - IPv6 support
 - Benchmarks

Pravin Shelar (1):
  openvswitch: Use regular GRE net_device instead of vport

Thomas Graf (13):
  ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic
  ip_tunnel: support per packet tunnel metadata
  vxlan: Flow based tunneling
  route: Extend flow representation with tunnel key
  route: Per route tunnel metadata with RTA_TUNNEL
  fib: Add fib rule match on tunnel id
  vxlan: Factor out device configuration
  openvswitch: Allocate  attach ip_tunnel_info for tunnel set action
  openvswitch: Move dev pointer into vport itself
  openvswitch: Abstract vport name through ovs_vport_name()
  openvswitch: Use regular VXLAN net_device device
  vxlan: remove indirect call to vxlan_rcv() and vni member
  arp: Associate ARP requests with tunnel info

 drivers/net/vxlan.c  | 663 ---
 include/linux/skbuff.h   |   2 +
 include/net/fib_rules.h  |   1 +
 include/net/flow.h   |   7 +
 include/net/ip_fib.h |   3 +
 include/net/ip_tunnels.h | 127 ++-
 include/net/route.h  |  18 +
 include/net/vxlan.h  |  82 -
 include/uapi/linux/fib_rules.h   |   2 +-
 include/uapi/linux/if_link.h |   1 +
 include/uapi/linux/openvswitch.h |   2 +-
 include/uapi/linux/rtnetlink.h   |  16 +
 net/core/dev.c   |   5 +-
 net/core/fib_rules.c |  17 +-
 net/core/skbuff.c|   8 +
 net/ipv4/arp.c   |   8 +
 net/ipv4/fib_frontend.c  |  57 +++
 net/ipv4/fib_semantics.c |  45 +++
 net/ipv4/ip_gre.c| 161 -
 net/ipv4/ip_tunnel_core.c|  15 +
 net/ipv4/route.c |  32 +-
 net/openvswitch/Kconfig  |  12 -
 net/openvswitch/Makefile |   2 -
 net/openvswitch/actions.c|  10 +-
 net/openvswitch/datapath.c   |  19 +-
 net/openvswitch/datapath.h   |   5 +-
 net/openvswitch/dp_notify.c  |   5 +-
 net/openvswitch/flow.c   |   4 +-
 net/openvswitch/flow.h   |  77 +---
 net/openvswitch/flow_netlink.c   |  78 -
 net/openvswitch/flow_netlink.h   |   3 +-
 net/openvswitch/vport-geneve.c   |  17 +-
 net/openvswitch/vport-gre.c  | 313 -
 net/openvswitch/vport-internal_dev.c |  38 +-
 net/openvswitch/vport-netdev.c   | 271 +++---
 net/openvswitch/vport-netdev.h   |  13 -
 net/openvswitch/vport-vxlan.c| 322 -
 net/openvswitch/vport.c  |  34 +-
 net/openvswitch/vport.h  |  21 +-
 39 files changed, 1334 insertions(+), 1182 deletions(-)
 delete mode 100644 net/openvswitch/vport-gre.c
 delete mode 100644 net/openvswitch/vport-vxlan.c

-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC 07/14] vxlan: Factor out device configuration

2015-06-01 Thread Thomas Graf

This factors out the device configuration out of the RTNL newlink
API which allows for in-kernel creation of VXLAN net_devices.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 drivers/net/vxlan.c | 332 
 include/net/vxlan.h |  59 ++
 2 files changed, 236 insertions(+), 155 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d5edba5..3acab95 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -54,10 +54,6 @@
 
 #define PORT_HASH_BITS 8
 #define PORT_HASH_SIZE  (1PORT_HASH_BITS)
-#define VNI_HASH_BITS  10
-#define VNI_HASH_SIZE  (1VNI_HASH_BITS)
-#define FDB_HASH_BITS  8
-#define FDB_HASH_SIZE  (1FDB_HASH_BITS)
 #define FDB_AGE_DEFAULT 300 /* 5 min */
 #define FDB_AGE_INTERVAL (10 * HZ) /* rescan interval */
 
@@ -74,6 +70,7 @@ module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, Log packets received with corrupted ECN);
 
 static int vxlan_net_id;
+static struct rtnl_link_ops vxlan_link_ops;
 
 static const u8 all_zeros_mac[ETH_ALEN];
 
@@ -84,21 +81,6 @@ struct vxlan_net {
spinlock_tsock_lock;
 };
 
-union vxlan_addr {
-   struct sockaddr_in sin;
-   struct sockaddr_in6 sin6;
-   struct sockaddr sa;
-};
-
-struct vxlan_rdst {
-   union vxlan_addr remote_ip;
-   __be16   remote_port;
-   u32  remote_vni;
-   u32  remote_ifindex;
-   struct list_head list;
-   struct rcu_head  rcu;
-};
-
 /* Forwarding table entry */
 struct vxlan_fdb {
struct hlist_node hlist;/* linked list of entries */
@@ -111,31 +93,6 @@ struct vxlan_fdb {
u8eth_addr[ETH_ALEN];
 };
 
-/* Pseudo network device */
-struct vxlan_dev {
-   struct hlist_node hlist;/* vni hash table */
-   struct list_head  next; /* vxlan's per namespace list */
-   struct vxlan_sock *vn_sock; /* listening socket */
-   struct net_device *dev;
-   struct net*net; /* netns for packet i/o */
-   struct vxlan_rdst default_dst;  /* default destination */
-   union vxlan_addr  saddr;/* source address */
-   __be16dst_port;
-   __u16 port_min; /* source port range */
-   __u16 port_max;
-   __u8  tos;  /* TOS override */
-   __u8  ttl;
-   u32   flags;/* VXLAN_F_* in vxlan.h */
-
-   unsigned long age_interval;
-   struct timer_list age_timer;
-   spinlock_thash_lock;
-   unsigned int  addrcnt;
-   unsigned int  addrmax;
-
-   struct hlist_head fdb_head[FDB_HASH_SIZE];
-};
-
 /* salt for hash table */
 static u32 vxlan_salt __read_mostly;
 static struct workqueue_struct *vxlan_wq;
@@ -345,7 +302,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct 
vxlan_dev *vxlan,
if (send_ip  vxlan_nla_put_addr(skb, NDA_DST, rdst-remote_ip))
goto nla_put_failure;
 
-   if (rdst-remote_port  rdst-remote_port != vxlan-dst_port 
+   if (rdst-remote_port  rdst-remote_port != vxlan-cfg.dst_port 
nla_put_be16(skb, NDA_PORT, rdst-remote_port))
goto nla_put_failure;
if (rdst-remote_vni != vxlan-default_dst.remote_vni 
@@ -749,7 +706,8 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
if (!(flags  NLM_F_CREATE))
return -ENOENT;
 
-   if (vxlan-addrmax  vxlan-addrcnt = vxlan-addrmax)
+   if (vxlan-cfg.addrmax 
+   vxlan-addrcnt = vxlan-cfg.addrmax)
return -ENOSPC;
 
/* Disallow replace to add a multicast entry */
@@ -835,7 +793,7 @@ static int vxlan_fdb_parse(struct nlattr *tb[], struct 
vxlan_dev *vxlan,
return -EINVAL;
*port = nla_get_be16(tb[NDA_PORT]);
} else {
-   *port = vxlan-dst_port;
+   *port = vxlan-cfg.dst_port;
}
 
if (tb[NDA_VNI]) {
@@ -1021,7 +979,7 @@ static bool vxlan_snoop(struct net_device *dev,
vxlan_fdb_create(vxlan, src_mac, src_ip,
 NUD_REACHABLE,
 NLM_F_EXCL|NLM_F_CREATE,
-vxlan-dst_port,
+vxlan-cfg.dst_port,
 vxlan-default_dst.remote_vni,
 0, NTF_SELF);
spin_unlock(vxlan-hash_lock);
@@ -1945,7 +1903,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
u32 flags = vxlan-flags;
 
if (rdst) {
-   dst_port = rdst-remote_port ? rdst-remote_port : 
vxlan-dst_port;
+   dst_port = rdst-remote_port ? rdst-remote_port : 
vxlan-cfg.dst_port;

[net-next RFC 14/14] arp: Associate ARP requests with tunnel info

2015-06-01 Thread Thomas Graf

Since ARP performs its own route lookup call, eventually
returned tunnel metadata must be attached manually.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 net/ipv4/arp.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 933a928..6cf0502 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -489,6 +489,7 @@ struct sk_buff *arp_create(int type, int ptype, __be32 
dest_ip,
unsigned char *arp_ptr;
int hlen = LL_RESERVED_SPACE(dev);
int tlen = dev-needed_tailroom;
+   struct rtable *rt;
 
/*
 *  Allocate a buffer
@@ -577,6 +578,13 @@ struct sk_buff *arp_create(int type, int ptype, __be32 
dest_ip,
}
memcpy(arp_ptr, dest_ip, 4);
 
+   rt = ip_route_output(dev_net(dev), dest_ip, src_ip, 0, dev-ifindex);
+   if (!IS_ERR(rt)) {
+   if (rt-rt_tun_info)
+   skb_attach_tunnel_info(skb, rt-rt_tun_info);
+   ip_rt_put(rt);
+   }
+
return skb;
 
 out:
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC 06/14] fib: Add fib rule match on tunnel id

2015-06-01 Thread Thomas Graf

This add the ability to select a routing table based on the tunnel
id which allows to maintain separate routing tables for each virtual
tunnel network.

ip rule add from all tunnel-id 100 lookup 100
ip rule add from all tunnel-id 200 lookup 200

Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/net/fib_rules.h|  1 +
 include/uapi/linux/fib_rules.h |  2 +-
 net/core/fib_rules.c   | 17 +++--
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 6d67383..822ed1e 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -19,6 +19,7 @@ struct fib_rule {
u8  action;
/* 3 bytes hole, try to use */
u32 target;
+   __be64  tun_id;
struct fib_rule __rcu   *ctarget;
struct net  *fr_net;
 
diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h
index 2b82d7e..96161b8 100644
--- a/include/uapi/linux/fib_rules.h
+++ b/include/uapi/linux/fib_rules.h
@@ -43,7 +43,7 @@ enum {
FRA_UNUSED5,
FRA_FWMARK, /* mark */
FRA_FLOW,   /* flow/class id */
-   FRA_UNUSED6,
+   FRA_TUN_ID,
FRA_SUPPRESS_IFGROUP,
FRA_SUPPRESS_PREFIXLEN,
FRA_TABLE,  /* Extended table id */
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 9a12668..6da78c9 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -186,6 +186,9 @@ static int fib_rule_match(struct fib_rule *rule, struct 
fib_rules_ops *ops,
if ((rule-mark ^ fl-flowi_mark)  rule-mark_mask)
goto out;
 
+   if (rule-tun_id  (rule-tun_id != fl-flowi_tun_key.tun_id))
+   goto out;
+
ret = ops-match(rule, fl, flags);
 out:
return (rule-flags  FIB_RULE_INVERT) ? !ret : ret;
@@ -330,6 +333,9 @@ static int fib_nl_newrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
if (tb[FRA_FWMASK])
rule-mark_mask = nla_get_u32(tb[FRA_FWMASK]);
 
+   if (tb[FRA_TUN_ID])
+   rule-tun_id = nla_get_be64(tb[FRA_TUN_ID]);
+
rule-action = frh-action;
rule-flags = frh-flags;
rule-table = frh_get_table(frh, tb);
@@ -473,6 +479,10 @@ static int fib_nl_delrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
(rule-mark_mask != nla_get_u32(tb[FRA_FWMASK])))
continue;
 
+   if (tb[FRA_TUN_ID] 
+   (rule-tun_id != nla_get_be64(tb[FRA_TUN_ID])))
+   continue;
+
if (!ops-compare(rule, frh, tb))
continue;
 
@@ -535,7 +545,8 @@ static inline size_t fib_rule_nlmsg_size(struct 
fib_rules_ops *ops,
 + nla_total_size(4) /* FRA_SUPPRESS_PREFIXLEN */
 + nla_total_size(4) /* FRA_SUPPRESS_IFGROUP */
 + nla_total_size(4) /* FRA_FWMARK */
-+ nla_total_size(4); /* FRA_FWMASK */
++ nla_total_size(4) /* FRA_FWMASK */
++ nla_total_size(8); /* FRA_TUN_ID */
 
if (ops-nlmsg_payload)
payload += ops-nlmsg_payload(rule);
@@ -591,7 +602,9 @@ static int fib_nl_fill_rule(struct sk_buff *skb, struct 
fib_rule *rule,
((rule-mark_mask || rule-mark) 
 nla_put_u32(skb, FRA_FWMASK, rule-mark_mask)) ||
(rule-target 
-nla_put_u32(skb, FRA_GOTO, rule-target)))
+nla_put_u32(skb, FRA_GOTO, rule-target)) ||
+   (rule-tun_id 
+nla_put_be64(skb, FRA_TUN_ID, rule-tun_id)))
goto nla_put_failure;
 
if (rule-suppress_ifgroup != -1) {
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL

2015-06-01 Thread Thomas Graf

Introduces a new Netlink attribute RTA_TUNNEL which allows routes
to set tunnel transmit metadata and specify the tunnel endpoint or
tunnel id on a per route basis. The route must point to a tunnel
device which understands per skb tunnel metadata and has been put
into the respective mode.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/net/ip_fib.h   |  3 +++
 include/net/ip_tunnels.h   |  1 -
 include/net/route.h| 10 
 include/uapi/linux/rtnetlink.h | 16 
 net/ipv4/fib_frontend.c| 57 ++
 net/ipv4/fib_semantics.c   | 45 +
 net/ipv4/route.c   | 30 +-
 net/openvswitch/vport.h|  1 +
 8 files changed, 161 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 54271ed..1cd7cf8 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -22,6 +22,7 @@
 #include net/fib_rules.h
 #include net/inetpeer.h
 #include linux/percpu.h
+#include net/ip_tunnels.h
 
 struct fib_config {
u8  fc_dst_len;
@@ -44,6 +45,7 @@ struct fib_config {
u32 fc_flow;
u32 fc_nlflags;
struct nl_info  fc_nlinfo;
+   struct ip_tunnel_info   fc_tunnel;
  };
 
 struct fib_info;
@@ -117,6 +119,7 @@ struct fib_info {
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
int fib_power;
 #endif
+   struct ip_tunnel_info   *fib_tunnel;
struct rcu_head rcu;
struct fib_nh   fib_nh[0];
 #define fib_devfib_nh[0].nh_dev
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index df8cfd3..b4ab930 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -9,7 +9,6 @@
 #include net/dsfield.h
 #include net/gro_cells.h
 #include net/inet_ecn.h
-#include net/ip.h
 #include net/netns/generic.h
 #include net/rtnetlink.h
 #include net/flow.h
diff --git a/include/net/route.h b/include/net/route.h
index 6ede321..dbda603 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -28,6 +28,7 @@
 #include net/inetpeer.h
 #include net/flow.h
 #include net/inet_sock.h
+#include net/ip_tunnels.h
 #include linux/in_route.h
 #include linux/rtnetlink.h
 #include linux/rcupdate.h
@@ -66,6 +67,7 @@ struct rtable {
 
struct list_headrt_uncached;
struct uncached_list*rt_uncached_list;
+   struct ip_tunnel_info   *rt_tun_info;
 };
 
 static inline bool rt_is_input_route(const struct rtable *rt)
@@ -198,6 +200,8 @@ struct in_ifaddr;
 void fib_add_ifaddr(struct in_ifaddr *);
 void fib_del_ifaddr(struct in_ifaddr *, struct in_ifaddr *);
 
+int fib_dump_tun_info(struct sk_buff *skb, struct ip_tunnel_info *tun_info);
+
 static inline void ip_rt_put(struct rtable *rt)
 {
/* dst_release() accepts a NULL parameter.
@@ -317,9 +321,15 @@ static inline int ip4_dst_hoplimit(const struct dst_entry 
*dst)
 
 static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb)
 {
+   struct rtable *rt;
+
if (skb_shinfo(skb)-tun_info)
return skb_shinfo(skb)-tun_info;
 
+   rt = skb_rtable(skb);
+   if (rt)
+   return rt-rt_tun_info;
+
return NULL;
 }
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 17fb02f..1f7aa68 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -286,6 +286,21 @@ enum rt_class_t {
 
 /* Routing message attributes */
 
+enum rta_tunnel_t {
+   RTA_TUN_UNSPEC,
+   RTA_TUN_ID,
+   RTA_TUN_DST,
+   RTA_TUN_SRC,
+   RTA_TUN_TTL,
+   RTA_TUN_TOS,
+   RTA_TUN_SPORT,
+   RTA_TUN_DPORT,
+   RTA_TUN_FLAGS,
+   __RTA_TUN_MAX,
+};
+
+#define RTA_TUN_MAX (__RTA_TUN_MAX - 1)
+
 enum rtattr_type_t {
RTA_UNSPEC,
RTA_DST,
@@ -308,6 +323,7 @@ enum rtattr_type_t {
RTA_VIA,
RTA_NEWDST,
RTA_PREF,
+   RTA_TUNNEL, /* destination VTEP */
__RTA_MAX
 };
 
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 872494e..bfa77a6 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -580,6 +580,57 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void 
__user *arg)
return -EINVAL;
 }
 
+static const struct nla_policy tunnel_policy[RTA_TUN_MAX + 1] = {
+   [RTA_TUN_ID]= { .type = NLA_U64 },
+   [RTA_TUN_DST]   = { .type = NLA_U32 },
+   [RTA_TUN_SRC]   = { .type = NLA_U32 },
+   [RTA_TUN_TTL]   = { .type = NLA_U8 },
+   [RTA_TUN_TOS]   = { .type = NLA_U8 },
+   [RTA_TUN_SPORT] = { .type = NLA_U16 },
+   [RTA_TUN_DPORT] = { .type = NLA_U16 },
+   [RTA_TUN_FLAGS] = { .type = NLA_U16 },
+};
+
+static int parse_rta_tunnel(struct fib_config *cfg, struct nlattr *attr)
+{
+   struct nlattr

[net-next RFC 12/14] vxlan: remove indirect call to vxlan_rcv() and vni member

2015-06-01 Thread Thomas Graf

With the removal of the special treating of OVS VXLAN vports, the
indirect call to vxlan_rcv() can be avoided and the VNI member
in vxlan_metadata can be removed.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 drivers/net/vxlan.c | 225 +---
 include/net/vxlan.h |   7 --
 2 files changed, 107 insertions(+), 125 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index b696871..9cc7d5a 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -75,7 +75,6 @@ static struct rtnl_link_ops vxlan_link_ops;
 static const u8 all_zeros_mac[ETH_ALEN];
 
 static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
-vxlan_rcv_t *rcv, void *data,
 bool no_share, u32 flags);
 
 /* per-network namespace private data for this module */
@@ -1122,6 +1121,102 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff 
*skb, struct vxlanhdr *vh,
return vh;
 }
 
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+ struct vxlan_metadata *md, __u32 vni)
+{
+   struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info;
+   struct iphdr *oip = NULL;
+   struct ipv6hdr *oip6 = NULL;
+   struct vxlan_dev *vxlan;
+   struct pcpu_sw_netstats *stats;
+   union vxlan_addr saddr;
+   int err = 0;
+   union vxlan_addr *remote_ip;
+
+   /* For flow based devices, map all packets to VNI 0 */
+   if (vs-flags  VXLAN_F_FLOW_BASED)
+   vni = 0;
+
+   /* Is this VNI defined? */
+   vxlan = vxlan_vs_find_vni(vs, vni);
+   if (!vxlan)
+   goto drop;
+
+   remote_ip = vxlan-default_dst.remote_ip;
+   skb_reset_mac_header(skb);
+   skb_scrub_packet(skb, !net_eq(vxlan-net, dev_net(vxlan-dev)));
+   skb-protocol = eth_type_trans(skb, vxlan-dev);
+   skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
+
+   /* Ignore packet loops (and multicast echo) */
+   if (ether_addr_equal(eth_hdr(skb)-h_source, vxlan-dev-dev_addr))
+   goto drop;
+
+   /* Re-examine inner Ethernet packet */
+   if (remote_ip-sa.sa_family == AF_INET) {
+   oip = ip_hdr(skb);
+   saddr.sin.sin_addr.s_addr = oip-saddr;
+   saddr.sa.sa_family = AF_INET;
+
+   if (tun_info) {
+   tun_info-key.ipv4_src = oip-saddr;
+   tun_info-key.ipv4_dst = oip-daddr;
+   tun_info-key.ipv4_tos = oip-tos;
+   tun_info-key.ipv4_ttl = oip-ttl;
+   }
+#if IS_ENABLED(CONFIG_IPV6)
+   } else {
+   oip6 = ipv6_hdr(skb);
+   saddr.sin6.sin6_addr = oip6-saddr;
+   saddr.sa.sa_family = AF_INET6;
+
+   /* TODO : Fill IPv6 tunnel info */
+#endif
+   }
+
+   if ((vxlan-flags  VXLAN_F_LEARN) 
+   vxlan_snoop(skb-dev, saddr, eth_hdr(skb)-h_source))
+   goto drop;
+
+   skb_reset_network_header(skb);
+   if (!(vs-flags  VXLAN_F_FLOW_BASED))
+   skb-mark = md-gbp;
+
+   if (oip6)
+   err = IP6_ECN_decapsulate(oip6, skb);
+   if (oip)
+   err = IP_ECN_decapsulate(oip, skb);
+
+   if (unlikely(err)) {
+   if (log_ecn_error) {
+   if (oip6)
+   net_info_ratelimited(non-ECT from %pI6\n,
+oip6-saddr);
+   if (oip)
+   net_info_ratelimited(non-ECT from %pI4 with 
TOS=%#x\n,
+oip-saddr, oip-tos);
+   }
+   if (err  1) {
+   ++vxlan-dev-stats.rx_frame_errors;
+   ++vxlan-dev-stats.rx_errors;
+   goto drop;
+   }
+   }
+
+   stats = this_cpu_ptr(vxlan-dev-tstats);
+   u64_stats_update_begin(stats-syncp);
+   stats-rx_packets++;
+   stats-rx_bytes += skb-len;
+   u64_stats_update_end(stats-syncp);
+
+   netif_rx(skb);
+
+   return;
+drop:
+   /* Consume bad packet */
+   kfree_skb(skb);
+}
+
 /* Callback from net/ipv4/udp.c to receive packets */
 static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
@@ -1226,8 +1321,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
goto bad_flags;
}
 
-   md-vni = vxh-vx_vni;
-   vs-rcv(vs, skb, md);
+   vxlan_rcv(vs, skb, md, vni  8);
return 0;
 
 drop:
@@ -1244,105 +1338,6 @@ error:
return 1;
 }
 
-static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
- struct vxlan_metadata *md)
-{
-   struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info;
-   struct iphdr *oip = NULL;
-   struct ipv6hdr *oip6 = NULL;
-   struct

[net-next RFC 13/14] openvswitch: Use regular GRE net_device instead of vport

2015-06-01 Thread Thomas Graf

From: Pravin Shelar pshe...@nicira.com

Removes all of the OVS specific GRE code and makes OVS use a
GRE net_device .

Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 net/core/dev.c |   5 +-
 net/ipv4/ip_gre.c  | 161 -
 net/openvswitch/Makefile   |   1 -
 net/openvswitch/vport-gre.c| 313 -
 net/openvswitch/vport-netdev.c |   7 +-
 5 files changed, 168 insertions(+), 319 deletions(-)
 delete mode 100644 net/openvswitch/vport-gre.c

diff --git a/net/core/dev.c b/net/core/dev.c
index 594163d..656f3b4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6969,6 +6969,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
INIT_LIST_HEAD(dev-ptype_all);
INIT_LIST_HEAD(dev-ptype_specific);
dev-priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
+
+   strcpy(dev-name, name);
+   dev-name_assign_type = name_assign_type;
setup(dev);
 
dev-num_tx_queues = txqs;
@@ -6983,8 +6986,6 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
goto free_all;
 #endif
 
-   strcpy(dev-name, name);
-   dev-name_assign_type = name_assign_type;
dev-group = INIT_NETDEV_GROUP;
if (!dev-ethtool_ops)
dev-ethtool_ops = default_ethtool_ops;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 5fd7064..b37515e 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -25,6 +25,7 @@
 #include linux/udp.h
 #include linux/if_arp.h
 #include linux/mroute.h
+#include linux/if_vlan.h
 #include linux/init.h
 #include linux/in6.h
 #include linux/inetdevice.h
@@ -115,6 +116,8 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, Log packets received with corrupted ECN);
 
+#define GRE_TAP_FB_NAME gretap0
+
 static struct rtnl_link_ops ipgre_link_ops __read_mostly;
 static int ipgre_tunnel_init(struct net_device *dev);
 
@@ -217,7 +220,17 @@ static int ipgre_rcv(struct sk_buff *skb, const struct 
tnl_ptk_info *tpi)
  iph-saddr, iph-daddr, tpi-key);
 
if (tunnel) {
+
skb_pop_mac_header(skb);
+   if (tunnel-dev == itn-fb_tunnel_dev) {
+   struct ip_tunnel_info *tun_info;
+
+   tun_info = ip_tunnel_info_alloc(0, GFP_ATOMIC);
+
+   /* TODO: setup tun info from tpi */
+   skb_attach_tunnel_info(skb, tun_info);
+   }
+
ip_tunnel_rcv(tunnel, skb, tpi, log_ecn_error);
return PACKET_RCVD;
}
@@ -287,6 +300,135 @@ out:
return NETDEV_TX_OK;
 }
 
+/* TODO: share xmit code */
+static inline struct rtable *tunnel_route_lookup(struct net *net,
+const struct ip_tunnel_key 
*key,
+u32 mark,
+struct flowi4 *fl,
+u8 protocol)
+{
+   struct rtable *rt;
+
+   memset(fl, 0, sizeof(*fl));
+   fl-daddr = key-ipv4_dst;
+   fl-saddr = key-ipv4_src;
+   fl-flowi4_tos = RT_TOS(key-ipv4_tos);
+   fl-flowi4_mark = mark;
+   fl-flowi4_proto = protocol;
+
+   rt = ip_route_output_key(net, fl);
+   return rt;
+}
+
+
+/* Returns the least-significant 32 bits of a __be64. */
+static __be32 be64_get_low32(__be64 x)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be32)x;
+#else
+   return (__force __be32)((__force u64)x  32);
+#endif
+}
+
+static __be16 filter_tnl_flags(__be16 flags)
+{
+   return flags  (TUNNEL_CSUM | TUNNEL_KEY);
+}
+
+
+static struct sk_buff *__build_header(struct sk_buff *skb,
+ const struct ip_tunnel_info *tun_info,
+ int tunnel_hlen)
+{
+   struct tnl_ptk_info tpi;
+
+   skb = gre_handle_offloads(skb, !!(tun_info-key.tun_flags  
TUNNEL_CSUM));
+   if (IS_ERR(skb))
+   return skb;
+
+   tpi.flags = filter_tnl_flags(tun_info-key.tun_flags);
+   tpi.proto = htons(ETH_P_TEB);
+   tpi.key = be64_get_low32(tun_info-key.tun_id);
+   tpi.seq = 0;
+   gre_build_header(skb, tpi, tunnel_hlen);
+
+   return skb;
+}
+
+static netdev_tx_t gre_fb_xmit(struct sk_buff *skb,
+   struct net_device *dev)
+{
+   struct net *net = dev_net(dev);
+   struct ip_tunnel_info *tun_info;
+   const struct ip_tunnel_key *key;
+   struct flowi4 fl;
+   struct rtable *rt;
+   int min_headroom;
+   int tunnel_hlen;
+   __be16 df;
+   int err;
+
+   tun_info = skb_shinfo(skb)-tun_info;
+   if (unlikely(!tun_info)) {
+   err = -EINVAL;
+   goto err_free_skb;
+   }
+
+   key = tun_info-key;
+
+   rt =

[net-next RFC 08/14] openvswitch: Allocate attach ip_tunnel_info for tunnel set action

2015-06-01 Thread Thomas Graf

Make use of the new skb tunnel metadata field by allocating a
ip_tunnel_info per OVS tunnel set action and then attaching that
metadata to each skb that passes the set action.

The old egress_tun_info via the OVS_CB() is left in place until
all tunnel vports have been converted to the new method.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 net/openvswitch/actions.c  |  8 +-
 net/openvswitch/datapath.c |  8 +++---
 net/openvswitch/flow.h |  5 
 net/openvswitch/flow_netlink.c | 59 +-
 net/openvswitch/flow_netlink.h |  1 +
 5 files changed, 69 insertions(+), 12 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 34cad57..484d965 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -726,7 +726,13 @@ static int execute_set_action(struct sk_buff *skb,
 {
/* Only tunnel set execution is supported without a mask. */
if (nla_type(a) == OVS_KEY_ATTR_TUNNEL_INFO) {
-   OVS_CB(skb)-egress_tun_info = nla_data(a);
+   struct ovs_tunnel_info *tun = nla_data(a);
+
+   skb_attach_tunnel_info(skb, tun-info);
+
+   /* FIXME: Remove when all vports have been converted */
+   OVS_CB(skb)-egress_tun_info = tun-info;
+
return 0;
}
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 3b90461..3315e3a 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1004,7 +1004,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
}
ovs_unlock();
 
-   ovs_nla_free_flow_actions(old_acts);
+   ovs_nla_free_flow_actions_rcu(old_acts);
ovs_flow_free(new_flow, false);
}
 
@@ -1016,7 +1016,7 @@ err_unlock_ovs:
ovs_unlock();
kfree_skb(reply);
 err_kfree_acts:
-   kfree(acts);
+   ovs_nla_free_flow_actions(acts);
 err_kfree_flow:
ovs_flow_free(new_flow, false);
 error:
@@ -1143,7 +1143,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
if (reply)
ovs_notify(dp_flow_genl_family, reply, info);
if (old_acts)
-   ovs_nla_free_flow_actions(old_acts);
+   ovs_nla_free_flow_actions_rcu(old_acts);
 
return 0;
 
@@ -1151,7 +1151,7 @@ err_unlock_ovs:
ovs_unlock();
kfree_skb(reply);
 err_kfree_acts:
-   kfree(acts);
+   ovs_nla_free_flow_actions(acts);
 error:
return error;
 }
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index cadc6c5..193eab9 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -45,6 +45,11 @@ struct sk_buff;
 #define TUN_METADATA_OPTS(flow_key, opt_len) \
((void *)((flow_key)-tun_opts + TUN_METADATA_OFFSET(opt_len)))
 
+struct ovs_tunnel_info
+{
+   struct ip_tunnel_info   *info;
+};
+
 #define OVS_SW_FLOW_KEY_METADATA_SIZE  \
(offsetof(struct sw_flow_key, recirc_id) +  \
FIELD_SIZEOF(struct sw_flow_key, recirc_id))
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index ecfa530..35086c6 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1548,11 +1548,45 @@ static struct sw_flow_actions 
*nla_alloc_flow_actions(int size, bool log)
return sfa;
 }
 
+static void ovs_nla_free_set_action(const struct nlattr *a)
+{
+   const struct nlattr *ovs_key = nla_data(a);
+   struct ovs_tunnel_info *ovs_tun;
+
+   switch (nla_type(ovs_key)) {
+   case OVS_KEY_ATTR_TUNNEL_INFO:
+   ovs_tun = nla_data(ovs_key);
+   ip_tunnel_info_put(ovs_tun-info);
+   break;
+   }
+}
+
+void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
+{
+   const struct nlattr *a;
+   int rem;
+
+   nla_for_each_attr(a, sf_acts-actions, sf_acts-actions_len, rem) {
+   switch (nla_type(a)) {
+   case OVS_ACTION_ATTR_SET:
+   ovs_nla_free_set_action(a);
+   break;
+   }
+   }
+
+   kfree(sf_acts);
+}
+
+static void __ovs_nla_free_flow_actions(struct rcu_head *head)
+{
+   ovs_nla_free_flow_actions(container_of(head, struct sw_flow_actions, 
rcu));
+}
+
 /* Schedules 'sf_acts' to be freed after the next RCU grace period.
  * The caller must hold rcu_read_lock for this to be sensible. */
-void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
+void ovs_nla_free_flow_actions_rcu(struct sw_flow_actions *sf_acts)
 {
-   kfree_rcu(sf_acts, rcu);
+   call_rcu(sf_acts-rcu, __ovs_nla_free_flow_actions);
 }
 
 static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa,
@@ -1747,6 +1781,7 @@ static int validate_and_copy_set_tun(const struct nlattr 
*attr,
struct sw_flow_match match;

Re: [PATCH 7/7] mac80211: Switch to new AEAD interface

2015-06-01 Thread Johannes Berg

On Mon, 2015-06-01 at 16:05 +0200, Johannes Berg wrote:

 Ok - here the length is kinda passed a part of the AAD buffer, but this
 is really just some arcane code that should be fixed to use a proper
 struct. The value there, even though it is __be16 and looks like it came
 from the data, is actually created locally, see ccmp_special_blocks()
 and gcmp_special_blocks().

IOW, I think something like this would make sense:

(but I'll hold it until after Herbert's patches I guess)

From 20bd0e92ab0d7ef545687da762228622bcdabeec Mon Sep 17 00:00:00 2001
From: Johannes Berg johannes.b...@intel.com
Date: Mon, 1 Jun 2015 16:33:11 +0200
Subject: [PATCH] mac80211: move AAD length out of AAD buffer

The code currently passes the AAD buffer as a __be16 with the
length, followed by the actual data, but doesn't use a struct
or make this explicit in any other way, so it's confusing.

Change the code to pass the AAD length explicity outside of
the buffer.

Reported-by: Stephan Mueller smuel...@chronox.de
Signed-off-by: Johannes Berg johannes.b...@intel.com
---
 net/mac80211/aes_ccm.c | 18 +++---
 net/mac80211/aes_ccm.h | 14 ++-
 net/mac80211/aes_gcm.c | 10 
 net/mac80211/aes_gcm.h |  6 +++--
 net/mac80211/wpa.c | 64 +++---
 5 files changed, 62 insertions(+), 50 deletions(-)

diff --git a/net/mac80211/aes_ccm.c b/net/mac80211/aes_ccm.c
index 208df7c0b6ea..b6e2f096127a 100644
--- a/net/mac80211/aes_ccm.c
+++ b/net/mac80211/aes_ccm.c
@@ -19,9 +19,10 @@
 #include key.h
 #include aes_ccm.h
 
-void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
-  u8 *data, size_t data_len, u8 *mic,
-  size_t mic_len)
+void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0,
+  u8 *aad, size_t aad_len,
+  u8 *data, size_t data_len,
+  u8 *mic, size_t mic_len)
 {
struct scatterlist assoc, pt, ct[2];
 
@@ -33,7 +34,7 @@ void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 
*b_0, u8 *aad,
memset(aead_req, 0, sizeof(aead_req_data));
 
sg_init_one(pt, data, data_len);
-   sg_init_one(assoc, aad[2], be16_to_cpup((__be16 *)aad));
+   sg_init_one(assoc, aad, aad_len);
sg_init_table(ct, 2);
sg_set_buf(ct[0], data, data_len);
sg_set_buf(ct[1], mic, mic_len);
@@ -45,9 +46,10 @@ void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 
*b_0, u8 *aad,
crypto_aead_encrypt(aead_req);
 }
 
-int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
- u8 *data, size_t data_len, u8 *mic,
- size_t mic_len)
+int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0,
+ u8 *aad, size_t aad_len,
+ u8 *data, size_t data_len,
+ u8 *mic, size_t mic_len)
 {
struct scatterlist assoc, pt, ct[2];
char aead_req_data[sizeof(struct aead_request) +
@@ -61,7 +63,7 @@ int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 
*b_0, u8 *aad,
memset(aead_req, 0, sizeof(aead_req_data));
 
sg_init_one(pt, data, data_len);
-   sg_init_one(assoc, aad[2], be16_to_cpup((__be16 *)aad));
+   sg_init_one(assoc, aad, aad_len);
sg_init_table(ct, 2);
sg_set_buf(ct[0], data, data_len);
sg_set_buf(ct[1], mic, mic_len);
diff --git a/net/mac80211/aes_ccm.h b/net/mac80211/aes_ccm.h
index 6a73d1e4d186..bfe355e4a680 100644
--- a/net/mac80211/aes_ccm.h
+++ b/net/mac80211/aes_ccm.h
@@ -15,12 +15,14 @@
 struct crypto_aead *ieee80211_aes_key_setup_encrypt(const u8 key[],
size_t key_len,
size_t mic_len);
-void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
-  u8 *data, size_t data_len, u8 *mic,
-  size_t mic_len);
-int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
- u8 *data, size_t data_len, u8 *mic,
- size_t mic_len);
+void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0,
+  u8 *aad, size_t aad_len,
+  u8 *data, size_t data_len,
+  u8 *mic, size_t mic_len);
+int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0,
+ u8 *aad, size_t aad_len,
+ u8 *data, size_t data_len,
+ u8 *mic, size_t mic_len);
 void ieee80211_aes_key_free(struct crypto_aead *tfm);
 
 #endif /* AES_CCM_H */
diff --git a/net/mac80211/aes_gcm.c b/net/mac80211/aes_gcm.c
index fd278bbe1b0d..fb6823c5e381 100644
--- a/net/mac80211/aes_gcm.c
+++ b/net/mac80211/aes_gcm.c
@@

Re: [PATCH] libceph: use kvfree() in ceph_put_page_vector()

2015-06-01 Thread Ilya Dryomov

On Mon, Jun 1, 2015 at 5:36 PM, Geliang Tang geliangt...@163.com wrote:
 Use kvfree() instead of open-coding it.

 Signed-off-by: Geliang Tang geliangt...@163.com
 ---
  net/ceph/pagevec.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

 diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
 index 096d914..d4f5f22 100644
 --- a/net/ceph/pagevec.c
 +++ b/net/ceph/pagevec.c
 @@ -51,10 +51,7 @@ void ceph_put_page_vector(struct page **pages, int 
 num_pages, bool dirty)
 set_page_dirty_lock(pages[i]);
 put_page(pages[i]);
 }
 -   if (is_vmalloc_addr(pages))
 -   vfree(pages);
 -   else
 -   kfree(pages);
 +   kvfree(pages);
  }
  EXPORT_SYMBOL(ceph_put_page_vector);

Already fixed in testing, wasn't pushed to linux-next though, sorry!

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] net: core: 'ethtool' issue with querying phy settings

2015-06-01 Thread Ben Hutchings

On Sun, 2015-05-31 at 17:19 -0700, David Miller wrote:
 From: Ben Hutchings b...@decadent.org.uk
 Date: Sun, 31 May 2015 20:54:06 +0100

  On Fri, 2015-05-22 at 16:15 -0400, David Miller wrote:
  From: Arun Parameswaran apara...@broadcom.com
  Date: Wed, 20 May 2015 14:35:30 -0700

   When trying to configure the settings for PHY1, using commands
   like 'ethtool -s eth0 phyad 1 speed 100', the 'ethtool' seems to
   modify other settings apart from the speed of the PHY1, in the
   above case.

   The ethtool seems to query the settings for PHY0, and use this
   as the base to apply the new settings to the PHY1. This is
   causing the other settings of the PHY 1 to be wrongly
   configured.

   The issue is caused by the '_ethtool_get_settings()' API, which
   gets called because of the 'ETHTOOL_GSET' command, is clearing
   the 'cmd' pointer (of type 'struct ethtool_cmd') by calling
   memset. This clears all the parameters (if any) passed for the
   'ETHTOOL_GSET' cmd. So the driver's callback is always invoked
   with 'cmd-phy_address' as '0'.

   The '_ethtool_get_settings()' is called from other files in the
   'net/core'. So the fix is applied to the 'ethtool_get_settings()'
   which is only called in the context of the 'ethtool'.

   Signed-off-by: Arun Parameswaran apara...@broadcom.com
   Reviewed-by: Ray Jui r...@broadcom.com
   Reviewed-by: Scott Branden sbran...@broadcom.com

  Applied and queued up for -stable, thanks.

  Please revert this.  This is an incompatible API change, not a bug fix.
  The established semantics are that 'phyad' is filled in by the driver;
  it is not a parameter to the ETHTOOL_GSET command.

 But then how in the world can the user specify specific PHY ADs for
 a device that will respond to more than one?

ETHTOOL_SSET sets the current PHY address and ETHTOOL_GSET gets it.

If multiple PHYs need to be configured for a single link then the driver
should configure them all at the same time rather than making it the
administrator's problem.

What we can't support with the current API are:
- multiple physical links behind a single net device (different
  configuration possible for each link)
- multiple PHYs are needed for a single link, and the driver can't
  automatically decide which to use (multiple addresses to set)

Ben.

-- 
Ben Hutchings
Power corrupts.  Absolute power is kind of neat.
   - John Lehman, Secretary of the US Navy 1981-1987

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] ethtool: changes of emac_regs structure accordingly within driver emac_regs structure.

2015-06-01 Thread Ben Hutchings

On Mon, 2015-06-01 at 16:30 +0400, Ivan Mikhaylov wrote:
 On Mon, 1 June 2015 12:57 +0400
 Ben Hutchings b...@decadent.org.uk wrote:
 
 On Thu, 2015-05-21 at 19:09 +0400, Ivan Mikhaylov wrote:
  In ibm_emac.c in ethtool size of emac structure which passing through
  to driver is nailed down and not correlating with current emac_regs
  structure.
  
  Signed-off-by: Ivan Mikhaylov i...@ru.ibm.com
 [...]
 
 This is not backward-compatible.  It ought to be possible to mix and
 match old and new ethtool and driver, except for the EMAC4SYNC case
 which has been broken up until now.
 
 Using the new definition of struct emac_regs, I think the driver and
 ethtool need to agree that the MAC register dump sizes are:
 
 EMAC:  offsetof(struct emac_regs, u1)
 EMAC4: offsetof(struct emac_regs, u1.emac4) + sizeof(p-u1.emac4)
 EMAC4SYNC: offsetof(struct emac_regs, u1.emac4sync) +
 sizeof(p-u1.emac4sync)
 
 Ben.
 
 -- 
 Ben Hutchings
 Reality is just a crutch for people who can't handle science fiction.
 
 Actually it is backward-compatible because we don't care about size
 which is coming from driver side, only what we doing is map of driver
 structure to ethtool structure and results will be same
 for emac and emac4.
 
  struct emac_regs *p = (struct emac_regs *)(hdr + 1);

The following registers won't be printed correctly.

 Also size which you mentioned (112 emac, 116 emac4) can be different
 from what you saying cause this managed by dts files where we can set
 something like 0x100 or 0x80 for this memory area and we will still
 have problem in representing MII area if this size wasn't set right
 in dts.

Yes, I understand that.  However, the in-tree device trees consistently
use those as the resource sizes so I think ethtool used to work properly
for the machines supported by those.  Increasing the size of the MAC
register dump is a regression for them.

Ben.

 Ethtool will be work in same way even if we have emac or emac4.
 
 Thank you for respond!
 

-- 
Ben Hutchings
Power corrupts.  Absolute power is kind of neat.
   - John Lehman, Secretary of the US Navy 1981-1987


signature.asc
Description: This is a digitally signed message part

[PATCH net-next 3/5] rocker: install untagged VLAN (vid=0) support for each port

2015-06-01 Thread sfeldma

From: Scott Feldman sfel...@gmail.com

On port probe, install by default untagged VLAN support.  This is
equivalent to running the command:

bridge vlan add vid 0 dev DEV self

A user could, if they wanted, manaully removing untagged support from the
port by running the command:

bridge vlan del vid 0 dev DEV self

But installing it by default on port initialization gives the normal
expected behavior.

Signed-off-by: Scott Feldman sfel...@gmail.com
---
 drivers/net/ethernet/rocker/rocker.c |   20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 76e2281..bd56273 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -3157,6 +3157,8 @@ static int rocker_port_vlan_flood_group(struct 
rocker_port *rocker_port,
 
for (i = 0; i  rocker-port_count; i++) {
p = rocker-ports[i];
+   if (!p)
+   continue;
if (!rocker_port_is_bridged(p))
continue;
if (test_bit(ntohs(vlan_id), p-vlan_bitmap)) {
@@ -3216,7 +3218,7 @@ static int rocker_port_vlan_l2_groups(struct rocker_port 
*rocker_port,
 
for (i = 0; i  rocker-port_count; i++) {
p = rocker-ports[i];
-   if (test_bit(ntohs(vlan_id), p-vlan_bitmap))
+   if (p  test_bit(ntohs(vlan_id), p-vlan_bitmap))
ref++;
}
 
@@ -4882,6 +4884,7 @@ static int rocker_probe_port(struct rocker *rocker, 
unsigned int port_number)
const struct pci_dev *pdev = rocker-pdev;
struct rocker_port *rocker_port;
struct net_device *dev;
+   u16 untagged_vid = 0;
int err;
 
dev = alloc_etherdev(sizeof(struct rocker_port));
@@ -4917,16 +4920,27 @@ static int rocker_probe_port(struct rocker *rocker, 
unsigned int port_number)
 
rocker_port_set_learning(rocker_port, SWITCHDEV_TRANS_NONE);
 
-   rocker_port-internal_vlan_id =
-   rocker_port_internal_vlan_id_get(rocker_port, dev-ifindex);
err = rocker_port_ig_tbl(rocker_port, SWITCHDEV_TRANS_NONE, 0);
if (err) {
dev_err(pdev-dev, install ig port table failed\n);
goto err_port_ig_tbl;
}
 
+   rocker_port-internal_vlan_id =
+   rocker_port_internal_vlan_id_get(rocker_port, dev-ifindex);
+
+   err = rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE,
+  untagged_vid, 0);
+   if (err) {
+   netdev_err(rocker_port-dev, install untagged VLAN failed\n);
+   goto err_untagged_vlan;
+   }
+
return 0;
 
+err_untagged_vlan:
+   rocker_port_ig_tbl(rocker_port, SWITCHDEV_TRANS_NONE,
+  ROCKER_OP_FLAG_REMOVE);
 err_port_ig_tbl:
unregister_netdev(dev);
 err_register_netdev:
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 2/5] rocker: cleanup vlan table on error adding vlan

2015-06-01 Thread sfeldma

From: Scott Feldman sfel...@gmail.com

Basic house keeping: If there is an error adding the router MAC for this
vlan, removing the just installed VLAN table entry to leave device in same
state as before failure.

Signed-off-by: Scott Feldman sfel...@gmail.com
---
 drivers/net/ethernet/rocker/rocker.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index ab6d871..76e2281 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4342,7 +4342,12 @@ static int rocker_port_vlan_add(struct rocker_port 
*rocker_port,
if (err)
return err;
 
-   return rocker_port_router_mac(rocker_port, trans, 0, htons(vid));
+   err = rocker_port_router_mac(rocker_port, trans, 0, htons(vid));
+   if (err)
+   rocker_port_vlan(rocker_port, trans,
+ROCKER_OP_FLAG_REMOVE, vid);
+
+   return err;
 }
 
 static int rocker_port_vlans_add(struct rocker_port *rocker_port,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 4/5] rocker: install/remove router MAC for untagged VLAN when joining/leaving bridge

2015-06-01 Thread sfeldma

From: Scott Feldman sfel...@gmail.com

When the port joins a bridge, the port's internal VLAN ID needs to change
to the bridge's internal VLAN ID.  Likewise, when leaving the bridge, the
internal VLAN ID reverts back the port's original internal VLAN ID.  (The
internal VLAN ID is used by device to internally mark untagged pkts with
some VLAN, which will eventually be removed on egress...think PVID).  When
the internal VLAN ID changes, we need to update the VLAN table entries and
the router MAC entries for IP/IPv6 to reflect the new internal VLAN ID.

This patch makes use of the common rocker_port_vlan_add/del functions to
make sure the tables are updated for the current internal VLAN ID.

Signed-off-by: Scott Feldman sfel...@gmail.com
---
 drivers/net/ethernet/rocker/rocker.c |   42 --
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index bd56273..3eb3eba 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -5178,41 +5178,49 @@ static bool rocker_port_dev_check(const struct 
net_device *dev)
 static int rocker_port_bridge_join(struct rocker_port *rocker_port,
   struct net_device *bridge)
 {
+   u16 untagged_vid = 0;
int err;
 
-   rocker_port_internal_vlan_id_put(rocker_port,
-rocker_port-dev-ifindex);
-
-   rocker_port-bridge_dev = bridge;
+   /* Port is joining bridge, so the internal VLAN for the
+* port is going to change to the bridge internal VLAN.
+* Let's remove untagged VLAN (vid=0) from port and
+* re-add once internal VLAN has changed.
+*/
 
-   /* Use bridge internal VLAN ID for untagged pkts */
-   err = rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE,
-  ROCKER_OP_FLAG_REMOVE, 0);
+   err = rocker_port_vlan_del(rocker_port, untagged_vid, 0);
if (err)
return err;
+
+   rocker_port_internal_vlan_id_put(rocker_port,
+rocker_port-dev-ifindex);
rocker_port-internal_vlan_id =
rocker_port_internal_vlan_id_get(rocker_port, bridge-ifindex);
-   return rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE, 0, 0);
+
+   rocker_port-bridge_dev = bridge;
+
+   return rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE,
+   untagged_vid, 0);
 }
 
 static int rocker_port_bridge_leave(struct rocker_port *rocker_port)
 {
+   u16 untagged_vid = 0;
int err;
 
-   rocker_port_internal_vlan_id_put(rocker_port,
-rocker_port-bridge_dev-ifindex);
-
-   rocker_port-bridge_dev = NULL;
-
-   /* Use port internal VLAN ID for untagged pkts */
-   err = rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE,
-  ROCKER_OP_FLAG_REMOVE, 0);
+   err = rocker_port_vlan_del(rocker_port, untagged_vid, 0);
if (err)
return err;
+
+   rocker_port_internal_vlan_id_put(rocker_port,
+rocker_port-bridge_dev-ifindex);
rocker_port-internal_vlan_id =
rocker_port_internal_vlan_id_get(rocker_port,
 rocker_port-dev-ifindex);
-   err = rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE, 0, 0);
+
+   rocker_port-bridge_dev = NULL;
+
+   err = rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE,
+  untagged_vid, 0);
if (err)
return err;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 5/5] rocker: remove support for legacy VLAN ndo ops

2015-06-01 Thread sfeldma

From: Scott Feldman sfel...@gmail.com

Remove support for legacy ndo ops
.ndo_vlan_rx_add_vid/.ndo_vlan_rx_kill_vid.  Rocker will use
bridge_setlink/dellink exclusively for VLAN add/del operations.

The legacy ops are needed if using 8021q driver module to setup VLANs on
the port.  But an alternative exists in using bridge_setlink/delink to
setup VLANs, which doesn't depend on 8021q module.  So rocker will switch
to the newer setlink/dellink ops.  VLANs can added/delete from the port,
regardless if port is bridged or not, using the bridge commands:

bridge vlan [add|del] vid VID dev DEV self

(Yes, I agree it's confusing to use the bridge command to set a VLAN on a
non-bridged port).

Using setlink/dellink over legacy ops let's us handle the stacked driver
case automatically.  It's built-in.  setlink also pass additional flags
(PVID, egress untagged) that aren't available with the legacy ops.

Signed-off-by: Scott Feldman sfel...@gmail.com
---
 drivers/net/ethernet/rocker/rocker.c |   34 +-
 1 file changed, 1 insertion(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 3eb3eba..e3fb97a 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4185,35 +4185,6 @@ static int rocker_port_set_mac_address(struct net_device 
*dev, void *p)
return 0;
 }
 
-static int rocker_port_vlan_rx_add_vid(struct net_device *dev,
-  __be16 proto, u16 vid)
-{
-   struct rocker_port *rocker_port = netdev_priv(dev);
-   int err;
-
-   err = rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE, 0, vid);
-   if (err)
-   return err;
-
-   return rocker_port_router_mac(rocker_port, SWITCHDEV_TRANS_NONE,
- 0, htons(vid));
-}
-
-static int rocker_port_vlan_rx_kill_vid(struct net_device *dev,
-   __be16 proto, u16 vid)
-{
-   struct rocker_port *rocker_port = netdev_priv(dev);
-   int err;
-
-   err = rocker_port_router_mac(rocker_port, SWITCHDEV_TRANS_NONE,
-ROCKER_OP_FLAG_REMOVE, htons(vid));
-   if (err)
-   return err;
-
-   return rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE,
-   ROCKER_OP_FLAG_REMOVE, vid);
-}
-
 static int rocker_port_get_phys_port_name(struct net_device *dev,
  char *buf, size_t len)
 {
@@ -4235,8 +4206,6 @@ static const struct net_device_ops rocker_port_netdev_ops 
= {
.ndo_stop   = rocker_port_stop,
.ndo_start_xmit = rocker_port_xmit,
.ndo_set_mac_address= rocker_port_set_mac_address,
-   .ndo_vlan_rx_add_vid= rocker_port_vlan_rx_add_vid,
-   .ndo_vlan_rx_kill_vid   = rocker_port_vlan_rx_kill_vid,
.ndo_bridge_getlink = switchdev_port_bridge_getlink,
.ndo_bridge_setlink = switchdev_port_bridge_setlink,
.ndo_bridge_dellink = switchdev_port_bridge_dellink,
@@ -4908,8 +4877,7 @@ static int rocker_probe_port(struct rocker *rocker, 
unsigned int port_number)
   NAPI_POLL_WEIGHT);
rocker_carrier_init(rocker_port);
 
-   dev-features |= NETIF_F_NETNS_LOCAL |
-NETIF_F_HW_VLAN_CTAG_FILTER;
+   dev-features |= NETIF_F_NETNS_LOCAL;
 
err = register_netdev(dev);
if (err) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 1/5] rocker: zero allocate ports array

2015-06-01 Thread sfeldma

From: Scott Feldman sfel...@gmail.com

When allocating the array of rocker port pointers, zero the array values so
we can test for !NULL to see if port is allocated/registered.  We'll need
this later when installing untagged VLAN support for each port, during port
probe.  It's a long story, but to install a VLAN (vid=0 for untagged, in
this case) on a port, we'll need to scan other ports to see if the VLAN
group for that VLAN has been setup.  To scan the other ports, we need to
walk the port array.

Signed-off-by: Scott Feldman sfel...@gmail.com
---
 drivers/net/ethernet/rocker/rocker.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 36f7edf..ab6d871 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4936,7 +4936,7 @@ static int rocker_probe_ports(struct rocker *rocker)
int err;
 
alloc_size = sizeof(struct rocker_port *) * rocker-port_count;
-   rocker-ports = kmalloc(alloc_size, GFP_KERNEL);
+   rocker-ports = kzalloc(alloc_size, GFP_KERNEL);
if (!rocker-ports)
return -ENOMEM;
for (i = 0; i  rocker-port_count; i++) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 0/5] rocker: enable by default untagged VLAN support

2015-06-01 Thread sfeldma

From: Scott Feldman sfel...@gmail.com

This patch set is a followup to Simon Horman's RFC patch:

   [PATCH/RFC net-next] rocker: by default accept untagged packets

Now, on port probe, we install untagged VLAN (vid=0) support for each port
as the default.  This is equivalent to the command:

   bridge vlan add vid 0 dev DEV self

Accepting untagged VLAN pkts is a reasonable default, but the user could
override this with:

   bridge vlan del vid 0 dev DEV self

With this, we no longer need 8021q module to install vid=0 when port interface
opens.  In fact, we don't need support for legacy VLAN ndo ops at all since
they're superseded by bridge_setlink/dellink.  So remove legacy VLAN ndo ops
support in driver.  (The legacy VLAN ndo ops are supported by bonding/team
drivers, but don't fit into the transaction model offered by switchdev, so
switching all VLAN functions to bridge_setlink/dellink switchdev support gets
us stacked driver + transaction model support).

Scott Feldman (5):
  rocker: zero allocate ports array
  rocker: cleanup vlan table on error adding vlan
  rocker: install untagged VLAN (vid=0) support for each port
  rocker: install/remove router MAC for untagged VLAN when
joining/leaving bridge
  rocker: remove support for legacy VLAN ndo ops

 drivers/net/ethernet/rocker/rocker.c |  105 --
 1 file changed, 50 insertions(+), 55 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] Netfilter fix for net

2015-06-01 Thread Pablo Neira Ayuso

Hi David,

The following patch reverts the ebtables chunk that enforces counters that was
introduced in the recently applied d26e2c9ffa38 ('Revert netfilter: ensure
number of counters is 0 in do_replace()') since this breaks ebtables.

You can pull this change from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit 9302d7bb0c5cd46be5706859301f18c137b2439f:

  sctp: Fix mangled IPv4 addresses on a IPv6 listening socket (2015-05-27 
14:15:26 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git master

for you to fetch changes up to d26e2c9ffa385dd1b646f43c1397ba12af9ed431:

  Revert netfilter: ensure number of counters is 0 in do_replace() 
(2015-06-01 19:45:47 +0200)


Bernhard Thaler (1):
  Revert netfilter: ensure number of counters is 0 in do_replace()

 net/bridge/netfilter/ebtables.c |4 
 1 file changed, 4 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] Revert netfilter: ensure number of counters is 0 in do_replace()

2015-06-01 Thread Pablo Neira Ayuso

From: Bernhard Thaler bernhard.tha...@wvnet.at

This partially reverts commit 1086bbe97a07 (netfilter: ensure number of
counters is 0 in do_replace()) in net/bridge/netfilter/ebtables.c.

Setting rules with ebtables does not work any more with 1086bbe97a07 place.

There is an error message and no rules set in the end.

e.g.

~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP
Unable to update the kernel. Two possible causes:
1. Multiple ebtables programs were executing simultaneously. The ebtables
   userspace tool doesn't by default support multiple ebtables programs
running

Reverting the ebtables part of 1086bbe97a07 makes this work again.

Signed-off-by: Bernhard Thaler bernhard.tha...@wvnet.at
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/bridge/netfilter/ebtables.c |4 
 1 file changed, 4 deletions(-)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 24c7c96..91180a7 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1117,8 +1117,6 @@ static int do_replace(struct net *net, const void __user 
*user,
return -ENOMEM;
if (tmp.num_counters = INT_MAX / sizeof(struct ebt_counter))
return -ENOMEM;
-   if (tmp.num_counters == 0)
-   return -EINVAL;
 
tmp.name[sizeof(tmp.name) - 1] = 0;
 
@@ -2161,8 +2159,6 @@ static int compat_copy_ebt_replace_from_user(struct 
ebt_replace *repl,
return -ENOMEM;
if (tmp.num_counters = INT_MAX / sizeof(struct ebt_counter))
return -ENOMEM;
-   if (tmp.num_counters == 0)
-   return -EINVAL;
 
memcpy(repl, tmp, offsetof(struct ebt_replace, hook_entry));
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Fix couple of issues with 'ethtool' get/set API's

2015-06-01 Thread Arun Parameswaran

On 15-05-31 12:59 PM, Ben Hutchings wrote:
 On Fri, 2015-05-22 at 15:43 -0700, Arun Parameswaran wrote:
 Hi,
 The patch fixes 2 issues with 'ethtool' getting/setting parametres in
 the do_gset() do_sset() API's.

 I have pushed a patch to the Kernel to fix an issue in the handling of
 the 'ethtool' commands which got accepted.
 This Kernel patch was based on Linux v4.1-rc4 and is available in:
 https://github.com/Broadcom/cygnus-linux/tree/net-core-ethtool-fix-v1

 The Kernel was always clearing the command from the 'ethtool' resulting
 in all operations to deal with PHY0. This prevents querying/setting
 PHY 1's settings.
 [...]
 
 Each net device can be associated with a single PHY at a time, and the
 ETHTOOL_GSET implementation should fill in the PHY address in the
 ethtool_cmd::phy_address field.  Where there are multiple PHYs that can
 be connected to the net device's MAC, an ETHTOOL_SSET operation can be
 used to change that PHY address.
 
The above can be done by the driver when there is one PHY per MAC. In our
case we have multiple PHYs controlled by the same MAC. I should have
clarified this earlier, I apologize.

When we specify the 'phyad', in the command line, we were expecting the
'ethtool' to fetch/set data for that 'phyad'. This is the intend of the
patch.

With the patch (in 'ethtool' and Kernel), if 'phyad' is not specified, it
will still function as you described above, it will be up to the driver to
return the proper 'phyad' and related settings.

 The ethtool API is not meant for controlling other PHYs that aren't
 connected to the MAC; if you want to do that then create more net
 devices for them or use the MDIO ioctls.
 
In the SoC, there are multiple PHYs (in our case there are 2) controlled
by the same MAC. We are trying to use 'ethtool' to control both the PHYs
connected to the same MAC.

 
 Ben.
 

Thanks
Arun
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Fix couple of issues with 'ethtool' get/set API's

2015-06-01 Thread Ben Hutchings

On Mon, 2015-06-01 at 10:14 -0700, Arun Parameswaran wrote:
 On 15-05-31 12:59 PM, Ben Hutchings wrote:
  On Fri, 2015-05-22 at 15:43 -0700, Arun Parameswaran wrote:
  Hi,
  The patch fixes 2 issues with 'ethtool' getting/setting parametres in
  the do_gset() do_sset() API's.
 
  I have pushed a patch to the Kernel to fix an issue in the handling of
  the 'ethtool' commands which got accepted.
  This Kernel patch was based on Linux v4.1-rc4 and is available in:
  https://github.com/Broadcom/cygnus-linux/tree/net-core-ethtool-fix-v1
 
  The Kernel was always clearing the command from the 'ethtool' resulting
  in all operations to deal with PHY0. This prevents querying/setting
  PHY 1's settings.
  [...]
  
  Each net device can be associated with a single PHY at a time, and the
  ETHTOOL_GSET implementation should fill in the PHY address in the
  ethtool_cmd::phy_address field.  Where there are multiple PHYs that can
  be connected to the net device's MAC, an ETHTOOL_SSET operation can be
  used to change that PHY address.
  
 The above can be done by the driver when there is one PHY per MAC. In our
 case we have multiple PHYs controlled by the same MAC. I should have
 clarified this earlier, I apologize.

I understand that you can have multiple PHYs on the same MDIO bus, but
not how the MAC can use them at the same time.  Is this hardware level
bonding?  Or are multiple PHYs needed for a single link?

 When we specify the 'phyad', in the command line, we were expecting the
 'ethtool' to fetch/set data for that 'phyad'. This is the intend of the
 patch.
 
 With the patch (in 'ethtool' and Kernel), if 'phyad' is not specified, it
 will still function as you described above, it will be up to the driver to
 return the proper 'phyad' and related settings.
[...]

But without the patch in ethtool and other programs calling this API
(it's not just the ethtool command!), you get random junk as the
phy_address.  How will you tell whether it's valid or not?

Ben.

-- 
Ben Hutchings
Power corrupts.  Absolute power is kind of neat.
   - John Lehman, Secretary of the US Navy 1981-1987


signature.asc
Description: This is a digitally signed message part

Re: [PATCH net] bnx2x: Move statistics implementation into semaphores

2015-06-01 Thread David Miller

From: Yuval Mintz yuval.mi...@qlogic.com
Date: Mon, 1 Jun 2015 15:08:18 +0300

 Commit dff173de84958 (bnx2x: Fix statistics locking scheme) changed the
 bnx2x locking around statistics state into using a mutex - but the lock
 is being accessed via a timer which is forbidden.

 [If compiled with CONFIG_DEBUG_MUTEXES, logs show a warning about
 accessing the mutex in interrupt context]

 This moves the implementation into using a semaphore [with size '1']
 instead.

 Signed-off-by: Yuval Mintz yuval.mi...@qlogic.com
 Signed-off-by: Ariel Elior ariel.el...@qlogic.com

Applied, thank you.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Fix couple of issues with 'ethtool' get/set API's

2015-06-01 Thread Ben Hutchings

On Mon, 2015-06-01 at 12:12 -0700, Arun Parameswaran wrote:
 On 15-06-01 11:07 AM, Ben Hutchings wrote:
  On Mon, 2015-06-01 at 10:14 -0700, Arun Parameswaran wrote:
  On 15-05-31 12:59 PM, Ben Hutchings wrote:
  On Fri, 2015-05-22 at 15:43 -0700, Arun Parameswaran wrote:
  Hi,
  The patch fixes 2 issues with 'ethtool' getting/setting parametres in
  the do_gset() do_sset() API's.
 
  I have pushed a patch to the Kernel to fix an issue in the handling of
  the 'ethtool' commands which got accepted.
  This Kernel patch was based on Linux v4.1-rc4 and is available in:
  https://github.com/Broadcom/cygnus-linux/tree/net-core-ethtool-fix-v1
 
  The Kernel was always clearing the command from the 'ethtool' resulting
  in all operations to deal with PHY0. This prevents querying/setting
  PHY 1's settings.
  [...]
 
  Each net device can be associated with a single PHY at a time, and the
  ETHTOOL_GSET implementation should fill in the PHY address in the
  ethtool_cmd::phy_address field.  Where there are multiple PHYs that can
  be connected to the net device's MAC, an ETHTOOL_SSET operation can be
  used to change that PHY address.
 
  The above can be done by the driver when there is one PHY per MAC. In our
  case we have multiple PHYs controlled by the same MAC. I should have
  clarified this earlier, I apologize.
  
  I understand that you can have multiple PHYs on the same MDIO bus, but
  not how the MAC can use them at the same time.  Is this hardware level
  bonding?  Or are multiple PHYs needed for a single link?
  
 We have an internal switch which manages the traffic to the PHY's (ports).
 There is 1 PHY per external port.
 The MAC is connected to the internal port of the switch.

Then you should create net devices for those external ports as well as
the internal port.

If I understand the switchdev API rightly, the external port devices
should implement the ethtool {get,set}_settings operations and the
ndo_switch_parent_id_get operation.  The existing net device should
expose only the internal link to the switch (which presumably isn't
configurable at all).

[...]
 But this prevents the 'ethtool' from being used to get/set data of
 specific PHY's.

That is fine because it is meant to manage the net device's own link (in
this case, the internal port), not other switch ports.

Ben.

-- 
Ben Hutchings
Power corrupts.  Absolute power is kind of neat.
   - John Lehman, Secretary of the US Navy 1981-1987


signature.asc
Description: This is a digitally signed message part

[PATCH 2/2] geneve: allow user to specify TOS info for tunnel frames

2015-06-01 Thread John W. Linville

Signed-off-by: John W. Linville linvi...@tuxdriver.com
---
I have the corresponding iproute2 patch ready, but I am holding it
for now to avoid confusion on the list and such...

 drivers/net/geneve.c | 18 ++
 include/uapi/linux/if_link.h |  1 +
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 1675dfdbfa70..78d49d186e05 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -46,6 +46,7 @@ struct geneve_dev {
struct geneve_sock *sock;   /* socket used for geneve tunnel */
u8 vni[3];  /* virtual network ID for tunnel */
u8 ttl; /* TTL override */
+   u8 tos; /* TOS override */
struct sockaddr_in remote;  /* IPv4 address for link partner */
struct list_head   next;/* geneve's per namespace list */
 };
@@ -194,7 +195,12 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct 
net_device *dev)
/* TODO: port min/max limits should be configurable */
sport = udp_flow_src_port(dev_net(dev), skb, 0, 0, true);
 
+   tos = geneve-tos;
+   if (tos == 1)
+   tos = ip_tunnel_get_dsfield(iip, skb);
+
memset(fl4, 0, sizeof(fl4));
+   fl4.flowi4_tos = RT_TOS(tos);
fl4.daddr = geneve-remote.sin_addr.s_addr;
rt = ip_route_output_key(geneve-net, fl4);
if (IS_ERR(rt)) {
@@ -208,9 +214,7 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct 
net_device *dev)
goto rt_tx_error;
}
 
-   /* TODO: tos should be configurable */
-
-   tos = ip_tunnel_ecn_encap(0, iip, skb);
+   tos = ip_tunnel_ecn_encap(tos, iip, skb);
 
ttl = geneve-ttl;
if (!ttl  IN_MULTICAST(ntohl(fl4.daddr)))
@@ -300,6 +304,7 @@ static const struct nla_policy 
geneve_policy[IFLA_GENEVE_MAX + 1] = {
[IFLA_GENEVE_ID]= { .type = NLA_U32 },
[IFLA_GENEVE_REMOTE]= { .len = FIELD_SIZEOF(struct iphdr, 
daddr) },
[IFLA_GENEVE_TTL]   = { .type = NLA_U8 },
+   [IFLA_GENEVE_TOS]   = { .type = NLA_U8 },
 };
 
 static int geneve_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -370,6 +375,9 @@ static int geneve_newlink(struct net *net, struct 
net_device *dev,
if (data[IFLA_GENEVE_TTL])
geneve-ttl = nla_get_u8(data[IFLA_GENEVE_TTL]);
 
+   if (data[IFLA_GENEVE_TOS])
+   geneve-tos = nla_get_u8(data[IFLA_GENEVE_TOS]);
+
list_add(geneve-next, gn-geneve_list);
 
hlist_add_head_rcu(geneve-hlist, gn-vni_list[hash]);
@@ -393,6 +401,7 @@ static size_t geneve_get_size(const struct net_device *dev)
return nla_total_size(sizeof(__u32)) +  /* IFLA_GENEVE_ID */
nla_total_size(sizeof(struct in_addr)) + /* IFLA_GENEVE_REMOTE 
*/
nla_total_size(sizeof(__u8)) +  /* IFLA_GENEVE_TTL */
+   nla_total_size(sizeof(__u8)) +  /* IFLA_GENEVE_TOS */
0;
 }
 
@@ -409,7 +418,8 @@ static int geneve_fill_info(struct sk_buff *skb, const 
struct net_device *dev)
geneve-remote.sin_addr.s_addr))
goto nla_put_failure;
 
-   if (nla_put_u8(skb, IFLA_GENEVE_TTL, geneve-ttl))
+   if (nla_put_u8(skb, IFLA_GENEVE_TTL, geneve-ttl) ||
+   nla_put_u8(skb, IFLA_GENEVE_TOS, geneve-tos))
goto nla_put_failure;
 
return 0;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 6edb8d268d58..ab90c196dde0 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -396,6 +396,7 @@ enum {
IFLA_GENEVE_ID,
IFLA_GENEVE_REMOTE,
IFLA_GENEVE_TTL,
+   IFLA_GENEVE_TOS,
__IFLA_GENEVE_MAX
 };
 #define IFLA_GENEVE_MAX(__IFLA_GENEVE_MAX - 1)
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] geneve: allow user to specify TTL for tunnel frames

2015-06-01 Thread John W. Linville

Signed-off-by: John W. Linville linvi...@tuxdriver.com
---
I have the corresponding iproute2 patch ready, but I am holding it
for now to avoid confusion on the list and such...

 drivers/net/geneve.c | 18 ++
 include/uapi/linux/if_link.h |  1 +
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index b7eafa4c1a67..1675dfdbfa70 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -44,7 +44,8 @@ struct geneve_dev {
struct net *net;/* netns for packet i/o */
struct net_device  *dev;/* netdev for geneve tunnel */
struct geneve_sock *sock;   /* socket used for geneve tunnel */
-   u8 vni[3];  /* virtual network ID for tunnel */
+   u8 vni[3];  /* virtual network ID for tunnel */
+   u8 ttl; /* TTL override */
struct sockaddr_in remote;  /* IPv4 address for link partner */
struct list_head   next;/* geneve's per namespace list */
 };
@@ -184,7 +185,7 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct 
net_device *dev)
struct flowi4 fl4;
int err;
__be16 sport;
-   __u8 tos, ttl = 0;
+   __u8 tos, ttl;
 
iip = ip_hdr(skb);
 
@@ -207,11 +208,12 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, 
struct net_device *dev)
goto rt_tx_error;
}
 
-   /* TODO: tos and ttl should be configurable */
+   /* TODO: tos should be configurable */
 
tos = ip_tunnel_ecn_encap(0, iip, skb);
 
-   if (IN_MULTICAST(ntohl(fl4.daddr)))
+   ttl = geneve-ttl;
+   if (!ttl  IN_MULTICAST(ntohl(fl4.daddr)))
ttl = 1;
 
ttl = ttl ? : ip4_dst_hoplimit(rt-dst);
@@ -297,6 +299,7 @@ static void geneve_setup(struct net_device *dev)
 static const struct nla_policy geneve_policy[IFLA_GENEVE_MAX + 1] = {
[IFLA_GENEVE_ID]= { .type = NLA_U32 },
[IFLA_GENEVE_REMOTE]= { .len = FIELD_SIZEOF(struct iphdr, 
daddr) },
+   [IFLA_GENEVE_TTL]   = { .type = NLA_U8 },
 };
 
 static int geneve_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -364,6 +367,9 @@ static int geneve_newlink(struct net *net, struct 
net_device *dev,
if (err)
return err;
 
+   if (data[IFLA_GENEVE_TTL])
+   geneve-ttl = nla_get_u8(data[IFLA_GENEVE_TTL]);
+
list_add(geneve-next, gn-geneve_list);
 
hlist_add_head_rcu(geneve-hlist, gn-vni_list[hash]);
@@ -386,6 +392,7 @@ static size_t geneve_get_size(const struct net_device *dev)
 {
return nla_total_size(sizeof(__u32)) +  /* IFLA_GENEVE_ID */
nla_total_size(sizeof(struct in_addr)) + /* IFLA_GENEVE_REMOTE 
*/
+   nla_total_size(sizeof(__u8)) +  /* IFLA_GENEVE_TTL */
0;
 }
 
@@ -402,6 +409,9 @@ static int geneve_fill_info(struct sk_buff *skb, const 
struct net_device *dev)
geneve-remote.sin_addr.s_addr))
goto nla_put_failure;
 
+   if (nla_put_u8(skb, IFLA_GENEVE_TTL, geneve-ttl))
+   goto nla_put_failure;
+
return 0;
 
 nla_put_failure:
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 2ca17d1cff3f..6edb8d268d58 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -395,6 +395,7 @@ enum {
IFLA_GENEVE_UNSPEC,
IFLA_GENEVE_ID,
IFLA_GENEVE_REMOTE,
+   IFLA_GENEVE_TTL,
__IFLA_GENEVE_MAX
 };
 #define IFLA_GENEVE_MAX(__IFLA_GENEVE_MAX - 1)
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] rocker: remove rocker parameter from functions that have rocker_port parameter

2015-06-01 Thread Andy Gospodarek

On Mon, Jun 01, 2015 at 01:25:04PM +0900, Simon Horman wrote:
 The rocker (switch) of a rocker_port may be trivially obtained from
 the latter it seems cleaner not to pass the former to a function when
 the latter is being passed anyway.
Excellent idea and commonly used in many other hardware drivers.

 
 rocker_port_rx_proc() is omitted from this change as it is a hot path case.
 
 Signed-off-by: Simon Horman simon.hor...@netronome.com
 Acked-by: Scott Feldman sfel...@gmail.com
Acked-by: Andy Gospodarek go...@cumulusnetworks.com

 
 ---
 v2
 * Dropped RFC designation
 * Omit rocker_port_rx_proc() from this change as it is a hot path case,
   as suggested by Scott Feldman
 * Added Scott Feldman's Ack
 ---
  drivers/net/ethernet/rocker/rocker.c | 115 
 ++-
  1 file changed, 45 insertions(+), 70 deletions(-)
 
 diff --git a/drivers/net/ethernet/rocker/rocker.c 
 b/drivers/net/ethernet/rocker/rocker.c
 index 36f7edfc3c7a..d246647b3653 100644
 --- a/drivers/net/ethernet/rocker/rocker.c
 +++ b/drivers/net/ethernet/rocker/rocker.c
 @@ -1172,11 +1172,11 @@ static void rocker_dma_rings_fini(struct rocker 
 *rocker)
   rocker_dma_ring_destroy(rocker, rocker-cmd_ring);
  }
  
 -static int rocker_dma_rx_ring_skb_map(const struct rocker *rocker,
 -   const struct rocker_port *rocker_port,
 +static int rocker_dma_rx_ring_skb_map(const struct rocker_port *rocker_port,
 struct rocker_desc_info *desc_info,
 struct sk_buff *skb, size_t buf_len)
  {
 + const struct rocker *rocker = rocker_port-rocker;
   struct pci_dev *pdev = rocker-pdev;
   dma_addr_t dma_handle;
  
 @@ -1201,8 +1201,7 @@ static size_t rocker_port_rx_buf_len(const struct 
 rocker_port *rocker_port)
   return rocker_port-dev-mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
  }
  
 -static int rocker_dma_rx_ring_skb_alloc(const struct rocker *rocker,
 - const struct rocker_port *rocker_port,
 +static int rocker_dma_rx_ring_skb_alloc(const struct rocker_port 
 *rocker_port,
   struct rocker_desc_info *desc_info)
  {
   struct net_device *dev = rocker_port-dev;
 @@ -1219,8 +1218,7 @@ static int rocker_dma_rx_ring_skb_alloc(const struct 
 rocker *rocker,
   skb = netdev_alloc_skb_ip_align(dev, buf_len);
   if (!skb)
   return -ENOMEM;
 - err = rocker_dma_rx_ring_skb_map(rocker, rocker_port, desc_info,
 -  skb, buf_len);
 + err = rocker_dma_rx_ring_skb_map(rocker_port, desc_info, skb, buf_len);
   if (err) {
   dev_kfree_skb_any(skb);
   return err;
 @@ -1257,15 +1255,15 @@ static void rocker_dma_rx_ring_skb_free(const struct 
 rocker *rocker,
   dev_kfree_skb_any(skb);
  }
  
 -static int rocker_dma_rx_ring_skbs_alloc(const struct rocker *rocker,
 -  const struct rocker_port *rocker_port)
 +static int rocker_dma_rx_ring_skbs_alloc(const struct rocker_port 
 *rocker_port)
  {
   const struct rocker_dma_ring_info *rx_ring = rocker_port-rx_ring;
 + const struct rocker *rocker = rocker_port-rocker;
   int i;
   int err;
  
   for (i = 0; i  rx_ring-size; i++) {
 - err = rocker_dma_rx_ring_skb_alloc(rocker, rocker_port,
 + err = rocker_dma_rx_ring_skb_alloc(rocker_port,
  rx_ring-desc_info[i]);
   if (err)
   goto rollback;
 @@ -1278,10 +1276,10 @@ rollback:
   return err;
  }
  
 -static void rocker_dma_rx_ring_skbs_free(const struct rocker *rocker,
 -  const struct rocker_port *rocker_port)
 +static void rocker_dma_rx_ring_skbs_free(const struct rocker_port 
 *rocker_port)
  {
   const struct rocker_dma_ring_info *rx_ring = rocker_port-rx_ring;
 + const struct rocker *rocker = rocker_port-rocker;
   int i;
  
   for (i = 0; i  rx_ring-size; i++)
 @@ -1327,7 +1325,7 @@ static int rocker_port_dma_rings_init(struct 
 rocker_port *rocker_port)
   goto err_dma_rx_ring_bufs_alloc;
   }
  
 - err = rocker_dma_rx_ring_skbs_alloc(rocker, rocker_port);
 + err = rocker_dma_rx_ring_skbs_alloc(rocker_port);
   if (err) {
   netdev_err(rocker_port-dev, failed to alloc rx dma ring 
 skbs\n);
   goto err_dma_rx_ring_skbs_alloc;
 @@ -1353,7 +1351,7 @@ static void rocker_port_dma_rings_fini(struct 
 rocker_port *rocker_port)
  {
   struct rocker *rocker = rocker_port-rocker;
  
 - rocker_dma_rx_ring_skbs_free(rocker, rocker_port);
 + rocker_dma_rx_ring_skbs_free(rocker_port);
   rocker_dma_ring_bufs_free(rocker, rocker_port-rx_ring,
 PCI_DMA_BIDIRECTIONAL);
   rocker_dma_ring_destroy(rocker, rocker_port-rx_ring);
 @@ -1588,22 +1586,20 @@ static

Re: [PATCH net-next v2 01/14] sfc: Add code to export port_num in netdev-dev_port

2015-06-01 Thread David Miller

From: Shradha Shah ss...@solarflare.com
Date: Mon, 1 Jun 2015 14:00:12 +0100

 In the case where we have multiple functions (PFs and VFs), this
 sysfs entry is useful to identify the physical port corresponding
 to the function we are interested in.

 Signed-off-by: Shradha Shah ss...@solarflare.com

This is a low effort change.

You retained all of the error handling changes that were only necessary when
you added the new sysfs file, but are completely unnecessary if you're
just reporting it via netdev-dev_port.

This is extremely disappointing, because you expect me to put a good effort
into reviewing your changes yet you aren't putting that level of effort into
the submission itself.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 183 matches

Mail list logo