[PATCH] net: phy: sfp: remove sfp_mutex's definition

2018-10-11 Thread Sebastian Andrzej Siewior
The sfp_mutex variable is defined but never used in this file. Not even
in the commit that introduced that variable.

Remove sfp_mutex, it has no purpose.

Cc: Andrew Lunn 
Cc: Florian Fainelli 
Cc: "David S. Miller" 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/net/phy/sfp.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 6e13b8832bc7d..fd8bb998ae52d 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -163,8 +163,6 @@ static const enum gpiod_flags gpio_flags[] = {
 /* Give this long for the PHY to reset. */
 #define T_PHY_RESET_MS 50
 
-static DEFINE_MUTEX(sfp_mutex);
-
 struct sff_data {
unsigned int gpios;
bool (*module_supported)(const struct sfp_eeprom_id *id);
-- 
2.19.1



locking in wimax/i2400m

2018-06-11 Thread Sebastian Andrzej Siewior
I tried to figure out if the URB-completion handler uses any locking and
stumbled here.

i2400m_pm_notifier() is called from process context. This function
invokes i2400m_fw_cache() + i2400m_fw_uncache(). Both functions do 
spin_lock(>rx_lock);
while in other places (say i2400mu_rxd()) it does 
spin_lock_irqsave(>rx_lock, flags);

So what do I miss? Is this lock never used in interrupt context and
lockdep didn't complain or did nobody try suspend with this driver
before?
>From what I can tell i2400m_dev_bootstrap() has the same locking
problem. 

While here, I noticed that i2400m_fw_cache() does use GFP_ATOMIC which
should be GFP_KERNEL since the context can't be atomic at this point
(there is even request_firmware() later on).

I am also curious why there is NULL and ~0 because it does not seem to
make a difference in i2400m_fw_uncache() but I'm more interested in
the locking bits :)

Sebastian


Re: [RFC PATCH 2/2] net: mac808211: mac802154: use lockdep_assert_in_softirq() instead own warning

2018-05-04 Thread Sebastian Andrzej Siewior
On 2018-05-04 20:51:32 [+0200], Peter Zijlstra wrote:
> On Fri, May 04, 2018 at 08:45:39PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2018-05-04 20:32:49 [+0200], Peter Zijlstra wrote:
> > > On Fri, May 04, 2018 at 07:51:44PM +0200, Sebastian Andrzej Siewior wrote:
> > > > From: Anna-Maria Gleixner <anna-ma...@linutronix.de>
> > > > 
> > > > The warning in ieee802154_rx() and ieee80211_rx_napi() is there to 
> > > > ensure
> > > > the softirq context for the subsequent netif_receive_skb() call. 
> > > 
> > > That's not in fact what it does though; so while that might indeed be
> > > the intent that's not what it does.
> > 
> > It was introduced in commit d20ef63d3246 ("mac80211: document
> > ieee80211_rx() context requirement"):
> > 
> > mac80211: document ieee80211_rx() context requirement
> > 
> > ieee80211_rx() must be called with softirqs disabled
> 
> softirqs disabled, ack that is exactly what it checks.
> 
> But afaict the assertion you introduced tests that we are _in_ softirq
> context, which is not the same.

indeed, now it clicked. Given what I wrote in the cover letter would you
be in favour of (a proper) lockdep_assert_BH_disabled() or the cheaper
local_bh_enable() (assuming the network folks don't mind the cheaper
version)?

Sebastian


Re: [RFC PATCH 2/2] net: mac808211: mac802154: use lockdep_assert_in_softirq() instead own warning

2018-05-04 Thread Sebastian Andrzej Siewior
On 2018-05-04 20:32:49 [+0200], Peter Zijlstra wrote:
> On Fri, May 04, 2018 at 07:51:44PM +0200, Sebastian Andrzej Siewior wrote:
> > From: Anna-Maria Gleixner <anna-ma...@linutronix.de>
> > 
> > The warning in ieee802154_rx() and ieee80211_rx_napi() is there to ensure
> > the softirq context for the subsequent netif_receive_skb() call. 
> 
> That's not in fact what it does though; so while that might indeed be
> the intent that's not what it does.

It was introduced in commit d20ef63d3246 ("mac80211: document
ieee80211_rx() context requirement"):

mac80211: document ieee80211_rx() context requirement

ieee80211_rx() must be called with softirqs disabled
since the networking stack requires this for netif_rx()
and some code in mac80211 can assume that it can not
be processing its own tasklet and this call at the same
time.

It may be possible to remove this requirement after a
careful audit of mac80211 and doing any needed locking
improvements in it along with disabling softirqs around
netif_rx(). An alternative might be to push all packet
processing to process context in mac80211, instead of
to the tasklet, and add other synchronisation.

Sebastian


[RFC PATCH 2/2] net: mac808211: mac802154: use lockdep_assert_in_softirq() instead own warning

2018-05-04 Thread Sebastian Andrzej Siewior
From: Anna-Maria Gleixner <anna-ma...@linutronix.de>

The warning in ieee802154_rx() and ieee80211_rx_napi() is there to ensure
the softirq context for the subsequent netif_receive_skb() call. The check
could be moved into the netif_receive_skb() function to prevent all calling
functions implement the checks on their own. Use the lockdep variant for
softirq context check. While at it, add a lockdep based check for irq
enabled as mentioned in the comment above netif_receive_skb().

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 net/core/dev.c | 3 +++
 net/mac80211/rx.c  | 2 --
 net/mac802154/rx.c | 2 --
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index af0558b00c6c..7b4b8611cc21 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4750,6 +4750,9 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
  */
 int netif_receive_skb(struct sk_buff *skb)
 {
+   lockdep_assert_irqs_enabled();
+   lockdep_assert_in_softirq();
+
trace_netif_receive_skb_entry(skb);
 
return netif_receive_skb_internal(skb);
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 03102aff0953..653332d81c17 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -4324,8 +4324,6 @@ void ieee80211_rx_napi(struct ieee80211_hw *hw, struct 
ieee80211_sta *pubsta,
struct ieee80211_supported_band *sband;
struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
 
-   WARN_ON_ONCE(softirq_count() == 0);
-
if (WARN_ON(status->band >= NUM_NL80211_BANDS))
goto drop;
 
diff --git a/net/mac802154/rx.c b/net/mac802154/rx.c
index 4dcf6e18563a..66916c270efc 100644
--- a/net/mac802154/rx.c
+++ b/net/mac802154/rx.c
@@ -258,8 +258,6 @@ void ieee802154_rx(struct ieee802154_local *local, struct 
sk_buff *skb)
 {
u16 crc;
 
-   WARN_ON_ONCE(softirq_count() == 0);
-
if (local->suspended)
goto drop;
 
-- 
2.17.0



[RFC PATCH 0/2] Introduce assert_in_softirq()

2018-05-04 Thread Sebastian Andrzej Siewior
ieee80211_rx_napi() has a check to ensure that it is invoked in softirq
context / with BH disabled. It is there because it invokes
netif_receive_skb() which has this requirement.
On -RT this check does not work as expected so there is always this
warning.
Tree wide there are two users of this check: ieee80211_rx_napi() and
ieee802154_rx(). This approach introduces assert_in_softirq() which does
the check if lockdep is enabled. This check could then become a nop on
-RT.

As an alternative netif_receive_skb() (or ieee80211_rx_napi() could do
local_bh_disable() / local_bh_enable() unconditionally. The _disable()
part is very cheap. The _enable() part is more expensive because it
includes a function call. We could avoid that jump in the likely case
when BH was already disabled by something like:

 static inline void local_bh_enable(void)
 {
 if (softirq_count() == SOFTIRQ_DISABLE_OFFSET)
 __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET);
 else
 preempt_count_sub(SOFTIRQ_DISABLE_OFFSET);
 }

Which would make bh_enable() cheaper for everyone.

Sebastian




[RFC PATCH 1/2] lockdep: Add a assert_in_softirq()

2018-05-04 Thread Sebastian Andrzej Siewior
From: Anna-Maria Gleixner <anna-ma...@linutronix.de>

Instead of directly warn on wrong context, check if softirq context is
set. This check could be a nop on RT.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 include/linux/lockdep.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..59363c3047c6 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -608,11 +608,17 @@ do {  
\
  "IRQs not disabled as expected\n");   \
} while (0)
 
+#define lockdep_assert_in_softirq()do {\
+   WARN_ONCE(debug_locks && !current->lockdep_recursion && \
+ !current->softirq_context,\
+ "Not in softirq context as expected\n");  \
+   } while (0)
 #else
 # define might_lock(lock) do { } while (0)
 # define might_lock_read(lock) do { } while (0)
 # define lockdep_assert_irqs_enabled() do { } while (0)
 # define lockdep_assert_irqs_disabled() do { } while (0)
+# define lockdep_assert_in_softirq() do { } while (0)
 #endif
 
 #ifdef CONFIG_LOCKDEP
-- 
2.17.0



Re: [PATCH 2/4] net: 3com: 3c59x: Move boomerang/vortex conditional into function

2018-05-04 Thread Sebastian Andrzej Siewior
On 2018-05-04 17:17:47 [+0200], To netdev@vger.kernel.org wrote:
> From: Anna-Maria Gleixner 
> 
> If vp->full_bus_master_tx is set, vp->full_bus_master_rx is set as well
> (see vortex_probe1()). Therefore the conditionals for the decision if
> boomerang or vortex ISR is executed have the same result. Instead of
> repeating the explicit conditional execution of the boomerang/vortex ISR,
> move it into an own function.
> 
> No functional change.
> 
> Cc: Steffen Klassert 

Steffen, this email address is still listed in the MAINTAINERS file for
the driver and rejects emails.

Sebastian


[PATCH 1/4] net: u64_stats_sync: Remove functions without user

2018-05-04 Thread Sebastian Andrzej Siewior
From: Anna-Maria Gleixner <anna-ma...@linutronix.de>

Commit 67db3e4bfbc9 ("tcp: no longer hold ehash lock while calling
tcp_get_info()") removes the only users of u64_stats_update_end/begin_raw()
without removing the function in header file.

Remove no longer used functions.

Cc: Eric Dumazet <eduma...@google.com>
Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 include/linux/u64_stats_sync.h | 14 --
 1 file changed, 14 deletions(-)

diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index 07ee0f84a46c..a27604f99ed0 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -112,20 +112,6 @@ u64_stats_update_end_irqrestore(struct u64_stats_sync 
*syncp,
 #endif
 }
 
-static inline void u64_stats_update_begin_raw(struct u64_stats_sync *syncp)
-{
-#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
-   raw_write_seqcount_begin(>seq);
-#endif
-}
-
-static inline void u64_stats_update_end_raw(struct u64_stats_sync *syncp)
-{
-#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
-   raw_write_seqcount_end(>seq);
-#endif
-}
-
 static inline unsigned int __u64_stats_fetch_begin(const struct u64_stats_sync 
*syncp)
 {
 #if BITS_PER_LONG==32 && defined(CONFIG_SMP)
-- 
2.17.0



[PATCH 4/4] net: 3com: 3c59x: irq save variant of ISR

2018-05-04 Thread Sebastian Andrzej Siewior
From: Anna-Maria Gleixner <anna-ma...@linutronix.de>

When vortex_boomerang_interrupt() is invoked from vortex_tx_timeout() or
poll_vortex() interrupts must be disabled. This detaches the interrupt
disable logic from locking which requires patching for PREEMPT_RT.

The advantage of avoiding spin_lock_irqsave() in the interrupt handler is
minimal, but converting it removes all the extra code for callers which
come not from interrupt context.

Cc: Steffen Klassert <klass...@mathematik.tu-chemnitz.de>
Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 drivers/net/ethernet/3com/3c59x.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/3com/3c59x.c 
b/drivers/net/ethernet/3com/3c59x.c
index fdafe25da153..cabbe227bb98 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -839,10 +839,7 @@ MODULE_PARM_DESC(use_mmio, "3c59x: use memory-mapped PCI 
I/O resource (0-1)");
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void poll_vortex(struct net_device *dev)
 {
-   unsigned long flags;
-   local_irq_save(flags);
vortex_boomerang_interrupt(dev->irq, dev);
-   local_irq_restore(flags);
 }
 #endif
 
@@ -1904,15 +1901,7 @@ static void vortex_tx_timeout(struct net_device *dev)
pr_err("%s: Interrupt posted but not delivered --"
   " IRQ blocked by another device?\n", dev->name);
/* Bad idea here.. but we might as well handle a few events. */
-   {
-   /*
-* Block interrupts because vortex_interrupt does a 
bare spin_lock()
-*/
-   unsigned long flags;
-   local_irq_save(flags);
-   vortex_boomerang_interrupt(dev->irq, dev);
-   local_irq_restore(flags);
-   }
+   vortex_boomerang_interrupt(dev->irq, dev);
}
 
if (vortex_debug > 0)
@@ -2516,16 +2505,17 @@ vortex_boomerang_interrupt(int irq, void *dev_id)
 {
struct net_device *dev = dev_id;
struct vortex_private *vp = netdev_priv(dev);
+   unsigned long flags;
irqreturn_t ret;
 
-   spin_lock(>lock);
+   spin_lock_irqsave(>lock, flags);
 
if (vp->full_bus_master_rx)
ret = _boomerang_interrupt(dev->irq, dev);
else
ret = _vortex_interrupt(dev->irq, dev);
 
-   spin_unlock(>lock);
+   spin_unlock_irqrestore(>lock, flags);
 
return ret;
 }
-- 
2.17.0



[PATCH 3/4] net: 3com: 3c59x: Pull locking out of ISR

2018-05-04 Thread Sebastian Andrzej Siewior
From: Anna-Maria Gleixner <anna-ma...@linutronix.de>

Locking is done in the same way in _vortex_interrupt() and
_boomerang_interrupt(). To prevent duplication, move the locking into the
calling vortex_boomerang_interrupt() function.

No functional change.

Cc: Steffen Klassert <klass...@mathematik.tu-chemnitz.de>
Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 drivers/net/ethernet/3com/3c59x.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/3com/3c59x.c 
b/drivers/net/ethernet/3com/3c59x.c
index 0cfdb07f3e59..fdafe25da153 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -2273,7 +2273,6 @@ _vortex_interrupt(int irq, struct net_device *dev)
unsigned int bytes_compl = 0, pkts_compl = 0;
 
ioaddr = vp->ioaddr;
-   spin_lock(>lock);
 
status = ioread16(ioaddr + EL3_STATUS);
 
@@ -2371,7 +2370,6 @@ _vortex_interrupt(int irq, struct net_device *dev)
pr_debug("%s: exiting interrupt, status %4.4x.\n",
   dev->name, status);
 handler_exit:
-   spin_unlock(>lock);
return IRQ_RETVAL(handled);
 }
 
@@ -2392,12 +2390,6 @@ _boomerang_interrupt(int irq, struct net_device *dev)
 
ioaddr = vp->ioaddr;
 
-
-   /*
-* It seems dopey to put the spinlock this early, but we could race 
against vortex_tx_timeout
-* and boomerang_start_xmit
-*/
-   spin_lock(>lock);
vp->handling_irq = 1;
 
status = ioread16(ioaddr + EL3_STATUS);
@@ -2516,7 +2508,6 @@ _boomerang_interrupt(int irq, struct net_device *dev)
   dev->name, status);
 handler_exit:
vp->handling_irq = 0;
-   spin_unlock(>lock);
return IRQ_RETVAL(handled);
 }
 
@@ -2525,11 +2516,18 @@ vortex_boomerang_interrupt(int irq, void *dev_id)
 {
struct net_device *dev = dev_id;
struct vortex_private *vp = netdev_priv(dev);
+   irqreturn_t ret;
+
+   spin_lock(>lock);
 
if (vp->full_bus_master_rx)
-   return _boomerang_interrupt(dev->irq, dev);
+   ret = _boomerang_interrupt(dev->irq, dev);
else
-   return _vortex_interrupt(dev->irq, dev);
+   ret = _vortex_interrupt(dev->irq, dev);
+
+   spin_unlock(>lock);
+
+   return ret;
 }
 
 static int vortex_rx(struct net_device *dev)
-- 
2.17.0



[PATCH 2/4] net: 3com: 3c59x: Move boomerang/vortex conditional into function

2018-05-04 Thread Sebastian Andrzej Siewior
From: Anna-Maria Gleixner <anna-ma...@linutronix.de>

If vp->full_bus_master_tx is set, vp->full_bus_master_rx is set as well
(see vortex_probe1()). Therefore the conditionals for the decision if
boomerang or vortex ISR is executed have the same result. Instead of
repeating the explicit conditional execution of the boomerang/vortex ISR,
move it into an own function.

No functional change.

Cc: Steffen Klassert <klass...@mathematik.tu-chemnitz.de>
Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 drivers/net/ethernet/3com/3c59x.c | 34 ++-
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/3com/3c59x.c 
b/drivers/net/ethernet/3com/3c59x.c
index 36c8950dbd2d..0cfdb07f3e59 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -765,8 +765,9 @@ static netdev_tx_t boomerang_start_xmit(struct sk_buff *skb,
struct net_device *dev);
 static int vortex_rx(struct net_device *dev);
 static int boomerang_rx(struct net_device *dev);
-static irqreturn_t vortex_interrupt(int irq, void *dev_id);
-static irqreturn_t boomerang_interrupt(int irq, void *dev_id);
+static irqreturn_t vortex_boomerang_interrupt(int irq, void *dev_id);
+static irqreturn_t _vortex_interrupt(int irq, struct net_device *dev);
+static irqreturn_t _boomerang_interrupt(int irq, struct net_device *dev);
 static int vortex_close(struct net_device *dev);
 static void dump_tx_ring(struct net_device *dev);
 static void update_stats(void __iomem *ioaddr, struct net_device *dev);
@@ -838,10 +839,9 @@ MODULE_PARM_DESC(use_mmio, "3c59x: use memory-mapped PCI 
I/O resource (0-1)");
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void poll_vortex(struct net_device *dev)
 {
-   struct vortex_private *vp = netdev_priv(dev);
unsigned long flags;
local_irq_save(flags);
-   (vp->full_bus_master_rx ? 
boomerang_interrupt:vortex_interrupt)(dev->irq,dev);
+   vortex_boomerang_interrupt(dev->irq, dev);
local_irq_restore(flags);
 }
 #endif
@@ -1729,8 +1729,7 @@ vortex_open(struct net_device *dev)
dma_addr_t dma;
 
/* Use the now-standard shared IRQ implementation. */
-   if ((retval = request_irq(dev->irq, vp->full_bus_master_rx ?
-   boomerang_interrupt : vortex_interrupt, 
IRQF_SHARED, dev->name, dev))) {
+   if ((retval = request_irq(dev->irq, vortex_boomerang_interrupt, 
IRQF_SHARED, dev->name, dev))) {
pr_err("%s: Could not reserve IRQ %d\n", dev->name, dev->irq);
goto err;
}
@@ -1911,10 +1910,7 @@ static void vortex_tx_timeout(struct net_device *dev)
 */
unsigned long flags;
local_irq_save(flags);
-   if (vp->full_bus_master_tx)
-   boomerang_interrupt(dev->irq, dev);
-   else
-   vortex_interrupt(dev->irq, dev);
+   vortex_boomerang_interrupt(dev->irq, dev);
local_irq_restore(flags);
}
}
@@ -2267,9 +2263,8 @@ boomerang_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
  */
 
 static irqreturn_t
-vortex_interrupt(int irq, void *dev_id)
+_vortex_interrupt(int irq, struct net_device *dev)
 {
-   struct net_device *dev = dev_id;
struct vortex_private *vp = netdev_priv(dev);
void __iomem *ioaddr;
int status;
@@ -2386,9 +2381,8 @@ vortex_interrupt(int irq, void *dev_id)
  */
 
 static irqreturn_t
-boomerang_interrupt(int irq, void *dev_id)
+_boomerang_interrupt(int irq, struct net_device *dev)
 {
-   struct net_device *dev = dev_id;
struct vortex_private *vp = netdev_priv(dev);
void __iomem *ioaddr;
int status;
@@ -2526,6 +2520,18 @@ boomerang_interrupt(int irq, void *dev_id)
return IRQ_RETVAL(handled);
 }
 
+static irqreturn_t
+vortex_boomerang_interrupt(int irq, void *dev_id)
+{
+   struct net_device *dev = dev_id;
+   struct vortex_private *vp = netdev_priv(dev);
+
+   if (vp->full_bus_master_rx)
+   return _boomerang_interrupt(dev->irq, dev);
+   else
+   return _vortex_interrupt(dev->irq, dev);
+}
+
 static int vortex_rx(struct net_device *dev)
 {
struct vortex_private *vp = netdev_priv(dev);
-- 
2.17.0



3c59x patches and the removal of an unused function

2018-05-04 Thread Sebastian Andrzej Siewior
The first patch removes an unused function. The goal of remaining three
patches is to get rid of the local_irq_save() usage in the driver which
benefits -RT.

Sebastian



[PATCH REPOST] xen/9pfs: don't inclide rwlock.h directly.

2018-05-04 Thread Sebastian Andrzej Siewior
rwlock.h should not be included directly. Instead linux/splinlock.h
should be included. One thing it does is to break the RT build.

Cc: Eric Van Hensbergen <eri...@gmail.com>
Cc: Ron Minnich <rminn...@sandia.gov>
Cc: Latchesar Ionkov <lu...@ionkov.net>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: v9fs-develo...@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 net/9p/trans_xen.c |1 -
 1 file changed, 1 deletion(-)

--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -38,7 +38,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 


[PATCH] 8139too: Use disable_irq_nosync() in rtl8139_poll_controller()

2018-05-02 Thread Sebastian Andrzej Siewior
From: Ingo Molnar <mi...@elte.hu>

Use disable_irq_nosync() instead of disable_irq() as this might be
called in atomic context with netpoll.

Signed-off-by: Ingo Molnar <mi...@elte.hu>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 drivers/net/ethernet/realtek/8139too.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/8139too.c 
b/drivers/net/ethernet/realtek/8139too.c
index d24b47b8e0b2..d118da5a10a2 100644
--- a/drivers/net/ethernet/realtek/8139too.c
+++ b/drivers/net/ethernet/realtek/8139too.c
@@ -2224,7 +2224,7 @@ static void rtl8139_poll_controller(struct net_device 
*dev)
struct rtl8139_private *tp = netdev_priv(dev);
const int irq = tp->pci_dev->irq;
 
-   disable_irq(irq);
+   disable_irq_nosync(irq);
rtl8139_interrupt(irq, dev);
enable_irq(irq);
 }
-- 
2.17.0



Re: [RFC] please clarify local_irq_disable() in pcpu_freelist_populate()

2017-10-29 Thread Sebastian Andrzej Siewior
On 2017-10-27 19:18:40 [-0700], Alexei Starovoitov wrote:
> pcpu_freelist_push() is called by bpf programs from atomic context.

so raw would still be correct because the content is locked.

> lockdep thinks that __pcpu_freelist_push() can be called recursively
> in the middle of pcpu_freelist_populate's loop and will deadlock
> which is not the case here. That's why local_irq_save() is there.
> Just to silence lockdep.

do you mind giving me bunch of test-cases so I can test it myself?

> While developing pcpu_freelist I've benchmarked many different
> approaches. Some of the numbers are in
> commit 6c9059817432 ("bpf: pre-allocate hash map elements")
> iirc I passed on llist, since llist_del_first still needs a lock,
> so doesn't really help.

Okay. I will look that up.

Sebastian


[RFC] please clarify local_irq_disable() in pcpu_freelist_populate()

2017-10-27 Thread Sebastian Andrzej Siewior
Hi,

while looking at other things here I stumbled at this in
kernel/bpf/percpu_freelist.c:

|void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
|u32 nr_elems)
|{
…
|/* disable irq to workaround lockdep false positive
| * in bpf usage pcpu_freelist_populate() will never race
| * with pcpu_freelist_push()
| */
|local_irq_save(flags); 
|for_each_possible_cpu(cpu) {
|again:
|head = per_cpu_ptr(s->freelist, cpu);
|__pcpu_freelist_push(head, buf); 
…
|}
|local_irq_restore(flags);
|}

and then we have

| static inline void __pcpu_freelist_push(struct pcpu_freelist_head *head,
| struct pcpu_freelist_node *node)
| {
| raw_spin_lock(>lock);
| node->next = head->first;
| head->first = node;
| raw_spin_unlock(>lock);
| }

I don't see how any of this can race with pcpu_freelist_push():

|void pcpu_freelist_push(struct pcpu_freelist *s,
|struct pcpu_freelist_node *node)
|{
|struct pcpu_freelist_head *head = this_cpu_ptr(s->freelist);
|
|__pcpu_freelist_push(head, node);
|}

I *think* the problem is using this_cpu_ptr() in non-atomic context
which splats a warning CONFIG_DEBUG_PREEMPT and has nothing todo with
lockdep. However pcpu_freelist_populate() is not using
pcpu_freelist_push() so I remain clueless.
__pcpu_freelist_push() adds an item (node) to the list head (head) and
this head is protected with a spin_lock.

I *think* pcpu_freelist_push() can use raw_cpu_ptr() instead and the
local_irq_save() can go away (with __pcpu_freelist_push() using a
raw_spin_lock_irqsafe() instead).

On the other hand, using llist instead would probably eliminate the need
for the lock in ->head since llist_add() and llist_del_first() is
lockless and serve the same purpose.

Sebastian


[PATCH 3/3] xen/9pfs: don't inclide rwlock.h directly.

2017-10-05 Thread Sebastian Andrzej Siewior
rwlock.h should not be included directly. Instead linux/splinlock.h
should be included. One thing it does is to break the RT build.

Cc: Eric Van Hensbergen <eri...@gmail.com>
Cc: Ron Minnich <rminn...@sandia.gov>
Cc: Latchesar Ionkov <lu...@ionkov.net>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: v9fs-develo...@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 net/9p/trans_xen.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index 6ad3e043c617..02c6c467a99c 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -38,7 +38,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.14.2



Re: 915f3e3f76 ("mac80211_hwsim: Replace bogus hrtimer clockid"): BUG: kernel reboot-without-warning in test stage

2017-09-05 Thread Sebastian Andrzej Siewior
On 2017-09-05 09:12:40 [+0200], Thomas Gleixner wrote:
> Sorry, no. That bisect is completely bogus. The commit in question merily
> replaces the unsupported clockid with a valid one.

The bisect is correct. It just has problems to express itself properly. So
the table says:

| WARNING:at_kernel/time/hrtimer.c:#hrtimer_init  | 7   | ||   |

| BUG:kernel_reboot-without-warning_in_test_stage | 0   | 12  | 6  | 6 |


which means _before_ your commit it counted a warning in hrtimer_init()
(an unsupported clock id was used). With your commit, the warning was
gone and I *think* the userland then printed
"BUG:kernel_reboot-without-warning_in_test_stage" because it had no
warning.
It seems that the bot learned to live with that warning which was around
for more than three years. Now that you removed it, it seems to be a
mistake to do so because nobody complained about it so far.

> Thanks,
> 
>   tglx

Sebastian


[PATCH 24/25 v2] net/cdc_ncm: Replace tasklet with softirq hrtimer

2017-09-05 Thread Sebastian Andrzej Siewior
From: Thomas Gleixner <t...@linutronix.de>

The bh tasklet is used in invoke the hrtimer (cdc_ncm_tx_timer_cb) in
softirq context. This can be also achieved without the tasklet but with
CLOCK_MONOTONIC_SOFT as hrtimer base.

Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Oliver Neukum <oli...@neukum.org>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: linux-...@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
[bigeasy: using usb_get_intfdata() as suggested by Bjørn Mork]
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
On 2017-08-31 15:57:04 [+0200], Bjørn Mork wrote:
> I believe the struct usbnet pointer is redundant.  We already have lots
> of pointers back and forth here.  This should work, but is not tested:
> 
>   struct usbnet *dev = usb_get_intfdata(ctx->control):

I think so, too. Still untested as I don't have a working gadget around.

v1…v2: Updated as suggested by Bjørn and added Greg's Acked-by.

 drivers/net/usb/cdc_ncm.c   | 36 +++-
 include/linux/usb/cdc_ncm.h |  1 -
 2 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 8f572b9f3625..42f7bd90e6a4 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -61,7 +61,6 @@ static bool prefer_mbim;
 module_param(prefer_mbim, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(prefer_mbim, "Prefer MBIM setting on dual NCM/MBIM 
functions");
 
-static void cdc_ncm_txpath_bh(unsigned long param);
 static void cdc_ncm_tx_timeout_start(struct cdc_ncm_ctx *ctx);
 static enum hrtimer_restart cdc_ncm_tx_timer_cb(struct hrtimer *hr_timer);
 static struct usb_driver cdc_ncm_driver;
@@ -777,10 +776,8 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct 
usb_interface *intf, u8 data_
if (!ctx)
return -ENOMEM;
 
-   hrtimer_init(>tx_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   hrtimer_init(>tx_timer, CLOCK_MONOTONIC_SOFT, HRTIMER_MODE_REL);
ctx->tx_timer.function = _ncm_tx_timer_cb;
-   ctx->bh.data = (unsigned long)dev;
-   ctx->bh.func = cdc_ncm_txpath_bh;
atomic_set(>stop, 0);
spin_lock_init(>mtx);
 
@@ -967,10 +964,7 @@ void cdc_ncm_unbind(struct usbnet *dev, struct 
usb_interface *intf)
 
atomic_set(>stop, 1);
 
-   if (hrtimer_active(>tx_timer))
-   hrtimer_cancel(>tx_timer);
-
-   tasklet_kill(>bh);
+   hrtimer_cancel(>tx_timer);
 
/* handle devices with combined control and data interface */
if (ctx->control == ctx->data)
@@ -1348,20 +1342,9 @@ static void cdc_ncm_tx_timeout_start(struct cdc_ncm_ctx 
*ctx)
HRTIMER_MODE_REL);
 }
 
-static enum hrtimer_restart cdc_ncm_tx_timer_cb(struct hrtimer *timer)
+static void cdc_ncm_txpath_bh(struct cdc_ncm_ctx *ctx)
 {
-   struct cdc_ncm_ctx *ctx =
-   container_of(timer, struct cdc_ncm_ctx, tx_timer);
-
-   if (!atomic_read(>stop))
-   tasklet_schedule(>bh);
-   return HRTIMER_NORESTART;
-}
-
-static void cdc_ncm_txpath_bh(unsigned long param)
-{
-   struct usbnet *dev = (struct usbnet *)param;
-   struct cdc_ncm_ctx *ctx = (struct cdc_ncm_ctx *)dev->data[0];
+   struct usbnet *dev = usb_get_intfdata(ctx->control);
 
spin_lock_bh(>mtx);
if (ctx->tx_timer_pending != 0) {
@@ -1379,6 +1362,17 @@ static void cdc_ncm_txpath_bh(unsigned long param)
}
 }
 
+static enum hrtimer_restart cdc_ncm_tx_timer_cb(struct hrtimer *timer)
+{
+   struct cdc_ncm_ctx *ctx =
+   container_of(timer, struct cdc_ncm_ctx, tx_timer);
+
+   if (!atomic_read(>stop))
+   cdc_ncm_txpath_bh(ctx);
+
+   return HRTIMER_NORESTART;
+}
+
 struct sk_buff *
 cdc_ncm_tx_fixup(struct usbnet *dev, struct sk_buff *skb, gfp_t flags)
 {
diff --git a/include/linux/usb/cdc_ncm.h b/include/linux/usb/cdc_ncm.h
index 1a59699cf82a..62b506fddf8d 100644
--- a/include/linux/usb/cdc_ncm.h
+++ b/include/linux/usb/cdc_ncm.h
@@ -92,7 +92,6 @@
 struct cdc_ncm_ctx {
struct usb_cdc_ncm_ntb_parameters ncm_parm;
struct hrtimer tx_timer;
-   struct tasklet_struct bh;
 
const struct usb_cdc_ncm_desc *func_desc;
const struct usb_cdc_mbim_desc *mbim_desc;
-- 
2.14.1



[PATCH] rxrpc: remove unused static variables

2017-06-27 Thread Sebastian Andrzej Siewior
The rxrpc_security_methods and rxrpc_security_sem user has been removed
in 648af7fca159 ("rxrpc: Absorb the rxkad security module"). This was
noticed by kbuild test robot for the -RT tree but is also true for !RT.

Reported-by: kbuild test robot <fengguang...@intel.com>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: David Howells <dhowe...@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 net/rxrpc/security.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/rxrpc/security.c b/net/rxrpc/security.c
index 7d921e56e715..13df56a738e5 100644
--- a/net/rxrpc/security.c
+++ b/net/rxrpc/security.c
@@ -19,9 +19,6 @@
 #include 
 #include "ar-internal.h"
 
-static LIST_HEAD(rxrpc_security_methods);
-static DECLARE_RWSEM(rxrpc_security_sem);
-
 static const struct rxrpc_security *rxrpc_security_types[] = {
[RXRPC_SECURITY_NONE]   = _no_security,
 #ifdef CONFIG_RXKAD
-- 
2.13.1



[PATCH 1/2] net/core: use local_bh_disable() in netif_rx_ni()

2017-06-16 Thread Sebastian Andrzej Siewior
In 2004 [0] netif_rx_ni() gained a preempt_disable() section around
netif_rx() and its do_softirq() + testing for it. The do_softirq() part
is required because netif_rx() raises the softirq but does not invoke
it. The preempt_disable() is required to avoid running the BH in
parallel.
All this can be avoided be putting this into a local_bh_disable()ed
section. The local_bh_enable() part will invoke do_softirq() if
required.

[0] Make netif_rx_ni preempt-safe
http://oss.sgi.com/projects/netdev/archive/2004-10/msg02211.html

Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 net/core/dev.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b8d6dd9e8b5c..b1f8a89322bd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3928,11 +3928,9 @@ int netif_rx_ni(struct sk_buff *skb)
 
trace_netif_rx_ni_entry(skb);
 
-   preempt_disable();
+   local_bh_disable();
err = netif_rx_internal(skb);
-   if (local_softirq_pending())
-   do_softirq();
-   preempt_enable();
+   local_bh_enable();
 
return err;
 }
-- 
2.11.0



[PATCH 2/2] net/core: remove explicit do_softirq() from busy_poll_stop()

2017-06-16 Thread Sebastian Andrzej Siewior
Since commit 217f69743681 ("net: busy-poll: allow preemption in
sk_busy_loop()") there is an explicit do_softirq() invocation after
local_bh_enable() has been invoked.
I don't understand why we need this because local_bh_enable() will
invoke do_softirq() once the softirq counter reached zero and we have
softirq-related work pending.

Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 net/core/dev.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b1f8a89322bd..447d37fe89e6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5281,8 +5281,6 @@ static void busy_poll_stop(struct napi_struct *napi, void 
*have_poll_lock)
if (rc == BUSY_POLL_BUDGET)
__napi_schedule(napi);
local_bh_enable();
-   if (local_softirq_pending())
-   do_softirq();
 }
 
 void napi_busy_loop(unsigned int napi_id,
-- 
2.11.0



Re: [PATCH] net/core: remove explicit do_softirq() from busy_poll_stop()

2017-05-27 Thread Sebastian Andrzej Siewior
On 2017-05-22 14:26:44 [-0700], Eric Dumazet wrote:
> On Mon, May 22, 2017 at 12:26 PM, Sebastian Andrzej Siewior
> <bige...@linutronix.de> wrote:
> > Since commit 217f69743681 ("net: busy-poll: allow preemption in
> > sk_busy_loop()") there is an explicit do_softirq() invocation after
> > local_bh_enable() has been invoked.
> > I don't understand why we need this because local_bh_enable() will
> > invoke do_softirq() once the softirq counter reached zero and we have
> > softirq-related work pending.
> >
> > Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
> > ---
> >  net/core/dev.c | 2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index fca407b4a6ea..e84eb0ec5529 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -5199,8 +5199,6 @@ static void busy_poll_stop(struct napi_struct *napi, 
> > void *have_poll_lock)
> > if (rc == BUSY_POLL_BUDGET)
> > __napi_schedule(napi);
> > local_bh_enable();
> > -   if (local_softirq_pending())
> > -   do_softirq();
> >  }
> 
> preemption is disabled.
so? This patch:

diff --git a/init/main.c b/init/main.c
--- a/init/main.c
+++ b/init/main.c
@@ -1001,6 +1001,21 @@ static int __ref kernel_init(void *unused)
  "See Linux Documentation/admin-guide/init.rst for guidance.");
 }
 
+static void delay_thingy_func(struct work_struct *x)
+{
+   preempt_disable();
+   local_bh_disable();
+   pr_err("one %s\n", current->comm);
+   raise_softirq(TASKLET_SOFTIRQ);
+   pr_err("two %s\n", current->comm);
+   local_bh_enable();
+   pr_err("three %s\n", current->comm);
+   preempt_enable();
+   pr_err("four %s\n", current->comm);
+}
+
+static DECLARE_DELAYED_WORK(delay_thingy, delay_thingy_func);
+
 static noinline void __init kernel_init_freeable(void)
 {
/*
@@ -1038,6 +1053,7 @@ static noinline void __init kernel_init_freeable(void)
 
do_basic_setup();
 
+   schedule_delayed_work(_thingy, HZ * 5);
/* Open the /dev/console on the rootfs, this should never fail */
if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
pr_err("Warning: unable to open an initial console.\n");
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 4e09821f9d9e..b8dcb9dc5692 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -500,6 +500,7 @@ static __latent_entropy void tasklet_action(struct 
softirq_action *a)
 {
struct tasklet_struct *list;
 
+   pr_err("%s() %s\n", __func__, current->comm);
local_irq_disable();
list = __this_cpu_read(tasklet_vec.head);
__this_cpu_write(tasklet_vec.head, NULL);

gives me this output:
[7.132439] one kworker/4:2
[7.132806] two kworker/4:2
[7.133120] softirq: tasklet_action() kworker/4:2
[7.133623] three kworker/4:2
[7.133940] four kworker/4:2

which means after the last local_bh_enable() we already invoke the
raised softirq handler. It does not matter that we are in a
preempt_disable() region. 

> 
> Look at netif_rx_ni() for a similar construct.

correct, this is old and it is already patched in -RT. And I have no
clue why this is required by because netif_rx_internal() itself does
preempt_disable() / get_cpu() so this one already disables preemption.
Looking at it, I *think* you want local_bh_disable() instead of
preempt_disable() and do_softirq() removed, too.

Let me browse at the musuem a little bit… ahhh, here -> "[PATCH] Make
netif_rx_ni preempt-safe" [0]. There we got the preempt_disable() from
which protects against parallel invocations of do_softirq() in user
context. And we only do the check for pending softirqs (and invoke
do_softirq()) because netif_rx() sets the pending bits without raising
the softirq and it is done in a BH-enabled section. And in interrupt
context we check for those in the interrupt-exit path. So in this case
we have to do it manually.

[0] http://oss.sgi.com/projects/netdev/archive/2004-10/msg02211.html

> What exact problem do you have with existing code, that is worth
> adding this change ?

I need to workaround the non-existing do_softirq() function in RT and my
current workaround is the patch I posted. I don't see the need for the
two lines. And it seems that the other construct in netif_rx_ni() can be
simplified / removed, too.

> Thanks.

Sebastian


[PATCH] net/core: remove explicit do_softirq() from busy_poll_stop()

2017-05-22 Thread Sebastian Andrzej Siewior
Since commit 217f69743681 ("net: busy-poll: allow preemption in
sk_busy_loop()") there is an explicit do_softirq() invocation after
local_bh_enable() has been invoked.
I don't understand why we need this because local_bh_enable() will
invoke do_softirq() once the softirq counter reached zero and we have
softirq-related work pending.

Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 net/core/dev.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index fca407b4a6ea..e84eb0ec5529 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5199,8 +5199,6 @@ static void busy_poll_stop(struct napi_struct *napi, void 
*have_poll_lock)
if (rc == BUSY_POLL_BUDGET)
__napi_schedule(napi);
local_bh_enable();
-   if (local_softirq_pending())
-   do_softirq();
 }
 
 void napi_busy_loop(unsigned int napi_id,
-- 
2.11.0



[PATCH] net/iucv: use explicit clean up labels in iucv_init()

2016-11-24 Thread Sebastian Andrzej Siewior
Ursula suggested to use explicit labels for clean up in the error path
instead of one `out_free' label which handles multiple exits.
Since the previous patch got already applied, here is a follow up patch.

Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 net/iucv/iucv.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c
index f0d6afc5d4a9..8f7ef167c45a 100644
--- a/net/iucv/iucv.c
+++ b/net/iucv/iucv.c
@@ -2038,16 +2038,16 @@ static int __init iucv_init(void)
rc = cpuhp_setup_state(CPUHP_NET_IUCV_PREPARE, "net/iucv:prepare",
   iucv_cpu_prepare, iucv_cpu_dead);
if (rc)
-   goto out_free;
+   goto out_dev;
rc = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "net/iucv:online",
   iucv_cpu_online, iucv_cpu_down_prep);
if (rc < 0)
-   goto out_free;
+   goto out_prep;
iucv_online = rc;
 
rc = register_reboot_notifier(_reboot_notifier);
if (rc)
-   goto out_free;
+   goto out_remove_hp;
ASCEBC(iucv_error_no_listener, 16);
ASCEBC(iucv_error_no_memory, 16);
ASCEBC(iucv_error_pathid, 16);
@@ -2061,11 +2061,11 @@ static int __init iucv_init(void)
 
 out_reboot:
unregister_reboot_notifier(_reboot_notifier);
-out_free:
-   if (iucv_online)
-   cpuhp_remove_state(iucv_online);
+out_remove_hp:
+   cpuhp_remove_state(iucv_online);
+out_prep:
cpuhp_remove_state(CPUHP_NET_IUCV_PREPARE);
-
+out_dev:
root_device_unregister(iucv_root);
 out_int:
unregister_external_irq(EXT_IRQ_IUCV, iucv_external_interrupt);
-- 
2.10.2



[PATCH 12/20 v2] net/iucv: Convert to hotplug state machine

2016-11-24 Thread Sebastian Andrzej Siewior
Install the callbacks via the state machine and let the core invoke the
callbacks on the already online CPUs. The smp function calls in the
online/downprep callbacks are not required as the callback is guaranteed to
be invoked on the upcoming/outgoing cpu.

Cc: Ursula Braun <ubr...@linux.vnet.ibm.com>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: linux-s...@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
v1…v2: Use explicit labels for clean up in iucv_init() as suggested by
   Ursula.

 include/linux/cpuhotplug.h |1 
 net/iucv/iucv.c|  122 -
 2 files changed, 47 insertions(+), 76 deletions(-)

--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -63,6 +63,7 @@ enum cpuhp_state {
CPUHP_X86_THERM_PREPARE,
CPUHP_X86_CPUID_PREPARE,
CPUHP_X86_MSR_PREPARE,
+   CPUHP_NET_IUCV_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_NOTF_ERR_INJ_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
--- a/net/iucv/iucv.c
+++ b/net/iucv/iucv.c
@@ -639,7 +639,7 @@ static void iucv_disable(void)
put_online_cpus();
 }
 
-static void free_iucv_data(int cpu)
+static int iucv_cpu_dead(unsigned int cpu)
 {
kfree(iucv_param_irq[cpu]);
iucv_param_irq[cpu] = NULL;
@@ -647,9 +647,10 @@ static void free_iucv_data(int cpu)
iucv_param[cpu] = NULL;
kfree(iucv_irq_data[cpu]);
iucv_irq_data[cpu] = NULL;
+   return 0;
 }
 
-static int alloc_iucv_data(int cpu)
+static int iucv_cpu_prepare(unsigned int cpu)
 {
/* Note: GFP_DMA used to get memory below 2G */
iucv_irq_data[cpu] = kmalloc_node(sizeof(struct iucv_irq_data),
@@ -671,58 +672,38 @@ static int alloc_iucv_data(int cpu)
return 0;
 
 out_free:
-   free_iucv_data(cpu);
+   iucv_cpu_dead(cpu);
return -ENOMEM;
 }
 
-static int iucv_cpu_notify(struct notifier_block *self,
-unsigned long action, void *hcpu)
+static int iucv_cpu_online(unsigned int cpu)
+{
+   if (!iucv_path_table)
+   return 0;
+   iucv_declare_cpu(NULL);
+   return 0;
+}
+
+static int iucv_cpu_down_prep(unsigned int cpu)
 {
cpumask_t cpumask;
-   long cpu = (long) hcpu;
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   case CPU_UP_PREPARE_FROZEN:
-   if (alloc_iucv_data(cpu))
-   return notifier_from_errno(-ENOMEM);
-   break;
-   case CPU_UP_CANCELED:
-   case CPU_UP_CANCELED_FROZEN:
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   free_iucv_data(cpu);
-   break;
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   case CPU_DOWN_FAILED:
-   case CPU_DOWN_FAILED_FROZEN:
-   if (!iucv_path_table)
-   break;
-   smp_call_function_single(cpu, iucv_declare_cpu, NULL, 1);
-   break;
-   case CPU_DOWN_PREPARE:
-   case CPU_DOWN_PREPARE_FROZEN:
-   if (!iucv_path_table)
-   break;
-   cpumask_copy(, _buffer_cpumask);
-   cpumask_clear_cpu(cpu, );
-   if (cpumask_empty())
-   /* Can't offline last IUCV enabled cpu. */
-   return notifier_from_errno(-EINVAL);
-   smp_call_function_single(cpu, iucv_retrieve_cpu, NULL, 1);
-   if (cpumask_empty(_irq_cpumask))
-   smp_call_function_single(
-   cpumask_first(_buffer_cpumask),
-   iucv_allow_cpu, NULL, 1);
-   break;
-   }
-   return NOTIFY_OK;
-}
+   if (!iucv_path_table)
+   return 0;
 
-static struct notifier_block __refdata iucv_cpu_notifier = {
-   .notifier_call = iucv_cpu_notify,
-};
+   cpumask_copy(, _buffer_cpumask);
+   cpumask_clear_cpu(cpu, );
+   if (cpumask_empty())
+   /* Can't offline last IUCV enabled cpu. */
+   return -EINVAL;
+
+   iucv_retrieve_cpu(NULL);
+   if (!cpumask_empty(_irq_cpumask))
+   return 0;
+   smp_call_function_single(cpumask_first(_buffer_cpumask),
+iucv_allow_cpu, NULL, 1);
+   return 0;
+}
 
 /**
  * iucv_sever_pathid
@@ -2027,6 +2008,7 @@ struct iucv_interface iucv_if = {
 };
 EXPORT_SYMBOL(iucv_if);
 
+static enum cpuhp_state iucv_online;
 /**
  * iucv_init
  *
@@ -2035,7 +2017,6 @@ EXPORT_SYMBOL(iucv_if);
 static int __init iucv_init(void)
 {
int rc;
-   int cpu;
 
if (!MACHINE_IS_VM) {
rc = -EPROTONOSUPPORT;
@@ -2054,23 +2035,19 @@ static int __init iucv_init(void)
goto out_int;
}
 
-   cpu_notifier_register_begin();
-
-   for_each_online_cpu(cpu) {
-   if (alloc_iucv_data(cpu)) {
-   

Re: [PATCH 12/20] net/iucv: Convert to hotplug state machine

2016-11-24 Thread Sebastian Andrzej Siewior
On 2016-11-23 19:04:16 [+0100], Ursula Braun wrote:
> Sebastian,
Hallo Ursula,

> your patch looks good to me. I run successfully some small tests with it.
> I want to suggest a small change in iucv_init() to keep the uniform technique
> of undo labels below. Do you agree?

So what you ask for is:

diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c
index f0d6afc5d4a9..8f7ef167c45a 100644
--- a/net/iucv/iucv.c
+++ b/net/iucv/iucv.c
@@ -2038,16 +2038,16 @@ static int __init iucv_init(void)
rc = cpuhp_setup_state(CPUHP_NET_IUCV_PREPARE, "net/iucv:prepare",
   iucv_cpu_prepare, iucv_cpu_dead);
if (rc)
-   goto out_free;
+   goto out_dev;
rc = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "net/iucv:online",
   iucv_cpu_online, iucv_cpu_down_prep);
if (rc < 0)
-   goto out_free;
+   goto out_prep;
iucv_online = rc;
 
rc = register_reboot_notifier(_reboot_notifier);
if (rc)
-   goto out_free;
+   goto out_remove_hp;
ASCEBC(iucv_error_no_listener, 16);
ASCEBC(iucv_error_no_memory, 16);
ASCEBC(iucv_error_pathid, 16);
@@ -2061,11 +2061,11 @@ static int __init iucv_init(void)
 
 out_reboot:
unregister_reboot_notifier(_reboot_notifier);
-out_free:
-   if (iucv_online)
-   cpuhp_remove_state(iucv_online);
+out_remove_hp:
+   cpuhp_remove_state(iucv_online);
+out_prep:
cpuhp_remove_state(CPUHP_NET_IUCV_PREPARE);
-
+out_dev:
root_device_unregister(iucv_root);
 out_int:
unregister_external_irq(EXT_IRQ_IUCV, iucv_external_interrupt);

This is your change including the removal of the `if' check in the
`out_remove' label which got renamed to `out_remove_hp'.
Of course, I agree with this change and a proper patch will follow in a
few hours if nobody objects.

> Kind regards, Ursula

Sebastian


[PATCH 12/20] net/iucv: Convert to hotplug state machine

2016-11-17 Thread Sebastian Andrzej Siewior
Install the callbacks via the state machine and let the core invoke the
callbacks on the already online CPUs. The smp function calls in the
online/downprep callbacks are not required as the callback is guaranteed to
be invoked on the upcoming/outgoing cpu.

Cc: Ursula Braun <ubr...@linux.vnet.ibm.com>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: linux-s...@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 include/linux/cpuhotplug.h |   1 +
 net/iucv/iucv.c| 118 +
 2 files changed, 45 insertions(+), 74 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index fd5598b8353a..69abf2c09f6c 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -63,6 +63,7 @@ enum cpuhp_state {
CPUHP_X86_THERM_PREPARE,
CPUHP_X86_CPUID_PREPARE,
CPUHP_X86_MSR_PREPARE,
+   CPUHP_NET_IUCV_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_NOTF_ERR_INJ_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c
index 88a2a3ba4212..f0d6afc5d4a9 100644
--- a/net/iucv/iucv.c
+++ b/net/iucv/iucv.c
@@ -639,7 +639,7 @@ static void iucv_disable(void)
put_online_cpus();
 }
 
-static void free_iucv_data(int cpu)
+static int iucv_cpu_dead(unsigned int cpu)
 {
kfree(iucv_param_irq[cpu]);
iucv_param_irq[cpu] = NULL;
@@ -647,9 +647,10 @@ static void free_iucv_data(int cpu)
iucv_param[cpu] = NULL;
kfree(iucv_irq_data[cpu]);
iucv_irq_data[cpu] = NULL;
+   return 0;
 }
 
-static int alloc_iucv_data(int cpu)
+static int iucv_cpu_prepare(unsigned int cpu)
 {
/* Note: GFP_DMA used to get memory below 2G */
iucv_irq_data[cpu] = kmalloc_node(sizeof(struct iucv_irq_data),
@@ -671,58 +672,38 @@ static int alloc_iucv_data(int cpu)
return 0;
 
 out_free:
-   free_iucv_data(cpu);
+   iucv_cpu_dead(cpu);
return -ENOMEM;
 }
 
-static int iucv_cpu_notify(struct notifier_block *self,
-unsigned long action, void *hcpu)
+static int iucv_cpu_online(unsigned int cpu)
 {
-   cpumask_t cpumask;
-   long cpu = (long) hcpu;
-
-   switch (action) {
-   case CPU_UP_PREPARE:
-   case CPU_UP_PREPARE_FROZEN:
-   if (alloc_iucv_data(cpu))
-   return notifier_from_errno(-ENOMEM);
-   break;
-   case CPU_UP_CANCELED:
-   case CPU_UP_CANCELED_FROZEN:
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   free_iucv_data(cpu);
-   break;
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   case CPU_DOWN_FAILED:
-   case CPU_DOWN_FAILED_FROZEN:
-   if (!iucv_path_table)
-   break;
-   smp_call_function_single(cpu, iucv_declare_cpu, NULL, 1);
-   break;
-   case CPU_DOWN_PREPARE:
-   case CPU_DOWN_PREPARE_FROZEN:
-   if (!iucv_path_table)
-   break;
-   cpumask_copy(, _buffer_cpumask);
-   cpumask_clear_cpu(cpu, );
-   if (cpumask_empty())
-   /* Can't offline last IUCV enabled cpu. */
-   return notifier_from_errno(-EINVAL);
-   smp_call_function_single(cpu, iucv_retrieve_cpu, NULL, 1);
-   if (cpumask_empty(_irq_cpumask))
-   smp_call_function_single(
-   cpumask_first(_buffer_cpumask),
-   iucv_allow_cpu, NULL, 1);
-   break;
-   }
-   return NOTIFY_OK;
+   if (!iucv_path_table)
+   return 0;
+   iucv_declare_cpu(NULL);
+   return 0;
 }
 
-static struct notifier_block __refdata iucv_cpu_notifier = {
-   .notifier_call = iucv_cpu_notify,
-};
+static int iucv_cpu_down_prep(unsigned int cpu)
+{
+   cpumask_t cpumask;
+
+   if (!iucv_path_table)
+   return 0;
+
+   cpumask_copy(, _buffer_cpumask);
+   cpumask_clear_cpu(cpu, );
+   if (cpumask_empty())
+   /* Can't offline last IUCV enabled cpu. */
+   return -EINVAL;
+
+   iucv_retrieve_cpu(NULL);
+   if (!cpumask_empty(_irq_cpumask))
+   return 0;
+   smp_call_function_single(cpumask_first(_buffer_cpumask),
+iucv_allow_cpu, NULL, 1);
+   return 0;
+}
 
 /**
  * iucv_sever_pathid
@@ -2027,6 +2008,7 @@ struct iucv_interface iucv_if = {
 };
 EXPORT_SYMBOL(iucv_if);
 
+static enum cpuhp_state iucv_online;
 /**
  * iucv_init
  *
@@ -2035,7 +2017,6 @@ EXPORT_SYMBOL(iucv_if);
 static int __init iucv_init(void)
 {
int rc;
-   int cpu;
 
if (!MACHINE_IS_VM) {
rc = -EPROTONOSUPPORT;
@@ -2054,23 +2035,19 @@ static int __init iucv_init(void)
goto out_int;
}
 
-  

[PATCH 08/25] net/dev: Convert to hotplug state machine

2016-11-03 Thread Sebastian Andrzej Siewior
Install the callbacks via the state machine.

Cc: "David S. Miller" <da...@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
---
 include/linux/cpuhotplug.h |  1 +
 net/core/dev.c | 16 ++--
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 31c58f6ec3c6..394eb7ed53be 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -36,6 +36,7 @@ enum cpuhp_state {
CPUHP_PERCPU_CNT_DEAD,
CPUHP_RADIX_DEAD,
CPUHP_PAGE_ALLOC_DEAD,
+   CPUHP_NET_DEV_DEAD,
CPUHP_WORKQUEUE_PREP,
CPUHP_POWER_NUMA_PREPARE,
CPUHP_HRTIMERS_PREPARE,
diff --git a/net/core/dev.c b/net/core/dev.c
index 4bc19a164ba5..71693729bdd5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7947,18 +7947,13 @@ int dev_change_net_namespace(struct net_device *dev, 
struct net *net, const char
 }
 EXPORT_SYMBOL_GPL(dev_change_net_namespace);
 
-static int dev_cpu_callback(struct notifier_block *nfb,
-   unsigned long action,
-   void *ocpu)
+static int dev_cpu_dead(unsigned int oldcpu)
 {
struct sk_buff **list_skb;
struct sk_buff *skb;
-   unsigned int cpu, oldcpu = (unsigned long)ocpu;
+   unsigned int cpu;
struct softnet_data *sd, *oldsd;
 
-   if (action != CPU_DEAD && action != CPU_DEAD_FROZEN)
-   return NOTIFY_OK;
-
local_irq_disable();
cpu = smp_processor_id();
sd = _cpu(softnet_data, cpu);
@@ -8008,10 +8003,9 @@ static int dev_cpu_callback(struct notifier_block *nfb,
input_queue_head_incr(oldsd);
}
 
-   return NOTIFY_OK;
+   return 0;
 }
 
-
 /**
  * netdev_increment_features - increment feature set by one
  * @all: current feature set
@@ -8345,7 +8339,9 @@ static int __init net_dev_init(void)
open_softirq(NET_TX_SOFTIRQ, net_tx_action);
open_softirq(NET_RX_SOFTIRQ, net_rx_action);
 
-   hotcpu_notifier(dev_cpu_callback, 0);
+   rc = cpuhp_setup_state_nocalls(CPUHP_NET_DEV_DEAD, "net/dev:dead",
+  NULL, dev_cpu_dead);
+   WARN_ON(rc < 0);
dst_subsys_init();
rc = 0;
 out:
-- 
2.10.2



[PATCH 09/25] net/flowcache: Convert to hotplug state machine

2016-11-03 Thread Sebastian Andrzej Siewior
Install the callbacks via the state machine. Use multi state support to avoid
custom list handling for the multiple instances.

Cc: "David S. Miller" <da...@davemloft.net>
Cc: Steffen Klassert <steffen.klass...@secunet.com>
Cc: Herbert Xu <herb...@gondor.apana.org.au>
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
---
 include/linux/cpuhotplug.h |  1 +
 include/net/flow.h |  1 +
 include/net/flowcache.h|  2 +-
 net/core/flow.c| 60 --
 net/xfrm/xfrm_policy.c |  1 +
 5 files changed, 30 insertions(+), 35 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 394eb7ed53be..86b940f19df8 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -56,6 +56,7 @@ enum cpuhp_state {
CPUHP_ARM_SHMOBILE_SCU_PREPARE,
CPUHP_SH_SH3X_PREPARE,
CPUHP_BLK_MQ_PREPARE,
+   CPUHP_NET_FLOW_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_NOTF_ERR_INJ_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
diff --git a/include/net/flow.h b/include/net/flow.h
index 035aa7716967..2e386bd6ee63 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -239,6 +239,7 @@ struct flow_cache_object *flow_cache_lookup(struct net *net,
void *ctx);
 int flow_cache_init(struct net *net);
 void flow_cache_fini(struct net *net);
+void flow_cache_hp_init(void);
 
 void flow_cache_flush(struct net *net);
 void flow_cache_flush_deferred(struct net *net);
diff --git a/include/net/flowcache.h b/include/net/flowcache.h
index c8f665ec6e0d..9caf3bfc8d2d 100644
--- a/include/net/flowcache.h
+++ b/include/net/flowcache.h
@@ -17,7 +17,7 @@ struct flow_cache_percpu {
 struct flow_cache {
u32 hash_shift;
struct flow_cache_percpu __percpu *percpu;
-   struct notifier_block   hotcpu_notifier;
+   struct hlist_node   node;
int low_watermark;
int high_watermark;
struct timer_list   rnd_timer;
diff --git a/net/core/flow.c b/net/core/flow.c
index 3937b1b68d5b..841fd7f87b30 100644
--- a/net/core/flow.c
+++ b/net/core/flow.c
@@ -419,28 +419,20 @@ static int flow_cache_cpu_prepare(struct flow_cache *fc, 
int cpu)
return 0;
 }
 
-static int flow_cache_cpu(struct notifier_block *nfb,
- unsigned long action,
- void *hcpu)
+static int flow_cache_cpu_up_prep(unsigned int cpu, struct hlist_node *node)
 {
-   struct flow_cache *fc = container_of(nfb, struct flow_cache,
-   hotcpu_notifier);
-   int res, cpu = (unsigned long) hcpu;
+   struct flow_cache *fc = hlist_entry_safe(node, struct flow_cache, node);
+
+   return flow_cache_cpu_prepare(fc, cpu);
+}
+
+static int flow_cache_cpu_dead(unsigned int cpu, struct hlist_node *node)
+{
+   struct flow_cache *fc = hlist_entry_safe(node, struct flow_cache, node);
struct flow_cache_percpu *fcp = per_cpu_ptr(fc->percpu, cpu);
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   case CPU_UP_PREPARE_FROZEN:
-   res = flow_cache_cpu_prepare(fc, cpu);
-   if (res)
-   return notifier_from_errno(res);
-   break;
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   __flow_cache_shrink(fc, fcp, 0);
-   break;
-   }
-   return NOTIFY_OK;
+   __flow_cache_shrink(fc, fcp, 0);
+   return 0;
 }
 
 int flow_cache_init(struct net *net)
@@ -467,18 +459,8 @@ int flow_cache_init(struct net *net)
if (!fc->percpu)
return -ENOMEM;
 
-   cpu_notifier_register_begin();
-
-   for_each_online_cpu(i) {
-   if (flow_cache_cpu_prepare(fc, i))
-   goto err;
-   }
-   fc->hotcpu_notifier = (struct notifier_block){
-   .notifier_call = flow_cache_cpu,
-   };
-   __register_hotcpu_notifier(>hotcpu_notifier);
-
-   cpu_notifier_register_done();
+   if (cpuhp_state_add_instance(CPUHP_NET_FLOW_PREPARE, >node))
+   goto err;
 
setup_timer(>rnd_timer, flow_cache_new_hashrnd,
(unsigned long) fc);
@@ -494,8 +476,6 @@ int flow_cache_init(struct net *net)
fcp->hash_table = NULL;
}
 
-   cpu_notifier_register_done();
-
free_percpu(fc->percpu);
fc->percpu = NULL;
 
@@ -509,7 +489,8 @@ void flow_cache_fini(struct net *net)
struct flow_cache *fc = >xfrm.flow_cache_global;
 
del_timer_sync(>rnd_timer);
-   unregister_hotcpu_notifier(>hotcpu_notifier);
+
+   cpuhp_state_remove_instance_nocalls(CPUHP_NET_FLOW_

[PATCH] rxrpc: remove unused static variables

2016-10-21 Thread Sebastian Andrzej Siewior
The rxrpc_security_methods and rxrpc_security_sem user has been removed
in 648af7fca159 ("rxrpc: Absorb the rxkad security module"). This was
noticed by kbuild test robot for the -RT tree but is also true for !RT.

Reported-by: kbuild test robot <fengguang...@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 net/rxrpc/security.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/rxrpc/security.c b/net/rxrpc/security.c
index 814d285ff802..d4d088e9be85 100644
--- a/net/rxrpc/security.c
+++ b/net/rxrpc/security.c
@@ -19,9 +19,6 @@
 #include 
 #include "ar-internal.h"
 
-static LIST_HEAD(rxrpc_security_methods);
-static DECLARE_RWSEM(rxrpc_security_sem);
-
 static const struct rxrpc_security *rxrpc_security_types[] = {
[RXRPC_SECURITY_NONE]   = _no_security,
 #ifdef CONFIG_RXKAD
-- 
2.9.3



Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD

2016-09-15 Thread Sebastian Andrzej Siewior
On 2016-09-09 15:46:44 [+0300], Grygorii Strashko wrote:
> 
> It looks like scheduler playing ping-pong between CPUs with threaded irqs 
> irq/354-355.
> And seems this might be the case - if I pin both threaded IRQ handlers to CPU0
> I can see better latency and netperf improvement
> cyclictest -m -Sp98 -q  -D4m
> T: 0 ( 1318) P:98 I:1000 C: 24 Min:  9 Act:   14 Avg:   15 Max:  
> 42
> T: 1 ( 1319) P:98 I:1500 C: 159909 Min:  9 Act:   14 Avg:   16 Max:  
> 39
> 
> if I arrange hwirqs  and pin pin both threaded IRQ handlers on CPU1 
> I can observe more less similar results as with this patch. 

so no patch then.

> with this change i do not see  "NOHZ: local_softirq_pending 80" any more 
> Tested-by: Grygorii Strashko  

okay. So I need to think what I do about this. Either this or trying to
run the "higher" softirq first but this could break things.
Thanks for the confirmation.

> > - having the hard-IRQ and IRQ-thread on the same CPU might help, too. It
> >   is not strictly required but saves a few cycles if you don't have to
> >   perform cross CPU wake ups and migrate task forth and back. The latter
> >   happens at prio 99.
> 
> I've experimented with this and it improves netperf and I also followed 
> instructions from [1].
> But seems messed ti pin threaded irqs to cpu.
> [1] 
> https://www.osadl.org/Real-time-Ethernet-UDP-worst-case-roun.qa-farm-rt-ethernet-udp-monitor.0.html

There is irq_thread() => irq_thread_check_affinity(). It might not work
as expected on ARM but it makes sense to follow the affinity mask HW irq
for the thread.

> > - I am not sure NAPI works as expected. I would assume so. There is IRQ
> >   354 and 355 which fire after each other. One would be enough I guess.
> >   And they seem to be short living / fire often. If NAPI works then it
> >   should put an end to it and push it to the softirq thread.
> >   If you have IRQ-pacing support I suggest to use something like 10ms or
> >   so. That means your ping response will go from <= 1ms to 10ms in the
> >   worst case but since you process more packets at a time your
> >   throughput should increase.
> >   If I count this correct, it too you alsmost 4ms from "raise SCHED" to
> >   "try process SCHED" and most of the time was spent in 35[45] hard irq,
> >   raise NET_RX or cross wakeup the IRQ thread.
> 
> The question I have to dial with is why switching to RT cause so significant
> netperf drop (without additional tunning) comparing to vanilla - ~120% for 
> 256K and ~200% for 128K windows?

You have a sched / thread ping/pong. That is one thing. !RT with
threaded irqs should show similar problems. The higher latency is caused
by the migration thread.

> It's of course expected to see netperf drop, but I assume not so significant 
> :(
> And I can't find any reports or statistic related to this. Does the same 
> happen on x86?

It should. Maybe at a lower level if it handles migration more
effective. There is this watchdog thread (for instance) which tries to
detect lockups and runs at P99. It causes "worse" cyclictest numbers on
x86 and on ARM but on ARM this is more visible than on x86.

Sebastian


Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD

2016-09-08 Thread Sebastian Andrzej Siewior
On 2016-08-19 17:29:16 [+0300], Grygorii Strashko wrote:
> I've collected trace before first occurrence of  "NOHZ: local_softirq_pending 
> 80"
> 
>  irq/354-4848400-85[000]90.639460: irq_handler_entry:irq=19 
> name=arch_timer
>iperf-1284  [001]90.639474: softirq_raise:vec=1 
> [action=TIMER]
>iperf-1284  [001]90.639486: irq_handler_exit: irq=19 
> ret=handled
>  irq/354-4848400-85[000]90.639488: softirq_raise:vec=7 
> [action=SCHED]
here we raise the SCHED softirq

>iperf-1284  [001]90.639490: sched_waking: 
> comm=ksoftirqd/1 pid=21 prio=120 target_cpu=001
>  irq/354-4848400-85[000]90.639492: softirq_raise:vec=1 
> [action=TIMER]
and a timer
>iperf-1284  [001]90.639499: sched_wakeup: 
> ksoftirqd/1:21 [120] success=1 CPU:001
>iperf-1284  [001]90.639504: sched_waking: 
> comm=ktimersoftd/1 pid=20 prio=98 target_cpu=001
>  irq/354-4848400-85[000]90.639505: irq_handler_exit: irq=19 
> ret=handled

okay. so softirq here I come

>iperf-1284  [001]90.639512: sched_wakeup: 
> ktimersoftd/1:20 [98] success=1 CPU:001
>iperf-1284  [001]90.639526: sched_stat_runtime:   comm=iperf 
> pid=1284 runtime=50752 [ns] vruntime=2105322850 [ns]
>iperf-1284  [001]90.639537: sched_switch: iperf:1284 
> [120] R ==> irq/355-4848400:86 [49]
>  irq/355-4848400-86[001]90.639545: softirq_raise:vec=3 
> [action=NET_RX]
>  irq/355-4848400-86[001]90.639556: softirq_entry:vec=3 
> [action=NET_RX]
>  irq/355-4848400-86[001]90.639589: softirq_exit: vec=3 
> [action=NET_RX]
>  irq/355-4848400-86[001]90.639614: sched_switch: 
> irq/355-4848400:86 [49] S ==> ktimersoftd/1:20 [98]
>ktimersoftd/1-20[001]90.639625: softirq_entry:vec=1 
> [action=TIMER]
>ktimersoftd/1-20[001]90.639637: sched_waking: 
> comm=rcu_preempt pid=8 prio=98 target_cpu=001
>ktimersoftd/1-20[001]90.639646: sched_wakeup: 
> rcu_preempt:8 [98] success=1 CPU:001
>ktimersoftd/1-20[001]90.639663: softirq_exit: vec=1 
> [action=TIMER]
>ktimersoftd/1-20[001]90.639679: sched_switch: 
> ktimersoftd/1:20 [98] S ==> rcu_preempt:8 [98]
>  rcu_preempt-8 [001]90.639722: sched_switch: 
> rcu_preempt:8 [98] S ==> ksoftirqd/1:21 [120]
>  ksoftirqd/1-21[001]90.639740: sched_stat_runtime:   
> comm=ksoftirqd/1 pid=21 runtime=25539 [ns] vruntime=29960463828 [ns]
>  ksoftirqd/1-21[001]90.639750: sched_switch: 
> ksoftirqd/1:21 [120] S ==> iperf:1284 [120]
>  irq/354-4848400-85[000]90.639878: irq_handler_entry:irq=355 
> name=48484000.ethernet
wait, no

[ lots NET_RX wake ups ]
>  irq/354-4848400-85[000]90.640560: softirq_exit: vec=3 
> [action=NET_RX]
ach. NET_RX

>  irq/354-4848400-85[000]90.640568: softirq_raise:vec=3 
> [action=NET_RX]
>  irq/354-4848400-85[000]90.640579: softirq_entry:vec=3 
> [action=NET_RX]
>  irq/354-4848400-85[000]90.640606: irq_handler_entry:irq=355 
> name=48484000.ethernet
>  irq/354-4848400-85[000]90.640608: irq_handler_exit: irq=355 
> ret=handled

[ more of those ]

>  irq/354-4848400-85[000]90.642393: softirq_exit: vec=3 
> [action=NET_RX]
>  irq/354-4848400-85[000]90.642419: sched_switch: 
> irq/354-4848400:85 [49] S ==> rcuc/0:11 [98]

We don't serve TIMER & SCHED because those two are pushed to the
ksoftirq thread(s). So we keep mostly doing NET_RX and now we switch to
the next best thing which is RCU.

>   rcuc/0-11[000]90.642430: irq_handler_entry:irq=354 
> name=48484000.ethernet
but not for long.

>   rcuc/0-11[000]90.642432: irq_handler_exit: irq=354 
> ret=handled
>   rcuc/0-11[000]90.642435: sched_waking: 
> comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
>   rcuc/0-11[000]90.642442: sched_migrate_task:   
> comm=irq/354-4848400 pid=85 prio=49 orig_cpu=0 dest_cpu=1
>   rcuc/0-11[000]90.642453: sched_wakeup: 
> irq/354-4848400:85 [49] success=1 CPU:001
>iperf-1284  [001]90.642462: sched_stat_runtime:   comm=iperf 
> pid=1284 runtime=113053 [ns] vruntime=2106997666 [ns]
>   rcuc/0-11[000]90.642464: irq_handler_entry:irq=355 
> name=48484000.ethernet
>   rcuc/0-11[000]90.642466: irq_handler_exit: irq=355 
> ret=handled
>   rcuc/0-11[000]90.642469: sched_waking: 
> comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
>iperf-1284  [001]90.642473: sched_switch: iperf:1284 
> [120] R ==> irq/354-4848400:85 [49]
>  irq/354-4848400-85[001]90.642481: softirq_raise:vec=3 
> 

Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD

2016-09-08 Thread Sebastian Andrzej Siewior
On 2016-08-12 18:58:21 [+0300], Grygorii Strashko wrote:
> Hi Sebastian,
Hi Grygorii,

> Thankds for comment. You're right:
> irq_thread()->irq_forced_thread_fn()->local_bh_enable()
> 
> but wouldn't here two wake_up_process() calls any way,
> plus preempt_check_resched_rt() in napi_schedule().

Usually you prefer BH handling in the IRQ-thread because it runs at
higher priority and is not interrupted by a SCHED_OTHER process. And you
can assign it a higher priority if it should be preferred over an other
interrupt. However, if the processing of the interrupt is taking too
much time (like that ping flood, a lot of network traffic) then we push
it to the softirq thread. If you do this now unconditionally in the
SCHED_OTHER softirq thread then you take away all the `good' things we
had (like processing important packets at higher priority as long as
nobody floods us). Plus you share this thread with everything else that
runs in there.

> >> And, as result, get benefits from the following improvements (tested
> >> on am57xx-evm):
> >>
> >> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
> >>seen any more. Now these warnings can be seen once iperf is started.
> >># iperf -c $IPERFHOST -w 128K  -d -t 60
> > 
> > Do you also see "sched: RT throttling activated"? Because I don't see
> > otherwise why this should pop up.
> 
> I've reverted my patch an did requested experiments (some additional info 
> below).
> 
> I do not see "sched: RT throttling activated" :(

That is okay. However if aim for throughput you might want to switch
away from NO_HZ (and deactivate the software watchdog wich runs at
prio 99 if enabled).

> root@am57xx-evm:~# ./net_perf.sh & cyclictest -m -Sp98 -q  -D4m
> [1] 1301
> # /dev/cpu_dma_latency set to 0us
> Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #92 SMP PREEMPT RT Fri Aug 
> 12 14:03:59 EEST 2016 armv7l GNU/Linux
> 
…
> [1]+  Done./net_perf.sh

I can't parse this. But that local_softirq_pending() warning might
contribute to lower numbers.

> === before, no net load:
> cyclictest -m -Sp98 -q  -D4m -i250 -d0
> # /dev/cpu_dma_latency set to 0us
> T: 0 ( 1288) P:98 I:250 C: 96 Min:  8 Act:9 Avg:8 Max:  33
> T: 1 ( 1289) P:98 I:250 C: 959929 Min:  7 Act:   11 Avg:9 Max:  26

> === after, no net load:
> cyclictest -m -Sp98 -q  -D4m -i250 -d0
> T: 0 ( 1301) P:98 I:250 C: 96 Min:  7 Act:9 Avg:8 Max:  22
> T: 1 ( 1302) P:98 I:250 C: 959914 Min:  7 Act:   11 Avg:8 Max:  28

I think those two should be equal more or less since the change should
have no impact on "no net load" or do I miss something?

> === before, with net load:
> cyclictest -m -Sp98 -q -D4m -i250 -d0
> T: 0 ( 1400) P:98 I:250 C: 96 Min:  8 Act:   25 Avg:   18 Max:  83
> T: 1 ( 1401) P:98 I:250 C: 959801 Min:  7 Act:   27 Avg:   17 Max:  48
> 
> 
> === after, with net load:
> cyclictest -m -Sp98 -q  -D4m -i250 -d0
> T: 0 ( 1358) P:98 I:250 C: 96 Min:  8 Act:   11 Avg:   14 Max:  42
> T: 1 ( 1359) P:98 I:250 C: 959743 Min:  7 Act:   18 Avg:   15 Max:  36

So the max value dropped by ~50% with your patch. Interesting. What I
remember from testing is that once you had, say, one hour of hackbench
running then after that, the extra network traffic didn't contribute
much (if at all) to the max value.
That said it is hard to believe that one extra context switch
contributes about 40us to the max value on CPU0.

> > What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/
> > in net/core/dev.c and chrt the priority of you network interrupt
> > handlers to SCHED_OTHER priority?
> 
> = without this patch + __raise_softirq_irqoff + netIRQs->SCHED_OTHER
> 
> with net load:
> cyclictest -m -Sp98 -q -D4m -i250 -d0
> T: 0 ( 1325) P:98 I:1000 C: 24 Min:  8 Act:   22 Avg:   17 Max:  
> 51
> T: 1 ( 1326) P:98 I:1500 C: 159981 Min:  8 Act:   15 Avg:   15 Max:  
> 39
> 
> cyclictest -m -Sp98 -q  -D4m -i250 -d0
> T: 0 ( 1307) P:98 I:250 C: 96 Min:  7 Act:   13 Avg:   16 Max:  50
> T: 1 ( 1308) P:98 I:250 C: 959819 Min:  8 Act:   12 Avg:   14 Max:  37
> 
> and net parformance is better:
> root@am57xx-evm:~# ps -A | grep 4848
>82 ?00:00:00 irq/354-4848400
>83 ?00:00:00 irq/355-4848400
> root@am57xx-evm:~# chrt -o -p 0 82
> root@am57xx-evm:~# chrt -o -p 0 83
> ./net_perf.sh & cyclictest -m -Sp98 -q  -D4m -i250 -d0
> [1] 1298
> # /dev/cpu_dma_latency set to 0us
> Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #95 SMP PREEMPT RT Fri Aug 
> 12 16:20:42 EEST 2016 armv7l GNU/Linux

So that looks nice, doesn't it?

Sebastian


Re: [RT PATCH 2/2] net: add a lock around icmp_sk()

2016-09-01 Thread Sebastian Andrzej Siewior
On 2016-08-31 09:37:43 [-0700], Eric Dumazet wrote:
> > --- a/net/ipv4/icmp.c
> > +++ b/net/ipv4/icmp.c
> > @@ -216,12 +219,14 @@ static inline struct sock *icmp_xmit_lock(struct net 
> > *net)
> >  
> > local_bh_disable();
> >  
> > +   local_lock(icmp_sk_lock);
> 
> Deadlock alert ?
> Please read the comment few lines after, explaining why we have to use
> spin_trylock().
> Or maybe I should double check what is local_lock() in RT

On !RT local_lock() is preempt_disable().
On RT local_lock() is a per-CPU sleeping spinlock which can be taken
recursively by the owner.

On a "normal" ping reply we have
- icmp_reply()
  - icmp_xmit_lock()
- local_lock()
  - icmp_push_reply()
 icmp_send()
- local_lock()

so that works. I didn't manage to resolve a possible dst_link_failure()
path but I would assume it is like the above (where the owner takes the
same local_lock() multiple times).

Sebastian


Re: [RT PATCH 1/2] net: add back the missing serialization in ip_send_unicast_reply()

2016-08-31 Thread Sebastian Andrzej Siewior
On 2016-08-31 12:15:53 [-0400], Steven Rostedt wrote:
> > @@ -689,10 +691,13 @@ static void tcp_v4_send_reset(const struct sock *sk, 
> > struct sk_buff *skb)
> >  offsetof(struct inet_timewait_sock, tw_bound_dev_if));
> >  
> > arg.tos = ip_hdr(skb)->tos;
> > +
> > +   local_lock(tcp_sk_lock);
> 
> Interesting that I noticed in mainline, they have:
> 
>   local_bh_disable();
> 
> here.
> 
> I'm surprised we don't have a local_lock_bh() or something to that
> effect.

Turning local_bh_disable() into local_lock_bh(). One side effect would
be that the network driver will be flushed out / waited for completion
during socket write (due to the spin_lock_bh()). Not sure how much fun
all this will bring.
We could try this…

> -- Steve

Sebastian


[RT PATCH 1/2] net: add back the missing serialization in ip_send_unicast_reply()

2016-08-31 Thread Sebastian Andrzej Siewior
Some time ago Sami Pietikäinen reported a crash on -RT in
ip_send_unicast_reply() which was later fixed by Nicholas Mc Guire
(v3.12.8-rt11). Later (v3.18.8) the code was reworked and I dropped the
patch. As it turns out it was mistake.
I have reports that the same crash is possible with a similar backtrace.
It seems that vanilla protects access to this_cpu_ptr() via
local_bh_disable(). This does not work on -RT since we can have
NET_RX and NET_TX running in parallel on the same CPU.
This is brings back the old locks.

|Unable to handle kernel NULL pointer dereference at virtual address 0010
|PC is at __ip_make_skb+0x198/0x3e8
|[] (__ip_make_skb) from [] 
(ip_push_pending_frames+0x20/0x40)
|[] (ip_push_pending_frames) from [] 
(ip_send_unicast_reply+0x210/0x22c)
|[] (ip_send_unicast_reply) from [] 
(tcp_v4_send_reset+0x190/0x1c0)
|[] (tcp_v4_send_reset) from [] (tcp_v4_do_rcv+0x22c/0x288)
|[] (tcp_v4_do_rcv) from [] (release_sock+0xb4/0x150)
|[] (release_sock) from [] (tcp_close+0x240/0x454)
|[] (tcp_close) from [] (inet_release+0x74/0x7c)
|[] (inet_release) from [] (sock_release+0x30/0xb0)
|[] (sock_release) from [] (sock_close+0x1c/0x24)
|[] (sock_close) from [] (__fput+0xe8/0x20c)
|[] (__fput) from [] (fput+0x18/0x1c)
|[] (fput) from [] (task_work_run+0xa4/0xb8)
|[] (task_work_run) from [] (do_work_pending+0xd0/0xe4)
|[] (do_work_pending) from [] (work_pending+0xc/0x20)
|Code: e3530001 8a01 e3a00040 ea11 (e5973010)

Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 net/ipv4/tcp_ipv4.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ad450509029b..c5521d1f1263 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -62,6 +62,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -565,6 +566,7 @@ void tcp_v4_send_check(struct sock *sk, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(tcp_v4_send_check);
 
+static DEFINE_LOCAL_IRQ_LOCK(tcp_sk_lock);
 /*
  * This routine will send an RST to the other tcp.
  *
@@ -689,10 +691,13 @@ static void tcp_v4_send_reset(const struct sock *sk, 
struct sk_buff *skb)
 offsetof(struct inet_timewait_sock, tw_bound_dev_if));
 
arg.tos = ip_hdr(skb)->tos;
+
+   local_lock(tcp_sk_lock);
ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
  skb, _SKB_CB(skb)->header.h4.opt,
  ip_hdr(skb)->saddr, ip_hdr(skb)->daddr,
  , arg.iov[0].iov_len);
+   local_unlock(tcp_sk_lock);
 
TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS);
TCP_INC_STATS_BH(net, TCP_MIB_OUTRSTS);
@@ -774,10 +779,12 @@ static void tcp_v4_send_ack(struct net *net,
if (oif)
arg.bound_dev_if = oif;
arg.tos = tos;
+   local_lock(tcp_sk_lock);
ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
  skb, _SKB_CB(skb)->header.h4.opt,
  ip_hdr(skb)->saddr, ip_hdr(skb)->daddr,
  , arg.iov[0].iov_len);
+   local_unlock(tcp_sk_lock);
 
TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS);
 }
-- 
2.9.3



[RT PATCH 2/2] net: add a lock around icmp_sk()

2016-08-31 Thread Sebastian Andrzej Siewior
It looks like the this_cpu_ptr() access in icmp_sk() is protected with
local_bh_disable(). To avoid missing serialization in -RT I am adding
here a local lock. No crash has been observed, this is just precaution.

Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 net/ipv4/icmp.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index c1f1d5030d37..63731fd6af3e 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -78,6 +78,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -205,6 +206,8 @@ static const struct icmp_control 
icmp_pointers[NR_ICMP_TYPES+1];
  *
  * On SMP we have one ICMP socket per-cpu.
  */
+static DEFINE_LOCAL_IRQ_LOCK(icmp_sk_lock);
+
 static struct sock *icmp_sk(struct net *net)
 {
return *this_cpu_ptr(net->ipv4.icmp_sk);
@@ -216,12 +219,14 @@ static inline struct sock *icmp_xmit_lock(struct net *net)
 
local_bh_disable();
 
+   local_lock(icmp_sk_lock);
sk = icmp_sk(net);
 
if (unlikely(!spin_trylock(>sk_lock.slock))) {
/* This can happen if the output path signals a
 * dst_link_failure() for an outgoing ICMP packet.
 */
+   local_unlock(icmp_sk_lock);
local_bh_enable();
return NULL;
}
@@ -231,6 +236,7 @@ static inline struct sock *icmp_xmit_lock(struct net *net)
 static inline void icmp_xmit_unlock(struct sock *sk)
 {
spin_unlock_bh(>sk_lock.slock);
+   local_unlock(icmp_sk_lock);
 }
 
 int sysctl_icmp_msgs_per_sec __read_mostly = 1000;
@@ -359,6 +365,7 @@ static void icmp_push_reply(struct icmp_bxm *icmp_param,
struct sock *sk;
struct sk_buff *skb;
 
+   local_lock(icmp_sk_lock);
sk = icmp_sk(dev_net((*rt)->dst.dev));
if (ip_append_data(sk, fl4, icmp_glue_bits, icmp_param,
   icmp_param->data_len+icmp_param->head_len,
@@ -381,6 +388,7 @@ static void icmp_push_reply(struct icmp_bxm *icmp_param,
skb->ip_summed = CHECKSUM_NONE;
ip_push_pending_frames(sk, fl4);
}
+   local_unlock(icmp_sk_lock);
 }
 
 /*
-- 
2.9.3



[PATCH 08/16] net: mvneta: Convert to hotplug state machine

2016-08-18 Thread Sebastian Andrzej Siewior
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Cc: Thomas Petazzoni <thomas.petazz...@free-electrons.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 drivers/net/ethernet/marvell/mvneta.c | 244 +-
 include/linux/cpuhotplug.h|   1 +
 2 files changed, 149 insertions(+), 96 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 8e4252dd9a9d..b20a2459c109 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -382,7 +382,8 @@ struct mvneta_port {
struct mvneta_rx_queue *rxqs;
struct mvneta_tx_queue *txqs;
struct net_device *dev;
-   struct notifier_block cpu_notifier;
+   struct hlist_node node_online;
+   struct hlist_node node_dead;
int rxq_def;
/* Protect the access to the percpu interrupt registers,
 * ensuring that the configuration remains coherent.
@@ -573,6 +574,7 @@ struct mvneta_rx_queue {
int next_desc_to_proc;
 };
 
+static enum cpuhp_state online_hpstate;
 /* The hardware supports eight (8) rx queues, but we are only allowing
  * the first one to be used. Therefore, let's just allocate one queue.
  */
@@ -3313,101 +3315,101 @@ static void mvneta_percpu_elect(struct mvneta_port 
*pp)
}
 };
 
-static int mvneta_percpu_notifier(struct notifier_block *nfb,
- unsigned long action, void *hcpu)
+static int mvneta_cpu_online(unsigned int cpu, struct hlist_node *node)
 {
-   struct mvneta_port *pp = container_of(nfb, struct mvneta_port,
- cpu_notifier);
-   int cpu = (unsigned long)hcpu, other_cpu;
+   int other_cpu;
+   struct mvneta_port *pp = hlist_entry_safe(node, struct mvneta_port,
+ node_online);
struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
 
-   switch (action) {
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   case CPU_DOWN_FAILED:
-   case CPU_DOWN_FAILED_FROZEN:
-   spin_lock(>lock);
-   /* Configuring the driver for a new CPU while the
-* driver is stopping is racy, so just avoid it.
-*/
-   if (pp->is_stopped) {
-   spin_unlock(>lock);
-   break;
+
+   spin_lock(>lock);
+   /* Configuring the driver for a new CPU while the
+* driver is stopping is racy, so just avoid it.
+*/
+   if (pp->is_stopped) {
+   spin_unlock(>lock);
+   return 0;
+   }
+   netif_tx_stop_all_queues(pp->dev);
+
+   /* We have to synchronise on tha napi of each CPU
+* except the one just being waked up
+*/
+   for_each_online_cpu(other_cpu) {
+   if (other_cpu != cpu) {
+   struct mvneta_pcpu_port *other_port =
+   per_cpu_ptr(pp->ports, other_cpu);
+
+   napi_synchronize(_port->napi);
}
-   netif_tx_stop_all_queues(pp->dev);
-
-   /* We have to synchronise on tha napi of each CPU
-* except the one just being waked up
-*/
-   for_each_online_cpu(other_cpu) {
-   if (other_cpu != cpu) {
-   struct mvneta_pcpu_port *other_port =
-   per_cpu_ptr(pp->ports, other_cpu);
-
-   napi_synchronize(_port->napi);
-   }
-   }
-
-   /* Mask all ethernet port interrupts */
-   on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
-   napi_enable(>napi);
-
-
-   /* Enable per-CPU interrupts on the CPU that is
-* brought up.
-*/
-   mvneta_percpu_enable(pp);
-
-   /* Enable per-CPU interrupt on the one CPU we care
-* about.
-*/
-   mvneta_percpu_elect(pp);
-
-   /* Unmask all ethernet port interrupts */
-   on_each_cpu(mvneta_percpu_unmask_interrupt, pp, true);
-   mvreg_write(pp, MVNETA_INTR_MISC_MASK,
-   MVNETA_CAUSE_PHY_STATUS_CHANGE |
-   MVNETA_CAUSE_LINK_CHANGE |
-   MVNETA_CAUSE_PSC_SYNC_CHANGE);
-   netif_tx_start_all_queues(pp->dev);
-   spin_unlock(>lock);
-   break;
-   case CPU_DOWN_PREPARE:
-   case CPU_DOWN_PREPARE_FROZEN:
-   netif_tx_stop_all_queues(pp->dev);
-   /* Thanks to this lock we are sure that any pending
-* cpu elect

[PATCH 6/6] net: virtio-net: Convert to hotplug state machine

2016-08-12 Thread Sebastian Andrzej Siewior
Install the callbacks via the state machine.
The driver supports multiple instances and therefore the new
cpuhp_state_add_instance_nocalls() infrastrucure is used. The driver
currently uses get_online_cpus() to avoid missing a CPU hotplug event
while invoking virtnet_set_affinity(). This could be avoided by using
cpuhp_state_add_instance() variant which holds the hotplug lock and
invokes callback during registration. This is more or less a 1:1
conversation of the current code.

Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: virtualizat...@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 drivers/net/virtio_net.c   | 110 +++--
 include/linux/cpuhotplug.h |   1 +
 2 files changed, 87 insertions(+), 24 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1b5f531eeb25..e9be88cd76c1 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -138,8 +138,9 @@ struct virtnet_info {
/* Does the affinity hint is set for virtqueues? */
bool affinity_hint_set;
 
-   /* CPU hot plug notifier */
-   struct notifier_block nb;
+   /* CPU hotplug instances for online & dead */
+   struct hlist_node node;
+   struct hlist_node node_dead;
 
/* Control VQ buffers: protected by the rtnl lock */
struct virtio_net_ctrl_hdr ctrl_hdr;
@@ -1237,25 +1238,53 @@ static void virtnet_set_affinity(struct virtnet_info 
*vi)
vi->affinity_hint_set = true;
 }
 
-static int virtnet_cpu_callback(struct notifier_block *nfb,
-   unsigned long action, void *hcpu)
+static int virtnet_cpu_online(unsigned int cpu, struct hlist_node *node)
 {
-   struct virtnet_info *vi = container_of(nfb, struct virtnet_info, nb);
+   struct virtnet_info *vi = hlist_entry_safe(node, struct virtnet_info,
+  node);
+   virtnet_set_affinity(vi);
+   return 0;
+}
 
-   switch(action & ~CPU_TASKS_FROZEN) {
-   case CPU_ONLINE:
-   case CPU_DOWN_FAILED:
-   case CPU_DEAD:
-   virtnet_set_affinity(vi);
-   break;
-   case CPU_DOWN_PREPARE:
-   virtnet_clean_affinity(vi, (long)hcpu);
-   break;
-   default:
-   break;
-   }
+static int virtnet_cpu_dead(unsigned int cpu, struct hlist_node *node)
+{
+   struct virtnet_info *vi = hlist_entry_safe(node, struct virtnet_info,
+  node_dead);
+   virtnet_set_affinity(vi);
+   return 0;
+}
 
-   return NOTIFY_OK;
+static int virtnet_cpu_down_prep(unsigned int cpu, struct hlist_node *node)
+{
+   struct virtnet_info *vi = hlist_entry_safe(node, struct virtnet_info,
+  node);
+
+   virtnet_clean_affinity(vi, cpu);
+   return 0;
+}
+
+static enum cpuhp_state virtionet_online;
+
+static int virtnet_cpu_notif_add(struct virtnet_info *vi)
+{
+   int ret;
+
+   ret = cpuhp_state_add_instance_nocalls(virtionet_online, >node);
+   if (ret)
+   return ret;
+   ret = cpuhp_state_add_instance_nocalls(CPUHP_VIRT_NET_DEAD,
+  >node_dead);
+   if (!ret)
+   return ret;
+   cpuhp_state_remove_instance_nocalls(virtionet_online, >node);
+   return ret;
+}
+
+static void virtnet_cpu_notif_remove(struct virtnet_info *vi)
+{
+   cpuhp_state_remove_instance_nocalls(virtionet_online, >node);
+   cpuhp_state_remove_instance_nocalls(CPUHP_VIRT_NET_DEAD,
+   >node_dead);
 }
 
 static void virtnet_get_ringparam(struct net_device *dev,
@@ -1879,8 +1908,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 
virtio_device_ready(vdev);
 
-   vi->nb.notifier_call = _cpu_callback;
-   err = register_hotcpu_notifier(>nb);
+   err = virtnet_cpu_notif_add(vi);
if (err) {
pr_debug("virtio_net: registering cpu notifier failed\n");
goto free_unregister_netdev;
@@ -1934,7 +1962,7 @@ static void virtnet_remove(struct virtio_device *vdev)
 {
struct virtnet_info *vi = vdev->priv;
 
-   unregister_hotcpu_notifier(>nb);
+   virtnet_cpu_notif_remove(vi);
 
/* Make sure no work handler is accessing the device. */
flush_work(>config_work);
@@ -1953,7 +1981,7 @@ static int virtnet_freeze(struct virtio_device *vdev)
struct virtnet_info *vi = vdev->priv;
int i;
 
-   unregister_hotcpu_notifier(>nb);
+   virtnet_cpu_notif_remove(vi);
 
/* Make sure no work handler is accessing the device */
flush_work(>config_work);
@@ -1997,7 +2025,7 @@ static int virtnet_restore(struct virtio_device *vdev)

Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD

2016-08-12 Thread Sebastian Andrzej Siewior
On 2016-08-11 19:15:40 [+0300], Grygorii Strashko wrote:
> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
> where this IRQs are forced threaded:
>  rx-irq
>   |- schedule threaded rx-irq handler
> ...
>   |- threaded rx-irq handler -> cpsw_rx_interrupt()
>  |- napi_schedule()
>   |- __raise_softirq_irqoff()
>  |- wakeup_proper_softirq()
> ...
>   napi

This should not be the default path. The default should be napi running
in the context of the threaded rx-irq handler once the handler is done.
The wakeup_proper_softirq() part is only done if napi thinks that the
callback functions runs for too long. So in *that* case you continue
NAPI in the softirq-thread which runs at SCHED_OTHER.

> after:
>  rx-irq
>   |- cpsw_rx_interrupt()
>  |- napi_schedule()
>   |- irq_exit()
>  |- invoke_softirq()
>  |- wakeup_softirqd()
> ...
>   napi

Since you schedule the softirq from an IRQ-off region / without a
process context you force the softirq to run in the thread at
SCHED_OTHER priority.

> And, as result, get benefits from the following improvements (tested
> on am57xx-evm):
> 
> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>seen any more. Now these warnings can be seen once iperf is started.
># iperf -c $IPERFHOST -w 128K  -d -t 60

Do you also see "sched: RT throttling activated"? Because I don't see
otherwise why this should pop up.

> 2) latency reduction when cyclictest is run in parallel with network load
>  where net_perf.sh is:
>iperf -c $IPERFHOST -w 8K-d -t 60
>iperf -c $IPERFHOST -w 16K   -d -t 60
>iperf -c $IPERFHOST -w 32K   -d -t 60
>iperf -c $IPERFHOST -w 64K   -d -t 60
>iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> before:
> T: 0 ( 1326) P:98 I:1000 C: 24 Min:  8 Act:   13 Avg:   18 Max:  
> 70
> T: 1 ( 1327) P:98 I:1500 C: 159981 Min:  9 Act:   15 Avg:   16 Max:  
> 43
> after:
> T: 0 ( 1331) P:98 I:1000 C: 24 Min:  8 Act:   15 Avg:   14 Max:  
> 51
> T: 1 ( 1332) P:98 I:1500 C: 159953 Min:  8 Act:   16 Avg:   15 Max:  
> 33

-d 0 to have I: set to the same value.
What does -i 250 say?
And without network load we are where we were at "after" values?

What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/
in net/core/dev.c and chrt the priority of you network interrupt
handlers to SCHED_OTHER priority?

> 3) network performance increase
> 
> win, KMbits/s
>   before  after   %
> 8K354 350.3   0.0
> 16K   412 551 33.7
> 32K   423 659.5   55.9
> 64K   436 728.3   67.0
> 128K  537 845 57.4

How close are the after numbers to !RT?

Sebastian


Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

2016-06-10 Thread Sebastian Andrzej Siewior
On 06/10/2016 06:11 PM, Steven Rostedt wrote:
>> It is true that in RT we don't have such a limit like in !RT. You would
>> need to use __raise_softirq_irqoff_ksoft() instead the normal or +
>> wakeup() since you may have timers pending and those need to go to the
>> "other" ksoftirqd.
>> But then I don't see much change. ksoftirqd runs now at SCHED_OTHER so
>> it will end up on the CPU right away unless there other tasks that need
>> the CPU. So the scheduler will balance it the same way. The only change
>> will be that softirqs which are processed in context of any application
>> for more than two jiffies will be moved to ksoftirqd. This could be a
>> win.
> 
> We actually triggered a starvation due to this. I was just seeing if
> Alison hit the same issue we did in our tests.

Okay. Didn't get this information from him. But this is only because
the softirqs not running in ksoftirqd, right?

> -- Steve
> 
Sebastian


Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

2016-06-10 Thread Sebastian Andrzej Siewior
* Steven Rostedt | 2016-06-04 07:11:31 [-0400]:

>From: Steven Rostedt 
>Date: Tue, 5 Jan 2016 14:53:09 -0500
>Subject: [PATCH] softirq: Perform softirqs in local_bh_enable() for a limited
> amount of time
>
>To prevent starvation of tasks like ksoftirqd, if the task that is
>processing its softirqs takes more than 2 jiffies to do so, and the
>softirqs are constantly being re-added, then defer the processing to
>ksoftirqd.

I'm not sure about this. Alison didn't scream "yes it solves my problem"
and I am not sure what it is.
It is true that in RT we don't have such a limit like in !RT. You would
need to use __raise_softirq_irqoff_ksoft() instead the normal or +
wakeup() since you may have timers pending and those need to go to the
"other" ksoftirqd.
But then I don't see much change. ksoftirqd runs now at SCHED_OTHER so
it will end up on the CPU right away unless there other tasks that need
the CPU. So the scheduler will balance it the same way. The only change
will be that softirqs which are processed in context of any application
for more than two jiffies will be moved to ksoftirqd. This could be a
win.

>Signed-off-by: Steven Rostedt 
>Signed-off-by: Luis Claudio R. Goncalves 

Sebastian


Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

2016-06-10 Thread Sebastian Andrzej Siewior
* Steven Rostedt | 2016-05-26 19:56:41 [-0400]:

>For example:
>
>   
>
>   napi_schedule_prep()
>test_and_set_bit(NAPI_STATE_SCHED, >state)
>
>   
>
>   sk_busy_loop()
>
>  do {
>   rc = busy_poll()
>   ret = napi_schedule_prep()
>return !test_and_set_bit(NAPI_STATE_SCHED, >state)
> 
>   if (!ret) return 0
>   
>  } while (...) /* for ever */
>

No, I don't see the busyloop. while() is here:
|while (!nonblock && skb_queue_empty(>sk_receive_queue) &&
|  !need_resched() && !busy_loop_timeout(end_time));

and this seems to be the case since v3.11 where it was introduced (but
now it moved to dev.c). So even if there is no busy_poll() and
napi_schedule_prep() returns 0 our cycles here are limited by
busy_loop_timeout().

Sebastian


Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

2016-06-09 Thread Sebastian Andrzej Siewior
* Alison Chaiken | 2016-06-07 17:19:43 [-0700]:

>Sorry to be obscure; I had applied that patch to v4.1.6-rt5.

Using the latest is often not a bad choice compared to the random tree
you have here.

>> What I remember from testing the two patches on am335x was that before a
>> ping flood on gbit froze the serial console but with them it the ping
>> flood was not noticed.
>
>I compiled a kernel from upstream d060a36 "Merge branch
>'ti-linux-4.1.y' of git.ti.com:ti-linux-kernel/ti-linux-kernel into
>ti-rt-linux-4.1.y" which is unpatched except for using a
>board-appropriate device-tree.The serial console is responsive
>with all our RT userspace applications running alongside a rapid
>external ping.   However, our main event loop misses frequently as
>soon as ping faster than 'ping -i 0.0002' is run.mpstat shows that
>the sum of the hard IRQ rates in a second is equal precisely to the
>NET_RX rate, which is ~3400/s.   Does the fact that 3400 < (1/0.0002)
>already mean that some packets are dropped?   ftrace shows that

Not necessarily. The ping command reports how many packets were
received. It is possible that the sender was not able to send that many
packates _or_ the received was able to process more packets during a
single interrupt.

>cpsw_rx_poll() is called even when there is essentially no network
>traffic, so I'm not sure how to tell if NAPI is working as intended.

You should see an invocation of __raise_softirq_irqoff_ksoft() and then
cpsw's poll function should run in "ksoftirqd/" context instead in the
context of the task it runs now.

>I tried running the wakeup_rt tracer, but it loads the system too
>much. With ftrace capturing IRQ, scheduler and net events, we're
>writing out markers into the trace buffer when the event loop makes
>its deadline and then when it misses so that we can compare the normal
>and long-latency intervals, but there doesn't appear to be a smoking
>gun in the difference between the two.

You would need to figure out what adds the latency. My understanding is
that your RT application is doing CAN traffic and is not meeting the
deadline. So you drop CAN packets in the end?

>Thanks for all your help,
>Alison

Sebastian


Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

2016-06-07 Thread Sebastian Andrzej Siewior
* Alison Chaiken | 2016-06-05 08:16:58 [-0700]:

>I did try that patch, but it hasn't made much difference.   Let me
>back up and restate the problem I'm trying to solve, which is that a
>DRA7X OMAP5 SOC system running a patched 4.1.18-ti-rt kernel has a
>main event loop in user space that misses latency deadlines under the
>test condition where I ping-flood it from another box.   While in
>production, the system would not be expected to support high rates of
>network traffic, but the instability with the ping-flood makes me
>wonder if there aren't underlying configuration problems.
>
>We've applied Sebastian's commit "softirq: split timer softirqs out of
>ksoftirqd," which improved event loop stability substantially when we

Why did you apply that one? You have 4.1.18-ti-rt so I don't know how
that works but v4.1.15-rt18 had this patch included. Also "net: provide
a way to delegate processing a softirq to ksoftirqd" should be applied
(which is also part of v4.1.15-rt18).

>left ksoftirqd running at userspace default but elevated ktimersoftd.
> That made me think that focusing on the softirqs was pertinent.

Before that explicit "delegation" to ksoftirq within NAPI it was likely
that the NAPI callback was never interrupted and continued on the "next"
softirq.

>priority) starts having problems, I see that the hard IRQ associated
>with the ethernet device uses about 35% of one core, which seems
>awfully high if the NAPI has triggered a switch to polling.  I vaguely

Try the patch above, it is likely your NAPI was never interrupted.

>recall David Miller saying in the "threadable napi poll loop"
>discussion that accounting was broken for net IRQs, so perhaps that
>number is misleading.   mpstat shows that the NET_RX softirqs run on
>the same core where we've pinned the ethernet IRQ, so you might hope
>that userspace might be able to run happily on the other one.
>
>What I see in ftrace while watching scheduler and IRQ events is that
>the userspace application is yielding to ethernet or CAN IRQs, which
>also raise NET_RX.In the following,  ping-flood is running, and
>irq/343 is the ethernet one:
>
> userspace_application-4767  [000] dn.h1..  4196.422318: irq_handler_entry: 
> irq=347 name=can1
> userspace_application-4767  [000] dn.h1..  4196.422319: irq_handler_exit: 
> irq=347 ret=handled
> userspace_application-4767  [000] dn.h2..  4196.422321: sched_waking: 
> comm=irq/347-can1 pid=2053 prio=28 target_cpu=000
> irq/343-4848400-874   [001] 112  4196.422323: softirq_entry: vec=3 
> [action=NET_RX]
> userspace_application-4767  [000] dn.h3..  4196.422325: sched_wakeup: 
> comm=irq/347-can1 pid=2053 prio=28 target_cpu=000
> irq/343-4848400-874   [001] 112  4196.422328: napi_poll: napi poll on 
> napi struct edd5f560 for device eth0
> irq/343-4848400-874   [001] 112  4196.422329: softirq_exit: vec=3 
> [action=NET_RX]
> userspace_application-4767  [000] dn..3..  4196.422332: sched_stat_runtime: 
> comm=userspace_application pid=4767 runtime=22448 [ns] vruntime=338486919642 
> [ns]
> userspace_application-4767  [000] d...3..  4196.422336: sched_switch: 
> prev_comm=userspace_application prev_pid=4767 prev_prio=120 prev_state=R ==> 
> next_comm=irq/347-can1 next_pid=2053 next_prio=28
> irq/343-4848400-874   [001] d...3..  4196.422339: sched_switch: 
> prev_comm=irq/343-4848400 prev_pid=874 prev_prio=47 prev_state=S ==> 
> next_comm=irq/344-4848400 next_pid=875 next_prio=47

What I remember from testing the two patches on am335x was that before a
ping flood on gbit froze the serial console but with them it the ping
flood was not noticed.

>Thanks again for the patches,
>Alison Chaiken
>Peloton Technology

Sebastian


Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

2016-06-02 Thread Sebastian Andrzej Siewior
* Steven Rostedt | 2016-05-26 19:56:41 [-0400]:

>[ Alison, can you try this patch ]

Alison, did you try it?

Sebastian


Re: Softirq priority inversion from "softirq: reduce latencies"

2016-03-07 Thread Sebastian Andrzej Siewior
On 02/27/2016 07:19 PM, Peter Hurley wrote:
> Hi Eric,

Hi Peter,

> Because both the uart driver (omap8250) and the dmaengine driver
> (edma) were (relatively) new, we assumed there was some race between
> starting a new rx DMA and processing the previous one.

Now after digesting the whole thread. I complained about this a long
while ago. After you start RX-DMA the DMA-engine is not programmed
immediately but deferred into softirq/tasklet. This is not the case for
continuous DMA transfer - those are programmed right away.

I don't remember that I found a reason why this simple programming has
to be deferred and can't happen immediately like it is the case for the
continuous DMA transfers. So I skipped that. RX-DMA in UART was working
well but for some reason omap's MMC-card driver refused to work.

Sebastian


Re: Softirq priority inversion from "softirq: reduce latencies"

2016-03-07 Thread Sebastian Andrzej Siewior
On 02/29/2016 05:58 AM, Mike Galbraith wrote:
> WRT -rt: if dma tasklets really do have hard (ish) constraints, -rt
> recently "broke" in the same way.. of all softirqs which are deferred
> to kthread context, due to a recent change, only timer/hrtimer are
> executed at realtime priority by default.

no. All softirqs are invoked in the context of the current process that
triggerd the softirq invocation. If NAPI goes on for too long (or other
softirq can't be executed in this context) it will continue in the
ksoftirqd. And this threads runs at a normal priority like it does in
mainline.
I adjusted it with mainline.

>   -Mike
> 
Sebastian


[PATCH RT 1/2] net: provide a way to delegate processing a softirq to ksoftirqd

2016-01-20 Thread Sebastian Andrzej Siewior
If the NET_RX uses up all of his budget it moves the following NAPI
invocations into the `ksoftirqd`. On -RT it does not do so. Instead it
rises the NET_RX softirq in its current context again.

In order to get closer to mainline's behaviour this patch provides
__raise_softirq_irqoff_ksoft() which raises the softirq in the ksoftird.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 include/linux/interrupt.h |  8 
 kernel/softirq.c  | 13 +
 net/core/dev.c|  2 +-
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 79a9622b5a38..655cee096aed 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -465,6 +465,14 @@ extern void thread_do_softirq(void);
 extern void open_softirq(int nr, void (*action)(struct softirq_action *));
 extern void softirq_init(void);
 extern void __raise_softirq_irqoff(unsigned int nr);
+#ifdef CONFIG_PREEMPT_RT_FULL
+extern void __raise_softirq_irqoff_ksoft(unsigned int nr);
+#else
+static inline void __raise_softirq_irqoff_ksoft(unsigned int nr)
+{
+   __raise_softirq_irqoff(nr);
+}
+#endif
 
 extern void raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq(unsigned int nr);
diff --git a/kernel/softirq.c b/kernel/softirq.c
index f4c2e679a7d7..e83f38cf 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -641,6 +641,19 @@ void __raise_softirq_irqoff(unsigned int nr)
 }
 
 /*
+ * Same as __raise_softirq_irqoff() but will process them in ksoftirqd
+ */
+void __raise_softirq_irqoff_ksoft(unsigned int nr)
+{
+   if (WARN_ON_ONCE(!__this_cpu_read(ksoftirqd)))
+   return;
+   trace_softirq_raise(nr);
+   or_softirq_pending(1UL << nr);
+   __this_cpu_read(ksoftirqd)->softirqs_raised |= (1U << nr);
+   wakeup_softirqd();
+}
+
+/*
  * This function must run with irqs disabled!
  */
 void raise_softirq_irqoff(unsigned int nr)
diff --git a/net/core/dev.c b/net/core/dev.c
index f4475ccbc19b..13a55d0df151 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4955,7 +4955,7 @@ static void net_rx_action(struct softirq_action *h)
list_splice_tail(, );
list_splice(, >poll_list);
if (!list_empty(>poll_list))
-   __raise_softirq_irqoff(NET_RX_SOFTIRQ);
+   __raise_softirq_irqoff_ksoft(NET_RX_SOFTIRQ);
 
net_rps_action_and_irq_enable(sd);
 }
-- 
2.7.0.rc3



[PATCH] net: rds: don't pretend to use cpu notifiers

2015-11-27 Thread Sebastian Andrzej Siewior
It looks like an attempt to use CPU notifier here which was never
completed. Nobody tried to wire it up completely since 2k9. So I unwind
this code and get rid of everything not required. Oh look! 19 lines were
removed while code still does the same thing.

Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 net/rds/page.c | 31 ++-
 1 file changed, 6 insertions(+), 25 deletions(-)

diff --git a/net/rds/page.c b/net/rds/page.c
index 9005a2c920ee..5a14e6d6a926 100644
--- a/net/rds/page.c
+++ b/net/rds/page.c
@@ -179,37 +179,18 @@ int rds_page_remainder_alloc(struct scatterlist *scat, 
unsigned long bytes,
 }
 EXPORT_SYMBOL_GPL(rds_page_remainder_alloc);
 
-static int rds_page_remainder_cpu_notify(struct notifier_block *self,
-unsigned long action, void *hcpu)
+void rds_page_exit(void)
 {
-   struct rds_page_remainder *rem;
-   long cpu = (long)hcpu;
+   unsigned int cpu;
 
-   rem = _cpu(rds_page_remainders, cpu);
+   for_each_possible_cpu(cpu) {
+   struct rds_page_remainder *rem;
 
-   rdsdebug("cpu %ld action 0x%lx\n", cpu, action);
+   rem = _cpu(rds_page_remainders, cpu);
+   rdsdebug("cpu %u\n", cpu);
 
-   switch (action) {
-   case CPU_DEAD:
if (rem->r_page)
__free_page(rem->r_page);
rem->r_page = NULL;
-   break;
}
-
-   return 0;
-}
-
-static struct notifier_block rds_page_remainder_nb = {
-   .notifier_call = rds_page_remainder_cpu_notify,
-};
-
-void rds_page_exit(void)
-{
-   int i;
-
-   for_each_possible_cpu(i)
-   rds_page_remainder_cpu_notify(_page_remainder_nb,
- (unsigned long)CPU_DEAD,
- (void *)(long)i);
 }
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html