date:20190528

[PATCH net-next] net: avoid indirect calls in L4 checksum calculation

2019-05-28 Thread Matteo Croce

Commit 283c16a2dfd3 ("indirect call wrappers: helpers to speed-up
indirect calls of builtin") introduces some macros to avoid doing
indirect calls.

Use these helpers to remove two indirect calls in the L4 checksum
calculation for devices which don't have hardware support for it.

As a test I generate packets with pktgen out to a dummy interface
with HW checksumming disabled, to have the checksum calculated in
every sent packet.
The packet rate measured with an i7-6700K CPU and a single pktgen
thread raised from 6143 to 6608 Kpps, an increase by 7.5%

Suggested-by: Davide Caratti 
Signed-off-by: Matteo Croce 
---
 net/core/skbuff.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e89be6282693..a24a7ef55ce9 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -69,6 +69,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -76,9 +77,22 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "datagram.h"
 
+#if IS_ENABLED(CONFIG_IP_SCTP)
+#define CSUM_UPDATE(f, ...) \
+   INDIRECT_CALL_2(f, csum_partial_ext, sctp_csum_update, __VA_ARGS__)
+#define CSUM_COMBINE(f, ...) \
+   INDIRECT_CALL_2(f, csum_block_add_ext, sctp_csum_combine, __VA_ARGS__)
+#else
+#define CSUM_UPDATE(f, ...) \
+   INDIRECT_CALL_1(f, csum_partial_ext, __VA_ARGS__)
+#define CSUM_COMBINE(f, ...) \
+   INDIRECT_CALL_1(f, csum_block_add_ext, __VA_ARGS__)
+#endif
+
 struct kmem_cache *skbuff_head_cache __ro_after_init;
 static struct kmem_cache *skbuff_fclone_cache __ro_after_init;
 #ifdef CONFIG_SKB_EXTENSIONS
@@ -2507,7 +2521,7 @@ __wsum __skb_checksum(const struct sk_buff *skb, int 
offset, int len,
if (copy > 0) {
if (copy > len)
copy = len;
-   csum = ops->update(skb->data + offset, copy, csum);
+   csum = CSUM_UPDATE(ops->update, skb->data + offset, copy, csum);
if ((len -= copy) == 0)
return csum;
offset += copy;
@@ -2534,9 +2548,9 @@ __wsum __skb_checksum(const struct sk_buff *skb, int 
offset, int len,
  frag->page_offset + offset - 
start,
  copy, p, p_off, p_len, copied) {
vaddr = kmap_atomic(p);
-   csum2 = ops->update(vaddr + p_off, p_len, 0);
+   csum2 = CSUM_UPDATE(ops->update, vaddr + p_off, 
p_len, 0);
kunmap_atomic(vaddr);
-   csum = ops->combine(csum, csum2, pos, p_len);
+   csum = CSUM_COMBINE(ops->combine, csum, csum2, 
pos, p_len);
pos += p_len;
}
 
@@ -2559,7 +2573,7 @@ __wsum __skb_checksum(const struct sk_buff *skb, int 
offset, int len,
copy = len;
csum2 = __skb_checksum(frag_iter, offset - start,
   copy, 0, ops);
-   csum = ops->combine(csum, csum2, pos, copy);
+   csum = CSUM_COMBINE(ops->combine, csum, csum2, pos, 
copy);
if ((len -= copy) == 0)
return csum;
offset += copy;
-- 
2.21.0

Re: [RFC][PATCH 0/7] Mount, FS, Block and Keyrings notifications

2019-05-28 Thread Greg KH

On Tue, May 28, 2019 at 05:01:47PM +0100, David Howells wrote:
> Things I want to avoid:
> 
>  (1) Introducing features that make the core VFS dependent on the network
>  stack or networking namespaces (ie. usage of netlink).
> 
>  (2) Dumping all this stuff into dmesg and having a daemon that sits there
>  parsing the output and distributing it as this then puts the
>  responsibility for security into userspace and makes handling
>  namespaces tricky.  Further, dmesg might not exist or might be
>  inaccessible inside a container.
> 
>  (3) Letting users see events they shouldn't be able to see.

How are you handling namespaces then?  Are they determined by the
namespace of the process that opened the original device handle, or the
namespace that made the new syscall for the events to "start flowing"?

Am I missing the logic that determines this in the patches, or is that
not implemented yet?

thanks,

greg k-h

[PATCH] ARM: xor-neon: Replace GNUC checks with CONFIG_CC_IS_GCC

2019-05-28 Thread Nathan Chancellor

Currently, when compiling this code with clang, the following warning is
emitted:

CC  arch/arm/lib/xor-neon.o
  arch/arm/lib/xor-neon.c:33:2: warning: This code requires at least
  version 4.6 of GCC [-W#warnings]

This is because clang poses as GCC 4.2.1 with its __GNUC__ conditionals
for glibc compatibility[1]:

$ echo | clang -dM -E -x c /dev/null | grep GNUC | awk '{print $2" "$3}'
__GNUC_MINOR__ 2
__GNUC_PATCHLEVEL__ 1
__GNUC_STDC_INLINE__ 1
__GNUC__ 4

As pointed out by Ard Biesheuvel and Arnd Bergmann in an earlier
thread[2], the oldest version of GCC that is currently supported is gcc
4.6 after commit cafa0010cd51 ("Raise the minimum required gcc version
to 4.6") so we do not need to check for anything older anymore.

However, just removing the version check is not enough to silence clang
because it does not recognize '#pragma GCC optimize':

  arch/arm/lib/xor-neon.c:25:13: warning: unknown pragma ignored
  [-Wunknown-pragmas]
  #pragma GCC optimize "tree-vectorize"

Looking into it further, -ftree-vectorize (which '#pragma GCC optimize
"tree-vectorize"' enables) is an alias in clang for -fvectorize[3],
which according to the documentation is on by default[4] (at least at
-O2 or -Os).

Just add the pragma when compiling with GCC so that clang does not
unnecessarily warn.

[1]: https://reviews.llvm.org/D51011#1206981
[2]: 
https://lore.kernel.org/lkml/cak8p3a3njtcgfd2dq9kbhp8dpxf6s-ulfeu6acayc4sdi+2...@mail.gmail.com/
[3]: 
https://github.com/llvm/llvm-project/blob/eafe8ef6f2b44ba/clang/include/clang/Driver/Options.td#L1729
[4]: https://llvm.org/docs/Vectorizers.html#usage

Link: https://github.com/ClangBuiltLinux/linux/issues/496
Reported-by: Nick Desaulniers 
Signed-off-by: Nathan Chancellor 
---
 arch/arm/lib/xor-neon.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
index c691b901092f..d532bc072ee4 100644
--- a/arch/arm/lib/xor-neon.c
+++ b/arch/arm/lib/xor-neon.c
@@ -22,15 +22,8 @@ MODULE_LICENSE("GPL");
  * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
  * NEON instructions.
  */
-#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
+#ifdef CONFIG_CC_IS_GCC
 #pragma GCC optimize "tree-vectorize"
-#else
-/*
- * While older versions of GCC do not generate incorrect code, they fail to
- * recognize the parallel nature of these functions, and emit plain ARM code,
- * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
- */
-#warning This code requires at least version 4.6 of GCC
 #endif
 
 #pragma GCC diagnostic ignored "-Wunused-variable"
-- 
2.22.0.rc1

[PATCH net-next 3/5] net: dsa: mv88e6xxx: Let taggers specify a can_timestamp function

2019-05-28 Thread Vladimir Oltean

The newly introduced function is called on both the RX and TX paths.

The boolean returned by port_txtstamp should only return false if the
driver tried to timestamp the skb but failed.

Currently there is some logic in the mv88e6xxx driver that determines
whether it should timestamp frames or not.

This is wasteful, because if the decision is to not timestamp them, then
DSA will have cloned an skb and freed it immediately afterwards.

Additionally other drivers (sja1105) may have other hardware criteria
for timestamping frames on RX, and the default conditions for
timestamping a frame are too restrictive.  When RX timestamping is
enabled, the sja1105 hardware emits a follow-up frame containing the
timestamp for every trapped link-local frame.  Then the link-local frame
is queued up inside the port_rxtstamp callback where it waits for its
follow-up meta frame to come.  But only a subset of the link-local
frames will pass through DSA's default filter for port_rxtstamp, so the
rest of the link-local traffic would still receive a meta frame but
would not get timestamped.

Since the state machine of waiting for meta frames is implemented in the
tagger rcv function for sja1105, it is difficult to know which frames
will pass through DSA's later filter and which won't.  And since
timestamping more frames than just PTP does no harm, just implement a
callback for sja1105 that will say that all link-local traffic will be
timestamped on RX.

PTP classification on the skb is still performed.  But now it is saved
to the DSA_SKB_CB, so drivers can reuse it without calling it again.

The mv88e6xxx driver was also modified to use the new generic
DSA_SKB_CB(skb)->ptp_type instead of its own, custom SKB_PTP_TYPE(skb).

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/mv88e6xxx/hwtstamp.c | 25 +
 drivers/net/dsa/mv88e6xxx/hwtstamp.h |  4 ++--
 include/net/dsa.h|  6 --
 net/dsa/dsa.c| 25 +++--
 net/dsa/slave.c  | 20 ++--
 5 files changed, 44 insertions(+), 36 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/hwtstamp.c 
b/drivers/net/dsa/mv88e6xxx/hwtstamp.c
index a17c16a2ab78..3295ad10818f 100644
--- a/drivers/net/dsa/mv88e6xxx/hwtstamp.c
+++ b/drivers/net/dsa/mv88e6xxx/hwtstamp.c
@@ -20,8 +20,6 @@
 #include "ptp.h"
 #include 
 
-#define SKB_PTP_TYPE(__skb) (*(unsigned int *)((__skb)->cb))
-
 static int mv88e6xxx_port_ptp_read(struct mv88e6xxx_chip *chip, int port,
   int addr, u16 *data, int len)
 {
@@ -216,8 +214,9 @@ int mv88e6xxx_port_hwtstamp_get(struct dsa_switch *ds, int 
port,
 }
 
 /* Get the start of the PTP header in this skb */
-static u8 *parse_ptp_header(struct sk_buff *skb, unsigned int type)
+static u8 *parse_ptp_header(struct sk_buff *skb)
 {
+   unsigned int type = DSA_SKB_CB(skb)->ptp_type;
u8 *data = skb_mac_header(skb);
unsigned int offset = 0;
 
@@ -249,7 +248,7 @@ static u8 *parse_ptp_header(struct sk_buff *skb, unsigned 
int type)
  * or NULL if the caller should not.
  */
 static u8 *mv88e6xxx_should_tstamp(struct mv88e6xxx_chip *chip, int port,
-  struct sk_buff *skb, unsigned int type)
+  struct sk_buff *skb)
 {
struct mv88e6xxx_port_hwtstamp *ps = >port_hwtstamp[port];
u8 *hdr;
@@ -257,7 +256,7 @@ static u8 *mv88e6xxx_should_tstamp(struct mv88e6xxx_chip 
*chip, int port,
if (!chip->info->ptp_support)
return NULL;
 
-   hdr = parse_ptp_header(skb, type);
+   hdr = parse_ptp_header(skb);
if (!hdr)
return NULL;
 
@@ -278,8 +277,7 @@ static int mv88e6xxx_ts_valid(u16 status)
 
 static int seq_match(struct sk_buff *skb, u16 ts_seqid)
 {
-   unsigned int type = SKB_PTP_TYPE(skb);
-   u8 *hdr = parse_ptp_header(skb, type);
+   u8 *hdr = parse_ptp_header(skb);
__be16 *seqid;
 
seqid = (__be16 *)(hdr + OFF_PTP_SEQUENCE_ID);
@@ -367,7 +365,7 @@ static int is_pdelay_resp(u8 *msgtype)
 }
 
 bool mv88e6xxx_port_rxtstamp(struct dsa_switch *ds, int port,
-struct sk_buff *skb, unsigned int type)
+struct sk_buff *skb)
 {
struct mv88e6xxx_port_hwtstamp *ps;
struct mv88e6xxx_chip *chip;
@@ -379,12 +377,10 @@ bool mv88e6xxx_port_rxtstamp(struct dsa_switch *ds, int 
port,
if (ps->tstamp_config.rx_filter != HWTSTAMP_FILTER_PTP_V2_EVENT)
return false;
 
-   hdr = mv88e6xxx_should_tstamp(chip, port, skb, type);
+   hdr = mv88e6xxx_should_tstamp(chip, port, skb);
if (!hdr)
return false;
 
-   SKB_PTP_TYPE(skb) = type;
-
if (is_pdelay_resp(hdr))
skb_queue_tail(>rx_queue2, skb);
else
@@ -503,17 +499,14 @@ long mv88e6xxx_hwtstamp_work(struct ptp_clock_info *ptp)
 }
 
 bool mv88e6xxx_port_txtstamp(struct dsa_switch *ds,

[PATCH net-next 0/5] PTP support for the SJA1105 DSA driver

2019-05-28 Thread Vladimir Oltean

This patchset adds the following:

 - A timecounter/cyclecounter based PHC for the free-running
   timestamping clock of this switch.

 - A state machine implemented in the DSA tagger for SJA1105, which
   keeps track of metadata follow-up Ethernet frames (the switch's way
   of transmitting RX timestamps).

 - Some common-sense on whether or not frames should be timestamped was
   taken out of the mv88e6xxx driver (the only other DSA driver with PTP
   support) and moved to the generic framework.  An option was also
   added for drivers to override these common-sense decisions, and
   timestamp some more frames.  This was the path of least resistance
   after implementing the aforementioned state machine - metadata
   follow-up frames need to be tracked anyway even if only to discard
   them and not pass them up the network stack.  And since the switch
   can't just be told to timestamp only what the kernel wants (PTP
   frames), simply use all the timestamps it provides.

 - A generic helper in the timecounter/cyclecounter code for
   reconstructing partial PTP timestamps, such as those generated by the
   SJA1105.

Not all is rosy, though.

PTP timestamping will only work when the ports are bridged. Otherwise,
the metadata follow-up frames holding RX timestamps won't be received
because they will be blocked by the master port's MAC filter. Linuxptp
tries to put the net device in ALLMULTI/PROMISC mode, but DSA doesn't
pass this on to the master port, which does the actual reception.
The master port is put in promiscous mode when the slave ports are
enslaved to a bridge.

Also, even with software-corrected timestamps, one can observe a
negative path delay reported by linuxptp:

ptp4l[55.600]: master offset  8 s2 freq  +83677 path delay -2390
ptp4l[56.600]: master offset 17 s2 freq  +83688 path delay -2391
ptp4l[57.601]: master offset  6 s2 freq  +83682 path delay -2391
ptp4l[58.601]: master offset -1 s2 freq  +83677 path delay -2391

Without investigating too deeply, this appears to be introduced by the
correction applied by linuxptp to t4 (t4c: corrected master rxtstamp)
during the path delay estimation process (removing the correction makes
the path delay positive).  This does not appear to have an obvious
negative effect upon the synchronization.

Lastly, clock manipulations on the actual hardware PTP clock will have
to be implemented anyway, for the TTEthernet block and the time-based
ingress policer.

Vladimir Oltean (5):
  timecounter: Add helper for reconstructing partial timestamps
  net: dsa: sja1105: Add support for the PTP clock
  net: dsa: mv88e6xxx: Let taggers specify a can_timestamp function
  net: dsa: sja1105: Add support for PTP timestamping
  net: dsa: sja1105: Increase priority of CPU-trapped frames

 drivers/net/dsa/mv88e6xxx/hwtstamp.c  |  25 +-
 drivers/net/dsa/mv88e6xxx/hwtstamp.h  |   4 +-
 drivers/net/dsa/sja1105/Kconfig   |   7 +
 drivers/net/dsa/sja1105/Makefile  |   1 +
 drivers/net/dsa/sja1105/sja1105.h |  30 ++
 .../net/dsa/sja1105/sja1105_dynamic_config.c  |   2 +
 drivers/net/dsa/sja1105/sja1105_main.c| 272 -
 drivers/net/dsa/sja1105/sja1105_ptp.c | 357 ++
 drivers/net/dsa/sja1105/sja1105_ptp.h |  48 +++
 drivers/net/dsa/sja1105/sja1105_spi.c |  28 ++
 .../net/dsa/sja1105/sja1105_static_config.c   |  59 +++
 .../net/dsa/sja1105/sja1105_static_config.h   |  10 +
 include/linux/dsa/sja1105.h   |  15 +
 include/linux/timecounter.h   |   7 +
 include/net/dsa.h |   6 +-
 kernel/time/timecounter.c |  33 ++
 net/dsa/dsa.c |  25 +-
 net/dsa/slave.c   |  20 +-
 net/dsa/tag_sja1105.c | 135 ++-
 19 files changed, 1043 insertions(+), 41 deletions(-)
 create mode 100644 drivers/net/dsa/sja1105/sja1105_ptp.c
 create mode 100644 drivers/net/dsa/sja1105/sja1105_ptp.h

-- 
2.17.1

[PATCH net-next 5/5] net: dsa: sja1105: Increase priority of CPU-trapped frames

2019-05-28 Thread Vladimir Oltean

Without noticing any particular issue, this patch ensures that
management traffic is treated with the maximum priority on RX by the
switch.  This is generally desirable, as the driver keeps a state
machine that waits for metadata follow-up frames as soon as a management
frame is received.  Increasing the priority helps expedite the reception
(and further reconstruction) of the RX timestamp to the driver after the
MAC has generated it.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/sja1105/sja1105_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c 
b/drivers/net/dsa/sja1105/sja1105_main.c
index ce516615536d..3bd250e4e070 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -380,7 +380,7 @@ static int sja1105_init_general_params(struct 
sja1105_private *priv)
.mirr_ptacu = 0,
.switchid = priv->ds->index,
/* Priority queue for link-local frames trapped to CPU */
-   .hostprio = 0,
+   .hostprio = 7,
.mac_fltres1 = SJA1105_LINKLOCAL_FILTER_A,
.mac_flt1= SJA1105_LINKLOCAL_FILTER_A_MASK,
.incl_srcpt1 = true,
-- 
2.17.1

[PATCH net-next 1/5] timecounter: Add helper for reconstructing partial timestamps

2019-05-28 Thread Vladimir Oltean

Some PTP hardware offers a 64-bit free-running counter whose snapshots
are used for timestamping, but only makes part of that snapshot
available as timestamps (low-order bits).

In that case, timecounter/cyclecounter users must bring the cyclecounter
and timestamps to the same bit width, and they currently have two
options of doing so:

- Trim the higher bits of the timecounter itself to the number of bits
  of the timestamps.  This might work for some setups, but if the
  wraparound of the timecounter in this case becomes high (~10 times per
  second) then this causes additional strain on the system, which must
  read the clock that often just to avoid missing the wraparounds.

- Reconstruct the timestamp by racing to read the PTP time within one
  wraparound cycle since the timestamp was generated.  This is
  preferable when the wraparound time is small (do a time-critical
  readout once vs doing it periodically), and it has no drawback even
  when the wraparound is comfortably sized.

Signed-off-by: Vladimir Oltean 
---
 include/linux/timecounter.h |  7 +++
 kernel/time/timecounter.c   | 33 +
 2 files changed, 40 insertions(+)

diff --git a/include/linux/timecounter.h b/include/linux/timecounter.h
index 2496ad4cfc99..03eab1f3bb9c 100644
--- a/include/linux/timecounter.h
+++ b/include/linux/timecounter.h
@@ -30,6 +30,9 @@
  * by the implementor and user of specific instances of this API.
  *
  * @read:  returns the current cycle value
+ * @partial_tstamp_mask:bitmask in case the hardware emits timestamps
+ * which only capture low-order bits of the full
+ * counter, and should be reconstructed.
  * @mask:  bitmask for two's complement
  * subtraction of non 64 bit counters,
  * see CYCLECOUNTER_MASK() helper macro
@@ -38,6 +41,7 @@
  */
 struct cyclecounter {
u64 (*read)(const struct cyclecounter *cc);
+   u64 partial_tstamp_mask;
u64 mask;
u32 mult;
u32 shift;
@@ -136,4 +140,7 @@ extern u64 timecounter_read(struct timecounter *tc);
 extern u64 timecounter_cyc2time(struct timecounter *tc,
u64 cycle_tstamp);
 
+extern u64 cyclecounter_reconstruct(const struct cyclecounter *cc,
+   u64 ts_partial);
+
 #endif
diff --git a/kernel/time/timecounter.c b/kernel/time/timecounter.c
index 85b98e727306..d4657d64e38d 100644
--- a/kernel/time/timecounter.c
+++ b/kernel/time/timecounter.c
@@ -97,3 +97,36 @@ u64 timecounter_cyc2time(struct timecounter *tc,
return nsec;
 }
 EXPORT_SYMBOL_GPL(timecounter_cyc2time);
+
+/**
+ * cyclecounter_reconstruct - reconstructs @ts_partial
+ * @cc:Pointer to cycle counter.
+ * @ts_partial:Typically RX or TX NIC timestamp, provided by hardware 
as
+ * the lower @partial_tstamp_mask bits of the cycle counter,
+ * sampled at the time the timestamp was collected.
+ * To reconstruct into a full @mask bit-wide timestamp, the
+ * cycle counter is read and the high-order bits (up to @mask) are
+ * filled in.
+ * Must be called within one wraparound of @partial_tstamp_mask
+ * bits of the cycle counter.
+ */
+u64 cyclecounter_reconstruct(const struct cyclecounter *cc, u64 ts_partial)
+{
+   u64 ts_reconstructed;
+   u64 cycle_now;
+
+   cycle_now = cc->read(cc);
+
+   ts_reconstructed = (cycle_now & ~cc->partial_tstamp_mask) |
+   ts_partial;
+
+   /* Check lower bits of current cycle counter against the timestamp.
+* If the current cycle counter is lower than the partial timestamp,
+* then wraparound surely occurred and must be accounted for.
+*/
+   if ((cycle_now & cc->partial_tstamp_mask) <= ts_partial)
+   ts_reconstructed -= (cc->partial_tstamp_mask + 1);
+
+   return ts_reconstructed;
+}
+EXPORT_SYMBOL_GPL(cyclecounter_reconstruct);
-- 
2.17.1

Re: linux-next: Fixes tags need some work in the sound-asoc tree

2019-05-28 Thread Stephen Rothwell

Hi Pierre-Louis,

On Tue, 28 May 2019 17:22:40 -0500 Pierre-Louis Bossart 
 wrote:
>
> On 5/28/19 4:56 PM, Stephen Rothwell wrote:
> > Hi all,
> > 
> > In commit
> > 
> >be1b577d0178 ("ASoC: SOF: Intel: hda: fix the hda init chip")
> > 
> > Fixes tag
> > 
> >Fixes: 8a300c8fb17 ("ASoC: SOF: Intel: Add HDA controller for Intel 
> > DSP")  
> 
> Sorry about that, not sure how I managed to add an off-by-one in all 
> these tags. Checkpatch.pl --strict did not report any issues, something 
> must be broken either in my setup or the script.
> Not sure how I can fix this now?

Its not worth the rebase necessary to fix them.  Just use it as a
learning experience.

-- 
Cheers,
Stephen Rothwell


pgprgtzjra5Q2.pgp
Description: OpenPGP digital signature

Re: [PATCH net-next 0/2] net: stmmac: dwmac-meson: update with SPDX Licence identifier

2019-05-28 Thread David Miller

From: Neil Armstrong 
Date: Mon, 27 May 2019 15:46:21 +0200

> Update the SPDX Licence identifier for the Amlogic Meson6 and Meson8 dwmac
> glue drivers.

Series applied.

Re: [PATCH net-next] net: mvpp2: cls: Remove unnessesary check in mvpp2_ethtool_cls_rule_ins

2019-05-28 Thread David Miller

From: YueHaibing 
Date: Mon, 27 May 2019 21:46:46 +0800

> Fix smatch warning:
> 
> drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c:1236
>  mvpp2_ethtool_cls_rule_ins() warn: unsigned 'info->fs.location' is never 
> less than zero.
> 
> 'info->fs.location' is u32 type, never less than zero.
> 
> Signed-off-by: YueHaibing 

This doesn't apply to net-next.

Re: [PATCH v2 08/10] Input: elan_i2c - export true width/height

2019-05-28 Thread Sean O'Brien

We do still use a maxed out major axis as a signal for a palm in the touchscreen
logic, but I'm not too concerned because if that axis is maxed out, the contact
should probably be treated as a palm anyway...

I'm more concerned with this affecting our gesture detection for
touchpad. It looks
like this change would cause all contacts to reported as some percentage bigger
than they are currently. Can you give me an idea of how big that percentage is?

On Tue, May 28, 2019 at 11:13 AM Harry Cutts  wrote:
>
> On Mon, 27 May 2019 at 18:21, Dmitry Torokhov  
> wrote:
> >
> > Hi Benjamin, KT,
> >
> > On Mon, May 27, 2019 at 11:55:01AM +0800, 廖崇榮 wrote:
> > > Hi
> > >
> > > -Original Message-
> > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > > Sent: Friday, May 24, 2019 5:37 PM
> > > To: Dmitry Torokhov; KT Liao; Rob Herring; Aaron Ma; Hans de Goede
> > > Cc: open list:HID CORE LAYER; lkml; devicet...@vger.kernel.org
> > > Subject: Re: [PATCH v2 08/10] Input: elan_i2c - export true width/height
> > >
> > > On Tue, May 21, 2019 at 3:28 PM Benjamin Tissoires 
> > >  wrote:
> > > >
> > > > The width/height is actually in the same unit than X and Y. So we
> > > > should not tamper the data, but just set the proper resolution, so
> > > > that userspace can correctly detect which touch is a palm or a finger.
> > > >
> > > > Signed-off-by: Benjamin Tissoires 
> > > >
> > > > --
> > > >
> > > > new in v2
> > > > ---
> > > >  drivers/input/mouse/elan_i2c_core.c | 11 ---
> > > >  1 file changed, 4 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/input/mouse/elan_i2c_core.c
> > > > b/drivers/input/mouse/elan_i2c_core.c
> > > > index 7ff044c6cd11..6f4feedb7765 100644
> > > > --- a/drivers/input/mouse/elan_i2c_core.c
> > > > +++ b/drivers/input/mouse/elan_i2c_core.c
> > > > @@ -45,7 +45,6 @@
> > > >  #define DRIVER_NAME"elan_i2c"
> > > >  #define ELAN_VENDOR_ID 0x04f3
> > > >  #define ETP_MAX_PRESSURE   255
> > > > -#define ETP_FWIDTH_REDUCE  90
> > > >  #define ETP_FINGER_WIDTH   15
> > > >  #define ETP_RETRY_COUNT3
> > > >
> > > > @@ -915,12 +914,8 @@ static void elan_report_contact(struct 
> > > > elan_tp_data *data,
> > > > return;
> > > > }
> > > >
> > > > -   /*
> > > > -* To avoid treating large finger as palm, let's reduce 
> > > > the
> > > > -* width x and y per trace.
> > > > -*/
> > > > -   area_x = mk_x * (data->width_x - ETP_FWIDTH_REDUCE);
> > > > -   area_y = mk_y * (data->width_y - ETP_FWIDTH_REDUCE);
> > > > +   area_x = mk_x * data->width_x;
> > > > +   area_y = mk_y * data->width_y;
> > > >
> > > > major = max(area_x, area_y);
> > > > minor = min(area_x, area_y); @@ -1123,8 +1118,10 @@
> > > > static int elan_setup_input_device(struct elan_tp_data *data)
> > > >  ETP_MAX_PRESSURE, 0, 0);
> > > > input_set_abs_params(input, ABS_MT_TOUCH_MAJOR, 0,
> > > >  ETP_FINGER_WIDTH * max_width, 0, 0);
> > > > +   input_abs_set_res(input, ABS_MT_TOUCH_MAJOR, data->x_res);
> > > > input_set_abs_params(input, ABS_MT_TOUCH_MINOR, 0,
> > > >  ETP_FINGER_WIDTH * min_width, 0, 0);
> > > > +   input_abs_set_res(input, ABS_MT_TOUCH_MINOR, data->y_res);
> > >
> > > I had a chat with Peter on Wednesday, and he mentioned that this is 
> > > dangerous as Major/Minor are max/min of the width and height. And given 
> > > that we might have 2 different resolutions, we would need to do some 
> > > computation in the kernel to ensure the data is correct with respect to 
> > > the resolution.
> > >
> > > TL;DR: I don't think we should export the resolution there :(
> > >
> > > KT, should I drop the patch entirely, or is there a strong argument for 
> > > keeping the ETP_FWIDTH_REDUCE around?
> > > I suggest you apply the patch, I have no idea why ETP_FWIDTH_REDUCE 
> > > existed.
> > > Our FW team know nothing about ETP_FWIDTH_REDUCE ether.
> > >
> > > The only side effect will happen on Chromebook because such computation 
> > > have stayed in ChromeOS' kernel for four years.
> > > Chrome's finger/palm threshold may be different from other Linux 
> > > distribution.
> > > We will discuss it with Google once the patch picked by chrome and cause 
> > > something wrong.
> >
> > Chrome has logic that contact with maximum major/minor is treated as a
> > palm, so here the driver (which originally came from Chrome OS)
> > artificially reduces the contact size to ensure that palm rejection
> > logic does not trigger.
> >
> > I'm adding Harry to confirm whether we are still using this logic and to
> > see if we can adjust it to be something else.
>
> I'm not very familiar with our touchpad code, so adding Sean O'Brien, who is.

Re: [PATCH] staging: rtl8723bs: Add missing blank lines

2019-05-28 Thread Fabio Lima

Em qua, 22 de mai de 2019 06:41, Dan Carpenter
 escreveu:
>
> On Tue, May 21, 2019 at 09:46:55PM -0300, Fabio Lima wrote:
> > This patch resolves the following warning from checkpatch.pl
> > WARNING: Missing a blank line after declarations
> >
> > Signed-off-by: Fabio Lima 
> > ---
> >  drivers/staging/rtl8723bs/core/rtw_debug.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/staging/rtl8723bs/core/rtw_debug.c 
> > b/drivers/staging/rtl8723bs/core/rtw_debug.c
> > index 9f8446ccf..853362381 100644
> > --- a/drivers/staging/rtl8723bs/core/rtw_debug.c
> > +++ b/drivers/staging/rtl8723bs/core/rtw_debug.c
> > @@ -382,6 +382,7 @@ ssize_t proc_set_roam_tgt_addr(struct file *file, const 
> > char __user *buffer, siz
> >   if (buffer && !copy_from_user(tmp, buffer, sizeof(tmp))) {
> >
> >   int num = sscanf(tmp, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx", addr, 
> > addr+1, addr+2, addr+3, addr+4, addr+5);
> > +
> >   if (num == 6)
> >   memcpy(adapter->mlmepriv.roam_tgt_addr, addr, 
> > ETH_ALEN);
> >
>
> I'm sorry but this function is really such nonsense.  Can you send a
> patch to re-write it instead?
>
> drivers/staging/rtl8723bs/core/rtw_debug.c
>371  ssize_t proc_set_roam_tgt_addr(struct file *file, const char __user 
> *buffer, size_t count, loff_t *pos, void *data)
>372  {
>373  struct net_device *dev = data;
>374  struct adapter *adapter = (struct adapter 
> *)rtw_netdev_priv(dev);
>375
>376  char tmp[32];
>377  u8 addr[ETH_ALEN];
>378
>379  if (count < 1)
>
> This check is silly.  I guess the safest thing is to change it to:
> if (count < sizeof(tmp))
>
>380  return -EFAULT;
>
> It should be return -EINVAL;
>
>381
>382  if (buffer && !copy_from_user(tmp, buffer, sizeof(tmp))) {
>
> Remove the check for if the user passes a NULL buffer, because that's
> already handled in copy_from_user().  Return -EFAULT if copy_from_user()
> fails.
>
> if (copy_from_user(tmp, buffer, sizeof(tmp)))
> return -EFAULT;
>
>
>383
>
> Extra blank line.
>
>384  int num = sscanf(tmp, 
> "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx", addr, addr+1, addr+2, addr+3, addr+4, 
> addr+5);
>
> You will need to move the num declaration to the start of the function.
>
>385  if (num == 6)
>386  memcpy(adapter->mlmepriv.roam_tgt_addr, addr, 
> ETH_ALEN);
>
> If num != 6 then return -EINVAL;
>
>387
>388  DBG_871X("set roam_tgt_addr to "MAC_FMT"\n", 
> MAC_ARG(adapter->mlmepriv.roam_tgt_addr));
>389  }
>390
>391  return count;
>392  }
>
> regards,
> dan carpenter

Thanks for your feedback.
This is my first patch and I will send the second patch with
modifications that you suggest.

Fabio Lima

Re: [PATCH v5 0/2] Fix issues with vmalloc flush flag

2019-05-28 Thread David Miller

From: Rick Edgecombe 
Date: Mon, 27 May 2019 14:10:56 -0700

> These two patches address issues with the recently added
> VM_FLUSH_RESET_PERMS vmalloc flag.
> 
> Patch 1 addresses an issue that could cause a crash after other
> architectures besides x86 rely on this path.
> 
> Patch 2 addresses an issue where in a rare case strange arguments
> could be provided to flush_tlb_kernel_range(). 

It just occurred to me another situation that would cause trouble on
sparc64, and that's if someone the address range of the main kernel
image ended up being passed to flush_tlb_kernel_range().

That would flush the locked kernel mapping and crash the kernel
instantly in a completely non-recoverable way.

Re: [PATCH net-next] hinic: fix a bug in set rx mode

2019-05-28 Thread David Miller

From: Xue Chaojing 
Date: Mon, 27 May 2019 22:10:05 +

> in set_rx_mode, __dev_mc_sync and netdev_for_each_mc_addr will
> repeatedly set the multicast mac address. so we delete this loop.
> 
> Signed-off-by: Xue Chaojing 

Applied.

Re: [PATCH net] Documentation: net-sysfs: Remove duplicate PHY device documentation

2019-05-28 Thread David Miller

From: Florian Fainelli 
Date: Mon, 27 May 2019 19:06:38 -0700

> Both sysfs-bus-mdio and sysfs-class-net-phydev contain the same
> duplication information. There is not currently any MDIO bus specific
> attribute, but there are PHY device (struct phy_device) specific
> attributes. Use the more precise description from sysfs-bus-mdio and
> carry that over to sysfs-class-net-phydev.
> 
> Fixes: 86f22d04dfb5 ("net: sysfs: Document PHY device sysfs attributes")
> Signed-off-by: Florian Fainelli 

Applied, thanks.

Re: [PATCH net-next 00/12] code optimizations & bugfixes for HNS3 driver

2019-05-28 Thread David Miller

From: Huazhong Tan 
Date: Tue, 28 May 2019 17:02:50 +0800

> This patch-set includes code optimizations and bugfixes for the HNS3
> ethernet controller driver.
> 
> [patch 1/12] fixes a compile warning reported by kbuild test robot.
> 
> [patch 2/12] fixes HNS3_RXD_GRO_SIZE_M macro definition error.
> 
> [patch 3/12] adds a debugfs command to dump firmware information.
> 
> [patch 4/12 - 10/12] adds some code optimizaions and cleanups for
> reset and driver unloading.
> 
> [patch 11/12 - 12/12] adds two bugfixes.

Series applied, thanks.

[PATCH v5 0/3] Qualcomm QCS404 PCIe support

2019-05-28 Thread Bjorn Andersson

This series adds support for the PCIe controller in the Qualcomm QCS404
platform.

Bjorn Andersson (3):
  PCI: qcom: Use clk_bulk API for 2.4.0 controllers
  dt-bindings: PCI: qcom: Add QCS404 to the binding
  PCI: qcom: Add QCS404 PCIe controller support

 .../devicetree/bindings/pci/qcom,pcie.txt |  25 +++-
 drivers/pci/controller/dwc/pcie-qcom.c| 113 --
 2 files changed, 75 insertions(+), 63 deletions(-)

-- 
2.18.0

Re: [v7 PATCH 2/2] mm: vmscan: correct some vmscan counters for THP swapout

2019-05-28 Thread Huang, Ying

Yang Shi  writes:

> Since commit bd4c82c22c36 ("mm, THP, swap: delay splitting THP after
> swapped out"), THP can be swapped out in a whole.  But, nr_reclaimed
> and some other vm counters still get inc'ed by one even though a whole
> THP (512 pages) gets swapped out.
>
> This doesn't make too much sense to memory reclaim.  For example, direct
> reclaim may just need reclaim SWAP_CLUSTER_MAX pages, reclaiming one THP
> could fulfill it.  But, if nr_reclaimed is not increased correctly,
> direct reclaim may just waste time to reclaim more pages,
> SWAP_CLUSTER_MAX * 512 pages in worst case.
>
> And, it may cause pgsteal_{kswapd|direct} is greater than
> pgscan_{kswapd|direct}, like the below:
>
> pgsteal_kswapd 122933
> pgsteal_direct 26600225
> pgscan_kswapd 174153
> pgscan_direct 14678312
>
> nr_reclaimed and nr_scanned must be fixed in parallel otherwise it would
> break some page reclaim logic, e.g.
>
> vmpressure: this looks at the scanned/reclaimed ratio so it won't
> change semantics as long as scanned & reclaimed are fixed in parallel.
>
> compaction/reclaim: compaction wants a certain number of physical pages
> freed up before going back to compacting.
>
> kswapd priority raising: kswapd raises priority if we scan fewer pages
> than the reclaim target (which itself is obviously expressed in order-0
> pages). As a result, kswapd can falsely raise its aggressiveness even
> when it's making great progress.
>
> Other than nr_scanned and nr_reclaimed, some other counters, e.g.
> pgactivate, nr_skipped, nr_ref_keep and nr_unmap_fail need to be fixed
> too since they are user visible via cgroup, /proc/vmstat or trace
> points, otherwise they would be underreported.
>
> When isolating pages from LRUs, nr_taken has been accounted in base
> page, but nr_scanned and nr_skipped are still accounted in THP.  It
> doesn't make too much sense too since this may cause trace point
> underreport the numbers as well.
>
> So accounting those counters in base page instead of accounting THP as
> one page.
>
> nr_dirty, nr_unqueued_dirty, nr_congested and nr_writeback are used by
> file cache, so they are not impacted by THP swap.
>
> This change may result in lower steal/scan ratio in some cases since
> THP may get split during page reclaim, then a part of tail pages get
> reclaimed instead of the whole 512 pages, but nr_scanned is accounted
> by 512, particularly for direct reclaim.  But, this should be not a
> significant issue.
>
> Cc: "Huang, Ying" 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Mel Gorman 
> Cc: "Kirill A . Shutemov" 
> Cc: Hugh Dickins 
> Cc: Shakeel Butt 
> Cc: Hillf Danton 
> Signed-off-by: Yang Shi 

Looks good to me!  Thanks for your effort!

Reviewed-by: "Huang, Ying" 

Best Regards,
Huang, Ying

[PATCH v5 1/3] PCI: qcom: Use clk_bulk API for 2.4.0 controllers

2019-05-28 Thread Bjorn Andersson

Before introducing the QCS404 platform, which uses the same PCIe
controller as IPQ4019, migrate this to use the bulk clock API, in order
to make the error paths slighly cleaner.

Acked-by: Stanimir Varbanov 
Reviewed-by: Niklas Cassel 
Reviewed-by: Vinod Koul 
Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- Renamed "err_clks" label
- Picked up Vinod's r-b and Stanimir's a-b

 drivers/pci/controller/dwc/pcie-qcom.c | 53 --
 1 file changed, 16 insertions(+), 37 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-qcom.c 
b/drivers/pci/controller/dwc/pcie-qcom.c
index 0ed235d560e3..23dc01212508 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -112,10 +112,10 @@ struct qcom_pcie_resources_2_3_2 {
struct regulator_bulk_data supplies[QCOM_PCIE_2_3_2_MAX_SUPPLY];
 };
 
+#define QCOM_PCIE_2_4_0_MAX_CLOCKS 3
 struct qcom_pcie_resources_2_4_0 {
-   struct clk *aux_clk;
-   struct clk *master_clk;
-   struct clk *slave_clk;
+   struct clk_bulk_data clks[QCOM_PCIE_2_4_0_MAX_CLOCKS];
+   int num_clks;
struct reset_control *axi_m_reset;
struct reset_control *axi_s_reset;
struct reset_control *pipe_reset;
@@ -638,18 +638,17 @@ static int qcom_pcie_get_resources_2_4_0(struct qcom_pcie 
*pcie)
struct qcom_pcie_resources_2_4_0 *res = >res.v2_4_0;
struct dw_pcie *pci = pcie->pci;
struct device *dev = pci->dev;
+   int ret;
 
-   res->aux_clk = devm_clk_get(dev, "aux");
-   if (IS_ERR(res->aux_clk))
-   return PTR_ERR(res->aux_clk);
+   res->clks[0].id = "aux";
+   res->clks[1].id = "master_bus";
+   res->clks[2].id = "slave_bus";
 
-   res->master_clk = devm_clk_get(dev, "master_bus");
-   if (IS_ERR(res->master_clk))
-   return PTR_ERR(res->master_clk);
+   res->num_clks = 3;
 
-   res->slave_clk = devm_clk_get(dev, "slave_bus");
-   if (IS_ERR(res->slave_clk))
-   return PTR_ERR(res->slave_clk);
+   ret = devm_clk_bulk_get(dev, res->num_clks, res->clks);
+   if (ret < 0)
+   return ret;
 
res->axi_m_reset = devm_reset_control_get_exclusive(dev, "axi_m");
if (IS_ERR(res->axi_m_reset))
@@ -719,9 +718,7 @@ static void qcom_pcie_deinit_2_4_0(struct qcom_pcie *pcie)
reset_control_assert(res->axi_m_sticky_reset);
reset_control_assert(res->pwr_reset);
reset_control_assert(res->ahb_reset);
-   clk_disable_unprepare(res->aux_clk);
-   clk_disable_unprepare(res->master_clk);
-   clk_disable_unprepare(res->slave_clk);
+   clk_bulk_disable_unprepare(res->num_clks, res->clks);
 }
 
 static int qcom_pcie_init_2_4_0(struct qcom_pcie *pcie)
@@ -850,23 +847,9 @@ static int qcom_pcie_init_2_4_0(struct qcom_pcie *pcie)
 
usleep_range(1, 12000);
 
-   ret = clk_prepare_enable(res->aux_clk);
-   if (ret) {
-   dev_err(dev, "cannot prepare/enable iface clock\n");
-   goto err_clk_aux;
-   }
-
-   ret = clk_prepare_enable(res->master_clk);
-   if (ret) {
-   dev_err(dev, "cannot prepare/enable core clock\n");
-   goto err_clk_axi_m;
-   }
-
-   ret = clk_prepare_enable(res->slave_clk);
-   if (ret) {
-   dev_err(dev, "cannot prepare/enable phy clock\n");
-   goto err_clk_axi_s;
-   }
+   ret = clk_bulk_prepare_enable(res->num_clks, res->clks);
+   if (ret)
+   goto err_clks;
 
/* enable PCIe clocks and resets */
val = readl(pcie->parf + PCIE20_PARF_PHY_CTRL);
@@ -891,11 +874,7 @@ static int qcom_pcie_init_2_4_0(struct qcom_pcie *pcie)
 
return 0;
 
-err_clk_axi_s:
-   clk_disable_unprepare(res->master_clk);
-err_clk_axi_m:
-   clk_disable_unprepare(res->aux_clk);
-err_clk_aux:
+err_clks:
reset_control_assert(res->ahb_reset);
 err_rst_ahb:
reset_control_assert(res->pwr_reset);
-- 
2.18.0

[PATCH v5 3/3] PCI: qcom: Add QCS404 PCIe controller support

2019-05-28 Thread Bjorn Andersson

The QCS404 platform contains a PCIe controller of version 2.4.0 and a
Qualcomm PCIe2 PHY. The driver already supports version 2.4.0, for the
IPQ4019, but this support touches clocks and resets related to the PHY
as well, and there's no upstream driver for the PHY.

On QCS404 we must initialize the PHY, so a separate PHY driver is
implemented to take care of this and the controller driver is updated to
not require the PHY related resources. This is done by relying on the
fact that operations in both the clock and reset framework are nops when
passed NULL, so we can isolate this change to only the get_resource
function.

For QCS404 we also need to enable the AHB (iface) clock, in order to
access the register space of the controller, but as this is not part of
the IPQ4019 DT binding this is only added for new users of the 2.4.0
controller.

Acked-by: Stanimir Varbanov 
Reviewed-by: Niklas Cassel 
Reviewed-by: Vinod Koul 
Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- Picked up Vinod's r-b and Stanimir's a-b

 drivers/pci/controller/dwc/pcie-qcom.c | 64 +++---
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-qcom.c 
b/drivers/pci/controller/dwc/pcie-qcom.c
index 23dc01212508..da5dd3639a49 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -112,7 +112,7 @@ struct qcom_pcie_resources_2_3_2 {
struct regulator_bulk_data supplies[QCOM_PCIE_2_3_2_MAX_SUPPLY];
 };
 
-#define QCOM_PCIE_2_4_0_MAX_CLOCKS 3
+#define QCOM_PCIE_2_4_0_MAX_CLOCKS 4
 struct qcom_pcie_resources_2_4_0 {
struct clk_bulk_data clks[QCOM_PCIE_2_4_0_MAX_CLOCKS];
int num_clks;
@@ -638,13 +638,16 @@ static int qcom_pcie_get_resources_2_4_0(struct qcom_pcie 
*pcie)
struct qcom_pcie_resources_2_4_0 *res = >res.v2_4_0;
struct dw_pcie *pci = pcie->pci;
struct device *dev = pci->dev;
+   bool is_ipq = of_device_is_compatible(dev->of_node, 
"qcom,pcie-ipq4019");
int ret;
 
res->clks[0].id = "aux";
res->clks[1].id = "master_bus";
res->clks[2].id = "slave_bus";
+   res->clks[3].id = "iface";
 
-   res->num_clks = 3;
+   /* qcom,pcie-ipq4019 is defined without "iface" */
+   res->num_clks = is_ipq ? 3 : 4;
 
ret = devm_clk_bulk_get(dev, res->num_clks, res->clks);
if (ret < 0)
@@ -658,27 +661,33 @@ static int qcom_pcie_get_resources_2_4_0(struct qcom_pcie 
*pcie)
if (IS_ERR(res->axi_s_reset))
return PTR_ERR(res->axi_s_reset);
 
-   res->pipe_reset = devm_reset_control_get_exclusive(dev, "pipe");
-   if (IS_ERR(res->pipe_reset))
-   return PTR_ERR(res->pipe_reset);
-
-   res->axi_m_vmid_reset = devm_reset_control_get_exclusive(dev,
-"axi_m_vmid");
-   if (IS_ERR(res->axi_m_vmid_reset))
-   return PTR_ERR(res->axi_m_vmid_reset);
-
-   res->axi_s_xpu_reset = devm_reset_control_get_exclusive(dev,
-   "axi_s_xpu");
-   if (IS_ERR(res->axi_s_xpu_reset))
-   return PTR_ERR(res->axi_s_xpu_reset);
-
-   res->parf_reset = devm_reset_control_get_exclusive(dev, "parf");
-   if (IS_ERR(res->parf_reset))
-   return PTR_ERR(res->parf_reset);
-
-   res->phy_reset = devm_reset_control_get_exclusive(dev, "phy");
-   if (IS_ERR(res->phy_reset))
-   return PTR_ERR(res->phy_reset);
+   if (is_ipq) {
+   /*
+* These resources relates to the PHY or are secure clocks, but
+* are controlled here for IPQ4019
+*/
+   res->pipe_reset = devm_reset_control_get_exclusive(dev, "pipe");
+   if (IS_ERR(res->pipe_reset))
+   return PTR_ERR(res->pipe_reset);
+
+   res->axi_m_vmid_reset = devm_reset_control_get_exclusive(dev,
+
"axi_m_vmid");
+   if (IS_ERR(res->axi_m_vmid_reset))
+   return PTR_ERR(res->axi_m_vmid_reset);
+
+   res->axi_s_xpu_reset = devm_reset_control_get_exclusive(dev,
+   
"axi_s_xpu");
+   if (IS_ERR(res->axi_s_xpu_reset))
+   return PTR_ERR(res->axi_s_xpu_reset);
+
+   res->parf_reset = devm_reset_control_get_exclusive(dev, "parf");
+   if (IS_ERR(res->parf_reset))
+   return PTR_ERR(res->parf_reset);
+
+   res->phy_reset = devm_reset_control_get_exclusive(dev, "phy");
+   if (IS_ERR(res->phy_reset))
+   return PTR_ERR(res->phy_reset);
+   }
 
res->axi_m_sticky_reset = devm_reset_control_get_exclusive(dev,

[PATCH v5 2/3] dt-bindings: PCI: qcom: Add QCS404 to the binding

2019-05-28 Thread Bjorn Andersson

The Qualcomm QCS404 platform contains a PCIe controller, add this to the
Qualcomm PCI binding document. The controller is the same version as the
one used in IPQ4019, but the PHY part is described separately, hence the
difference in clocks and resets.

Reviewed-by: Rob Herring 
Reviewed-by: Vinod Koul 
Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- Picked up Vinod's r-b

 .../devicetree/bindings/pci/qcom,pcie.txt | 25 +--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/pci/qcom,pcie.txt 
b/Documentation/devicetree/bindings/pci/qcom,pcie.txt
index 1fd703bd73e0..ada80b01bf0c 100644
--- a/Documentation/devicetree/bindings/pci/qcom,pcie.txt
+++ b/Documentation/devicetree/bindings/pci/qcom,pcie.txt
@@ -10,6 +10,7 @@
- "qcom,pcie-msm8996" for msm8996 or apq8096
- "qcom,pcie-ipq4019" for ipq4019
- "qcom,pcie-ipq8074" for ipq8074
+   - "qcom,pcie-qcs404" for qcs404
 
 - reg:
Usage: required
@@ -116,6 +117,15 @@
- "ahb" AHB clock
- "aux" Auxiliary clock
 
+- clock-names:
+   Usage: required for qcs404
+   Value type: 
+   Definition: Should contain the following entries
+   - "iface"   AHB clock
+   - "aux" Auxiliary clock
+   - "master_bus"  AXI Master clock
+   - "slave_bus"   AXI Slave clock
+
 - resets:
Usage: required
Value type: 
@@ -167,6 +177,17 @@
- "ahb" AHB Reset
- "axi_m_sticky"AXI Master Sticky reset
 
+- reset-names:
+   Usage: required for qcs404
+   Value type: 
+   Definition: Should contain the following entries
+   - "axi_m"   AXI Master reset
+   - "axi_s"   AXI Slave reset
+   - "axi_m_sticky"AXI Master Sticky reset
+   - "pipe_sticky" PIPE sticky reset
+   - "pwr" PWR reset
+   - "ahb" AHB reset
+
 - power-domains:
Usage: required for apq8084 and msm8996/apq8096
Value type: 
@@ -195,12 +216,12 @@
Definition: A phandle to the PCIe endpoint power supply
 
 - phys:
-   Usage: required for apq8084
+   Usage: required for apq8084 and qcs404
Value type: 
Definition: List of phandle(s) as listed in phy-names property
 
 - phy-names:
-   Usage: required for apq8084
+   Usage: required for apq8084 and qcs404
Value type: 
Definition: Should contain "pciephy"
 
-- 
2.18.0

Re: [PATCH v3 1/3] PCI: qcom: Use clk_bulk API for 2.4.0 controllers

2019-05-28 Thread Bjorn Andersson

On Thu 16 May 02:14 PDT 2019, Stanimir Varbanov wrote:

> Hi Bjorn,
> 
> On 5/2/19 3:19 AM, Bjorn Andersson wrote:
> > Before introducing the QCS404 platform, which uses the same PCIe
> > controller as IPQ4019, migrate this to use the bulk clock API, in order
> > to make the error paths slighly cleaner.
> > 
> > Acked-by: Stanimir Varbanov 
> > Reviewed-by: Niklas Cassel 
> > Signed-off-by: Bjorn Andersson 
> > ---
> > 
> > Changes since v2:
> > - Defined QCOM_PCIE_2_4_0_MAX_CLOCKS
> > 
> >  drivers/pci/controller/dwc/pcie-qcom.c | 49 --
> >  1 file changed, 14 insertions(+), 35 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pcie-qcom.c 
> > b/drivers/pci/controller/dwc/pcie-qcom.c
> > index 0ed235d560e3..d740cbe0e56d 100644
> > --- a/drivers/pci/controller/dwc/pcie-qcom.c
> > +++ b/drivers/pci/controller/dwc/pcie-qcom.c
> > @@ -112,10 +112,10 @@ struct qcom_pcie_resources_2_3_2 {
> > struct regulator_bulk_data supplies[QCOM_PCIE_2_3_2_MAX_SUPPLY];
> >  };
> >  
> > +#define QCOM_PCIE_2_4_0_MAX_CLOCKS 3
> >  struct qcom_pcie_resources_2_4_0 {
> > -   struct clk *aux_clk;
> > -   struct clk *master_clk;
> > -   struct clk *slave_clk;
> > +   struct clk_bulk_data clks[QCOM_PCIE_2_4_0_MAX_CLOCKS];
> > +   int num_clks;
> > struct reset_control *axi_m_reset;
> > struct reset_control *axi_s_reset;
> > struct reset_control *pipe_reset;
> > @@ -638,18 +638,17 @@ static int qcom_pcie_get_resources_2_4_0(struct 
> > qcom_pcie *pcie)
> > struct qcom_pcie_resources_2_4_0 *res = >res.v2_4_0;
> > struct dw_pcie *pci = pcie->pci;
> > struct device *dev = pci->dev;
> > +   int ret;
> >  
> > -   res->aux_clk = devm_clk_get(dev, "aux");
> > -   if (IS_ERR(res->aux_clk))
> > -   return PTR_ERR(res->aux_clk);
> > +   res->clks[0].id = "aux";
> > +   res->clks[1].id = "master_bus";
> > +   res->clks[2].id = "slave_bus";
> >  
> > -   res->master_clk = devm_clk_get(dev, "master_bus");
> > -   if (IS_ERR(res->master_clk))
> > -   return PTR_ERR(res->master_clk);
> > +   res->num_clks = 3;
> 
> Use the new fresh define QCOM_PCIE_2_4_0_MAX_CLOCKS?
> 

As I replace it in patch 3/3 with a value different from "max clocks", I
don't think it makes sense to use the define here. So I'm leaving this
as is.

> >  
> > -   res->slave_clk = devm_clk_get(dev, "slave_bus");
> > -   if (IS_ERR(res->slave_clk))
> > -   return PTR_ERR(res->slave_clk);
> > +   ret = devm_clk_bulk_get(dev, res->num_clks, res->clks);
> > +   if (ret < 0)
> > +   return ret;
> >  
> > res->axi_m_reset = devm_reset_control_get_exclusive(dev, "axi_m");
> > if (IS_ERR(res->axi_m_reset))
> > @@ -719,9 +718,7 @@ static void qcom_pcie_deinit_2_4_0(struct qcom_pcie 
> > *pcie)
> > reset_control_assert(res->axi_m_sticky_reset);
> > reset_control_assert(res->pwr_reset);
> > reset_control_assert(res->ahb_reset);
> > -   clk_disable_unprepare(res->aux_clk);
> > -   clk_disable_unprepare(res->master_clk);
> > -   clk_disable_unprepare(res->slave_clk);
> > +   clk_bulk_disable_unprepare(res->num_clks, res->clks);
> >  }
> >  
> >  static int qcom_pcie_init_2_4_0(struct qcom_pcie *pcie)
> > @@ -850,23 +847,9 @@ static int qcom_pcie_init_2_4_0(struct qcom_pcie *pcie)
> >  
> > usleep_range(1, 12000);
> >  
> > -   ret = clk_prepare_enable(res->aux_clk);
> > -   if (ret) {
> > -   dev_err(dev, "cannot prepare/enable iface clock\n");
> > +   ret = clk_bulk_prepare_enable(res->num_clks, res->clks);
> > +   if (ret)
> > goto err_clk_aux;
> 
> Maybe you have to change the name of the label too?
> 

Updated this and posted v5. Should be good to be merged now.

Thanks for your reviews!

Regards,
Bjorn

Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket

2019-05-28 Thread Jason Wang




On 2019/5/29 上午12:45, Stefano Garzarella wrote:

On Wed, May 15, 2019 at 10:48:44AM +0800, Jason Wang wrote:

On 2019/5/15 上午12:35, Stefano Garzarella wrote:

On Tue, May 14, 2019 at 11:25:34AM +0800, Jason Wang wrote:

On 2019/5/14 上午1:23, Stefano Garzarella wrote:

On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:

On 2019/5/10 下午8:58, Stefano Garzarella wrote:

+static struct virtio_vsock_buf *
+virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
+{
+   struct virtio_vsock_buf *buf;
+
+   if (pkt->len == 0)
+   return NULL;
+
+   buf = kzalloc(sizeof(*buf), GFP_KERNEL);
+   if (!buf)
+   return NULL;
+
+   /* If the buffer in the virtio_vsock_pkt is full, we can move it to
+* the new virtio_vsock_buf avoiding the copy, because we are sure that
+* we are not use more memory than that counted by the credit mechanism.
+*/
+   if (zero_copy && pkt->len == pkt->buf_len) {
+   buf->addr = pkt->buf;
+   pkt->buf = NULL;
+   } else {

Is the copy still needed if we're just few bytes less? We meet similar issue
for virito-net, and virtio-net solve this by always copy first 128bytes for
big packets.

See receive_big()

I'm seeing, It is more sophisticated.
IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
first 128 bytes, then adds the buffer used to receive the packet as a frag to
the skb.

Yes and the point is if the packet is smaller than 128 bytes the pages will
be recycled.



So it's avoid the overhead of allocation of a large buffer. I got it.

Just a curiosity, why the threshold is 128 bytes?


 From its name (GOOD_COPY_LEN), I think it just a value that won't lose much
performance, e.g the size two cachelines.


Jason, Stefan,
since I'm removing the patches to increase the buffers to 64 KiB and I'm
adding a threshold for small packets, I would simplify this patch,
removing the new buffer allocation and copying small packets into the
buffers already queued (if there is a space).
In this way, I should solve the issue of 1 byte packets.

Do you think could be better?



I think so.

Thanks




Thanks,
Stefano

Re: [PATCH v3 0/2] Qualcomm PCIe2 PHY

2019-05-28 Thread Bjorn Andersson

On Wed 01 May 17:14 PDT 2019, Bjorn Andersson wrote:

> The Qualcomm PCIe2 PHY is based on design from Synopsys and found in
> several different platforms where the QMP PHY isn't used.
> 

Kishon, any feedback on this or would you be willing to pick it up?

Regards,
Bjorn

> Bjorn Andersson (2):
>   dt-bindings: phy: Add binding for Qualcomm PCIe2 PHY
>   phy: qcom: Add Qualcomm PCIe2 PHY driver
> 
>  .../bindings/phy/qcom-pcie2-phy.txt   |  42 +++
>  drivers/phy/qualcomm/Kconfig  |   8 +
>  drivers/phy/qualcomm/Makefile |   1 +
>  drivers/phy/qualcomm/phy-qcom-pcie2.c | 331 ++
>  4 files changed, 382 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/phy/qcom-pcie2-phy.txt
>  create mode 100644 drivers/phy/qualcomm/phy-qcom-pcie2.c
> 
> -- 
> 2.18.0
>

Re: [PATCH v2] qcom: apr: Make apr callbacks in non-atomic context

2019-05-28 Thread Bjorn Andersson

On Fri 08 Feb 09:55 PST 2019, Srinivas Kandagatla wrote:

> APR communication with DSP is not atomic in nature.
> Its request-response type. Trying to pretend that these are atomic
> and invoking apr client callbacks directly under atomic/irq context has
> endless issues with soundcard. It makes more sense to convert these
> to nonatomic calls. This also coverts all the dais to be nonatomic.
> 
> All the callbacks are now invoked as part of rx work queue.
> 
> Signed-off-by: Srinivas Kandagatla 
> Reviewed-by: Bjorn Andersson 

Picked up

Thanks,
Bjorn

> ---
> Changes since v1:
>  - flush and destroy work queue after removing the device
>to avoid active communication from device. suggested by Bjorn.
> 
>  drivers/soc/qcom/apr.c | 74 +++---
>  1 file changed, 69 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/soc/qcom/apr.c b/drivers/soc/qcom/apr.c
> index 74f8b9607daa..039e3aa6f5e0 100644
> --- a/drivers/soc/qcom/apr.c
> +++ b/drivers/soc/qcom/apr.c
> @@ -8,6 +8,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -17,8 +18,18 @@ struct apr {
>   struct rpmsg_endpoint *ch;
>   struct device *dev;
>   spinlock_t svcs_lock;
> + spinlock_t rx_lock;
>   struct idr svcs_idr;
>   int dest_domain_id;
> + struct workqueue_struct *rxwq;
> + struct work_struct rx_work;
> + struct list_head rx_list;
> +};
> +
> +struct apr_rx_buf {
> + struct list_head node;
> + int len;
> + uint8_t buf[];
>  };
>  
>  /**
> @@ -62,11 +73,7 @@ static int apr_callback(struct rpmsg_device *rpdev, void 
> *buf,
> int len, void *priv, u32 addr)
>  {
>   struct apr *apr = dev_get_drvdata(>dev);
> - uint16_t hdr_size, msg_type, ver, svc_id;
> - struct apr_device *svc = NULL;
> - struct apr_driver *adrv = NULL;
> - struct apr_resp_pkt resp;
> - struct apr_hdr *hdr;
> + struct apr_rx_buf *abuf;
>   unsigned long flags;
>  
>   if (len <= APR_HDR_SIZE) {
> @@ -75,6 +82,34 @@ static int apr_callback(struct rpmsg_device *rpdev, void 
> *buf,
>   return -EINVAL;
>   }
>  
> + abuf = kzalloc(sizeof(*abuf) + len, GFP_ATOMIC);
> + if (!abuf)
> + return -ENOMEM;
> +
> + abuf->len = len;
> + memcpy(abuf->buf, buf, len);
> +
> + spin_lock_irqsave(>rx_lock, flags);
> + list_add_tail(>node, >rx_list);
> + spin_unlock_irqrestore(>rx_lock, flags);
> +
> + queue_work(apr->rxwq, >rx_work);
> +
> + return 0;
> +}
> +
> +
> +static int apr_do_rx_callback(struct apr *apr, struct apr_rx_buf *abuf)
> +{
> + uint16_t hdr_size, msg_type, ver, svc_id;
> + struct apr_device *svc = NULL;
> + struct apr_driver *adrv = NULL;
> + struct apr_resp_pkt resp;
> + struct apr_hdr *hdr;
> + unsigned long flags;
> + void *buf = abuf->buf;
> + int len = abuf->len;
> +
>   hdr = buf;
>   ver = APR_HDR_FIELD_VER(hdr->hdr_field);
>   if (ver > APR_PKT_VER + 1)
> @@ -132,6 +167,23 @@ static int apr_callback(struct rpmsg_device *rpdev, void 
> *buf,
>   return 0;
>  }
>  
> +static void apr_rxwq(struct work_struct *work)
> +{
> + struct apr *apr = container_of(work, struct apr, rx_work);
> + struct apr_rx_buf *abuf, *b;
> + unsigned long flags;
> +
> + if (!list_empty(>rx_list)) {
> + list_for_each_entry_safe(abuf, b, >rx_list, node) {
> + apr_do_rx_callback(apr, abuf);
> + spin_lock_irqsave(>rx_lock, flags);
> + list_del(>node);
> + spin_unlock_irqrestore(>rx_lock, flags);
> + kfree(abuf);
> + }
> + }
> +}
> +
>  static int apr_device_match(struct device *dev, struct device_driver *drv)
>  {
>   struct apr_device *adev = to_apr_device(dev);
> @@ -285,6 +337,14 @@ static int apr_probe(struct rpmsg_device *rpdev)
>   dev_set_drvdata(dev, apr);
>   apr->ch = rpdev->ept;
>   apr->dev = dev;
> + apr->rxwq = create_singlethread_workqueue("qcom_apr_rx");
> + if (!apr->rxwq) {
> + dev_err(apr->dev, "Failed to start Rx WQ\n");
> + return -ENOMEM;
> + }
> + INIT_WORK(>rx_work, apr_rxwq);
> + INIT_LIST_HEAD(>rx_list);
> + spin_lock_init(>rx_lock);
>   spin_lock_init(>svcs_lock);
>   idr_init(>svcs_idr);
>   of_register_apr_devices(dev);
> @@ -303,7 +363,11 @@ static int apr_remove_device(struct device *dev, void 
> *null)
>  
>  static void apr_remove(struct rpmsg_device *rpdev)
>  {
> + struct apr *apr = dev_get_drvdata(>dev);
> +
>   device_for_each_child(>dev, NULL, apr_remove_device);
> + flush_workqueue(apr->rxwq);
> + destroy_workqueue(apr->rxwq);
>  }
>  
>  /*
> -- 
> 2.20.1
>

[PATCH] signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO

2019-05-28 Thread Eric W. Biederman



Recently syzbot in conjunction with KMSAN reported that
ptrace_peek_siginfo can copy an uninitialized siginfo to userspace.
Inspecting ptrace_peek_siginfo confirms this.

The problem is that off when initialized from args.off can be
initialized to a negaive value.  At which point the "if (off >= 0)"
test to see if off became negative fails because off started off
negative.

Prevent the core problem by adding a variable found that is only true
if a siginfo is found and copied to a temporary in preparation for
being copied to userspace.

Prevent args.off from being truncated when being assigned to off by
testing that off is <= the maximum possible value of off.  Convert off
to an unsigned long so that we should not have to truncate args.off,
we have well defined overflow behavior so if we add another check we
won't risk fighting undefined compiler behavior, and so that we have a
type whose maximum value is easy to test for.

Cc: Andrei Vagin 
Cc: sta...@vger.kernel.org
Reported-by: syzbot+0d602a1b0d8c95bdf...@syzkaller.appspotmail.com
Fixes: 84c751bd4aeb ("ptrace: add ability to retrieve signals without removing 
from a queue (v4)")
Signed-off-by: "Eric W. Biederman" 
---

Comments?
Concerns?

Otherwise I will queue this up and send it to Linus.

 kernel/ptrace.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 6f357f4fc859..4c2b24a885d3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -704,6 +704,10 @@ static int ptrace_peek_siginfo(struct task_struct *child,
if (arg.nr < 0)
return -EINVAL;
 
+   /* Ensure arg.off fits in an unsigned */
+   if (arg.off > ULONG_MAX)
+   return 0;
+
if (arg.flags & PTRACE_PEEKSIGINFO_SHARED)
pending = >signal->shared_pending;
else
@@ -711,18 +715,20 @@ static int ptrace_peek_siginfo(struct task_struct *child,
 
for (i = 0; i < arg.nr; ) {
kernel_siginfo_t info;
-   s32 off = arg.off + i;
+   unsigned long off = arg.off + i;
+   bool found = false;
 
spin_lock_irq(>sighand->siglock);
list_for_each_entry(q, >list, list) {
if (!off--) {
+   found = true;
copy_siginfo(, >info);
break;
}
}
spin_unlock_irq(>sighand->siglock);
 
-   if (off >= 0) /* beyond the end of the list */
+   if (!found) /* beyond the end of the list */
break;
 
 #ifdef CONFIG_COMPAT
-- 
2.21.0.dirty

[UPSTREAM KERNEL] mm/zsmalloc.c: Add module parameter malloc_force_movable

2019-05-28 Thread Hui Zhu

zswap compresses swap pages into a dynamically allocated RAM-based
memory pool.  The memory pool should be zbud, z3fold or zsmalloc.
All of them will allocate unmovable pages.  It will increase the
number of unmovable page blocks that will bad for anti-fragment.

zsmalloc support page migration if request movable page:
handle = zs_malloc(zram->mem_pool, comp_len,
GFP_NOIO | __GFP_HIGHMEM |
__GFP_MOVABLE);

This commit adds module parameter malloc_force_movable to enable
or disable zs_malloc force allocate block with gfp
__GFP_HIGHMEM | __GFP_MOVABLE (disabled by default).

Following part is test log in a pc that has 8G memory and 2G swap.

When it disabled:
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 
1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4410183 usecs = 601836 KB/s
2717908992 bytes / 4524375 usecs = 586646 KB/s
2717908992 bytes / 4558583 usecs = 582244 KB/s
2717908992 bytes / 4824261 usecs = 550179 KB/s
348046 usecs to free memory
401680 usecs to free memory
369660 usecs to free memory
180867 usecs to free memory
/home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order   0  1  2  3  4  
5  6  7  8  9 10
Node0, zone  DMA, typeUnmovable  1  1  1  0  2  
1  1  0  1  0  0
Node0, zone  DMA, type  Movable  0  0  0  0  0  
0  0  0  0  1  3
Node0, zone  DMA, type  Reclaimable  0  0  0  0  0  
0  0  0  0  0  0
Node0, zone  DMA, type   HighAtomic  0  0  0  0  0  
0  0  0  0  0  0
Node0, zone  DMA, type  CMA  0  0  0  0  0  
0  0  0  0  0  0
Node0, zone  DMA, type  Isolate  0  0  0  0  0  
0  0  0  0  0  0
Node0, zoneDMA32, typeUnmovable 13 11 10 11 10  
6  7  3  1  0  0
Node0, zoneDMA32, type  Movable 36 26 39 40 37  
   36 24 29 14  6767
Node0, zoneDMA32, type  Reclaimable  0  0  0  0  0  
0  0  0  0  0  1
Node0, zoneDMA32, type   HighAtomic  0  0  0  0  0  
0  0  0  0  0  0
Node0, zoneDMA32, type  CMA  0  0  0  0  0  
0  0  0  0  0  0
Node0, zoneDMA32, type  Isolate  0  0  0  0  0  
0  0  0  0  0  0
Node0, zone   Normal, typeUnmovable   7744   7519   6900   5964   4583  
 2878   1346448146  1  0
Node0, zone   Normal, type  Movable645   1930   1685   1339   1020  
  670363210106310399
Node0, zone   Normal, type  Reclaimable 53 70116 48 13  
0  0  0  0  0  0
Node0, zone   Normal, type   HighAtomic  0  0  0  0  0  
0  0  0  0  0  0
Node0, zone   Normal, type  CMA  0  0  0  0  0  
0  0  0  0  0  0
Node0, zone   Normal, type  Isolate  0  0  0  0  0  
0  0  0  0  0  0

Number of blocks type Unmovable  Movable  Reclaimable   HighAtomic  
CMA  Isolate
Node 0, zone  DMA1700   
 00
Node 0, zoneDMA324 165020   
 00
Node 0, zone   Normal  947 1469   150   
 00

When it enabled:
~# echo 1 > /sys/module/zsmalloc/parameters/malloc_force_movable
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 
1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4779235 usecs = 555362 KB/s
2717908992 bytes / 4856673 usecs = 546507 KB/s
2717908992 bytes / 4920079 usecs = 539464 KB/s
2717908992 bytes / 4935505 usecs = 537778 KB/s
354839 usecs to free memory
368167 usecs to free memory
355460 usecs to free memory
385452 usecs to free memory
/home/teawater/kernel/vm-scalability# cat

Re: [PATCH] ARM: dts: aspeed: g4: add video engine support

2019-05-28 Thread Andrew Jeffery




On Mon, 27 May 2019, at 20:58, Alexander Filippov wrote:
> Add a node to describe the video engine and VGA scratch registers on
> AST2400.
> 
> These changes were copied from aspeed-g5.dtsi
> 
> Signed-off-by: Alexander Filippov 

Ugh, I should really sort out the bmc-misc stuff, I don't like to see it 
propagate
in its current form. That's not your problem though, and I hope to address it in
the near future.

For the OpenBMC kernel tree:

Acked-by: Andrew Jeffery 

> ---
>  arch/arm/boot/dts/aspeed-g4.dtsi | 62 
>  1 file changed, 62 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/aspeed-g4.dtsi 
> b/arch/arm/boot/dts/aspeed-g4.dtsi
> index 6011692df15a..adc1804918df 100644
> --- a/arch/arm/boot/dts/aspeed-g4.dtsi
> +++ b/arch/arm/boot/dts/aspeed-g4.dtsi
> @@ -168,6 +168,10 @@
>   compatible = "aspeed,g4-pinctrl";
>   };
>  
> + vga_scratch: scratch {
> + compatible = "aspeed,bmc-misc";
> + };
> +
>   p2a: p2a-control {
>   compatible = "aspeed,ast2400-p2a-ctrl";
>   status = "disabled";
> @@ -195,6 +199,16 @@
>   reg = <0x1e72 0x8000>;  // 32K
>   };
>  
> + video: video@1e70 {
> + compatible = "aspeed,ast2400-video-engine";
> + reg = <0x1e70 0x1000>;
> + clocks = < ASPEED_CLK_GATE_VCLK>,
> +  < ASPEED_CLK_GATE_ECLK>;
> + clock-names = "vclk", "eclk";
> + interrupts = <7>;
> + status = "disabled";
> + };
> +
>   gpio: gpio@1e78 {
>   #gpio-cells = <2>;
>   gpio-controller;
> @@ -1408,6 +1422,54 @@
>   };
>  };
>  
> +_scratch {
> + dac_mux {
> + offset = <0x2c>;
> + bit-mask = <0x3>;
> + bit-shift = <16>;
> + };
> + vga0 {
> + offset = <0x50>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga1 {
> + offset = <0x54>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga2 {
> + offset = <0x58>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga3 {
> + offset = <0x5c>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga4 {
> + offset = <0x60>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga5 {
> + offset = <0x64>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga6 {
> + offset = <0x68>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga7 {
> + offset = <0x6c>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> +};
> +
>  _regs {
>   sio_2b {
>   offset = <0xf0>;
> -- 
> 2.20.1
> 
>

Re: [RFC PATCH 0/3] Make deferred split shrinker memcg aware

2019-05-28 Thread David Rientjes

On Tue, 28 May 2019, Yang Shi wrote:

> 
> I got some reports from our internal application team about memcg OOM.
> Even though the application has been killed by oom killer, there are
> still a lot THPs reside, page reclaim doesn't reclaim them at all.
> 
> Some investigation shows they are on deferred split queue, memcg direct
> reclaim can't shrink them since THP deferred split shrinker is not memcg
> aware, this may cause premature OOM in memcg.  The issue can be
> reproduced easily by the below test:
> 

Right, we've also encountered this.  I talked to Kirill about it a week or 
so ago where the suggestion was to split all compound pages on the 
deferred split queues under the presence of even memory pressure.

That breaks cgroup isolation and perhaps unfairly penalizes workloads that 
are running attached to other memcg hierarchies that are not under 
pressure because their compound pages are now split as a side effect.  
There is a benefit to keeping these compound pages around while not under 
memory pressure if all pages are subsequently mapped again.

> $ cgcreate -g memory:thp
> $ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes
> $ cgexec -g memory:thp ./transhuge-stress 4000
> 
> transhuge-stress comes from kernel selftest.
> 
> It is easy to hit OOM, but there are still a lot THP on the deferred split
> queue, memcg direct reclaim can't touch them since the deferred split
> shrinker is not memcg aware.
> 

Yes, we have seen this on at least 4.15 as well.

> Convert deferred split shrinker memcg aware by introducing per memcg deferred
> split queue.  The THP should be on either per node or per memcg deferred
> split queue if it belongs to a memcg.  When the page is immigrated to the
> other memcg, it will be immigrated to the target memcg's deferred split queue
> too.
> 
> And, move deleting THP from deferred split queue in page free before memcg
> uncharge so that the page's memcg information is available.
> 
> Reuse the second tail page's deferred_list for per memcg list since the same
> THP can't be on multiple deferred split queues at the same time.
> 
> Remove THP specific destructor since it is not used anymore with memcg aware
> THP shrinker (Please see the commit log of patch 2/3 for the details).
> 
> Make deferred split shrinker not depend on memcg kmem since it is not slab.
> It doesn't make sense to not shrink THP even though memcg kmem is disabled.
> 
> With the above change the test demonstrated above doesn't trigger OOM anymore
> even though with cgroup.memory=nokmem.
> 

I'm curious if your internal applications team is also asking for 
statistics on how much memory can be freed if the deferred split queues 
can be shrunk?  We have applications that monitor their own memory usage 
through memcg stats or usage and proactively try to reduce that usage when 
it is growing too large.  The deferred split queues have significantly 
increased both memcg usage and rss when they've upgraded kernels.

How are your applications monitoring how much memory from deferred split 
queues can be freed on memory pressure?  Any thoughts on providing it as a 
memcg stat?

Thanks!

Re: [PATCH -next] EDAC: aspeed: Remove set but not used variable 'np'

2019-05-28 Thread Andrew Jeffery




On Sun, 26 May 2019, at 00:12, YueHaibing wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/edac/aspeed_edac.c: In function aspeed_probe:
> drivers/edac/aspeed_edac.c:284:22: warning: variable np set but not 
> used [-Wunused-but-set-variable]
> 
> It is never used and can be removed.
> 
> Signed-off-by: YueHaibing 

Reviewed-by: Andrew Jeffery 

> ---
>  drivers/edac/aspeed_edac.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/edac/aspeed_edac.c b/drivers/edac/aspeed_edac.c
> index 11833c0a5d07..5634437bb39d 100644
> --- a/drivers/edac/aspeed_edac.c
> +++ b/drivers/edac/aspeed_edac.c
> @@ -281,15 +281,11 @@ static int aspeed_probe(struct platform_device *pdev)
>   struct device *dev = >dev;
>   struct edac_mc_layer layers[2];
>   struct mem_ctl_info *mci;
> - struct device_node *np;
>   struct resource *res;
>   void __iomem *regs;
>   u32 reg04;
>   int rc;
>  
> - /* setup regmap */
> - np = dev->of_node;
> -
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>   if (!res)
>   return -ENOENT;
> -- 
> 2.17.1
> 
> 
>

RE: [EXT] Re: Issue: regmap: use debugfs even when no device

2019-05-28 Thread Andy Duan

From: Mark Brown  Sent: Tuesday, May 28, 2019 9:27 PM
> On Tue, May 28, 2019 at 02:20:15AM +, Andy Duan wrote:
> 
> > So on i.MX8MM/8QM/8QXP platforms, we catch the issue that user dump
> > regmap registers without power cause system hang.
> > Maybe revert the patch is more reasonable ?
> 
> This is an issue with or without a device - you can have the same issue with
> devices that are powered off.  Typically where power is dynamic the driver
> will use a register cache so the registers are always available.

Correct, regmap without device also has issue when power if off, because regmap
doesn't implement runtime pm for the device, but maybe device driver implement
the runtime pm for the device. 

So regmap how to manage the clock and power when access registers by debugfs ?

Andy

[PATCH] dm-init: fix 2 incorrect use of kstrndup()

2019-05-28 Thread Gen Zhang

In drivers/md/dm-init.c, kstrndup() is incorrectly used twice.

It should be: char *kstrndup(const char *s, size_t max, gfp_t gfp);

Signed-off-by: Gen Zhang 
---
diff --git a/drivers/md/dm-init.c b/drivers/md/dm-init.c
index 352e803..526e261 100644
--- a/drivers/md/dm-init.c
+++ b/drivers/md/dm-init.c
@@ -140,8 +140,8 @@ static char __init *dm_parse_table_entry(struct dm_device 
*dev, char *str)
return ERR_PTR(-EINVAL);
}
/* target_args */
-   dev->target_args_array[n] = kstrndup(field[3], GFP_KERNEL,
-DM_MAX_STR_SIZE);
+   dev->target_args_array[n] = kstrndup(field[3], DM_MAX_STR_SIZE,
+   GFP_KERNEL);
if (!dev->target_args_array[n])
return ERR_PTR(-ENOMEM);
 
@@ -275,7 +275,7 @@ static int __init dm_init_init(void)
DMERR("Argument is too big. Limit is %d\n", DM_MAX_STR_SIZE);
return -EINVAL;
}
-   str = kstrndup(create, GFP_KERNEL, DM_MAX_STR_SIZE);
+   str = kstrndup(create, DM_MAX_STR_SIZE, GFP_KERNEL);
if (!str)
return -ENOMEM;
 
---

[PATCH] wd719x: pass GFP_ATOMIC instead of GFP_KERNEL

2019-05-28 Thread Hariprasad Kelam

wd719x_chip_init is getting called in interrupt disabled
mode(spin_lock_irqsave) , so we need to GFP_ATOMIC instead
of GFP_KERNEL.

Issue identified by coccicheck

Signed-off-by: Hariprasad Kelam 
---
 drivers/scsi/wd719x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/wd719x.c b/drivers/scsi/wd719x.c
index c2f4006..f300fd7 100644
--- a/drivers/scsi/wd719x.c
+++ b/drivers/scsi/wd719x.c
@@ -319,7 +319,7 @@ static int wd719x_chip_init(struct wd719x *wd)
 
if (!wd->fw_virt)
wd->fw_virt = dma_alloc_coherent(>pdev->dev, wd->fw_size,
->fw_phys, GFP_KERNEL);
+>fw_phys, GFP_ATOMIC);
if (!wd->fw_virt) {
ret = -ENOMEM;
goto wd719x_init_end;
-- 
2.7.4

[v4, PATCH] net: stmmac: add support for hash table size 128/256 in dwmac4

2019-05-28 Thread Biao Huang

1. get hash table size in hw feature reigster, and add support
for taller hash table(128/256) in dwmac4.
2. only clear GMAC_PACKET_FILTER bits used in this function,
to avoid side effect to functions of other bits.

Signed-off-by: Biao Huang 
---
 drivers/net/ethernet/stmicro/stmmac/common.h  |7 +--
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  |4 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c |   49 -
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c  |1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |4 ++
 5 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 1961fe9..26bbcd8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -335,6 +335,7 @@ struct dma_features {
/* 802.3az - Energy-Efficient Ethernet (EEE) */
unsigned int eee;
unsigned int av;
+   unsigned int hash_tb_sz;
unsigned int tsoen;
/* TX and RX csum */
unsigned int tx_coe;
@@ -428,9 +429,9 @@ struct mac_device_info {
struct mii_regs mii;/* MII register Addresses */
struct mac_link link;
void __iomem *pcsr; /* vpointer to device CSRs */
-   int multicast_filter_bins;
-   int unicast_filter_entries;
-   int mcast_bits_log2;
+   unsigned int multicast_filter_bins;
+   unsigned int unicast_filter_entries;
+   unsigned int mcast_bits_log2;
unsigned int rx_csum;
unsigned int pcs;
unsigned int pmt;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 01c1089..a37e09b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -18,8 +18,7 @@
 /*  MAC registers */
 #define GMAC_CONFIG0x
 #define GMAC_PACKET_FILTER 0x0008
-#define GMAC_HASH_TAB_0_31 0x0010
-#define GMAC_HASH_TAB_32_630x0014
+#define GMAC_HASH_TAB(x)   (0x10 + x * 4)
 #define GMAC_RX_FLOW_CTRL  0x0090
 #define GMAC_QX_TX_FLOW_CTRL(x)(0x70 + x * 4)
 #define GMAC_TXQ_PRTY_MAP0 0x98
@@ -184,6 +183,7 @@ enum power_event {
 #define GMAC_HW_FEAT_MIISELBIT(0)
 
 /* MAC HW features1 bitmap */
+#define GMAC_HW_HASH_TB_SZ GENMASK(25, 24)
 #define GMAC_HW_FEAT_AVSEL BIT(20)
 #define GMAC_HW_TSOEN  BIT(18)
 #define GMAC_HW_TXFIFOSIZE GENMASK(10, 6)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index 5e98da4..2544cff 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -403,41 +403,50 @@ static void dwmac4_set_filter(struct mac_device_info *hw,
  struct net_device *dev)
 {
void __iomem *ioaddr = (void __iomem *)dev->base_addr;
-   unsigned int value = 0;
+   int numhashregs = (hw->multicast_filter_bins >> 5);
+   int mcbitslog2 = hw->mcast_bits_log2;
+   unsigned int value;
+   int i;
 
+   value = readl(ioaddr + GMAC_PACKET_FILTER);
+   value &= ~GMAC_PACKET_FILTER_HMC;
+   value &= ~GMAC_PACKET_FILTER_HPF;
+   value &= ~GMAC_PACKET_FILTER_PCF;
+   value &= ~GMAC_PACKET_FILTER_PM;
+   value &= ~GMAC_PACKET_FILTER_PR;
if (dev->flags & IFF_PROMISC) {
value = GMAC_PACKET_FILTER_PR | GMAC_PACKET_FILTER_PCF;
} else if ((dev->flags & IFF_ALLMULTI) ||
-   (netdev_mc_count(dev) > HASH_TABLE_SIZE)) {
+  (netdev_mc_count(dev) > hw->multicast_filter_bins)) {
/* Pass all multi */
-   value = GMAC_PACKET_FILTER_PM;
-   /* Set the 64 bits of the HASH tab. To be updated if taller
-* hash table is used
-*/
-   writel(0x, ioaddr + GMAC_HASH_TAB_0_31);
-   writel(0x, ioaddr + GMAC_HASH_TAB_32_63);
+   value |= GMAC_PACKET_FILTER_PM;
+   /* Set all the bits of the HASH tab */
+   for (i = 0; i < numhashregs; i++)
+   writel(0x, ioaddr + GMAC_HASH_TAB(i));
} else if (!netdev_mc_empty(dev)) {
-   u32 mc_filter[2];
+   u32 mc_filter[8];
struct netdev_hw_addr *ha;
 
/* Hash filter for multicast */
-   value = GMAC_PACKET_FILTER_HMC;
+   value |= GMAC_PACKET_FILTER_HMC;
 
memset(mc_filter, 0, sizeof(mc_filter));
netdev_for_each_mc_addr(ha, dev) {
-   /* The upper 6 bits of the calculated CRC are used to
-* index the content of the Hash Table Reg 0 and

[v4, PATCH] add some features in stmmac

2019-05-28 Thread Biao Huang

Changes in v4:  
retain the reverse xmas tree ordering.  

Changes in v3:  
rewrite the patch base on serires in
https://patchwork.ozlabs.org/project/netdev/list/?series=109699 

Changes in v2;  
1. reverse Christmas tree order in dwmac4_set_filter.   
2. remove clause 45 patch, waiting for cl45 patch from Boon Leong   
   

v1: 
This series add some features in stmmac driver. 
1. add support for hash table size 128/256  
2. add mdio clause 45 access from mac device for dwmac4.

Biao Huang (1): 
  net: stmmac: add support for hash table size 128/256 in dwmac4

 drivers/net/ethernet/stmicro/stmmac/common.h  |7 +--   
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  |4 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c |   49 - 
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c  |1 + 
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |4 ++
 5 files changed, 40 insertions(+), 25 deletions(-) 

--  
1.7.9.5

Re: [PATCH] perf: Fix oops when kthread execs user process

2019-05-28 Thread Michael Ellerman

Peter Zijlstra  writes:
> On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
>> When a kthread calls call_usermodehelper() the steps are:
>>   1. allocate current->mm
>>   2. load_elf_binary()
>>   3. populate current->thread.regs
>> 
>> While doing this, interrupts are not disabled. If there is a perf
>> interrupt in the middle of this process (i.e. step 1 has completed
>> but not yet reached to step 3) and if perf tries to read userspace
>> regs, kernel oops.
>> 
>> Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
>> pt_regs are not set.
>> 
>> See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs
>> user process") for details.
>
> Why the hell do we set current->mm before it is complete? Note that
> normally exec() builds the new mm before attaching it, see exec_mmap()
> in flush_old_exec().
>
> Also, why did those PPC folks 'fix' this in isolation? And why didn't
> you Cc them?

We just assumed it was our bug, 'cause we have plenty of those :)

cheers

Re: [PATCH RESEND 2/7] csky: entry: Remove unneeded need_resched() loop

2019-05-28 Thread Guo Ren

Thx Valentin,

You are right, Approved.

Best Regards
 Guo Ren

On Tue, May 28, 2019 at 11:48:43AM +0100, Valentin Schneider wrote:
> Since the enabling and disabling of IRQs within preempt_schedule_irq()
> is contained in a need_resched() loop, we don't need the outer arch
> code loop.
> 
> Signed-off-by: Valentin Schneider 
> Cc: Guo Ren 
> ---
>  arch/csky/kernel/entry.S | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/arch/csky/kernel/entry.S b/arch/csky/kernel/entry.S
> index a7e84bd8..679afbcc2001 100644
> --- a/arch/csky/kernel/entry.S
> +++ b/arch/csky/kernel/entry.S
> @@ -292,11 +292,7 @@ ENTRY(csky_irq)
>   ldw r8, (r9, TINFO_FLAGS)
>   btsti   r8, TIF_NEED_RESCHED
>   bf  2f
> -1:
>   jbsrpreempt_schedule_irq/* irq en/disable is done inside */
> - ldw r7, (r9, TINFO_FLAGS)   /* get new tasks TI_FLAGS */
> - btsti   r7, TIF_NEED_RESCHED
> - bt  1b  /* go again */
>  #endif
>  2:
>   jmpiret_from_exception
> -- 
> 2.20.1
>

[PATCH] wcd9335: fix a incorrect use of kstrndup()

2019-05-28 Thread Gen Zhang

In wcd9335_codec_enable_dec(), 'widget_name' is allocated by kstrndup().
However, according to doc: "Note: Use kmemdup_nul() instead if the size
is known exactly." So we should use kmemdup_nul() here instead of
kstrndup().

Signed-off-by: Gen Zhang 
---
diff --git a/sound/soc/codecs/wcd9335.c b/sound/soc/codecs/wcd9335.c
index a04a7ce..85737fe 100644
--- a/sound/soc/codecs/wcd9335.c
+++ b/sound/soc/codecs/wcd9335.c
@@ -2734,7 +2734,7 @@ static int wcd9335_codec_enable_dec(struct 
snd_soc_dapm_widget *w,
char *dec;
u8 hpf_coff_freq;
 
-   widget_name = kstrndup(w->name, 15, GFP_KERNEL);
+   widget_name = kmemdup_nul(w->name, 15, GFP_KERNEL);
if (!widget_name)
return -ENOMEM;
 
---

[PATCH] intel_menlow: avoid null pointer deference error

2019-05-28 Thread Young Xiao

Fix a null pointer deference by acpi_driver_data() if device is
null (dereference before check). We should only set cdev and check
this is OK after we are sure device is not null.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/platform/x86/intel_menlow.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/intel_menlow.c 
b/drivers/platform/x86/intel_menlow.c
index 77eb870..28feb5c 100644
--- a/drivers/platform/x86/intel_menlow.c
+++ b/drivers/platform/x86/intel_menlow.c
@@ -180,9 +180,13 @@ static int intel_menlow_memory_add(struct acpi_device 
*device)
 
 static int intel_menlow_memory_remove(struct acpi_device *device)
 {
-   struct thermal_cooling_device *cdev = acpi_driver_data(device);
+   struct thermal_cooling_device *cdev;
+
+   if (!device)
+   return -EINVAL;
 
-   if (!device || !cdev)
+   cdev = acpi_driver_data(device);
+   if (!cdev)
return -EINVAL;
 
sysfs_remove_link(>dev.kobj, "thermal_cooling");
-- 
2.7.4

Re: [PATCH v2 2/7] drivers/soc: Add Aspeed XDMA Engine Driver

2019-05-28 Thread Andrew Jeffery




On Sat, 25 May 2019, at 01:39, Eddie James wrote:
> 
> On 5/21/19 7:02 AM, Arnd Bergmann wrote:
> > On Mon, May 20, 2019 at 10:19 PM Eddie James  wrote:
> >> diff --git a/include/uapi/linux/aspeed-xdma.h 
> >> b/include/uapi/linux/aspeed-xdma.h
> >> new file mode 100644
> >> index 000..2a4bd13
> >> --- /dev/null
> >> +++ b/include/uapi/linux/aspeed-xdma.h
> >> @@ -0,0 +1,26 @@
> >> +/* SPDX-License-Identifier: GPL-2.0+ */
> >> +/* Copyright IBM Corp 2019 */
> >> +
> >> +#ifndef _UAPI_LINUX_ASPEED_XDMA_H_
> >> +#define _UAPI_LINUX_ASPEED_XDMA_H_
> >> +
> >> +#include 
> >> +
> >> +/*
> >> + * aspeed_xdma_op
> >> + *
> >> + * upstream: boolean indicating the direction of the DMA operation; 
> >> upstream
> >> + *   means a transfer from the BMC to the host
> >> + *
> >> + * host_addr: the DMA address on the host side, typically configured by 
> >> PCI
> >> + *subsystem
> >> + *
> >> + * len: the size of the transfer in bytes; it should be a multiple of 16 
> >> bytes
> >> + */
> >> +struct aspeed_xdma_op {
> >> +   __u32 upstream;
> >> +   __u64 host_addr;
> >> +   __u32 len;
> >> +};
> >> +
> >> +#endif /* _UAPI_LINUX_ASPEED_XDMA_H_ */
> > If this is a user space interface, please remove the holes in the
> > data structure.
> 
> 
> Surely it's 4-byte aligned and there won't be holes??

__u64 is 8-byte aligned, so you have a hole after upstream.

Easiest just to put upstream after len?

Andrew

Re: [PATCH 2/2] Revert "mm, thp: restore node-local hugepage allocations"

2019-05-28 Thread David Rientjes

On Fri, 24 May 2019, Andrea Arcangeli wrote:

> > > We are going in circles, *yes* there is a problem for potential swap 
> > > storms today because of the poor interaction between memory compaction 
> > > and 
> > > directed reclaim but this is a result of a poor API that does not allow 
> > > userspace to specify that its workload really will span multiple sockets 
> > > so faulting remotely is the best course of action.  The fix is not to 
> > > cause regressions for others who have implemented a userspace stack that 
> > > is based on the past 3+ years of long standing behavior or for 
> > > specialized 
> > > workloads where it is known that it spans multiple sockets so we want 
> > > some 
> > > kind of different behavior.  We need to provide a clear and stable API to 
> > > define these terms for the page allocator that is independent of any 
> > > global setting of thp enabled, defrag, zone_reclaim_mode, etc.  It's 
> > > workload dependent.
> > 
> > um, who is going to do this work?
> 
> That's a good question. It's going to be a not simple patch to
> backport to -stable: it'll be intrusive and it will affect
> mm/page_alloc.c significantly so it'll reject heavy. I wouldn't
> consider it -stable material at least in the short term, it will
> require some testing.
> 

Hi Andrea,

I'm not sure what patch you're referring to, unfortunately.  The above 
comment was referring to APIs that are made available to userspace to 
define when to fault locally vs remotely and what the preference should be 
for any form of compaction or reclaim to achieve that.  Today we have 
global enabling options, global defrag settings, enabling prctls, and 
madvise options.  The point it makes is that whether a specific workload 
fits into a single socket is workload dependant and thus we are left with 
prctls and madvise options.  The prctl either enables thp or it doesn't, 
it is not interesting here; the madvise is overloaded in four different 
ways (enabling, stalling at fault, collapsability, defrag) so it's not 
surprising that continuing to overload it for existing users will cause 
undesired results.  It makes an argument that we need a clear and stable 
means of defining the behavior, not changing the 4+ year behavior and 
giving those who regress no workaround.

> This is why applying a simple fix that avoids the swap storms (and the
> swap-less pathological THP regression for vfio device assignment GUP
> pinning) is preferable before adding an alloc_pages_multi_order (or
> equivalent) so that it'll be the allocator that will decide when
> exactly to fallback from 2M to 4k depending on the NUMA distance and
> memory availability during the zonelist walk. The basic idea is to
> call alloc_pages just once (not first for 2M and then for 4k) and
> alloc_pages will decide which page "order" to return.
> 

The commit description doesn't mention the swap storms that you're trying 
to fix, it's probably better to describe that again and why it is not 
beneficial to swap unless an entire pageblock can become free or memory 
compaction has indicated that additional memory freeing would allow 
migration to make an entire pageblock free.  I understand that's a 
invasive code change, but merging this patch changes the 4+ year behavior 
that started here:

commit 077fcf116c8c2bd7ee9487b645aa3b50368db7e1
Author: Aneesh Kumar K.V 
Date:   Wed Feb 11 15:27:12 2015 -0800

mm/thp: allocate transparent hugepages on local node

And that commit's description describes quite well the regression that we 
encounter if we remove __GFP_THISNODE here.  That's because the access 
latency regression is much more substantial than what was reported for 
Naples in your changelog.

In the interest of making forward progress, can we agree that swapping 
from the local node *never* makes sense unless we can show that an entire 
pageblock can become free or that it enables memory compaction to migrate 
memory that can make an entire pageblock free?  Are you reporting swap 
storms for the local node when one of these is true?

> > Implementing a new API doesn't help existing userspace which is hurting
> > from the problem which this patch addresses.
> 
> Yes, we can't change all apps that may not fit in a single NUMA
> node. Currently it's unsafe to turn "transparent_hugepages/defrag =
> always" or the bad behavior can then materialize also outside of
> MADV_HUGEPAGE. Those apps that use MADV_HUGEPAGE on their long lived
> allocations (i.e. guest physical memory) like qemu are affected even
> with the default "defrag = madvise". Those apps are using
> MADV_HUGEPAGE for more than 3 years and they are widely used and open
> source of course.
> 

I continue to reiterate that the 4+ year long standing behavior of 
MADV_HUGEPAGE is overloaded; you are anticipating a specific behavior for 
workloads that do not fit in a single NUMA node whereas other users 
developed in the past four years are anticipating a different behavior.  
I'm trying to

Re: [PATCH v2 1/3] KVM: x86: add support for user wait instructions

2019-05-28 Thread Tao Xu




On 29/05/2019 09:24, Paolo Bonzini wrote:

On 24/05/19 09:56, Tao Xu wrote:

+7.19 KVM_CAP_ENABLE_USR_WAIT_PAUSE
+
+Architectures: x86
+Parameters: args[0] whether feature should be enabled or not
+
+With this capability enabled, a VM can use UMONITOR, UMWAIT and TPAUSE
+instructions. If the instruction causes a delay, the amount of
+time delayed is called here the physical delay. The physical delay is
+first computed by determining the virtual delay (the time to delay
+relative to the VM’s timestamp counter). Otherwise, UMONITOR, UMWAIT
+and TPAUSE cause an invalid-opcode exception(#UD).
+


There is no need to make it a capability.  You can just check the guest
CPUID and see if it includes X86_FEATURE_WAITPKG.

Paolo



Thank you Paolo, but I have another question. I was wondering if it is 
appropriate to enable X86_FEATURE_WAITPKG when QEMU uses "-overcommit 
cpu-pm=on"? Or just enable X86_FEATURE_WAITPKG when QEMU add the feature 
"-cpu host,+waitpkg"? User wait instructions is the wait or pause 
instructions may be executed at any privilege level, but can use 
IA32_UMWAIT_CONTROL to set the maximum time.

Re: [PATCH net-next 1/5] timecounter: Add helper for reconstructing partial timestamps

2019-05-28 Thread John Stultz

On Tue, May 28, 2019 at 4:58 PM Vladimir Oltean  wrote:
>
> Some PTP hardware offers a 64-bit free-running counter whose snapshots
> are used for timestamping, but only makes part of that snapshot
> available as timestamps (low-order bits).
>
> In that case, timecounter/cyclecounter users must bring the cyclecounter
> and timestamps to the same bit width, and they currently have two
> options of doing so:
>
> - Trim the higher bits of the timecounter itself to the number of bits
>   of the timestamps.  This might work for some setups, but if the
>   wraparound of the timecounter in this case becomes high (~10 times per
>   second) then this causes additional strain on the system, which must
>   read the clock that often just to avoid missing the wraparounds.
>
> - Reconstruct the timestamp by racing to read the PTP time within one
>   wraparound cycle since the timestamp was generated.  This is
>   preferable when the wraparound time is small (do a time-critical
>   readout once vs doing it periodically), and it has no drawback even
>   when the wraparound is comfortably sized.
>
> Signed-off-by: Vladimir Oltean 
> ---
>  include/linux/timecounter.h |  7 +++
>  kernel/time/timecounter.c   | 33 +
>  2 files changed, 40 insertions(+)
>
> diff --git a/include/linux/timecounter.h b/include/linux/timecounter.h
> index 2496ad4cfc99..03eab1f3bb9c 100644
> --- a/include/linux/timecounter.h
> +++ b/include/linux/timecounter.h
> @@ -30,6 +30,9 @@
>   * by the implementor and user of specific instances of this API.
>   *
>   * @read:  returns the current cycle value
> + * @partial_tstamp_mask:bitmask in case the hardware emits timestamps
> + * which only capture low-order bits of the full
> + * counter, and should be reconstructed.
>   * @mask:  bitmask for two's complement
>   * subtraction of non 64 bit counters,
>   * see CYCLECOUNTER_MASK() helper macro
> @@ -38,6 +41,7 @@
>   */
>  struct cyclecounter {
> u64 (*read)(const struct cyclecounter *cc);
> +   u64 partial_tstamp_mask;
> u64 mask;
> u32 mult;
> u32 shift;
> @@ -136,4 +140,7 @@ extern u64 timecounter_read(struct timecounter *tc);
>  extern u64 timecounter_cyc2time(struct timecounter *tc,
> u64 cycle_tstamp);
>
> +extern u64 cyclecounter_reconstruct(const struct cyclecounter *cc,
> +   u64 ts_partial);
> +
>  #endif
> diff --git a/kernel/time/timecounter.c b/kernel/time/timecounter.c
> index 85b98e727306..d4657d64e38d 100644
> --- a/kernel/time/timecounter.c
> +++ b/kernel/time/timecounter.c
> @@ -97,3 +97,36 @@ u64 timecounter_cyc2time(struct timecounter *tc,
> return nsec;
>  }
>  EXPORT_SYMBOL_GPL(timecounter_cyc2time);
> +
> +/**
> + * cyclecounter_reconstruct - reconstructs @ts_partial
> + * @cc:Pointer to cycle counter.
> + * @ts_partial:Typically RX or TX NIC timestamp, provided by 
> hardware as
> + * the lower @partial_tstamp_mask bits of the cycle counter,
> + * sampled at the time the timestamp was collected.
> + * To reconstruct into a full @mask bit-wide timestamp, the
> + * cycle counter is read and the high-order bits (up to @mask) 
> are
> + * filled in.
> + * Must be called within one wraparound of @partial_tstamp_mask
> + * bits of the cycle counter.
> + */
> +u64 cyclecounter_reconstruct(const struct cyclecounter *cc, u64 ts_partial)
> +{
> +   u64 ts_reconstructed;
> +   u64 cycle_now;
> +
> +   cycle_now = cc->read(cc);
> +
> +   ts_reconstructed = (cycle_now & ~cc->partial_tstamp_mask) |
> +   ts_partial;
> +
> +   /* Check lower bits of current cycle counter against the timestamp.
> +* If the current cycle counter is lower than the partial timestamp,
> +* then wraparound surely occurred and must be accounted for.
> +*/
> +   if ((cycle_now & cc->partial_tstamp_mask) <= ts_partial)
> +   ts_reconstructed -= (cc->partial_tstamp_mask + 1);
> +
> +   return ts_reconstructed;
> +}
> +EXPORT_SYMBOL_GPL(cyclecounter_reconstruct);

Hrm. Is this actually generic? Would it make more sense to have the
specific implementations with this quirk implement this in their
read() handler? If not, why?

thanks
-john

[PATCH] falcon: pass valid pointer from ef4_enqueue_unwind.

2019-05-28 Thread Young Xiao

The bytes_compl and pkts_compl pointers passed to ef4_dequeue_buffers
cannot be NULL. Add a paranoid warning to check this condition and fix
the one case where they were NULL.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/net/ethernet/sfc/falcon/tx.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/falcon/tx.c 
b/drivers/net/ethernet/sfc/falcon/tx.c
index c5059f4..ed89bc6 100644
--- a/drivers/net/ethernet/sfc/falcon/tx.c
+++ b/drivers/net/ethernet/sfc/falcon/tx.c
@@ -69,6 +69,7 @@ static void ef4_dequeue_buffer(struct ef4_tx_queue *tx_queue,
}
 
if (buffer->flags & EF4_TX_BUF_SKB) {
+   EF4_WARN_ON_PARANOID(!pkts_compl || !bytes_compl);
(*pkts_compl)++;
(*bytes_compl) += buffer->skb->len;
dev_consume_skb_any((struct sk_buff *)buffer->skb);
@@ -271,12 +272,14 @@ static int ef4_tx_map_data(struct ef4_tx_queue *tx_queue, 
struct sk_buff *skb)
 static void ef4_enqueue_unwind(struct ef4_tx_queue *tx_queue)
 {
struct ef4_tx_buffer *buffer;
+   unsigned int bytes_compl = 0;
+   unsigned int pkts_compl = 0;
 
/* Work backwards until we hit the original insert pointer value */
while (tx_queue->insert_count != tx_queue->write_count) {
--tx_queue->insert_count;
buffer = __ef4_tx_queue_get_insert_buffer(tx_queue);
-   ef4_dequeue_buffer(tx_queue, buffer, NULL, NULL);
+   ef4_dequeue_buffer(tx_queue, buffer, _compl, _compl);
}
 }
 
-- 
2.7.4

[PATCH net-next v3 4/5] net: stmmac: add xPCS functions for device with DWMACv5.1

2019-05-28 Thread Voon Weifeng

From: Ong Boon Leong 

We introduce support for driver that has v5.10 IP and is also using
xPCS as MMD. This can be easily enabled for other product that integrates
xPCS that is not using v5.00 IP.

Reviewed-by: Chuah Kim Tatt 
Reviewed-by: Voon Weifeng 
Reviewed-by: Kweh Hock Leong 
Reviewed-by: Baoli Zhang 
Signed-off-by: Ong Boon Leong 
Signed-off-by: Voon Weifeng 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 33 ++
 drivers/net/ethernet/stmicro/stmmac/hwif.c| 41 ++-
 drivers/net/ethernet/stmicro/stmmac/hwif.h|  2 ++
 3 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index b4bb5629de38..34f05068142e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -801,6 +801,39 @@ static void dwmac4_debug(void __iomem *ioaddr, struct 
stmmac_extra_stats *x,
.flex_pps_config = dwmac5_flex_pps_config,
 };
 
+const struct stmmac_ops dwmac510_xpcs_ops = {
+   .core_init = dwmac4_core_init,
+   .set_mac = stmmac_dwmac4_set_mac,
+   .rx_ipc = dwmac4_rx_ipc_enable,
+   .rx_queue_enable = dwmac4_rx_queue_enable,
+   .rx_queue_prio = dwmac4_rx_queue_priority,
+   .tx_queue_prio = dwmac4_tx_queue_priority,
+   .rx_queue_routing = dwmac4_rx_queue_routing,
+   .prog_mtl_rx_algorithms = dwmac4_prog_mtl_rx_algorithms,
+   .prog_mtl_tx_algorithms = dwmac4_prog_mtl_tx_algorithms,
+   .set_mtl_tx_queue_weight = dwmac4_set_mtl_tx_queue_weight,
+   .map_mtl_to_dma = dwmac4_map_mtl_dma,
+   .config_cbs = dwmac4_config_cbs,
+   .dump_regs = dwmac4_dump_regs,
+   .host_irq_status = dwmac4_irq_status,
+   .host_mtl_irq_status = dwmac4_irq_mtl_status,
+   .flow_ctrl = dwmac4_flow_ctrl,
+   .pmt = dwmac4_pmt,
+   .set_umac_addr = dwmac4_set_umac_addr,
+   .get_umac_addr = dwmac4_get_umac_addr,
+   .set_eee_mode = dwmac4_set_eee_mode,
+   .reset_eee_mode = dwmac4_reset_eee_mode,
+   .set_eee_timer = dwmac4_set_eee_timer,
+   .set_eee_pls = dwmac4_set_eee_pls,
+   .debug = dwmac4_debug,
+   .set_filter = dwmac4_set_filter,
+   .safety_feat_config = dwmac5_safety_feat_config,
+   .safety_feat_irq_status = dwmac5_safety_feat_irq_status,
+   .safety_feat_dump = dwmac5_safety_feat_dump,
+   .rxp_config = dwmac5_rxp_config,
+   .flex_pps_config = dwmac5_flex_pps_config,
+};
+
 int dwmac4_setup(struct stmmac_priv *priv)
 {
struct mac_device_info *mac = priv->hw;
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c 
b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 81b966a8261b..f1cb3ce165e5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -73,11 +73,13 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
bool gmac;
bool gmac4;
bool xgmac;
+   bool has_xpcs;
u32 min_id;
const struct stmmac_regs_off regs;
const void *desc;
const void *dma;
const void *mac;
+   const void *xpcs;
const void *hwtimestamp;
const void *mode;
const void *tc;
@@ -89,6 +91,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
.gmac = false,
.gmac4 = false,
.xgmac = false,
+   .has_xpcs = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC3_X_OFFSET,
@@ -97,6 +100,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
.desc = NULL,
.dma = _dma_ops,
.mac = _ops,
+   .xpcs = NULL,
.hwtimestamp = _ptp,
.mode = NULL,
.tc = NULL,
@@ -106,6 +110,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
.gmac = true,
.gmac4 = false,
.xgmac = false,
+   .has_xpcs = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC3_X_OFFSET,
@@ -114,6 +119,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
.desc = NULL,
.dma = _dma_ops,
.mac = _ops,
+   .xpcs = NULL,
.hwtimestamp = _ptp,
.mode = NULL,
.tc = NULL,
@@ -123,6 +129,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
.gmac = false,
.gmac4 = true,
.xgmac = false,
+   .has_xpcs = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -130,6 +137,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
},
.desc = _desc_ops,
.dma = _dma_ops,
+

[PATCH net-next v3 3/5] net: stmmac: add xpcs function hooks into main driver and ethtool

2019-05-28 Thread Voon Weifeng

From: Ong Boon Leong 

With xPCS functions now ready, we add them into the main driver and
ethtool logics. To differentiate from EQoS MAC PCS and DWC Ethernet
xPCS, we introduce 'has_xpcs' in platform data as a mean to indicate
whether GBE controller includes xPCS or not.

To support platform-specific C37 AN PCS mode selection for MII MMD,
we introduce 'pcs_mode' in platform data.

The basic framework for xPCS interrupt handling is implemented too.

Reviewed-by: Chuah Kim Tatt 
Reviewed-by: Voon Weifeng 
Reviewed-by: Kweh Hock Leong 
Reviewed-by: Baoli Zhang 
Signed-off-by: Ong Boon Leong 
Signed-off-by: Voon Weifeng 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h   |   2 +
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |  50 +--
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 152 -
 include/linux/stmmac.h |   2 +
 4 files changed, 158 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index dd95d959c1ce..0b8460a4a220 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -36,6 +36,7 @@ struct stmmac_resources {
const char *mac;
int wol_irq;
int lpi_irq;
+   int xpcs_irq;
int irq;
 };
 
@@ -168,6 +169,7 @@ struct stmmac_priv {
int clk_csr;
struct timer_list eee_ctrl_timer;
int lpi_irq;
+   int xpcs_irq;
int eee_enabled;
int eee_active;
int tx_lpi_timer;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index e09522c5509a..f0815d196147 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -28,6 +28,7 @@
 
 #include "stmmac.h"
 #include "dwmac_dma.h"
+#include "dwxpcs.h"
 
 #define REG_SPACE_SIZE 0x1060
 #define MAC100_ETHTOOL_NAME"st_mac100"
@@ -277,7 +278,8 @@ static int stmmac_ethtool_get_link_ksettings(struct 
net_device *dev,
struct phy_device *phy = dev->phydev;
 
if (priv->hw->pcs & STMMAC_PCS_RGMII ||
-   priv->hw->pcs & STMMAC_PCS_SGMII) {
+   priv->hw->pcs & STMMAC_PCS_SGMII ||
+   priv->plat->pcs_mode == AN_CTRL_PCS_MD_C37_1000BASEX) {
struct rgmii_adv adv;
u32 supported, advertising, lp_advertising;
 
@@ -294,6 +296,11 @@ static int stmmac_ethtool_get_link_ksettings(struct 
net_device *dev,
if (stmmac_pcs_get_adv_lp(priv, priv->ioaddr, ))
return -EOPNOTSUPP; /* should never happen indeed */
 
+   /* Get ADV & LPA is only application for 1000BASE-X C37.
+* For MAC side SGMII AN, get ADV & LPA from PHY.
+*/
+   stmmac_xpcs_get_adv_lp(priv, dev, , priv->plat->pcs_mode);
+
/* Encoding of PSE bits is defined in 802.3z, 37.2.1.4 */
 
ethtool_convert_link_mode_to_legacy_u32(
@@ -376,22 +383,23 @@ static int stmmac_ethtool_get_link_ksettings(struct 
net_device *dev,
int rc;
 
if (priv->hw->pcs & STMMAC_PCS_RGMII ||
-   priv->hw->pcs & STMMAC_PCS_SGMII) {
-   u32 mask = ADVERTISED_Autoneg | ADVERTISED_Pause;
-
+   priv->hw->pcs & STMMAC_PCS_SGMII ||
+   priv->plat->pcs_mode == AN_CTRL_PCS_MD_C37_1000BASEX) {
/* Only support ANE */
if (cmd->base.autoneg != AUTONEG_ENABLE)
return -EINVAL;
 
-   mask &= (ADVERTISED_1000baseT_Half |
-   ADVERTISED_1000baseT_Full |
-   ADVERTISED_100baseT_Half |
-   ADVERTISED_100baseT_Full |
-   ADVERTISED_10baseT_Half |
-   ADVERTISED_10baseT_Full);
-
mutex_lock(>lock);
stmmac_pcs_ctrl_ane(priv, priv->ioaddr, 1, priv->hw->ps, 0);
+
+   /* For 1000BASE-X C37 AN, it is always 1000Mbps. And, we only
+* support FD which is set by default in SR_MII_AN_ADV
+* during XPCS init. So, we don't need to set FD again.
+* For SGMII C37 AN, we let user to change link settings
+* through PHY since it is MAC side SGMII.
+*/
+   stmmac_xpcs_ctrl_ane(priv, dev, 1, 0);
+
mutex_unlock(>lock);
 
return 0;
@@ -457,6 +465,16 @@ static void stmmac_ethtool_gregs(struct net_device *dev,
pause->autoneg = 1;
if (!adv_lp.pause)
return;
+   } else if (priv->plat->pcs_mode == AN_CTRL_PCS_MD_C37_1000BASEX &&
+  !stmmac_xpcs_get_adv_lp(priv, netdev, _lp,
+  priv->plat->pcs_mode)) {
+   /* DW xPCS 1000BASE-X C37 AN mode only because for MAC side
+

[PATCH net-next v3 1/5] net: stmmac: enable clause 45 mdio support

2019-05-28 Thread Voon Weifeng

From: Kweh Hock Leong 

DWMAC4 is capable to support clause 45 mdio communication.
This patch enable the feature on stmmac_mdio_write() and
stmmac_mdio_read() by following phy_write_mmd() and
phy_read_mmd() mdiobus read write implementation format.

Reviewed-by: Li, Yifan 
Signed-off-by: Kweh Hock Leong 
Signed-off-by: Ong Boon Leong 
Signed-off-by: Weifeng Voon 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 40 ++-
 include/linux/phy.h   |  2 ++
 2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index bdd351597b55..c3d8f1d145ec 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -34,11 +34,27 @@
 
 #define MII_BUSY 0x0001
 #define MII_WRITE 0x0002
+#define MII_DATA_MASK GENMASK(15, 0)
 
 /* GMAC4 defines */
 #define MII_GMAC4_GOC_SHIFT2
+#define MII_GMAC4_REG_ADDR_SHIFT   16
 #define MII_GMAC4_WRITE(1 << MII_GMAC4_GOC_SHIFT)
 #define MII_GMAC4_READ (3 << MII_GMAC4_GOC_SHIFT)
+#define MII_GMAC4_C45E BIT(1)
+
+static void stmmac_mdio_c45_setup(struct stmmac_priv *priv, int phyreg,
+ u32 *val, u32 *data)
+{
+   unsigned int reg_shift = priv->hw->mii.reg_shift;
+   unsigned int reg_mask = priv->hw->mii.reg_mask;
+
+   *val |= MII_GMAC4_C45E;
+   *val &= ~reg_mask;
+   *val |= ((phyreg >> MII_DEVADDR_C45_SHIFT) << reg_shift) & reg_mask;
+
+   *data |= (phyreg & MII_REGADDR_C45_MASK) << MII_GMAC4_REG_ADDR_SHIFT;
+}
 
 /* XGMAC defines */
 #define MII_XGMAC_SADDRBIT(18)
@@ -165,22 +181,26 @@ static int stmmac_mdio_read(struct mii_bus *bus, int 
phyaddr, int phyreg)
struct stmmac_priv *priv = netdev_priv(ndev);
unsigned int mii_address = priv->hw->mii.addr;
unsigned int mii_data = priv->hw->mii.data;
-   u32 v;
-   int data;
u32 value = MII_BUSY;
+   int data = 0;
+   u32 v;
 
value |= (phyaddr << priv->hw->mii.addr_shift)
& priv->hw->mii.addr_mask;
value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
& priv->hw->mii.clk_csr_mask;
-   if (priv->plat->has_gmac4)
+   if (priv->plat->has_gmac4) {
value |= MII_GMAC4_READ;
+   if (phyreg & MII_ADDR_C45)
+   stmmac_mdio_c45_setup(priv, phyreg, , );
+   }
 
if (readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
   100, 1))
return -EBUSY;
 
+   writel(data, priv->ioaddr + mii_data);
writel(value, priv->ioaddr + mii_address);
 
if (readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
@@ -188,7 +208,7 @@ static int stmmac_mdio_read(struct mii_bus *bus, int 
phyaddr, int phyreg)
return -EBUSY;
 
/* Read the data from the MII data register */
-   data = (int)readl(priv->ioaddr + mii_data);
+   data = (int)readl(priv->ioaddr + mii_data) & MII_DATA_MASK;
 
return data;
 }
@@ -208,8 +228,9 @@ static int stmmac_mdio_write(struct mii_bus *bus, int 
phyaddr, int phyreg,
struct stmmac_priv *priv = netdev_priv(ndev);
unsigned int mii_address = priv->hw->mii.addr;
unsigned int mii_data = priv->hw->mii.data;
-   u32 v;
u32 value = MII_BUSY;
+   int data = phydata;
+   u32 v;
 
value |= (phyaddr << priv->hw->mii.addr_shift)
& priv->hw->mii.addr_mask;
@@ -217,10 +238,13 @@ static int stmmac_mdio_write(struct mii_bus *bus, int 
phyaddr, int phyreg,
 
value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
& priv->hw->mii.clk_csr_mask;
-   if (priv->plat->has_gmac4)
+   if (priv->plat->has_gmac4) {
value |= MII_GMAC4_WRITE;
-   else
+   if (phyreg & MII_ADDR_C45)
+   stmmac_mdio_c45_setup(priv, phyreg, , );
+   } else {
value |= MII_WRITE;
+   }
 
/* Wait until any existing MII operation is complete */
if (readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
@@ -228,7 +252,7 @@ static int stmmac_mdio_write(struct mii_bus *bus, int 
phyaddr, int phyreg,
return -EBUSY;
 
/* Set the MII address register to write */
-   writel(phydata, priv->ioaddr + mii_data);
+   writel(data, priv->ioaddr + mii_data);
writel(value, priv->ioaddr + mii_address);
 
/* Wait until any existing MII operation is complete */
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 073fb151b5a9..d3daac8ec686 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -198,6 +198,8 @@

[PATCH net-next v3 5/5] net: stmmac: add EHL SGMII 1Gbps PCI info and PCI ID

2019-05-28 Thread Voon Weifeng

Added EHL SGMII 1Gbps PCI ID. Different MII and speed will have
different PCI ID.

Signed-off-by: Voon Weifeng 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 111 +++
 1 file changed, 111 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
index 7cbc01f316fa..f2225c1eafc2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -23,6 +23,7 @@
 #include 
 
 #include "stmmac.h"
+#include "dwxpcs.h"
 
 /*
  * This struct is used to associate PCI Function of MAC controller on a board,
@@ -118,6 +119,113 @@ static int stmmac_default_data(struct pci_dev *pdev,
.setup = stmmac_default_data,
 };
 
+static int ehl_common_data(struct pci_dev *pdev,
+  struct plat_stmmacenet_data *plat)
+{
+   int i;
+
+   plat->bus_id = 1;
+   plat->phy_addr = 0;
+   plat->clk_csr = 5;
+   plat->has_gmac = 0;
+   plat->has_gmac4 = 1;
+   plat->xpcs_phy_addr = 0x16;
+   plat->pcs_mode = AN_CTRL_PCS_MD_C37_SGMII;
+   plat->force_sf_dma_mode = 0;
+   plat->tso_en = 1;
+
+   plat->rx_queues_to_use = 8;
+   plat->tx_queues_to_use = 8;
+   plat->rx_sched_algorithm = MTL_RX_ALGORITHM_SP;
+
+   for (i = 0; i < plat->rx_queues_to_use; i++) {
+   plat->rx_queues_cfg[i].mode_to_use = MTL_QUEUE_DCB;
+   plat->rx_queues_cfg[i].chan = i;
+
+   /* Disable Priority config by default */
+   plat->rx_queues_cfg[i].use_prio = false;
+
+   /* Disable RX queues routing by default */
+   plat->rx_queues_cfg[i].pkt_route = 0x0;
+   }
+
+   for (i = 0; i < plat->tx_queues_to_use; i++) {
+   plat->tx_queues_cfg[i].mode_to_use = MTL_QUEUE_DCB;
+
+   /* Disable Priority config by default */
+   plat->tx_queues_cfg[i].use_prio = false;
+   }
+
+   plat->tx_sched_algorithm = MTL_TX_ALGORITHM_WRR;
+   plat->tx_queues_cfg[0].weight = 0x09;
+   plat->tx_queues_cfg[1].weight = 0x0A;
+   plat->tx_queues_cfg[2].weight = 0x0B;
+   plat->tx_queues_cfg[3].weight = 0x0C;
+   plat->tx_queues_cfg[4].weight = 0x0D;
+   plat->tx_queues_cfg[5].weight = 0x0E;
+   plat->tx_queues_cfg[6].weight = 0x0F;
+   plat->tx_queues_cfg[7].weight = 0x10;
+
+   plat->mdio_bus_data->phy_reset = NULL;
+   plat->mdio_bus_data->phy_mask = 0;
+
+   plat->dma_cfg->pbl = 32;
+   plat->dma_cfg->pblx8 = true;
+   plat->dma_cfg->fixed_burst = 0;
+   plat->dma_cfg->mixed_burst = 0;
+   plat->dma_cfg->aal = 0;
+
+   plat->axi = devm_kzalloc(>dev, sizeof(*plat->axi),
+GFP_KERNEL);
+   if (!plat->axi)
+   return -ENOMEM;
+   plat->axi->axi_lpi_en = 0;
+   plat->axi->axi_xit_frm = 0;
+   plat->axi->axi_wr_osr_lmt = 0;
+   plat->axi->axi_rd_osr_lmt = 2;
+   plat->axi->axi_blen[0] = 4;
+   plat->axi->axi_blen[1] = 8;
+   plat->axi->axi_blen[2] = 16;
+
+   /* Set default value for multicast hash bins */
+   plat->multicast_filter_bins = HASH_TABLE_SIZE;
+
+   /* Set default value for unicast filter entries */
+   plat->unicast_filter_entries = 1;
+
+   /* Set the maxmtu to a default of JUMBO_LEN */
+   plat->maxmtu = JUMBO_LEN;
+
+   /* Set 32KB fifo size as the advertised fifo size in
+* the HW features is not the same as the HW implementation
+*/
+   plat->tx_fifo_size = 32768;
+   plat->rx_fifo_size = 32768;
+
+   return 0;
+}
+
+static int ehl_sgmii1g_data(struct pci_dev *pdev,
+   struct plat_stmmacenet_data *plat)
+{
+   int ret;
+
+   /* Set common default data first */
+   ret = ehl_common_data(pdev, plat);
+
+   if (ret)
+   return ret;
+
+   plat->interface = PHY_INTERFACE_MODE_SGMII;
+   plat->has_xpcs = 1;
+
+   return 0;
+}
+
+static struct stmmac_pci_info ehl_sgmii1g_pci_info = {
+   .setup = ehl_sgmii1g_data,
+};
+
 static const struct stmmac_pci_func_data galileo_stmmac_func_data[] = {
{
.func = 6,
@@ -290,6 +398,7 @@ static int stmmac_pci_probe(struct pci_dev *pdev,
res.addr = pcim_iomap_table(pdev)[i];
res.wol_irq = pdev->irq;
res.irq = pdev->irq;
+   res.xpcs_irq = 0;
 
return stmmac_dvr_probe(>dev, plat, );
 }
@@ -359,6 +468,7 @@ static int __maybe_unused stmmac_pci_resume(struct device 
*dev)
 
 #define STMMAC_QUARK_ID  0x0937
 #define STMMAC_DEVICE_ID 0x1108
+#define STMMAC_EHL_SGMII1G_ID   0x4b31
 
 #define STMMAC_DEVICE(vendor_id, dev_id, info) {   \
PCI_VDEVICE(vendor_id, dev_id), \
@@ -369,6 +479,7 @@ static int __maybe_unused stmmac_pci_resume(struct device 
*dev)
STMMAC_DEVICE(STMMAC, STMMAC_DEVICE_ID, stmmac_pci_info),
STMMAC_DEVICE(STMICRO,

Re: [PATCH v5 3/7] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper

2019-05-28 Thread Lu Baolu


Hi,

On 5/28/19 7:50 PM, Eric Auger wrote:

Several call sites are about to check whether a device belongs
to the PCI sub-hierarchy of a candidate PCI-PCI bridge.
Introduce an helper to perform that check.



This looks good to me.

Reviewed-by: Lu Baolu 

Best regards,
Baolu



Signed-off-by: Eric Auger 
---
  drivers/iommu/intel-iommu.c | 37 +
  1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5ec8b5bd308f..879f11c82b05 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -736,12 +736,39 @@ static int iommu_dummy(struct device *dev)
return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
  }
  
+/* is_downstream_to_pci_bridge - test if a device belongs to the

+ * PCI sub-hierarchy of a candidate PCI-PCI bridge
+ *
+ * @dev: candidate PCI device belonging to @bridge PCI sub-hierarchy
+ * @bridge: the candidate PCI-PCI bridge
+ *
+ * Return: true if @dev belongs to @bridge PCI sub-hierarchy
+ */
+static bool
+is_downstream_to_pci_bridge(struct device *dev, struct device *bridge)
+{
+   struct pci_dev *pdev, *pbridge;
+
+   if (!dev_is_pci(dev) || !dev_is_pci(bridge))
+   return false;
+
+   pdev = to_pci_dev(dev);
+   pbridge = to_pci_dev(bridge);
+
+   if (pbridge->subordinate &&
+   pbridge->subordinate->number <= pdev->bus->number &&
+   pbridge->subordinate->busn_res.end >= pdev->bus->number)
+   return true;
+
+   return false;
+}
+
  static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 
*devfn)
  {
struct dmar_drhd_unit *drhd = NULL;
struct intel_iommu *iommu;
struct device *tmp;
-   struct pci_dev *ptmp, *pdev = NULL;
+   struct pci_dev *pdev = NULL;
u16 segment = 0;
int i;
  
@@ -787,13 +814,7 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf

goto out;
}
  
-			if (!pdev || !dev_is_pci(tmp))

-   continue;
-
-   ptmp = to_pci_dev(tmp);
-   if (ptmp->subordinate &&
-   ptmp->subordinate->number <= pdev->bus->number &&
-   ptmp->subordinate->busn_res.end >= 
pdev->bus->number)
+   if (is_downstream_to_pci_bridge(dev, tmp))
goto got_pdev;
}

[PATCH net-next v3 2/5] net: stmmac: introducing support for DWC xPCS logics

2019-05-28 Thread Voon Weifeng

From: Ong Boon Leong 

xPCS is DWC Ethernet Physical Coding Sublayer that may be integrated
into a GbE controller that uses DWC EQoS MAC controller. An example of
HW configuration is shown below:-

  <-GBE Controller-->|<--External PHY chip-->

  +--+ +++---+   +--+
  |   EQoS   | <-GMII->| DW |<-->|PHY| <-- SGMII --> | External GbE |
  |   MAC| |xPCS||IF |   | PHY Chip |
  +--+ +++---+   +--+
 ^   ^  ^
 |   |  |
 +-MDIO-+

xPCS is a Clause-45 MDIO Manageable Device (MMD) and we need a way to
differentiate it from external PHY chip that is discovered over MDIO.
Therefore, xpcs_phy_addr is introduced in stmmac platform data
(plat_stmmacenet_data) for differentiating xPCS from 'phy_addr' that
belongs to external PHY.

Basic functionalities for initializing xPCS and configuring auto
negotiation (AN), loopback, link status, AN advertisement and Link
Partner ability are implemented. The implementation supports the C37
AN for 1000BASE-X and SGMII (MAC side SGMII only).

Tested-by: Tan, Tee Min 
Reviewed-by: Voon Weifeng 
Reviewed-by: Kweh Hock Leong 
Signed-off-by: Ong Boon Leong 
Signed-off-by: Voon Weifeng 
---
 drivers/net/ethernet/stmicro/stmmac/Makefile |   2 +-
 drivers/net/ethernet/stmicro/stmmac/common.h |   1 +
 drivers/net/ethernet/stmicro/stmmac/dwxpcs.c | 198 +++
 drivers/net/ethernet/stmicro/stmmac/dwxpcs.h |  51 +++
 drivers/net/ethernet/stmicro/stmmac/hwif.h   |  19 +++
 include/linux/stmmac.h   |   1 +
 6 files changed, 271 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxpcs.c
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxpcs.h

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index c529c21e9bdd..57ca648fae4e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -6,7 +6,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
  mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o  \
  dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \
  stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o dwxgmac2_descs.o \
- $(stmmac-y)
+ dwxpcs.o $(stmmac-y)
 
 # Ordering matters. Generic driver must be last.
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 272b9ca66314..67d03a5a21af 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -419,6 +419,7 @@ struct mii_regs {
 
 struct mac_device_info {
const struct stmmac_ops *mac;
+   const struct stmmac_xpcs *xpcs;
const struct stmmac_desc_ops *desc;
const struct stmmac_dma_ops *dma;
const struct stmmac_mode_ops *mode;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxpcs.c 
b/drivers/net/ethernet/stmicro/stmmac/dwxpcs.c
new file mode 100644
index ..081d3631afd2
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxpcs.c
@@ -0,0 +1,198 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019, Intel Corporation.
+ * DWC Ethernet Physical Coding Sublayer
+ */
+#include 
+#include 
+#include "dwxpcs.h"
+#include "stmmac.h"
+
+/* DW xPCS mdiobus_read and mdiobus_write helper functions */
+#define xpcs_read(dev, reg) \
+   mdiobus_read(priv->mii, xpcs_phy_addr, \
+MII_ADDR_C45 | (reg) | \
+((dev) << MII_DEVADDR_C45_SHIFT))
+#define xpcs_write(dev, reg, val) \
+   mdiobus_write(priv->mii, xpcs_phy_addr, \
+ MII_ADDR_C45 | (reg) | \
+ ((dev) << MII_DEVADDR_C45_SHIFT), val)
+
+static void dw_xpcs_init(struct net_device *ndev, int pcs_mode)
+{
+   struct stmmac_priv *priv = netdev_priv(ndev);
+   int xpcs_phy_addr = priv->plat->xpcs_phy_addr;
+   int phydata;
+
+   if (pcs_mode == AN_CTRL_PCS_MD_C37_SGMII) {
+   /* For AN for SGMII mode, the settings are :-
+* 1) VR_MII_AN_CTRL Bit(2:1)[PCS_MODE] = 10b (SGMII AN)
+* 2) VR_MII_AN_CTRL Bit(3) [TX_CONFIG] = 0b (MAC side SGMII)
+*DW xPCS used with DW EQoS MAC is always MAC
+*side SGMII.
+* 3) VR_MII_AN_CTRL Bit(0) [AN_INTR_EN] = 1b (AN Interrupt
+*enabled)
+* 4) VR_MII_DIG_CTRL1 Bit(9) [MAC_AUTO_SW] = 1b (Automatic
+*speed mode change after SGMII AN complete)
+* Note: Since it is MAC side SGMII, there is no need to set
+

[PATCH net-next v3 0/5] net: stmmac: enable EHL SGMII

2019-05-28 Thread Voon Weifeng

This patch-set is to enable Ethernet controller
(DW Ethernet QoS and DW Ethernet PCS) with SGMII interface in Elkhart Lake.
The DW Ethernet PCS is the Physical Coding Sublayer that is between Ethernet
MAC and PHY and uses MDIO Clause-45 as Communication.

Kweh Hock Leong (1):
  net: stmmac: enable clause 45 mdio support

Ong Boon Leong (3):
  net: stmmac: introducing support for DWC xPCS logics
  net: stmmac: add xpcs function hooks into main driver and ethtool
  net: stmmac: add xPCS functions for device with DWMACv5.1

Voon Weifeng (1):
  net: stmmac: add EHL SGMII 1Gbps PCI info and PCI ID

 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 drivers/net/ethernet/stmicro/stmmac/common.h   |   1 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c  |  33 
 drivers/net/ethernet/stmicro/stmmac/dwxpcs.c   | 198 +
 drivers/net/ethernet/stmicro/stmmac/dwxpcs.h   |  51 ++
 drivers/net/ethernet/stmicro/stmmac/hwif.c |  41 -
 drivers/net/ethernet/stmicro/stmmac/hwif.h |  21 +++
 drivers/net/ethernet/stmicro/stmmac/stmmac.h   |   2 +
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |  50 --
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 152 
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  |  40 -
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c   | 111 
 include/linux/phy.h|   2 +
 include/linux/stmmac.h |   3 +
 14 files changed, 649 insertions(+), 58 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxpcs.c
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxpcs.h

-- 
Changelog v2:
*Added support for the C37 AN for 1000BASE-X and SGMII (MAC side SGMII only)
*removed and submitted the fix patch to net
 "net: stmmac: dma channel control register need to be init first"
*Squash the following 2 patches and move it to the end of the patch set:
 "net: stmmac: add EHL SGMII 1Gbps platform data and PCI ID"
 "net: stmmac: add xPCS platform data for EHL"
Changelog v3:
*Applied reversed christmas tree
1.9.1

[PATCH] sparc: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD

2019-05-28 Thread Young Xiao

The PERF_EVENT_IOC_PERIOD ioctl command can be used to change the
sample period of a running perf_event. Consequently, when calculating
the next event period, the new period will only be considered after the
previous one has overflowed.

This patch changes the calculation of the remaining event ticks so that
they are offset if the period has changed.

See commit 3581fe0ef37c ("ARM: 7556/1: perf: fix updated event period in
response to PERF_EVENT_IOC_PERIOD") for details.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 arch/sparc/kernel/perf_event.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index 6de7c68..a58ae9c 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -891,6 +891,10 @@ static int sparc_perf_event_set_period(struct perf_event 
*event,
s64 period = hwc->sample_period;
int ret = 0;
 
+   /* The period may have been changed by PERF_EVENT_IOC_PERIOD */
+   if (unlikely(period != hwc->last_period))
+   left = period - (hwc->last_period - left);
+
if (unlikely(left <= -period)) {
left = period;
local64_set(>period_left, left);
-- 
2.7.4

[PATCH net-next v2] net: stmmac: Switch to devm_alloc_etherdev_mqs

2019-05-28 Thread Jisheng Zhang

Make use of devm_alloc_etherdev_mqs() to simplify the code.

Signed-off-by: Jisheng Zhang 
---
Since V1:
 - fix the build error, sorry, my bad.

 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index a87ec70b19f1..4defdcb4f237 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -4243,9 +4243,8 @@ int stmmac_dvr_probe(struct device *device,
u32 queue, maxq;
int ret = 0;
 
-   ndev = alloc_etherdev_mqs(sizeof(struct stmmac_priv),
- MTL_MAX_TX_QUEUES,
- MTL_MAX_RX_QUEUES);
+   ndev = devm_alloc_etherdev_mqs(device, sizeof(struct stmmac_priv),
+  MTL_MAX_TX_QUEUES, MTL_MAX_RX_QUEUES);
if (!ndev)
return -ENOMEM;
 
@@ -4277,8 +4276,7 @@ int stmmac_dvr_probe(struct device *device,
priv->wq = create_singlethread_workqueue("stmmac_wq");
if (!priv->wq) {
dev_err(priv->device, "failed to create workqueue\n");
-   ret = -ENOMEM;
-   goto error_wq;
+   return -ENOMEM;
}
 
INIT_WORK(>service_task, stmmac_service_task);
@@ -4434,8 +4432,6 @@ int stmmac_dvr_probe(struct device *device,
}
 error_hw_init:
destroy_workqueue(priv->wq);
-error_wq:
-   free_netdev(ndev);
 
return ret;
 }
@@ -4472,7 +4468,6 @@ int stmmac_dvr_remove(struct device *dev)
stmmac_mdio_unregister(ndev);
destroy_workqueue(priv->wq);
mutex_destroy(>lock);
-   free_netdev(ndev);
 
return 0;
 }
-- 
2.20.1

Re: [PATCH net-next] net: stmmac: Switch to devm_alloc_etherdev_mqs

2019-05-28 Thread Jisheng Zhang

On Tue, 28 May 2019 11:07:53 -0700 David Miller wrote:

> 
> You never even tried to compiled this patch.
> 

oops, my bad. I patched the another branch and tested the patch but when I
manually patch net-next tree, I made a mistake. Sorry.

Re: [RFC PATCH 0/3] Make deferred split shrinker memcg aware

2019-05-28 Thread Yang Shi





On 5/29/19 9:22 AM, David Rientjes wrote:

On Tue, 28 May 2019, Yang Shi wrote:


I got some reports from our internal application team about memcg OOM.
Even though the application has been killed by oom killer, there are
still a lot THPs reside, page reclaim doesn't reclaim them at all.

Some investigation shows they are on deferred split queue, memcg direct
reclaim can't shrink them since THP deferred split shrinker is not memcg
aware, this may cause premature OOM in memcg.  The issue can be
reproduced easily by the below test:


Right, we've also encountered this.  I talked to Kirill about it a week or
so ago where the suggestion was to split all compound pages on the
deferred split queues under the presence of even memory pressure.

That breaks cgroup isolation and perhaps unfairly penalizes workloads that
are running attached to other memcg hierarchies that are not under
pressure because their compound pages are now split as a side effect.
There is a benefit to keeping these compound pages around while not under
memory pressure if all pages are subsequently mapped again.


Yes, I do agree. I tried other approaches too, it sounds making deferred 
split queue per memcg is the optimal one.





$ cgcreate -g memory:thp
$ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes
$ cgexec -g memory:thp ./transhuge-stress 4000

transhuge-stress comes from kernel selftest.

It is easy to hit OOM, but there are still a lot THP on the deferred split
queue, memcg direct reclaim can't touch them since the deferred split
shrinker is not memcg aware.


Yes, we have seen this on at least 4.15 as well.


Convert deferred split shrinker memcg aware by introducing per memcg deferred
split queue.  The THP should be on either per node or per memcg deferred
split queue if it belongs to a memcg.  When the page is immigrated to the
other memcg, it will be immigrated to the target memcg's deferred split queue
too.

And, move deleting THP from deferred split queue in page free before memcg
uncharge so that the page's memcg information is available.

Reuse the second tail page's deferred_list for per memcg list since the same
THP can't be on multiple deferred split queues at the same time.

Remove THP specific destructor since it is not used anymore with memcg aware
THP shrinker (Please see the commit log of patch 2/3 for the details).

Make deferred split shrinker not depend on memcg kmem since it is not slab.
It doesn't make sense to not shrink THP even though memcg kmem is disabled.

With the above change the test demonstrated above doesn't trigger OOM anymore
even though with cgroup.memory=nokmem.


I'm curious if your internal applications team is also asking for
statistics on how much memory can be freed if the deferred split queues
can be shrunk?  We have applications that monitor their own memory usage


No, but this reminds me. The THPs on deferred split queue should be 
accounted into available memory too.



through memcg stats or usage and proactively try to reduce that usage when
it is growing too large.  The deferred split queues have significantly
increased both memcg usage and rss when they've upgraded kernels.

How are your applications monitoring how much memory from deferred split
queues can be freed on memory pressure?  Any thoughts on providing it as a
memcg stat?


I don't think they have such monitor. I saw rss_huge is abormal in memcg 
stat even after the application is killed by oom, so I realized the 
deferred split queue may play a role here.


The memcg stat doesn't have counters for available memory as global 
vmstat. It may be better to have such statistics, or extending 
reclaimable "slab" to shrinkable/reclaimable "memory".




Thanks!

[PATCH 1/1] Revert "drivers: thermal: tsens: Add new operation to check if a sensor is enabled"

2019-05-28 Thread Eduardo Valentin

This reverts commit 3e6a8fb3308419129c7a52de6eb42feef5a919a0.

Cc: Andy Gross 
Cc: David Brown 
Cc: Amit Kucheria 
Cc: Zhang Rui 
Cc: Daniel Lezcano 
Suggested-by: Amit Kucheria 
Reported-by: Andy Gross 
Signed-off-by: Eduardo Valentin 
---

Added this for next -rc, as per request.

 drivers/thermal/qcom/tsens-common.c | 14 --
 drivers/thermal/qcom/tsens-v0_1.c   |  1 -
 drivers/thermal/qcom/tsens-v2.c |  1 -
 drivers/thermal/qcom/tsens.c|  5 -
 drivers/thermal/qcom/tsens.h|  1 -
 5 files changed, 22 deletions(-)

diff --git a/drivers/thermal/qcom/tsens-common.c 
b/drivers/thermal/qcom/tsens-common.c
index 928e8e8..528df88 100644
--- a/drivers/thermal/qcom/tsens-common.c
+++ b/drivers/thermal/qcom/tsens-common.c
@@ -64,20 +64,6 @@ void compute_intercept_slope(struct tsens_priv *priv, u32 
*p1,
}
 }
 
-bool is_sensor_enabled(struct tsens_priv *priv, u32 hw_id)
-{
-   u32 val;
-   int ret;
-
-   if ((hw_id > (priv->num_sensors - 1)) || (hw_id < 0))
-   return -EINVAL;
-   ret = regmap_field_read(priv->rf[SENSOR_EN], );
-   if (ret)
-   return ret;
-
-   return val & (1 << hw_id);
-}
-
 static inline int code_to_degc(u32 adc_code, const struct tsens_sensor *s)
 {
int degc, num, den;
diff --git a/drivers/thermal/qcom/tsens-v0_1.c 
b/drivers/thermal/qcom/tsens-v0_1.c
index a319283..6f26fad 100644
--- a/drivers/thermal/qcom/tsens-v0_1.c
+++ b/drivers/thermal/qcom/tsens-v0_1.c
@@ -334,7 +334,6 @@ static const struct reg_field 
tsens_v0_1_regfields[MAX_REGFIELDS] = {
/* CTRL_OFFSET */
[TSENS_EN] = REG_FIELD(SROT_CTRL_OFF, 0,  0),
[TSENS_SW_RST] = REG_FIELD(SROT_CTRL_OFF, 1,  1),
-   [SENSOR_EN]= REG_FIELD(SROT_CTRL_OFF, 3, 13),
 
/* - TM -- */
/* INTERRUPT ENABLE */
diff --git a/drivers/thermal/qcom/tsens-v2.c b/drivers/thermal/qcom/tsens-v2.c
index 1099069..0a4f2b8 100644
--- a/drivers/thermal/qcom/tsens-v2.c
+++ b/drivers/thermal/qcom/tsens-v2.c
@@ -44,7 +44,6 @@ static const struct reg_field 
tsens_v2_regfields[MAX_REGFIELDS] = {
/* CTRL_OFF */
[TSENS_EN] = REG_FIELD(SROT_CTRL_OFF,0,  0),
[TSENS_SW_RST] = REG_FIELD(SROT_CTRL_OFF,1,  1),
-   [SENSOR_EN]= REG_FIELD(SROT_CTRL_OFF,3, 18),
 
/* - TM -- */
/* INTERRUPT ENABLE */
diff --git a/drivers/thermal/qcom/tsens.c b/drivers/thermal/qcom/tsens.c
index 36b0b52..0627d86 100644
--- a/drivers/thermal/qcom/tsens.c
+++ b/drivers/thermal/qcom/tsens.c
@@ -85,11 +85,6 @@ static int tsens_register(struct tsens_priv *priv)
struct thermal_zone_device *tzd;
 
for (i = 0;  i < priv->num_sensors; i++) {
-   if (!is_sensor_enabled(priv, priv->sensor[i].hw_id)) {
-   dev_err(priv->dev, "sensor %d: disabled\n",
-   priv->sensor[i].hw_id);
-   continue;
-   }
priv->sensor[i].priv = priv;
priv->sensor[i].id = i;
tzd = devm_thermal_zone_of_sensor_register(priv->dev, i,
diff --git a/drivers/thermal/qcom/tsens.h b/drivers/thermal/qcom/tsens.h
index eefe384..2fd9499 100644
--- a/drivers/thermal/qcom/tsens.h
+++ b/drivers/thermal/qcom/tsens.h
@@ -315,7 +315,6 @@ void compute_intercept_slope(struct tsens_priv *priv, u32 
*pt1, u32 *pt2, u32 mo
 int init_common(struct tsens_priv *priv);
 int get_temp_tsens_valid(struct tsens_priv *priv, int i, int *temp);
 int get_temp_common(struct tsens_priv *priv, int i, int *temp);
-bool is_sensor_enabled(struct tsens_priv *priv, u32 hw_id);
 
 /* TSENS target */
 extern const struct tsens_plat_data data_8960;
-- 
2.1.4

Re: [PATCH] thermal: tsens: Remove unnecessary comparison of unsigned integer with < 0

2019-05-28 Thread Eduardo Valentin

Gustavo,

On Mon, May 27, 2019 at 11:08:25AM -0500, Gustavo A. R. Silva wrote:
> There is no need to compare hw_id with < 0 because such comparison
> of an unsigned value is always false.
> 
> Fix this by removing such comparison.


Thanks for fixing this. But we had to revert the commit that introduces
this issue. So this patch is no longer applicable.

> 
> Addresses-Coverity-ID: 1445440 ("Unsigned compared against 0")
> Fixes: 3e6a8fb33084 ("drivers: thermal: tsens: Add new operation to check if 
> a sensor is enabled")
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/thermal/qcom/tsens-common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/qcom/tsens-common.c 
> b/drivers/thermal/qcom/tsens-common.c
> index 928e8e81ba69..94878ad35464 100644
> --- a/drivers/thermal/qcom/tsens-common.c
> +++ b/drivers/thermal/qcom/tsens-common.c
> @@ -69,7 +69,7 @@ bool is_sensor_enabled(struct tsens_priv *priv, u32 hw_id)
>   u32 val;
>   int ret;
>  
> - if ((hw_id > (priv->num_sensors - 1)) || (hw_id < 0))
> + if (hw_id > priv->num_sensors - 1)
>   return -EINVAL;
>   ret = regmap_field_read(priv->rf[SENSOR_EN], );
>   if (ret)

Re: [PATCH] kernel/sys.c: fix possible spectre-v1 in do_prlimit()

2019-05-28 Thread Dianzhang Chen

Hi,

Although when detect it is misprediction and drop the execution, but
it can not drop all the effects of speculative execution, like the
cache state. During the speculative execution, the:


rlim = tsk->signal->rlim + resource;// use resource as index

...

*old_rlim = *rlim;


may read some secret data into cache.

and then the attacker can use side-channel attack to find out what the
secret data is.


Virtually any observable effect of speculatively executed code can be
leveraged to create the covert channel that leaks sensitive
information[1].


A general form of spectre v1 would be[1]:

if (x < array1_size) {

y = array1[x];

// do something using y that is

// observable when speculatively

// executed

}


[1] https://spectreattack.com/spectre.pdf

Cyrill Gorcunov  于2019年5月28日周二 下午3:10写道：
>
> On Tue, May 28, 2019 at 10:37:10AM +0800, Dianzhang Chen wrote:
> > Hi,
> > Because when i reply your email，i always get 'Message rejected' from
> > gmail(get this rejection from all the recipients). I still don't know
> > how to deal with it, so i reply your email here:
>
> Hi! This is weird. Next time simply reply to LKML (I CC'ed it back).
>
> > Because of speculative execution, the attacker can bypass the bound
> > check `if (resource >= RLIM_NLIMITS)`.
>
> And then misprediction get detected and execution is dropped. So I
> still don't see a problem here, since we don't leak info even in
> such case.
>
> That said I don't mind for this patch but rather in a sake of
> code clarity, not because of spectre issue since it has
> nothing to do here.
>
> > as for array_index_nospec(index, size), it will clamp the index within
> > the range of [0, size), and attacker can't exploit speculative
> > execution to make the index out of range [0, size).
> >
> >
> > For more detail, please check the link below:
> >
> > https://github.com/torvalds/linux/commit/f3804203306e098dae9ca51540fcd5eb700d7f40

[PATCH] pinctrl: ns2: Fix potential NULL dereference

2019-05-28 Thread Young Xiao

platform_get_resource() may fail and return NULL, so we should
better check it's return value to avoid a NULL pointer dereference
a bit later in the code.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/pinctrl/bcm/pinctrl-ns2-mux.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pinctrl/bcm/pinctrl-ns2-mux.c 
b/drivers/pinctrl/bcm/pinctrl-ns2-mux.c
index 4b5cf0e..2bf6af7 100644
--- a/drivers/pinctrl/bcm/pinctrl-ns2-mux.c
+++ b/drivers/pinctrl/bcm/pinctrl-ns2-mux.c
@@ -1048,6 +1048,8 @@ static int ns2_pinmux_probe(struct platform_device *pdev)
return PTR_ERR(pinctrl->base0);
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+   if (!res)
+   return -EINVAL;
pinctrl->base1 = devm_ioremap_nocache(>dev, res->start,
resource_size(res));
if (!pinctrl->base1) {
-- 
2.7.4

Re: [PATCH 1/3] mm: thp: make deferred split shrinker memcg aware

2019-05-28 Thread Yang Shi





On 5/28/19 10:42 PM, Kirill Tkhai wrote:

Hi, Yang,

On 28.05.2019 15:44, Yang Shi wrote:

Currently THP deferred split shrinker is not memcg aware, this may cause
premature OOM with some configuration. For example the below test would
run into premature OOM easily:

$ cgcreate -g memory:thp
$ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes
$ cgexec -g memory:thp transhuge-stress 4000

transhuge-stress comes from kernel selftest.

It is easy to hit OOM, but there are still a lot THP on the deferred
split queue, memcg direct reclaim can't touch them since the deferred
split shrinker is not memcg aware.

Convert deferred split shrinker memcg aware by introducing per memcg
deferred split queue.  The THP should be on either per node or per memcg
deferred split queue if it belongs to a memcg.  When the page is
immigrated to the other memcg, it will be immigrated to the target
memcg's deferred split queue too.

And, move deleting THP from deferred split queue in page free before
memcg uncharge so that the page's memcg information is available.

Reuse the second tail page's deferred_list for per memcg list since the
same THP can't be on multiple deferred split queues.

Cc: Kirill Tkhai 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: "Kirill A . Shutemov" 
Cc: Hugh Dickins 
Cc: Shakeel Butt 
Signed-off-by: Yang Shi 
---
  include/linux/huge_mm.h|  24 ++
  include/linux/memcontrol.h |   6 ++
  include/linux/mm_types.h   |   7 +-
  mm/huge_memory.c   | 182 +
  mm/memcontrol.c|  20 +
  mm/swap.c  |   4 +
  6 files changed, 194 insertions(+), 49 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7cd5c15..f6d1cde 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -250,6 +250,26 @@ static inline bool thp_migration_supported(void)
return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
  }
  
+static inline struct list_head *page_deferred_list(struct page *page)

+{
+   /*
+* Global deferred list in the second tail pages is occupied by
+* compound_head.
+*/
+   return [2].deferred_list;
+}
+
+static inline struct list_head *page_memcg_deferred_list(struct page *page)
+{
+   /*
+* Memcg deferred list in the second tail pages is occupied by
+* compound_head.
+*/
+   return [2].memcg_deferred_list;
+}
+
+extern void del_thp_from_deferred_split_queue(struct page *);
+
  #else /* CONFIG_TRANSPARENT_HUGEPAGE */
  #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
  #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
@@ -368,6 +388,10 @@ static inline bool thp_migration_supported(void)
  {
return false;
  }
+
+static inline void del_thp_from_deferred_split_queue(struct page *page)
+{
+}
  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
  
  #endif /* _LINUX_HUGE_MM_H */

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index bc74d6a..9ff5fab 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -316,6 +316,12 @@ struct mem_cgroup {
struct list_head event_list;
spinlock_t event_list_lock;
  
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE

+   struct list_head split_queue;
+   unsigned long split_queue_len;
+   spinlock_t split_queue_lock;
+#endif
+
struct mem_cgroup_per_node *nodeinfo[0];
/* WARNING: nodeinfo must be the last member here */
  };
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8ec38b1..405f5e6 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -139,7 +139,12 @@ struct page {
struct {/* Second tail page of compound page */
unsigned long _compound_pad_1;  /* compound_head */
unsigned long _compound_pad_2;
-   struct list_head deferred_list;
+   union {
+   /* Global THP deferred split list */
+   struct list_head deferred_list;
+   /* Memcg THP deferred split list */
+   struct list_head memcg_deferred_list;

Why we need two namesakes for this list entry?

For me it looks redundantly: it does not give additional information,
but it leads to duplication (and we have two helpers page_deferred_list()
and page_memcg_deferred_list() instead of one).


Yes, kind of. Actually I was also wondering if this is worth or not. My 
point is this may improve the code readability. We can figure out what 
split queue (per node or per memcg) is being manipulated just by the 
name of the list.


If the most people thought this is unnecessary, I'm definitely ok to 
just keep one name.





+   };
};
struct {/* Page table pages */
unsigned long _pt_pad_1;/* compound_head */
diff --git

Re: [PATCH -next] drivers: thermal: tsens: Change hw_id type to int in is_sensor_enabled

2019-05-28 Thread Eduardo Valentin

YueHaibing,

On Mon, May 27, 2019 at 09:41:24PM +0800, YueHaibing wrote:
> Sensor hw_id is int type other u32, is_sensor_enabled
> should use int to compare, this fix smatch warning:
> 
> drivers/thermal/qcom/tsens-common.c:72
>  is_sensor_enabled() warn: unsigned 'hw_id' is never less than zero.
> 
> Fixes: 3e6a8fb33084 ("drivers: thermal: tsens: Add new operation to check if 
> a sensor is enabled")

Thanks for the patch, but we had to revert this commit which was
causing some issues. So, your patch is not applicable.

> Signed-off-by: YueHaibing 

Thank you anyways.

> ---
>  drivers/thermal/qcom/tsens-common.c | 2 +-
>  drivers/thermal/qcom/tsens.h| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/thermal/qcom/tsens-common.c 
> b/drivers/thermal/qcom/tsens-common.c
> index 928e8e81ba69..5df4eed84535 100644
> --- a/drivers/thermal/qcom/tsens-common.c
> +++ b/drivers/thermal/qcom/tsens-common.c
> @@ -64,7 +64,7 @@ void compute_intercept_slope(struct tsens_priv *priv, u32 
> *p1,
>   }
>  }
>  
> -bool is_sensor_enabled(struct tsens_priv *priv, u32 hw_id)
> +bool is_sensor_enabled(struct tsens_priv *priv, int hw_id)
>  {
>   u32 val;
>   int ret;
> diff --git a/drivers/thermal/qcom/tsens.h b/drivers/thermal/qcom/tsens.h
> index eefe3844fb4e..15264806f6a8 100644
> --- a/drivers/thermal/qcom/tsens.h
> +++ b/drivers/thermal/qcom/tsens.h
> @@ -315,7 +315,7 @@ void compute_intercept_slope(struct tsens_priv *priv, u32 
> *pt1, u32 *pt2, u32 mo
>  int init_common(struct tsens_priv *priv);
>  int get_temp_tsens_valid(struct tsens_priv *priv, int i, int *temp);
>  int get_temp_common(struct tsens_priv *priv, int i, int *temp);
> -bool is_sensor_enabled(struct tsens_priv *priv, u32 hw_id);
> +bool is_sensor_enabled(struct tsens_priv *priv, int hw_id);
>  
>  /* TSENS target */
>  extern const struct tsens_plat_data data_8960;

Re: [PATCH] arm64: dts: ls1028a: Add Thermal Monitor Unit node

2019-05-28 Thread Eduardo Valentin

On Thu, Apr 25, 2019 at 04:26:40PM +0800, Yuantian Tang wrote:
> The Thermal Monitoring Unit (TMU) monitors and reports the
> temperature from 2 remote temperature measurement sites
> located on ls1028a chip.
> Add TMU dts node to enable this feature.
> 
> Signed-off-by: Yuantian Tang 

I dont see anything wrong from a thermal standpoint.

Acked-by: Eduardo Valentin 

Please get this via your arch tree maintainer to avoid merge conflicts.

> ---
>  arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi |  114 
> 
>  1 files changed, 114 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> index b045812..a25f5fc 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> @@ -29,6 +29,7 @@
>   clocks = < 1 0>;
>   next-level-cache = <>;
>   cpu-idle-states = <_PH20>;
> + #cooling-cells = <2>;
>   };
>  
>   cpu1: cpu@1 {
> @@ -39,6 +40,7 @@
>   clocks = < 1 0>;
>   next-level-cache = <>;
>   cpu-idle-states = <_PH20>;
> + #cooling-cells = <2>;
>   };
>  
>   l2: l2-cache {
> @@ -398,6 +400,118 @@
>   status = "disabled";
>   };
>  
> + tmu: tmu@1f0 {
> + compatible = "fsl,qoriq-tmu";
> + reg = <0x0 0x1f8 0x0 0x1>;
> + interrupts = <0 23 0x4>;
> + fsl,tmu-range = <0xb 0xa0026 0x80048 0x70061>;
> + fsl,tmu-calibration = <0x 0x0024
> +0x0001 0x002b
> +0x0002 0x0031
> +0x0003 0x0038
> +0x0004 0x003f
> +0x0005 0x0045
> +0x0006 0x004c
> +0x0007 0x0053
> +0x0008 0x0059
> +0x0009 0x0060
> +0x000a 0x0066
> +0x000b 0x006d
> +
> +0x0001 0x001c
> +0x00010001 0x0024
> +0x00010002 0x002c
> +0x00010003 0x0035
> +0x00010004 0x003d
> +0x00010005 0x0045
> +0x00010006 0x004d
> +0x00010007 0x0045
> +0x00010008 0x005e
> +0x00010009 0x0066
> +0x0001000a 0x006e
> +
> +0x0002 0x0018
> +0x00020001 0x0022
> +0x00020002 0x002d
> +0x00020003 0x0038
> +0x00020004 0x0043
> +0x00020005 0x004d
> +0x00020006 0x0058
> +0x00020007 0x0063
> +0x00020008 0x006e
> +
> +0x0003 0x0010
> +0x00030001 0x001c
> +0x00030002 0x0029
> +0x00030003 0x0036
> +0x00030004 0x0042
> +0x00030005 0x004f
> +0x00030006 0x005b
> +0x00030007 0x0068>;
> + little-endian;
> + #thermal-sensor-cells = <1>;
> + };
> +
> + thermal-zones {
> + core-cluster {
> + polling-delay-passive = <1000>;
> + polling-delay = <5000>;
> + thermal-sensors = < 0>;
> +
> + trips {
> + core_cluster_alert: core-cluster-alert {
> +

[net-next:master 161/171] drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:924:6: sparse: sparse: symbol 'hclge_dbg_get_m7_stats_info' was not declared. Should it be static?

2019-05-28 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   602e0f295a91813c9a15938f2a292b9c60a416d9
commit: 33a90e2f20e6c455889a0f41857692221172a5ae [161/171] net: hns3: add 
support for dump firmware statistics by debugfs
reproduce:
# apt-get install sparse
# sparse version: v0.6.1-rc1-7-g2b96cd8-dirty
git checkout 33a90e2f20e6c455889a0f41857692221172a5ae
make ARCH=x86_64 allmodconfig
make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 


sparse warnings: (new ones prefixed by >>)

   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:32:17: sparse: 
sparse: cast from restricted __le32
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:564:31: sparse: 
sparse: restricted __le16 degrades to integer
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:598:39: sparse: 
sparse: incorrect type in assignment (different base types) @@expected 
unsigned int @@got restricted __le32unsigned int @@
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:598:39: sparse:   
 expected unsigned int
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:598:39: sparse:   
 got restricted __le32 [usertype] qs_bit_map
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:833:30: sparse: 
sparse: restricted __le16 degrades to integer
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:840:33: sparse: 
sparse: restricted __le16 degrades to integer
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:841:30: sparse: 
sparse: restricted __le16 degrades to integer
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:842:31: sparse: 
sparse: restricted __le16 degrades to integer
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:844:33: sparse: 
sparse: restricted __le16 degrades to integer
>> drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:924:6: sparse: 
>> sparse: symbol 'hclge_dbg_get_m7_stats_info' was not declared. Should it be 
>> static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

[RFC PATCH net-next] net: hns3: hclge_dbg_get_m7_stats_info() can be static

2019-05-28 Thread kbuild test robot



Fixes: 33a90e2f20e6 ("net: hns3: add support for dump firmware statistics by 
debugfs")
Signed-off-by: kbuild test robot 
---
 hclge_debugfs.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
index ed1f533..4fbed47a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
@@ -921,7 +921,7 @@ static void hclge_dbg_dump_rst_info(struct hclge_dev *hdev)
 hdev->rst_stats.reset_cnt);
 }
 
-void hclge_dbg_get_m7_stats_info(struct hclge_dev *hdev)
+static void hclge_dbg_get_m7_stats_info(struct hclge_dev *hdev)
 {
struct hclge_desc *desc_src, *desc_tmp;
struct hclge_get_m7_bd_cmd *req;

[GIT PULL] tracing: Avoid memory leak in predicate_parse()

2019-05-28 Thread Steven Rostedt



Linus,

This fixes a memory leak from the error path in the event filter logic.


Please pull the latest trace-v5.2-rc2 tree, which can be found at:


  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
trace-v5.2-rc2

Tag SHA1: 0658b13d1bfd40bda1c2bd1ef3738857e1bf4000
Head SHA1: dfb4a6f2191a80c8b790117d0ff592fd712d3296


Tomas Bortoli (1):
  tracing: Avoid memory leak in predicate_parse()


 kernel/trace/trace_events_filter.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)
---
commit dfb4a6f2191a80c8b790117d0ff592fd712d3296
Author: Tomas Bortoli 
Date:   Tue May 28 17:43:38 2019 +0200

tracing: Avoid memory leak in predicate_parse()

In case of errors, predicate_parse() goes to the out_free label
to free memory and to return an error code.

However, predicate_parse() does not free the predicates of the
temporary prog_stack array, thence leaking them.

Link: http://lkml.kernel.org/r/20190528154338.29976-1-tomasbort...@gmail.com

Cc: sta...@vger.kernel.org
Fixes: 80765597bc587 ("tracing: Rewrite filter logic to be simpler and 
faster")
Reported-by: syzbot+6b8e0fb820e570c59...@syzkaller.appspotmail.com
Signed-off-by: Tomas Bortoli 
[ Added protection around freeing prog_stack[i].pred ]
Signed-off-by: Steven Rostedt (VMware) 

diff --git a/kernel/trace/trace_events_filter.c 
b/kernel/trace/trace_events_filter.c
index d3e59312ef40..5079d1db3754 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -428,7 +428,7 @@ predicate_parse(const char *str, int nr_parens, int 
nr_preds,
op_stack = kmalloc_array(nr_parens, sizeof(*op_stack), GFP_KERNEL);
if (!op_stack)
return ERR_PTR(-ENOMEM);
-   prog_stack = kmalloc_array(nr_preds, sizeof(*prog_stack), GFP_KERNEL);
+   prog_stack = kcalloc(nr_preds, sizeof(*prog_stack), GFP_KERNEL);
if (!prog_stack) {
parse_error(pe, -ENOMEM, 0);
goto out_free;
@@ -579,7 +579,11 @@ predicate_parse(const char *str, int nr_parens, int 
nr_preds,
 out_free:
kfree(op_stack);
kfree(inverts);
-   kfree(prog_stack);
+   if (prog_stack) {
+   for (i = 0; prog_stack[i].pred; i++)
+   kfree(prog_stack[i].pred);
+   kfree(prog_stack);
+   }
return ERR_PTR(ret);
 }

[PATCH v2 net-next] net: mvpp2: cls: Remove unnessesary check in mvpp2_ethtool_cls_rule_ins

2019-05-28 Thread YueHaibing

Fix smatch warning:

drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c:1236
 mvpp2_ethtool_cls_rule_ins() warn: unsigned 'info->fs.location' is never less 
than zero.

'info->fs.location' is u32 type, never less than zero.

Signed-off-by: YueHaibing 
---
v2: rework patch based net-next
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c 
b/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c
index bd19a910dc90..e1c90adb2982 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c
@@ -1300,8 +1300,7 @@ int mvpp2_ethtool_cls_rule_ins(struct mvpp2_port *port,
struct mvpp2_ethtool_fs *efs, *old_efs;
int ret = 0;
 
-   if (info->fs.location >= MVPP2_N_RFS_ENTRIES_PER_FLOW ||
-   info->fs.location < 0)
+   if (info->fs.location >= MVPP2_N_RFS_ENTRIES_PER_FLOW)
return -EINVAL;
 
efs = kzalloc(sizeof(*efs), GFP_KERNEL);
-- 
2.20.1

Re: [PATCH RESEND V13 2/5] thermal: of-thermal: add API for getting sensor ID from DT

2019-05-28 Thread Eduardo Valentin

On Tue, May 28, 2019 at 02:06:18PM +0800, anson.hu...@nxp.com wrote:
> From: Anson Huang 
> 
> On some platforms like i.MX8QXP, the thermal driver needs a
> real HW sensor ID from DT thermal zone, the HW sensor ID is
> used to get temperature from SCU firmware, and the virtual
> sensor ID starting from 0 to N is NOT used at all, this patch
> adds new API thermal_zone_of_get_sensor_id() to provide the
> feature of getting sensor ID from DT thermal zone's node.
> 
> Signed-off-by: Anson Huang 
> ---
> Changes since V12:
>   - adjust the second parameter of thermal_zone_of_get_sensor_id() API, 
> then caller no need
> to pass the of_phandle_args structure and put the sensor_specs.np 
> manually, also putting
> the sensor node device check inside this API to make it easy for 
> usage;

What happened to using nxp,resource-id property in your driver?
Why do we need this as an API in of-thermal? What other drivers may
benefit of this?

Regardless, this patch needs to document the new API under
Documentation/

> ---
>  drivers/thermal/of-thermal.c | 66 
> +---
>  include/linux/thermal.h  | 10 +++
>  2 files changed, 60 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
> index dc5093b..a53792b 100644
> --- a/drivers/thermal/of-thermal.c
> +++ b/drivers/thermal/of-thermal.c
> @@ -449,6 +449,54 @@ thermal_zone_of_add_sensor(struct device_node *zone,
>  }
>  
>  /**
> + * thermal_zone_of_get_sensor_id - get sensor ID from a DT thermal zone
> + * @tz_np: a valid thermal zone device node.
> + * @sensor_np: a sensor node of a valid sensor device.
> + * @id: a sensor ID pointer will be passed back.
> + *
> + * This function will get sensor ID from a given thermal zone node, use
> + * "thermal-sensors" as list name, and get sensor ID from first phandle's
> + * argument.
> + *
> + * Return: 0 on success, proper error code otherwise.
> + */
> +
> +int thermal_zone_of_get_sensor_id(struct device_node *tz_np,
> +   struct device_node *sensor_np,
> +   u32 *id)
> +{
> + struct of_phandle_args sensor_specs;
> + int ret;
> +
> + ret = of_parse_phandle_with_args(tz_np,
> +  "thermal-sensors",
> +  "#thermal-sensor-cells",
> +  0,
> +  _specs);
> + if (ret)
> + return ret;
> +
> + if (sensor_specs.np != sensor_np) {
> + of_node_put(sensor_specs.np);
> + return -ENODEV;
> + }
> +
> + if (sensor_specs.args_count >= 1) {
> + *id = sensor_specs.args[0];
> + WARN(sensor_specs.args_count > 1,
> +  "%pOFn: too many cells in sensor specifier %d\n",
> +  sensor_specs.np, sensor_specs.args_count);
> + } else {
> + *id = 0;
> + }
> +
> + of_node_put(sensor_specs.np);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(thermal_zone_of_get_sensor_id);
> +
> +/**
>   * thermal_zone_of_sensor_register - registers a sensor to a DT thermal zone
>   * @dev: a valid struct device pointer of a sensor device. Must contain
>   *   a valid .of_node, for the sensor node.
> @@ -499,36 +547,22 @@ thermal_zone_of_sensor_register(struct device *dev, int 
> sensor_id, void *data,
>   sensor_np = of_node_get(dev->of_node);
>  
>   for_each_available_child_of_node(np, child) {
> - struct of_phandle_args sensor_specs;
>   int ret, id;
>  
>   /* For now, thermal framework supports only 1 sensor per zone */
> - ret = of_parse_phandle_with_args(child, "thermal-sensors",
> -  "#thermal-sensor-cells",
> -  0, _specs);
> + ret = thermal_zone_of_get_sensor_id(child, sensor_np, );
>   if (ret)
>   continue;
>  
> - if (sensor_specs.args_count >= 1) {
> - id = sensor_specs.args[0];
> - WARN(sensor_specs.args_count > 1,
> -  "%pOFn: too many cells in sensor specifier %d\n",
> -  sensor_specs.np, sensor_specs.args_count);
> - } else {
> - id = 0;
> - }
> -
> - if (sensor_specs.np == sensor_np && id == sensor_id) {
> + if (id == sensor_id) {
>   tzd = thermal_zone_of_add_sensor(child, sensor_np,
>data, ops);
>   if (!IS_ERR(tzd))
>   tzd->ops->set_mode(tzd, THERMAL_DEVICE_ENABLED);
>  
> - of_node_put(sensor_specs.np);
>   of_node_put(child);
>   goto exit;
>   }
> -

Re: [PATCH] lib: test_overflow: Avoid taining the kernel and fix wrap size

2019-05-28 Thread Kees Cook

On Tue, May 28, 2019 at 04:40:06PM -0700, Joe Perches wrote:
> On Tue, 2019-05-28 at 15:51 -0700, Kees Cook wrote:
> > This adds __GFP_NOWARN to the kmalloc()-portions of the overflow test to
> > avoid tainting the kernel. Additionally fixes up the math on wrap size
> > to be architecture and page size agnostic.
> []
> > diff --git a/lib/test_overflow.c b/lib/test_overflow.c
> []
> > @@ -486,16 +486,17 @@ static int __init test_overflow_shift(void)
> []
> > +#define alloc_GFP   (GFP_KERNEL | __GFP_NOWARN)
> []
> > +#define alloc110(alloc, arg, sz) alloc(arg, sz, alloc_GFP | __GFP_NOWARN)
> 
> seems redundant.

Whoops. Missed that one. Fixing...

-- 
Kees Cook

Re: [PATCH] thermal/drivers/of: Add a get_temp_id callback function

2019-05-28 Thread Eduardo Valentin

On Thu, May 23, 2019 at 07:48:56PM -0700, Andrey Smirnov wrote:
> On Mon, Apr 29, 2019 at 9:51 AM Daniel Lezcano
>  wrote:
> >
> > On 24/04/2019 01:08, Daniel Lezcano wrote:
> > > On 23/04/2019 17:44, Eduardo Valentin wrote:
> > >> Hello,
> > >>
> > >> On Tue, Apr 16, 2019 at 07:22:03PM +0200, Daniel Lezcano wrote:
> > >>> Currently when we register a sensor, we specify the sensor id and a data
> > >>> pointer to be passed when the get_temp function is called. However the
> > >>> sensor_id is not passed to the get_temp callback forcing the driver to
> > >>> do extra allocation and adding back pointer to find out from the sensor
> > >>> information the driver data and then back to the sensor id.
> > >>>
> > >>> Add a new callback get_temp_id() which will be called if set. It will
> > >>> call the get_temp_id() with the sensor id.
> > >>>
> > >>> That will be more consistent with the registering function.
> > >>
> > >> I still do not understand why we need to have a get_id callback.
> > >> The use cases I have seen so far, which I have been intentionally 
> > >> rejecting, are
> > >> mainly solvable by creating other compatible entries. And really, if you
> > >> have, say a bandgap, chip that supports multiple sensors, but on
> > >> SoC version A it has 5 sensors, and on SoC version B it has only 4,
> > >> or on SoC version C, it has 5 but they are either logially located
> > >> in different places (gpu vs iva regions), these are all cases in which
> > >> you want a different compatible!
> > >>
> > >> Do you mind sharing why you need a get sensor id callback?
> > >
> > > It is not a get sensor id callback, it is a get_temp callback which pass
> > > the sensor id.
> > >
> > > See in the different drivers, it is a common pattern there is a
> > > structure for the driver, then a structure for the sensor. When the
> > > get_temp is called, the callback needs info from the sensor structure
> > > and from the driver structure, so a back pointer to the driver structure
> > > is added in the sensor structure.
> >

Do you mind sending a patch showing how one could convert an existing
driver to use this new API?

> > Hi Eduardo,
> >
> > does the explanation clarifies the purpose of this change?
> >
> 
> Eduardo, did you ever have a chance to revisit this thread? I would
> really like to make some progress on this one to unblock my i.MX8MQ
> hwmon series.

The problem I have with this patch is that it is an API which resides
only in of-thermal. Growing APIs on DT only diverges of-thermal from
thermal core and platform drivers.

Besides, this patch needs to document the API in Documention/

> 
> Thanks,
> Andrey Smirnov

Re: [PATCH v3 2/4] mtd: rawnand: Add Macronix MX25F0A NAND controller

2019-05-28 Thread masonccyang



Hi Miquel,

> > > > > > +static void mxic_nand_select_chip(struct nand_chip *chip, int 
 
> > chipnr) 
> > > > > 
> > > > > _select_target() is preferred now 
> > > > 
> > > > Do you mean I implement mxic_nand_select_target() to control #CS ?
> > > > 
> > > > If so, I need to call mxic_nand_select_target( ) to control #CS ON
> > > > and then #CS OFF in _exec_op() due to nand_select_target() > nand_base,c> 
> > > > is still calling chip->legacy.select_chip ? 
> > > 
> > > You must forget about the ->select_chip() callback. Now it should be
> > > handled directly from the controller driver. Please have a look at 
the
> > > commit pointed against the marvell_nand.c driver. 
> > 
> > I have no Marvell NFC datasheet and have one question.
> > 
> > In marvell_nand.c, there is no xxx_deselect_target() or 
> > something like that doing #CS OFF.
> > marvell_nfc_select_target() seems always to make one of chip or die
> > #CS keep low.
> > 
> > Is it right ?
> 
> Yes, AFAIR there is no "de-assert" mechanism in this controller.
> 
> > 
> > How to make all #CS keep high for NAND to enter 
> > low-power standby mode if driver don't use "legacy.select_chip()" ?
> 
> See commit 02b4a52604a4 ("mtd: rawnand: Make ->select_chip() optional
> when ->exec_op() is implemented") which states:
> 
> "When [->select_chip() is] not implemented, the core is assuming
>the CS line is automatically asserted/deasserted by the driver
>->exec_op() implementation."
> 
> Of course, the above is right only when the controller driver supports
> the ->exec_op() interface. 

Currently, it seems that we will get the incorrect data and error
operation due to CS in error toggling if CS line is controlled in 
->exec_op().
i.e,. 

1) In nand_onfi_detect() to call nand_exec_op() twice by 
nand_read_param_page_op() and annd_read_data_op()

2) In nand_write_page_xxx to call nand_exec_op() many times by
nand_prog_page_begin_op(), nand_write_data_op() and 
nand_prog_page_end_op().


Should we consider to add a CS line controller in struct nand_controller
i.e,.

struct nand_controller {
 struct mutex lock;
 const struct nand_controller_ops *ops;
+  void (*select_chip)(struct nand_chip *chip, int cs);
};

to replace legacy.select_chip() ?


To patch in nand_select_target() and nand_deselect_target()

void nand_select_target(struct nand_chip *chip, unsigned int cs)
{
/*
 * cs should always lie between 0 and chip->numchips, when that's 
not
 * the case it's a bug and the caller should be fixed.
 */
if (WARN_ON(cs > chip->numchips))
return;

chip->cur_cs = cs;

+   if (chip->controller->select_chip)
+   chip->controller->select_chip(chip, cs);
+
if (chip->legacy.select_chip)
chip->legacy.select_chip(chip, cs);
}

void nand_deselect_target(struct nand_chip *chip)
{
+   if (chip->controller->select_chip)
+   chip->controller->select_chip(chip, -1);
+
if (chip->legacy.select_chip)
chip->legacy.select_chip(chip, -1);

chip->cur_cs = -1;
}


> 
> So if you think it is not too time consuming and worth the trouble to
> assert/deassert the CS at each operation, you may do it in your driver.
> 
> 
> Thanks,
> Miquèl

thanks & best regards,
Mason

CONFIDENTIALITY NOTE:

This e-mail and any attachments may contain confidential information 
and/or personal data, which is protected by applicable laws. Please be 
reminded that duplication, disclosure, distribution, or use of this e-mail 
(and/or its attachments) or any part thereof is prohibited. If you receive 
this e-mail in error, please notify us immediately and delete this mail as 
well as its attachment(s) from your system. In addition, please be 
informed that collection, processing, and/or use of personal data is 
prohibited unless expressly permitted by personal data protection laws. 
Thank you for your attention and cooperation.

Macronix International Co., Ltd.

=





CONFIDENTIALITY NOTE:

This e-mail and any attachments may contain confidential information and/or 
personal data, which is protected by applicable laws. Please be reminded that 
duplication, disclosure, distribution, or use of this e-mail (and/or its 
attachments) or any part thereof is prohibited. If you receive this e-mail in 
error, please notify us immediately and delete this mail as well as its 
attachment(s) from your system. In addition, please be informed that 
collection, processing, and/or use of personal data is prohibited unless 
expressly permitted by personal data protection laws. Thank you for your 
attention and cooperation.

Macronix International Co., Ltd.

=

[PATCH v2] lib: test_overflow: Avoid tainting the kernel and fix wrap size

2019-05-28 Thread Kees Cook

This adds __GFP_NOWARN to the kmalloc()-portions of the overflow test to
avoid tainting the kernel. Additionally fixes up the math on wrap size
to be architecture and page size agnostic.

Reported-by: Randy Dunlap 
Suggested-by: Rasmus Villemoes 
Fixes: ca90800a91ba ("test_overflow: Add memory allocation overflow tests")
Signed-off-by: Kees Cook 
---
v2: fix leftover __GFP_NOWARN (joe)
---
 lib/test_overflow.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/lib/test_overflow.c b/lib/test_overflow.c
index fc680562d8b6..7a4b6f6c5473 100644
--- a/lib/test_overflow.c
+++ b/lib/test_overflow.c
@@ -486,16 +486,17 @@ static int __init test_overflow_shift(void)
  * Deal with the various forms of allocator arguments. See comments above
  * the DEFINE_TEST_ALLOC() instances for mapping of the "bits".
  */
-#define alloc010(alloc, arg, sz) alloc(sz, GFP_KERNEL)
-#define alloc011(alloc, arg, sz) alloc(sz, GFP_KERNEL, NUMA_NO_NODE)
+#define alloc_GFP   (GFP_KERNEL | __GFP_NOWARN)
+#define alloc010(alloc, arg, sz) alloc(sz, alloc_GFP)
+#define alloc011(alloc, arg, sz) alloc(sz, alloc_GFP, NUMA_NO_NODE)
 #define alloc000(alloc, arg, sz) alloc(sz)
 #define alloc001(alloc, arg, sz) alloc(sz, NUMA_NO_NODE)
-#define alloc110(alloc, arg, sz) alloc(arg, sz, GFP_KERNEL)
+#define alloc110(alloc, arg, sz) alloc(arg, sz, alloc_GFP)
 #define free0(free, arg, ptr)   free(ptr)
 #define free1(free, arg, ptr)   free(arg, ptr)
 
-/* Wrap around to 8K */
-#define TEST_SIZE  (9 << PAGE_SHIFT)
+/* Wrap around to 16K */
+#define TEST_SIZE  (5 * 4096)
 
 #define DEFINE_TEST_ALLOC(func, free_func, want_arg, want_gfp, want_node)\
 static int __init test_ ## func (void *arg)\
-- 
2.17.1


-- 
Kees Cook

Re: [PATCH -next] EDAC: aspeed: Remove set but not used variable 'np'

2019-05-28 Thread Stefan Schaeckeler (sschaeck)

On  Tuesday, May 28, 2019 at 6:27 PM, Andrew Jeffery wrote:
> On Sun, 26 May 2019, at 00:12, YueHaibing wrote:
> > Fixes gcc '-Wunused-but-set-variable' warning:
> >
> > drivers/edac/aspeed_edac.c: In function aspeed_probe:
> > drivers/edac/aspeed_edac.c:284:22: warning: variable np set but not
> > used [-Wunused-but-set-variable]
> >
> > It is never used and can be removed.
> >
> > Signed-off-by: YueHaibing 
>
> Reviewed-by: Andrew Jeffery 

Reviewed-by: Stefan Schaeckeler 

> > ---
> >  drivers/edac/aspeed_edac.c | 4 
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/drivers/edac/aspeed_edac.c b/drivers/edac/aspeed_edac.c
> > index 11833c0a5d07..5634437bb39d 100644
> > --- a/drivers/edac/aspeed_edac.c
> > +++ b/drivers/edac/aspeed_edac.c
> > @@ -281,15 +281,11 @@ static int aspeed_probe(struct platform_device *pdev)
> > struct device *dev = >dev;
> > struct edac_mc_layer layers[2];
> > struct mem_ctl_info *mci;
> > -   struct device_node *np;
> > struct resource *res;
> > void __iomem *regs;
> > u32 reg04;
> > int rc;
> >
> > -   /* setup regmap */
> > -   np = dev->of_node;
> > -
> > res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > if (!res)
> > return -ENOENT;
> > --
> > 2.17.1

Re: [PATCH] cpumask: Remove error message and backtrace on out-of-memory condition

2019-05-28 Thread Andrew Morton

On Mon, 27 May 2019 14:29:58 +0200 Geert Uytterhoeven  
wrote:

> There is no need to print an error message and backtrace if
> kmalloc_node() fails, as the memory allocation core already takes care
> of that.
> 
> ...
>
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -114,13 +114,6 @@ bool alloc_cpumask_var_node(cpumask_var_t *mask, gfp_t 
> flags, int node)
>  {
>   *mask = kmalloc_node(cpumask_size(), flags, node);
>  
> -#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> - if (!*mask) {
> - printk(KERN_ERR "=> alloc_cpumask_var: failed!\n");
> - dump_stack();
> - }
> -#endif
> -
>   return *mask != NULL;
>  }
>  EXPORT_SYMBOL(alloc_cpumask_var_node);

Well, not really - as it stands CONFIG_DEBUG_PER_CPU_MAPS=y can override a
caller's __GFP_NOWARN.

I wonder if anyone ever sets CONFIG_DEBUG_PER_CPU_MAPS any more...

Re: [PATCH 3/4] vsock/virtio: fix flush of works during the .remove()

2019-05-28 Thread Jason Wang




On 2019/5/28 下午6:56, Stefano Garzarella wrote:

We flush all pending works before to call vdev->config->reset(vdev),
but other works can be queued before the vdev->config->del_vqs(vdev),
so we add another flush after it, to avoid use after free.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Stefano Garzarella 
---
  net/vmw_vsock/virtio_transport.c | 23 +--
  1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index e694df10ab61..ad093ce96693 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -660,6 +660,15 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
return ret;
  }
  
+static void virtio_vsock_flush_works(struct virtio_vsock *vsock)

+{
+   flush_work(>loopback_work);
+   flush_work(>rx_work);
+   flush_work(>tx_work);
+   flush_work(>event_work);
+   flush_work(>send_pkt_work);
+}
+
  static void virtio_vsock_remove(struct virtio_device *vdev)
  {
struct virtio_vsock *vsock = vdev->priv;
@@ -668,12 +677,6 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
mutex_lock(_virtio_vsock_mutex);
the_virtio_vsock = NULL;
  
-	flush_work(>loopback_work);

-   flush_work(>rx_work);
-   flush_work(>tx_work);
-   flush_work(>event_work);
-   flush_work(>send_pkt_work);
-
/* Reset all connected sockets when the device disappear */
vsock_for_each_connected_socket(virtio_vsock_reset_sock);
  
@@ -690,6 +693,9 @@ static void virtio_vsock_remove(struct virtio_device *vdev)

vsock->event_run = false;
mutex_unlock(>event_lock);
  
+	/* Flush all pending works */

+   virtio_vsock_flush_works(vsock);
+
/* Flush all device writes and interrupts, device will not use any
 * more buffers.
 */
@@ -726,6 +732,11 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
/* Delete virtqueues and flush outstanding callbacks if any */
vdev->config->del_vqs(vdev);
  
+	/* Other works can be queued before 'config->del_vqs()', so we flush

+* all works before to free the vsock object to avoid use after free.
+*/
+   virtio_vsock_flush_works(vsock);



Some questions after a quick glance:

1) It looks to me that the work could be queued from the path of 
vsock_transport_cancel_pkt() . Is that synchronized here?


2) If we decide to flush after dev_vqs(), is tx_run/rx_run/event_run 
still needed? It looks to me we've already done except that we need 
flush rx_work in the end since send_pkt_work can requeue rx_work.


Thanks



+
kfree(vsock);
mutex_unlock(_virtio_vsock_mutex);
  }

Re: linux-next: Fixes tag needs some work in the cifs tree

2019-05-28 Thread Murphy Zhou

On Fri, May 24, 2019 at 10:14 PM Steve French  wrote:
>
> fixed and repushed to cifs-2.6.git for-next

Thanks!

[resend including mail lists]

>
> On Thu, May 23, 2019 at 11:27 PM Stephen Rothwell  
> wrote:
> >
> > Hi all,
> >
> > In commit
> >
> >   f875253b5fe6 ("fs/cifs/smb2pdu.c: fix buffer free in SMB2_ioctl_free")
> >
> > Fixes tag
> >
> >   Fixes: 2c87d6a ("cifs: Allocate memory for all iovs in smb2_ioctl")
> >
> > has these problem(s):
> >
> >   - SHA1 should be at least 12 digits long
> > Can be fixed by setting core.abbrev to 12 (or more) or (for git v2.11
> > or later) just making sure it is not set (or set to "auto").
> >
> > --
> > Cheers,
> > Stephen Rothwell
>
>
>
> --
> Thanks,
>
> Steve

RE: [EXT] Re: [PATCH] arm64: dts: ls1028a: Add Thermal Monitor Unit node

2019-05-28 Thread Andy Tang

> -Original Message-
> From: Eduardo Valentin 
> Sent: 2019年5月29日 10:54
> To: Andy Tang 
> Cc: shawn...@kernel.org; Leo Li ;
> robh...@kernel.org; mark.rutl...@arm.com;
> linux-arm-ker...@lists.infradead.org; devicet...@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux...@vger.kernel.org;
> daniel.lezc...@linaro.org; rui.zh...@intel.com
> Subject: [EXT] Re: [PATCH] arm64: dts: ls1028a: Add Thermal Monitor Unit
> node
> 
> Caution: EXT Email
> 
> On Thu, Apr 25, 2019 at 04:26:40PM +0800, Yuantian Tang wrote:
> > The Thermal Monitoring Unit (TMU) monitors and reports the temperature
> > from 2 remote temperature measurement sites located on ls1028a chip.
> > Add TMU dts node to enable this feature.
> >
> > Signed-off-by: Yuantian Tang 
> 
> I dont see anything wrong from a thermal standpoint.
> 
> Acked-by: Eduardo Valentin 
> 
> Please get this via your arch tree maintainer to avoid merge conflicts.
Thanks for your review. 
The only concern for arch tree maintainer is that "cooling-maps" is a required 
property.
So I have to add cooling-maps for each zone. 
Since there are two thermal zones but only one cooling device, which is 
cpufreq, I have to
use CPUFREQ as cooling device twice which may cause cooling decision conflict.
The case will get worse when we have 7 thermal zones.
This makes me think "maybe we need to change cooling-maps to an optional 
property".
In this way, we can put the cooling devices to specific thermal zones and leave 
the zones without
Cooling devices to do the default action which is reset or poweroff soc.
What's your opinion about this?

BR,
Andy

> 
> > ---
> >  arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi |  114
> > 
> >  1 files changed, 114 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> > b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> > index b045812..a25f5fc 100644
> > --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> > @@ -29,6 +29,7 @@
> >   clocks = < 1 0>;
> >   next-level-cache = <>;
> >   cpu-idle-states = <_PH20>;
> > + #cooling-cells = <2>;
> >   };
> >
> >   cpu1: cpu@1 {
> > @@ -39,6 +40,7 @@
> >   clocks = < 1 0>;
> >   next-level-cache = <>;
> >   cpu-idle-states = <_PH20>;
> > + #cooling-cells = <2>;
> >   };
> >
> >   l2: l2-cache {
> > @@ -398,6 +400,118 @@
> >   status = "disabled";
> >   };
> >
> > + tmu: tmu@1f0 {
> > + compatible = "fsl,qoriq-tmu";
> > + reg = <0x0 0x1f8 0x0 0x1>;
> > + interrupts = <0 23 0x4>;
> > + fsl,tmu-range = <0xb 0xa0026 0x80048
> 0x70061>;
> > + fsl,tmu-calibration = <0x 0x0024
> > +0x0001
> 0x002b
> > +0x0002
> 0x0031
> > +0x0003
> 0x0038
> > +0x0004
> 0x003f
> > +0x0005
> 0x0045
> > +0x0006
> 0x004c
> > +0x0007
> 0x0053
> > +0x0008
> 0x0059
> > +0x0009
> 0x0060
> > +0x000a
> 0x0066
> > +0x000b
> 0x006d
> > +
> > +0x0001
> 0x001c
> > +0x00010001
> 0x0024
> > +0x00010002
> 0x002c
> > +0x00010003
> 0x0035
> > +0x00010004
> 0x003d
> > +0x00010005
> 0x0045
> > +0x00010006
> 0x004d
> > +0x00010007
> 0x0045
> > +0x00010008
> 0x005e
> > +0x00010009
> 0x0066
> > +0x0001000a
> 0x006e
> > +
> > +0x0002
> 0x0018
> > +0x00020001
> 0x0022
> > +0x00020002
> 0x002d
> > +0x00020003
> 0x0038
> > +

ebpf trace doesn't work during cpu hotplug

2019-05-28 Thread Ming Lei

Hi,

Looks ebpf trace doesn't work during cpu hotplug, see the following trace:

1) trace two functions called during CPU unplug via bcc/trace

/usr/share/bcc/tools/trace -T 'takedown_cpu "%d", arg1'  'take_cpu_down'

2) put cpu7 offline via:

echo 0 > /sys/devices/system/cpu/cpu7/online

3) only trace on 'takedown_cpu' is dumped via bcc/trace:

TIME PID TID COMMFUNC -
03:23:17 733 733 bashtakedown_cpu 7

The lost trace on 'take_cpu_down' can never be shown, even though
CPU7 is switched ON again.

take_cpu_down is called via stop_machine_cpuslocked.

Thanks,
Ming Lei

[PATCH] ASoC: cs42xx8: Fix build error with CONFIG_GPIOLIB is not set

2019-05-28 Thread shengjiu . wang

From: Shengjiu Wang 

config: x86_64-randconfig-x000201921-201921
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
make ARCH=x86_64

sound/soc/codecs/cs42xx8.c: In function ‘cs42xx8_probe’:
sound/soc/codecs/cs42xx8.c:472:25: error: implicit declaration of function 
‘devm_gpiod_get_optional’; did you mean ‘devm_clk_get_optional’? 
[-Werror=implicit-function-declaration]
  cs42xx8->gpiod_reset = devm_gpiod_get_optional(dev, "reset",
 ^~~
 devm_clk_get_optional
sound/soc/codecs/cs42xx8.c:473:8: error: ‘GPIOD_OUT_HIGH’ undeclared (first use 
in this function); did you mean ‘GPIOF_INIT_HIGH’?
GPIOD_OUT_HIGH);
^~
GPIOF_INIT_HIGH
sound/soc/codecs/cs42xx8.c:473:8: note: each undeclared identifier is reported 
only once for each function it appears in
sound/soc/codecs/cs42xx8.c:477:2: error: implicit declaration of function 
‘gpiod_set_value_cansleep’; did you mean ‘gpio_set_value_cansleep’? 
[-Werror=implicit-function-declaration]
  gpiod_set_value_cansleep(cs42xx8->gpiod_reset, 0);
  ^~~~
  gpio_set_value_cansleep

Fixes: bfe95dfa4dac ("ASoC: cs42xx8: Add reset gpio handling")
Reported-by: kbuild test robot 
Signed-off-by: Shengjiu Wang 
---
 sound/soc/codecs/cs42xx8.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/codecs/cs42xx8.c b/sound/soc/codecs/cs42xx8.c
index 3e8dbf63adbe..3bbc62322dfe 100644
--- a/sound/soc/codecs/cs42xx8.c
+++ b/sound/soc/codecs/cs42xx8.c
@@ -14,7 +14,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
-- 
2.21.0

RE: [PATCH RESEND V13 2/5] thermal: of-thermal: add API for getting sensor ID from DT

2019-05-28 Thread Anson Huang

Hi, Eduardo

> -Original Message-
> From: Eduardo Valentin 
> Sent: Wednesday, May 29, 2019 11:02 AM
> To: Anson Huang 
> Cc: robh...@kernel.org; mark.rutl...@arm.com; shawn...@kernel.org;
> s.ha...@pengutronix.de; ker...@pengutronix.de; feste...@gmail.com;
> catalin.mari...@arm.com; will.dea...@arm.com; rui.zh...@intel.com;
> daniel.lezc...@linaro.org; Aisheng Dong ;
> ulf.hans...@linaro.org; Peng Fan ; Daniel Baluta
> ; maxime.rip...@bootlin.com; o...@lixom.net;
> ja...@amarulasolutions.com; horms+rene...@verge.net.au; Leonard Crestez
> ; bjorn.anders...@linaro.org;
> dingu...@kernel.org; enric.balle...@collabora.com;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; linux...@vger.kernel.org; dl-linux-imx  i...@nxp.com>
> Subject: Re: [PATCH RESEND V13 2/5] thermal: of-thermal: add API for getting
> sensor ID from DT
> 
> On Tue, May 28, 2019 at 02:06:18PM +0800, anson.hu...@nxp.com wrote:
> > From: Anson Huang 
> >
> > On some platforms like i.MX8QXP, the thermal driver needs a real HW
> > sensor ID from DT thermal zone, the HW sensor ID is used to get
> > temperature from SCU firmware, and the virtual sensor ID starting from
> > 0 to N is NOT used at all, this patch adds new API
> > thermal_zone_of_get_sensor_id() to provide the feature of getting
> > sensor ID from DT thermal zone's node.
> >
> > Signed-off-by: Anson Huang 
> > ---
> > Changes since V12:
> > - adjust the second parameter of thermal_zone_of_get_sensor_id() API,
> then caller no need
> >   to pass the of_phandle_args structure and put the sensor_specs.np
> manually, also putting
> >   the sensor node device check inside this API to make it easy for
> > usage;
> 
> What happened to using nxp,resource-id property in your driver?
> Why do we need this as an API in of-thermal? What other drivers may benefit
> of this?
> 
> Regardless, this patch needs to document the new API under Documentation/

As Rob has different opinion about this property, he thought it is unnecessary, 
see below
discussion mail, that is why I need to add API to get the resource ID from 
phandle argument.
I am totally confused now, which approach should we adopt?

https://patchwork.kernel.org/patch/10831397/

Thanks,
Anson

> 
> > ---
> >  drivers/thermal/of-thermal.c | 66 +---
> 
> >  include/linux/thermal.h  | 10 +++
> >  2 files changed, 60 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/thermal/of-thermal.c
> > b/drivers/thermal/of-thermal.c index dc5093b..a53792b 100644
> > --- a/drivers/thermal/of-thermal.c
> > +++ b/drivers/thermal/of-thermal.c
> > @@ -449,6 +449,54 @@ thermal_zone_of_add_sensor(struct device_node
> > *zone,  }
> >
> >  /**
> > + * thermal_zone_of_get_sensor_id - get sensor ID from a DT thermal
> > + zone
> > + * @tz_np: a valid thermal zone device node.
> > + * @sensor_np: a sensor node of a valid sensor device.
> > + * @id: a sensor ID pointer will be passed back.
> > + *
> > + * This function will get sensor ID from a given thermal zone node,
> > + use
> > + * "thermal-sensors" as list name, and get sensor ID from first
> > + phandle's
> > + * argument.
> > + *
> > + * Return: 0 on success, proper error code otherwise.
> > + */
> > +
> > +int thermal_zone_of_get_sensor_id(struct device_node *tz_np,
> > + struct device_node *sensor_np,
> > + u32 *id)
> > +{
> > +   struct of_phandle_args sensor_specs;
> > +   int ret;
> > +
> > +   ret = of_parse_phandle_with_args(tz_np,
> > +"thermal-sensors",
> > +"#thermal-sensor-cells",
> > +0,
> > +_specs);
> > +   if (ret)
> > +   return ret;
> > +
> > +   if (sensor_specs.np != sensor_np) {
> > +   of_node_put(sensor_specs.np);
> > +   return -ENODEV;
> > +   }
> > +
> > +   if (sensor_specs.args_count >= 1) {
> > +   *id = sensor_specs.args[0];
> > +   WARN(sensor_specs.args_count > 1,
> > +"%pOFn: too many cells in sensor specifier %d\n",
> > +sensor_specs.np, sensor_specs.args_count);
> > +   } else {
> > +   *id = 0;
> > +   }
> > +
> > +   of_node_put(sensor_specs.np);
> > +
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(thermal_zone_of_get_sensor_id);
> > +
> > +/**
> >   * thermal_zone_of_sensor_register - registers a sensor to a DT thermal 
> > zone
> >   * @dev: a valid struct device pointer of a sensor device. Must contain
> >   *   a valid .of_node, for the sensor node.
> > @@ -499,36 +547,22 @@ thermal_zone_of_sensor_register(struct device
> *dev, int sensor_id, void *data,
> > sensor_np = of_node_get(dev->of_node);
> >
> > for_each_available_child_of_node(np, child) {
> > -   struct of_phandle_args sensor_specs;
> > int ret, id;
> >
> > /*

[PATCH] NFC: microread/pn544: Fix possible null pointer deference error

2019-05-28 Thread Young Xiao

When there is an access phy-hdev in pn544_hci_i2c_irq_thread_fn or
microread_i2c_irq_thread_fn, it is not initialized in pn544_hci_i2c_probe
or microread_i2c_probe.

Therefore, we change the order of calling function xxx_probe and
request_threaded_irq, and add guard of phy->hdev in
xxx_i2c_irq_thread_fn function.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/nfc/microread/i2c.c | 19 +++
 drivers/nfc/pn544/i2c.c | 16 
 2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/drivers/nfc/microread/i2c.c b/drivers/nfc/microread/i2c.c
index 1806d20..80fc6d5 100644
--- a/drivers/nfc/microread/i2c.c
+++ b/drivers/nfc/microread/i2c.c
@@ -212,7 +212,7 @@ static irqreturn_t microread_i2c_irq_thread_fn(int irq, 
void *phy_id)
struct sk_buff *skb = NULL;
int r;
 
-   if (!phy || irq != phy->i2c_dev->irq) {
+   if (!phy || !phy->hdev || irq != phy->i2c_dev->irq) {
WARN_ON_ONCE(1);
return IRQ_NONE;
}
@@ -257,6 +257,12 @@ static int microread_i2c_probe(struct i2c_client *client,
 
i2c_set_clientdata(client, phy);
phy->i2c_dev = client;
+   r = microread_probe(phy, _phy_ops, LLC_SHDLC_NAME,
+   MICROREAD_I2C_FRAME_HEADROOM,
+   MICROREAD_I2C_FRAME_TAILROOM,
+   MICROREAD_I2C_LLC_MAX_PAYLOAD, >hdev);
+   if (r < 0)
+   return r;
 
r = request_threaded_irq(client->irq, NULL, microread_i2c_irq_thread_fn,
 IRQF_TRIGGER_RISING | IRQF_ONESHOT,
@@ -266,21 +272,10 @@ static int microread_i2c_probe(struct i2c_client *client,
return r;
}
 
-   r = microread_probe(phy, _phy_ops, LLC_SHDLC_NAME,
-   MICROREAD_I2C_FRAME_HEADROOM,
-   MICROREAD_I2C_FRAME_TAILROOM,
-   MICROREAD_I2C_LLC_MAX_PAYLOAD, >hdev);
-   if (r < 0)
-   goto err_irq;
 
nfc_info(>dev, "Probed\n");
 
return 0;
-
-err_irq:
-   free_irq(client->irq, phy);
-
-   return r;
 }
 
 static int microread_i2c_remove(struct i2c_client *client)
diff --git a/drivers/nfc/pn544/i2c.c b/drivers/nfc/pn544/i2c.c
index d0207f8..c9694c8 100644
--- a/drivers/nfc/pn544/i2c.c
+++ b/drivers/nfc/pn544/i2c.c
@@ -496,7 +496,7 @@ static irqreturn_t pn544_hci_i2c_irq_thread_fn(int irq, 
void *phy_id)
struct sk_buff *skb = NULL;
int r;
 
-   if (!phy || irq != phy->i2c_dev->irq) {
+   if (!phy || !phy->hdev || irq != phy->i2c_dev->irq) {
WARN_ON_ONCE(1);
return IRQ_NONE;
}
@@ -924,6 +924,13 @@ static int pn544_hci_i2c_probe(struct i2c_client *client,
 
pn544_hci_i2c_platform_init(phy);
 
+   r = pn544_hci_probe(phy, _phy_ops, LLC_SHDLC_NAME,
+   PN544_I2C_FRAME_HEADROOM, PN544_I2C_FRAME_TAILROOM,
+   PN544_HCI_I2C_LLC_MAX_PAYLOAD,
+   pn544_hci_i2c_fw_download, >hdev);
+   if (r < 0)
+   return r;
+
r = devm_request_threaded_irq(>dev, client->irq, NULL,
  pn544_hci_i2c_irq_thread_fn,
  IRQF_TRIGGER_RISING | IRQF_ONESHOT,
@@ -933,13 +940,6 @@ static int pn544_hci_i2c_probe(struct i2c_client *client,
return r;
}
 
-   r = pn544_hci_probe(phy, _phy_ops, LLC_SHDLC_NAME,
-   PN544_I2C_FRAME_HEADROOM, PN544_I2C_FRAME_TAILROOM,
-   PN544_HCI_I2C_LLC_MAX_PAYLOAD,
-   pn544_hci_i2c_fw_download, >hdev);
-   if (r < 0)
-   return r;
-
return 0;
 }
 
-- 
2.7.4

[PATCH 1/1] Drivers: hv: vmbus: Break out ISA independent parts of mshyperv.h

2019-05-28 Thread Michael Kelley

Break out parts of mshyperv.h that are ISA independent into a
separate file in include/asm-generic. This move facilitates
ARM64 code reusing these definitions and avoids code
duplication. No functionality or behavior is changed.

Signed-off-by: Michael Kelley 
---
 MAINTAINERS |   1 +
 arch/x86/include/asm/mshyperv.h | 147 +---
 include/asm-generic/mshyperv.h  | 182 
 3 files changed, 187 insertions(+), 143 deletions(-)
 create mode 100644 include/asm-generic/mshyperv.h

diff --git a/MAINTAINERS b/MAINTAINERS
index cf2a5b7..521192d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7308,6 +7308,7 @@ F:net/vmw_vsock/hyperv_transport.c
 F: include/clocksource/hyperv_timer.h
 F: include/linux/hyperv.h
 F: include/uapi/linux/hyperv.h
+F: include/asm-generic/mshyperv.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index f4fa8a9..2a793bf 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -3,84 +3,15 @@
 #define _ASM_X86_MSHYPER_H
 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
 
-#define VP_INVAL   U32_MAX
-
-struct ms_hyperv_info {
-   u32 features;
-   u32 misc_features;
-   u32 hints;
-   u32 nested_features;
-   u32 max_vp_index;
-   u32 max_lp_index;
-};
-
-extern struct ms_hyperv_info ms_hyperv;
-
-
 typedef int (*hyperv_fill_flush_list_func)(
struct hv_guest_mapping_flush_list *flush,
void *data);
 
-/*
- * Generate the guest ID.
- */
-
-static inline  __u64 generate_guest_id(__u64 d_info1, __u64 kernel_version,
-  __u64 d_info2)
-{
-   __u64 guest_id = 0;
-
-   guest_id = (((__u64)HV_LINUX_VENDOR_ID) << 48);
-   guest_id |= (d_info1 << 48);
-   guest_id |= (kernel_version << 16);
-   guest_id |= d_info2;
-
-   return guest_id;
-}
-
-
-/* Free the message slot and signal end-of-message if required */
-static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
-{
-   /*
-* On crash we're reading some other CPU's message page and we need
-* to be careful: this other CPU may already had cleared the header
-* and the host may already had delivered some other message there.
-* In case we blindly write msg->header.message_type we're going
-* to lose it. We can still lose a message of the same type but
-* we count on the fact that there can only be one
-* CHANNELMSG_UNLOAD_RESPONSE and we don't care about other messages
-* on crash.
-*/
-   if (cmpxchg(>header.message_type, old_msg_type,
-   HVMSG_NONE) != old_msg_type)
-   return;
-
-   /*
-* Make sure the write to MessageType (ie set to
-* HVMSG_NONE) happens before we read the
-* MessagePending and EOMing. Otherwise, the EOMing
-* will not deliver any more messages since there is
-* no empty slot
-*/
-   mb();
-
-   if (msg->header.message_flags.msg_pending) {
-   /*
-* This will cause message queue rescan to
-* possibly deliver another msg from the
-* hypervisor
-*/
-   wrmsrl(HV_X64_MSR_EOM, 0);
-   }
-}
-
 #define hv_init_timer(timer, tick) \
wrmsrl(HV_X64_MSR_STIMER0_COUNT + (2*timer), tick)
 #define hv_init_timer_config(timer, val) \
@@ -97,6 +28,8 @@ static inline void vmbus_signal_eom(struct hv_message *msg, 
u32 old_msg_type)
 
 #define hv_get_vp_index(index) rdmsrl(HV_X64_MSR_VP_INDEX, index)
 
+#define hv_signal_eom() wrmsrl(HV_X64_MSR_EOM, 0)
+
 #define hv_get_synint_state(int_num, val) \
rdmsrl(HV_X64_MSR_SINT0 + int_num, val)
 #define hv_set_synint_state(int_num, val) \
@@ -122,13 +55,6 @@ static inline void vmbus_signal_eom(struct hv_message *msg, 
u32 old_msg_type)
 #define trace_hyperv_callback_vector hyperv_callback_vector
 #endif
 void hyperv_vector_handler(struct pt_regs *regs);
-void hv_setup_vmbus_irq(void (*handler)(void));
-void hv_remove_vmbus_irq(void);
-
-void hv_setup_kexec_handler(void (*handler)(void));
-void hv_remove_kexec_handler(void);
-void hv_setup_crash_handler(void (*handler)(struct pt_regs *regs));
-void hv_remove_crash_handler(void);
 
 /*
  * Routines for stimer0 Direct Mode handling.
@@ -136,8 +62,6 @@ static inline void vmbus_signal_eom(struct hv_message *msg, 
u32 old_msg_type)
  */
 void hv_stimer0_vector_handler(struct pt_regs *regs);
 void hv_stimer0_callback_vector(void);
-int hv_setup_stimer0_irq(int *irq, int *vector, void (*handler)(void));
-void hv_remove_stimer0_irq(int irq);
 
 static inline void hv_enable_stimer0_percpu_irq(int irq) {}
 static inline void hv_disable_stimer0_percpu_irq(int irq) {}
@@ -282,14 +206,6 @@ static inline u64

Re: [PATCH v2] mm/swap: Fix release_pages() when releasing devmap pages

2019-05-28 Thread Ira Weiny

On Mon, May 27, 2019 at 05:01:07PM +0200, Michal Hocko wrote:
> On Fri 24-05-19 10:36:56, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> > 
> > Device pages can be more than type MEMORY_DEVICE_PUBLIC.
> > 
> > Handle all device pages within release_pages()
> > 
> > This was found via code inspection while determining if release_pages()
> > and the new put_user_pages() could be interchangeable.
> 
> Please expand more about who is such a user and why does it use
> release_pages rather than put_*page API.

Sorry for not being more clear.   The error was discovered while discussing a
proposal to change a use of release_pages() to put_user_pages()[1]

[1] 
https://lore.kernel.org/lkml/20190523172852.ga27...@iweiny-desk2.sc.intel.com/

In that thread John was saying that release_pages() was functionally equivalent
to a loop around put_page().  He also suggested implementing put_user_pages()
by using release_pages().  On the surface they did not seem the same to me so I
did a deep dive to make sure they were and found this error.

>
> The above changelog doesn't
> really help understanding what is the actual problem. I also do not
> understand the fix and a failure mode from release_pages is just scary.

This is not failing release_pages().  The fix is that not all devmap pages are
"public" type.  So previous to this change devmap pages of other types would
not correctly be accounted for.

The discussion about put_devmap_managed_page() "failing" is not about it
failing directly but rather in how these pages should be accounted for.  Only
devmap pages which require pagemap ops (specifically page_free()) require
put_devmap_managed_page() processing.   Because of the optimized locking in
release_pages() the zone device check is required to release the lock even if
put_devmap_managed_page() does not handle the put.

> It is basically impossible to handle the error case. So what is going on
> here?

I think what has happened is the code in release_pages() and put_page()
diverged at some point.  I think it is worth a clean up in this area but I
don't see way to do it at the moment which would be any cleaner than what is
there.  So I've refrained from doing so.

Does this help?  Would you like to roll a V3 with some of this in the commit
message?

Ira

>
>
>
> 
> > Cc: Jérôme Glisse 
> > Cc: Michal Hocko 
> > Reviewed-by: Dan Williams 
> > Reviewed-by: John Hubbard 
> > Signed-off-by: Ira Weiny 
> > 
> > ---
> > Changes from V1:
> > Add comment clarifying that put_devmap_managed_page() can still
> > fail.
> > Add Reviewed-by tags.
> > 
> >  mm/swap.c | 11 +++
> >  1 file changed, 7 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 9d0432baddb0..f03b7b4bfb4f 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -740,15 +740,18 @@ void release_pages(struct page **pages, int nr)
> > if (is_huge_zero_page(page))
> > continue;
> >  
> > -   /* Device public page can not be huge page */
> > -   if (is_device_public_page(page)) {
> > +   if (is_zone_device_page(page)) {
> > if (locked_pgdat) {
> > spin_unlock_irqrestore(_pgdat->lru_lock,
> >flags);
> > locked_pgdat = NULL;
> > }
> > -   put_devmap_managed_page(page);
> > -   continue;
> > +   /*
> > +* zone-device-pages can still fail here and will
> > +* therefore need put_page_testzero()
> > +*/
> > +   if (put_devmap_managed_page(page))
> > +   continue;
> > }
> >  
> > page = compound_head(page);
> > -- 
> > 2.20.1
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

Re: [RFC PATCH v5 16/16] dcache: Add CONFIG_DCACHE_SMO

2019-05-28 Thread Tobin C. Harding

On Tue, May 21, 2019 at 02:05:38AM +, Roman Gushchin wrote:
> On Tue, May 21, 2019 at 11:31:18AM +1000, Tobin C. Harding wrote:
> > On Tue, May 21, 2019 at 12:57:47AM +, Roman Gushchin wrote:
> > > On Mon, May 20, 2019 at 03:40:17PM +1000, Tobin C. Harding wrote:
> > > > In an attempt to make the SMO patchset as non-invasive as possible add a
> > > > config option CONFIG_DCACHE_SMO (under "Memory Management options") for
> > > > enabling SMO for the DCACHE.  Whithout this option dcache constructor is
> > > > used but no other code is built in, with this option enabled slab
> > > > mobility is enabled and the isolate/migrate functions are built in.
> > > > 
> > > > Add CONFIG_DCACHE_SMO to guard the partial shrinking of the dcache via
> > > > Slab Movable Objects infrastructure.
> > > 
> > > Hm, isn't it better to make it a static branch? Or basically anything
> > > that allows switching on the fly?
> > 
> > If that is wanted, turning SMO on and off per cache, we can probably do
> > this in the SMO code in SLUB.
> 
> Not necessarily per cache, but without recompiling the kernel.
> > 
> > > It seems that the cost of just building it in shouldn't be that high.
> > > And the question if the defragmentation worth the trouble is so much
> > > easier to answer if it's possible to turn it on and off without rebooting.
> > 
> > If the question is 'is defragmentation worth the trouble for the
> > dcache', I'm not sure having SMO turned off helps answer that question.
> > If one doesn't shrink the dentry cache there should be very little
> > overhead in having SMO enabled.  So if one wants to explore this
> > question then they can turn on the config option.  Please correct me if
> > I'm wrong.
> 
> The problem with a config option is that it's hard to switch over.
> 
> So just to test your changes in production a new kernel should be built,
> tested and rolled out to a representative set of machines (which can be
> measured in thousands of machines). Then if results are questionable,
> it should be rolled back.
> 
> What you're actually guarding is the kmem_cache_setup_mobility() call,
> which can be perfectly avoided using a boot option, for example. Turning
> it on and off completely dynamic isn't that hard too.

Hi Roman,

I've added a boot parameter to SLUB so that admins can enable/disable
SMO at boot time system wide.  Then for each object that implements SMO
(currently XArray and dcache) I've also added a boot parameter to
enable/disable SMO for that cache specifically (these depend on SMO
being enabled system wide).

All three boot parameters default to 'off', I've added a config option
to default each to 'on'.

I've got a little more testing to do on another part of the set then the
PATCH version is coming at you :)

This is more a courtesy email than a request for comment, but please
feel free to shout if you don't like the method outlined above.

Fully dynamic config is not currently possible because currently the SMO
implementation does not support disabling mobility for a cache once it
is turned on, a bit of extra logic would need to be added and some state
stored - I'm not sure it warrants it ATM but that can be easily added
later if wanted.  Maybe Christoph will give his opinion on this.

thanks,
Tobin.

Re: [ext4] 079f9927c7: ltp.mmap16.fail

2019-05-28 Thread Theodore Ts'o

On Wed, May 29, 2019 at 10:52:56AM +0800, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-7):
> 
> commit: 079f9927c7bfa026d963db1455197159ebe5b534 ("ext4: gracefully handle 
> ext4_break_layouts() failure during truncate")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

Jan --- this is the old version of your patch, which I had dropped
before sending a push request to Linus.  However, I forgot to reset
the dev branch so it still had the old patch on it, and so it got
picked up in linux-next.  Apologies for the confusion.

I've reset the dev branch on ext4.git, and the new version of your
patch will show up there shortly, as I start reviewing patches for the
next merge window.

Cheers,

- Ted

> <<>>
> tag=mmap16 stime=1559078706
> cmdline="mmap16"
> contacts=""
> analysis=exit
> <<>>
> mke2fs 1.43.4 (31-Jan-2017)
> mmap16  0  TINFO  :  Using test device LTP_DEV='/dev/loop0'
> mmap16  0  TINFO  :  Formatting /dev/loop0 with ext4 opts='-b 1024' extra 
> opts='10240'
> mmap16  1  TFAIL  :  mmap16.c:85: Bug is reproduced!
> <<>>
> initiation_status="ok"
> duration=8 termination_type=exited termination_id=1 corefile=no
> cutime=11 cstime=345
> <<>>

kernel BUG at mm/swap_state.c:170!

2019-05-28 Thread Mikhail Gavrilov

Hi folks.
I am observed kernel panic after update to git tag 5.2-rc2.
This crash happens at memory pressing when swap being used.

Unfortunately in journalctl saved only this:

May 29 08:02:02 localhost.localdomain kernel: page:e9095823
refcount:1 mapcount:1 mapping:8f3ffeb36949 index:0x625002ab2
May 29 08:02:02 localhost.localdomain kernel: anon
May 29 08:02:02 localhost.localdomain kernel: flags:
0x17fffe00080034(uptodate|lru|active|swapbacked)
May 29 08:02:02 localhost.localdomain kernel: raw: 0017fffe00080034
e90944640888 e90956e208c8 8f3ffeb36949
May 29 08:02:02 localhost.localdomain kernel: raw: 000625002ab2
 0001 8f41aeeff000
May 29 08:02:02 localhost.localdomain kernel: page dumped because:
VM_BUG_ON_PAGE(entry != page)
May 29 08:02:02 localhost.localdomain kernel: page->mem_cgroup:8f41aeeff000
May 29 08:02:02 localhost.localdomain kernel: [ cut here
]
May 29 08:02:02 localhost.localdomain kernel: kernel BUG at mm/swap_state.c:170!




--
Best Regards,
Mike Gavrilov.

Re: [PATCH] x86/fpu: Use fault_in_pages_writeable() for pre-faulting

2019-05-28 Thread Andrew Morton

On Sun, 26 May 2019 19:33:25 +0200 Sebastian Andrzej Siewior 
 wrote:

> From: Hugh Dickins 
> 
> Since commit
> 
>d9c9ce34ed5c8 ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() 
> fails")

Please add this as a

Fixes: d9c9ce34ed5c8 ("x86/fpu: Fault-in user stack if 
copy_fpstate_to_sigframe() fails")

line so that anyone who backports d9c9ce34ed5c8 has a chance of finding
this patch also.

Re: [PATCH] perf: Fix oops when kthread execs user process

2019-05-28 Thread Michael Ellerman

Will Deacon  writes:
> On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote:
>> On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
>> > When a kthread calls call_usermodehelper() the steps are:
>> >   1. allocate current->mm
>> >   2. load_elf_binary()
>> >   3. populate current->thread.regs
>> > 
>> > While doing this, interrupts are not disabled. If there is a perf
>> > interrupt in the middle of this process (i.e. step 1 has completed
>> > but not yet reached to step 3) and if perf tries to read userspace
>> > regs, kernel oops.
>
> This seems to be because pt_regs(current) gives NULL for kthreads on Power.

Right, we've done that since roughly forever in copy_thread():

int copy_thread(unsigned long clone_flags, unsigned long usp,
unsigned long kthread_arg, struct task_struct *p)
{
...
/* Copy registers */
sp -= sizeof(struct pt_regs);
childregs = (struct pt_regs *) sp;
if (unlikely(p->flags & PF_KTHREAD)) {
/* kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
childregs->gpr[1] = sp + sizeof(struct pt_regs);
...
p->thread.regs = NULL;  /* no user register state */

See commit from 2002:

https://github.com/mpe/linux-fullhistory/commit/c0a96c0918d21d8a99270e94d9c4a4a322d04581#diff-edb76bfcc84905163f34d24d2aad3f3aR187

> From the initial report [1], it doesn't look like the mm isn't initialised,
> but rather than we're dereferencing a NULL pt_regs pointer somehow for the
> current task (see previous comment). I don't see how that can happen on
> arm64, given that we put the pt_regs on the kernel stack which is allocated
> during fork.

We have the regs on the stack too (see above), but we're explicitly
NULL'ing the link from task->thread.

Looks like on arm64 and x86 there is no link from task->thread, instead
you get from task to pt_regs via task_stack_page().

That actually seems potentially fishy given the comment on
task_stack_page() about the stack going away for exiting tasks. We
should probably be NULL'ing the regs pointer in free_thread_stack() or
similar. Though that race mustn't be happening because other arches
would see it.

Or are we just wrong and kthreads should have non-NULL regs? I can't
find another arch that does the same as us.

cheers

[PATCH v8 3/3] i2c-ocores: sifive: add polling mode workaround for FU540-C000 SoC.

2019-05-28 Thread Sagar Shrikant Kadam

The i2c-ocore driver already has a polling mode interface.But it needs
a workaround for FU540 Chipset on HiFive unleashed board (RevA00).
There is an erratum in FU540 chip that prevents interrupt driven i2c
transfers from working, and also the I2C controller's interrupt bit
cannot be cleared if set, due to this the existing i2c polling mode
interface added in mainline earlier doesn't work, and CPU stall's
infinitely, when-ever i2c transfer is initiated.

Ref:
commit dd7dbf0eb090 ("i2c: ocores: refactor setup for polling")

The workaround / fix under OCORES_FLAG_BROKEN_IRQ is particularly for
FU540-COOO SoC.

The polling function identifies a SiFive device based on the device node
and enables the workaround.

Signed-off-by: Sagar Shrikant Kadam 
---
 drivers/i2c/busses/i2c-ocores.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c
index b334fa2..4117f1a 100644
--- a/drivers/i2c/busses/i2c-ocores.c
+++ b/drivers/i2c/busses/i2c-ocores.c
@@ -35,6 +35,7 @@ struct ocores_i2c {
int iobase;
u32 reg_shift;
u32 reg_io_width;
+   unsigned long flags;
wait_queue_head_t wait;
struct i2c_adapter adap;
struct i2c_msg *msg;
@@ -84,6 +85,8 @@ struct ocores_i2c {
 #define TYPE_GRLIB 1
 #define TYPE_SIFIVE_REV0   2
 
+#define OCORES_FLAG_BROKEN_IRQ BIT(1) /* Broken IRQ for FU540-C000 SoC */
+
 static void oc_setreg_8(struct ocores_i2c *i2c, int reg, u8 value)
 {
iowrite8(value, i2c->base + (reg << i2c->reg_shift));
@@ -236,9 +239,12 @@ static irqreturn_t ocores_isr(int irq, void *dev_id)
struct ocores_i2c *i2c = dev_id;
u8 stat = oc_getreg(i2c, OCI2C_STATUS);
 
-   if (!(stat & OCI2C_STAT_IF))
+   if (i2c->flags & OCORES_FLAG_BROKEN_IRQ) {
+   if ((stat & OCI2C_STAT_IF) && !(stat & OCI2C_STAT_BUSY))
+   return IRQ_NONE;
+   } else if (!(stat & OCI2C_STAT_IF)) {
return IRQ_NONE;
-
+   }
ocores_process(i2c, stat);
 
return IRQ_HANDLED;
@@ -353,6 +359,11 @@ static void ocores_process_polling(struct ocores_i2c *i2c)
ret = ocores_isr(-1, i2c);
if (ret == IRQ_NONE)
break; /* all messages have been transferred */
+   else {
+   if (i2c->flags & OCORES_FLAG_BROKEN_IRQ)
+   if (i2c->state == STATE_DONE)
+   break;
+   }
}
 }
 
@@ -595,6 +606,7 @@ static int ocores_i2c_probe(struct platform_device *pdev)
 {
struct ocores_i2c *i2c;
struct ocores_i2c_platform_data *pdata;
+   const struct of_device_id *match;
struct resource *res;
int irq;
int ret;
@@ -677,6 +689,14 @@ static int ocores_i2c_probe(struct platform_device *pdev)
irq = platform_get_irq(pdev, 0);
if (irq == -ENXIO) {
ocores_algorithm.master_xfer = ocores_xfer_polling;
+
+   /*
+* Set in OCORES_FLAG_BROKEN_IRQ to enable workaround for
+* FU540-C000 SoC in polling mode.
+*/
+   match = of_match_node(ocores_i2c_match, pdev->dev.of_node);
+   if (match && (long)match->data == TYPE_SIFIVE_REV0)
+   i2c->flags |= OCORES_FLAG_BROKEN_IRQ;
} else {
if (irq < 0)
return irq;
-- 
1.9.1

[PATCH v8 1/3] dt-bindings: i2c: extend existing opencore bindings.

2019-05-28 Thread Sagar Shrikant Kadam

Reformatted compatibility strings to one valid combination on
each line.
Add FU540-C000 specific device tree bindings to already available
i2-ocores file. This device is available on
HiFive Unleashed Rev A00 board. Move interrupt under optional
property list as this can be optional.

The FU540-C000 SoC from sifive, has an Opencore's I2C block
reimplementation.

The DT compatibility string for this IP is present in HDL and available at.
https://github.com/sifive/sifive-blocks/blob/master/src/main/scala/devices/i2c/I2C.scala#L73

Signed-off-by: Sagar Shrikant Kadam 
---
 Documentation/devicetree/bindings/i2c/i2c-ocores.txt | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/i2c/i2c-ocores.txt 
b/Documentation/devicetree/bindings/i2c/i2c-ocores.txt
index 17bef9a..6b25a80 100644
--- a/Documentation/devicetree/bindings/i2c/i2c-ocores.txt
+++ b/Documentation/devicetree/bindings/i2c/i2c-ocores.txt
@@ -1,9 +1,13 @@
 Device tree configuration for i2c-ocores
 
 Required properties:
-- compatible  : "opencores,i2c-ocores" or "aeroflexgaisler,i2cmst"
+- compatible  : "opencores,i2c-ocores"
+"aeroflexgaisler,i2cmst"
+"sifive,fu540-c000-i2c", "sifive,i2c0"
+For Opencore based I2C IP block reimplemented in
+FU540-C000 SoC. Please refer to 
sifive-blocks-ip-versioning.txt
+for additional details.
 - reg : bus address start and address range size of device
-- interrupts  : interrupt number
 - clocks  : handle to the controller clock; see the note below.
 Mutually exclusive with opencores,ip-clock-frequency
 - opencores,ip-clock-frequency: frequency of the controller clock in Hz;
@@ -12,6 +16,7 @@ Required properties:
 - #size-cells : should be <0>
 
 Optional properties:
+- interrupts  : interrupt number.
 - clock-frequency : frequency of bus clock in Hz; see the note below.
 Defaults to 100 KHz when the property is not specified
 - reg-shift   : device register offsets are shifted by this value
-- 
1.9.1

[PATCH v8 2/3] i2c-ocores: sifive: add support for i2c device on FU540-c000 SoC.

2019-05-28 Thread Sagar Shrikant Kadam

Update device id table for Opencore's I2C master based re-implementation
used in FU540-c000 chipset on HiFive Unleashed platform.

Device ID's include Sifive, soc-specific device for chip specific tweaks
and sifive IP block specific device for generic programming model.

Signed-off-by: Sagar Shrikant Kadam 
---
 drivers/i2c/busses/i2c-ocores.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c
index c3dabee..b334fa2 100644
--- a/drivers/i2c/busses/i2c-ocores.c
+++ b/drivers/i2c/busses/i2c-ocores.c
@@ -82,6 +82,7 @@ struct ocores_i2c {
 
 #define TYPE_OCORES0
 #define TYPE_GRLIB 1
+#define TYPE_SIFIVE_REV0   2
 
 static void oc_setreg_8(struct ocores_i2c *i2c, int reg, u8 value)
 {
@@ -462,6 +463,14 @@ static u32 ocores_func(struct i2c_adapter *adap)
.compatible = "aeroflexgaisler,i2cmst",
.data = (void *)TYPE_GRLIB,
},
+   {
+   .compatible = "sifive,fu540-c000-i2c",
+   .data = (void *)TYPE_SIFIVE_REV0,
+   },
+   {
+   .compatible = "sifive,i2c0",
+   .data = (void *)TYPE_SIFIVE_REV0,
+   },
{},
 };
 MODULE_DEVICE_TABLE(of, ocores_i2c_match);
-- 
1.9.1

[PATCH v8 3/3] i2c-ocores: sifive: add polling mode workaround for FU540-C000 SoC.

2019-05-28 Thread Sagar Shrikant Kadam

The i2c-ocore driver already has a polling mode interface.But it needs
a workaround for FU540 Chipset on HiFive unleashed board (RevA00).
There is an erratum in FU540 chip that prevents interrupt driven i2c
transfers from working, and also the I2C controller's interrupt bit
cannot be cleared if set, due to this the existing i2c polling mode
interface added in mainline earlier doesn't work, and CPU stall's
infinitely, when-ever i2c transfer is initiated.

Ref:
commit dd7dbf0eb090 ("i2c: ocores: refactor setup for polling")

The workaround / fix under OCORES_FLAG_BROKEN_IRQ is particularly for
FU540-COOO SoC.

The polling function identifies a SiFive device based on the device node
and enables the workaround.

Signed-off-by: Sagar Shrikant Kadam 
---
 drivers/i2c/busses/i2c-ocores.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c
index b334fa2..4117f1a 100644
--- a/drivers/i2c/busses/i2c-ocores.c
+++ b/drivers/i2c/busses/i2c-ocores.c
@@ -35,6 +35,7 @@ struct ocores_i2c {
int iobase;
u32 reg_shift;
u32 reg_io_width;
+   unsigned long flags;
wait_queue_head_t wait;
struct i2c_adapter adap;
struct i2c_msg *msg;
@@ -84,6 +85,8 @@ struct ocores_i2c {
 #define TYPE_GRLIB 1
 #define TYPE_SIFIVE_REV0   2
 
+#define OCORES_FLAG_BROKEN_IRQ BIT(1) /* Broken IRQ for FU540-C000 SoC */
+
 static void oc_setreg_8(struct ocores_i2c *i2c, int reg, u8 value)
 {
iowrite8(value, i2c->base + (reg << i2c->reg_shift));
@@ -236,9 +239,12 @@ static irqreturn_t ocores_isr(int irq, void *dev_id)
struct ocores_i2c *i2c = dev_id;
u8 stat = oc_getreg(i2c, OCI2C_STATUS);
 
-   if (!(stat & OCI2C_STAT_IF))
+   if (i2c->flags & OCORES_FLAG_BROKEN_IRQ) {
+   if ((stat & OCI2C_STAT_IF) && !(stat & OCI2C_STAT_BUSY))
+   return IRQ_NONE;
+   } else if (!(stat & OCI2C_STAT_IF)) {
return IRQ_NONE;
-
+   }
ocores_process(i2c, stat);
 
return IRQ_HANDLED;
@@ -353,6 +359,11 @@ static void ocores_process_polling(struct ocores_i2c *i2c)
ret = ocores_isr(-1, i2c);
if (ret == IRQ_NONE)
break; /* all messages have been transferred */
+   else {
+   if (i2c->flags & OCORES_FLAG_BROKEN_IRQ)
+   if (i2c->state == STATE_DONE)
+   break;
+   }
}
 }
 
@@ -595,6 +606,7 @@ static int ocores_i2c_probe(struct platform_device *pdev)
 {
struct ocores_i2c *i2c;
struct ocores_i2c_platform_data *pdata;
+   const struct of_device_id *match;
struct resource *res;
int irq;
int ret;
@@ -677,6 +689,14 @@ static int ocores_i2c_probe(struct platform_device *pdev)
irq = platform_get_irq(pdev, 0);
if (irq == -ENXIO) {
ocores_algorithm.master_xfer = ocores_xfer_polling;
+
+   /*
+* Set in OCORES_FLAG_BROKEN_IRQ to enable workaround for
+* FU540-C000 SoC in polling mode.
+*/
+   match = of_match_node(ocores_i2c_match, pdev->dev.of_node);
+   if (match && (long)match->data == TYPE_SIFIVE_REV0)
+   i2c->flags |= OCORES_FLAG_BROKEN_IRQ;
} else {
if (irq < 0)
return irq;
-- 
1.9.1

[PATCH v8 0/3] Extend dt bindings to support I2C on sifive devices and a fix broken IRQ in polling mode.

2019-05-28 Thread Sagar Shrikant Kadam

The patch is based on mainline v5.2-rc1 and extends DT-bindings for Opencore 
based I2C IP block reimplemented
in FU540 SoC, available on HiFive unleashed board (Rev A00), and also provides 
a workaround for broken IRQ
which affects the already available I2C polling mode interface in mainline, for 
FU540-C000 chipsets.

The polling mode workaround patch fixes the CPU stall issue, when-ever i2c 
transfer are initiated.

This workaround checks if it's a FU540 chipset based on device tree 
information, and check's for open
core's IF(interrupt flag) and BUSY flags to break from the polling loop upon 
completion of transfer.

To test the patch, a PMOD-AD2 sensor is connected to HiFive Unleashed board 
over J1 connector, and
appropriate device node is added into board specific device tree as per the 
information provided in
dt-bindings in Documentation/devicetree/bindings/i2c/i2c-ocores.txt.
Without this workaround, the CPU stall's infinitely.

Busybox i2c utilities used to verify workaround : i2cdetect, i2cdump, i2cset, 
i2cget


Patch History:
V7<->V8:
-Incorporated review comments for cosmetic changes like: space, comma and 
period(.)

V6<->V7:
-Rectified space and tab issue in dt bindings strings.
-Implemented workaround based on i2c->flags, as per review comment on v6.

V5<->V6:
-Incorporated suggestions on v5 patch as follows:
-Reformatted compatibility strings in dt doc with one valid combination on each 
line.
-Removed interrupt-parents from optional property list. 
-With rebase to v5.2-rc1, the v5 variant of polling workaround PATCH becomes 
in-compatible.
 Till kernel v5.1 the polling mode was enabled based on i2c->flags, wherease in 
kernel v5.2-rc1 polling mode is set as
 master transfer algorithim at probe time itself, and i2c->flags checks are 
removed.
-Modified v5 to check for SiFive device type in polling function and include 
the workaround/fix for broken IRQ.

v4<->V5:
-Removed un-necessary checks of OCORES_FLAG_BROKEN_IRQ.

V3<->V4:
-Incorporated suggestions on v3 patch as follows:
-OCORES_FLAG_BROKEN_IRQ BIT position rectified.
-Updated BORKEN_IRQ flag checks such that if sifive device (Fu540-C000) is 
identified,then use polling mode as IRQ is broken.

V2<->V3:
-Incorporated review comments on v2 patch as follows:
-Rectified compatibility string sequence with the most specific one at the 
first (dt bindings). 
-Moved interrupts and interrupt-parent under optional property list 
(dt-bindings).
-Updated reference to sifive-blocks-ip-versioning.txt and URL to IP repository 
used (dt-bindings).
-Removed example for i2c0 device node from binding doc (dt-bindings).
-Included sifive,i2c0 device under compatibility table in i2c-ocores driver 
(i2c-ocores).
-Updated polling mode hooks for SoC specific fix to handle broken IRQ 
(i2c-ocores).


V1<->V2:
-Incorporate review comments from Andrew
-Extend dt bindings into i2c-ocores.txt instead of adding new file
-Rename SIFIVE_FLAG_POLL to OCORES_FLAG_BROKEN_IRQ

V1:
-Update dt bindings for sifive i2c devices
-Fix broken IRQ affecting i2c polling mode interface.

Sagar Shrikant Kadam (3):
  dt-bindings: i2c: extend existing opencore bindings.
  i2c-ocores: sifive: add support for i2c device on FU540-c000 SoC.
  i2c-ocores: sifive: add polling mode workaround for FU540-C000 SoC.

 .../devicetree/bindings/i2c/i2c-ocores.txt |  9 --
 drivers/i2c/busses/i2c-ocores.c| 33 --
 2 files changed, 38 insertions(+), 4 deletions(-)

-- 
1.9.1

Re: [PATCH net-next 1/5] timecounter: Add helper for reconstructing partial timestamps

2019-05-28 Thread Richard Cochran

On Tue, May 28, 2019 at 07:14:22PM -0700, John Stultz wrote:
> Hrm. Is this actually generic? Would it make more sense to have the
> specific implementations with this quirk implement this in their
> read() handler? If not, why?

Strongly agree that this workaround should stay in the driver.  After
all, we do not want to encourage HW designers to continue in this way.

Thanks,
Richard

Re: [PATCH net-next 3/5] net: dsa: mv88e6xxx: Let taggers specify a can_timestamp function

2019-05-28 Thread Richard Cochran

On Wed, May 29, 2019 at 02:56:25AM +0300, Vladimir Oltean wrote:
> The newly introduced function is called on both the RX and TX paths.

NAK on this patch.
 
> The boolean returned by port_txtstamp should only return false if the
> driver tried to timestamp the skb but failed.

So you say.
 
> Currently there is some logic in the mv88e6xxx driver that determines
> whether it should timestamp frames or not.
> 
> This is wasteful, because if the decision is to not timestamp them, then
> DSA will have cloned an skb and freed it immediately afterwards.

No, it isn't wasteful.  Look at the tests in that driver to see why.
 
> Additionally other drivers (sja1105) may have other hardware criteria
> for timestamping frames on RX, and the default conditions for
> timestamping a frame are too restrictive.

I'm sorry, but we won't change the frame just for one device that has
design issues.

Please put device specific workarounds into its driver.

Thanks,
Richard

[PATCH] amd64-agp: fix arbitrary kernel memory writes

2019-05-28 Thread Young Xiao

pg_start is copied from userspace on AGPIOC_BIND and AGPIOC_UNBIND ioctl
cmds of agp_ioctl() and passed to agpioc_bind_wrap().  As said in the
comment, (pg_start + mem->page_count) may wrap in case of AGPIOC_BIND,
and it is not checked at all in case of AGPIOC_UNBIND.  As a result, user
with sufficient privileges (usually "video" group) may generate either
local DoS or privilege escalation.

See commit 194b3da873fd ("agp: fix arbitrary kernel memory writes")
for details.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/char/agp/amd64-agp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/char/agp/amd64-agp.c b/drivers/char/agp/amd64-agp.c
index c69e39f..5daa0e3 100644
--- a/drivers/char/agp/amd64-agp.c
+++ b/drivers/char/agp/amd64-agp.c
@@ -60,7 +60,8 @@ static int amd64_insert_memory(struct agp_memory *mem, off_t 
pg_start, int type)
 
/* Make sure we can fit the range in the gatt table. */
/* FIXME: could wrap */
-   if (((unsigned long)pg_start + mem->page_count) > num_entries)
+   if (((pg_start + mem->page_count) > num_entries) ||
+   ((pg_start + mem->page_count) < pg_start))
return -EINVAL;
 
j = pg_start;
-- 
2.7.4

Re: [PATCH net-next 0/5] PTP support for the SJA1105 DSA driver

2019-05-28 Thread Richard Cochran

On Wed, May 29, 2019 at 02:56:22AM +0300, Vladimir Oltean wrote:
> Not all is rosy, though.

You can sure say that again!
 
> PTP timestamping will only work when the ports are bridged. Otherwise,
> the metadata follow-up frames holding RX timestamps won't be received
> because they will be blocked by the master port's MAC filter. Linuxptp
> tries to put the net device in ALLMULTI/PROMISC mode,

Untrue.

> but DSA doesn't
> pass this on to the master port, which does the actual reception.
> The master port is put in promiscous mode when the slave ports are
> enslaved to a bridge.
> 
> Also, even with software-corrected timestamps, one can observe a
> negative path delay reported by linuxptp:
> 
> ptp4l[55.600]: master offset  8 s2 freq  +83677 path delay -2390
> ptp4l[56.600]: master offset 17 s2 freq  +83688 path delay -2391
> ptp4l[57.601]: master offset  6 s2 freq  +83682 path delay -2391
> ptp4l[58.601]: master offset -1 s2 freq  +83677 path delay -2391
> 
> Without investigating too deeply, this appears to be introduced by the
> correction applied by linuxptp to t4 (t4c: corrected master rxtstamp)
> during the path delay estimation process (removing the correction makes
> the path delay positive).

No.  The root cause is the time stamps delivered by the hardware or
your driver.  That needs to be addressed before going forward.

Thanks,
Richard

[RFC PATCH v3] rtl8xxxu: Improve TX performance of RTL8723BU on rtl8xxxu driver

2019-05-28 Thread Chris Chiu

We have 3 laptops which connect the wifi by the same RTL8723BU.
The PCI VID/PID of the wifi chip is 10EC:B720 which is supported.
They have the same problem with the in-kernel rtl8xxxu driver, the
iperf (as a client to an ethernet-connected server) gets ~1Mbps.
Nevertheless, the signal strength is reported as around -40dBm,
which is quite good. From the wireshark capture, the tx rate for each
data and qos data packet is only 1Mbps. Compare to the driver from
https://github.com/lwfinger/rtl8723bu, the same iperf test gets ~12
Mbps or more. The signal strength is reported similarly around
-40dBm. That's why we want to improve.

After reading the source code of the rtl8xxxu driver and Larry's, the
major difference is that Larry's driver has a watchdog which will keep
monitoring the signal quality and updating the rate mask just like the
rtl8xxxu_gen2_update_rate_mask() does if signal quality changes.
And this kind of watchdog also exists in rtlwifi driver of some specific
chips, ex rtl8192ee, rtl8188ee, rtl8723ae, rtl8821ae...etc. They have
the same member function named dm_watchdog and will invoke the
corresponding dm_refresh_rate_adaptive_mask to adjust the tx rate
mask.

With this commit, the tx rate of each data and qos data packet will
be 39Mbps (MCS4) with the 0xF0 as the tx rate mask. The 20th bit
to 23th bit means MCS4 to MCS7. It means that the firmware still picks
the lowest rate from the rate mask and explains why the tx rate of
data and qos data is always lowest 1Mbps because the default rate mask
passed is always 0xFFF ranges from the basic CCK rate, OFDM rate,
and MCS rate. However, with Larry's driver, the tx rate observed from
wireshark under the same condition is almost 65Mbps or 72Mbps.

I believe the firmware of RTL8723BU may need fix. And I think we
can still bring in the dm_watchdog as rtlwifi to improve from the
driver side. Please leave precious comments for my commits and
suggest what I can do better. Or suggest if there's any better idea
to fix this. Thanks.

Signed-off-by: Chris Chiu 
---


Notes:
  v2:
   - Fix errors and warnings complained by checkpatch.pl
   - Replace data structure rate_adaptive by 2 member variables
   - Make rtl8xxxu_wireless_mode non-static
   - Runs refresh_rate_mask() only in station mode
  v3:
   - Remove ugly rtl8xxxu_watchdog data structure
   - Make sure only one vif exists


 .../net/wireless/realtek/rtl8xxxu/rtl8xxxu.h  |  49 ++
 .../realtek/rtl8xxxu/rtl8xxxu_8723b.c | 145 ++
 .../wireless/realtek/rtl8xxxu/rtl8xxxu_core.c |  80 +-
 3 files changed, 273 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h 
b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
index 8828baf26e7b..42e9227f4d19 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
@@ -1195,6 +1195,44 @@ struct rtl8723bu_c2h {
 
 struct rtl8xxxu_fileops;
 
+/*mlme related.*/
+enum wireless_mode {
+   WIRELESS_MODE_UNKNOWN = 0,
+   /* Sub-Element */
+   WIRELESS_MODE_B = BIT(0),
+   WIRELESS_MODE_G = BIT(1),
+   WIRELESS_MODE_A = BIT(2),
+   WIRELESS_MODE_N_24G = BIT(3),
+   WIRELESS_MODE_N_5G = BIT(4),
+   WIRELESS_AUTO = BIT(5),
+   WIRELESS_MODE_AC = BIT(6),
+   WIRELESS_MODE_MAX = 0x7F,
+};
+
+/* from rtlwifi/wifi.h */
+enum ratr_table_mode_new {
+   RATEID_IDX_BGN_40M_2SS = 0,
+   RATEID_IDX_BGN_40M_1SS = 1,
+   RATEID_IDX_BGN_20M_2SS_BN = 2,
+   RATEID_IDX_BGN_20M_1SS_BN = 3,
+   RATEID_IDX_GN_N2SS = 4,
+   RATEID_IDX_GN_N1SS = 5,
+   RATEID_IDX_BG = 6,
+   RATEID_IDX_G = 7,
+   RATEID_IDX_B = 8,
+   RATEID_IDX_VHT_2SS = 9,
+   RATEID_IDX_VHT_1SS = 10,
+   RATEID_IDX_MIX1 = 11,
+   RATEID_IDX_MIX2 = 12,
+   RATEID_IDX_VHT_3SS = 13,
+   RATEID_IDX_BGN_3SS = 14,
+};
+
+#define RTL8XXXU_RATR_STA_INIT 0
+#define RTL8XXXU_RATR_STA_HIGH 1
+#define RTL8XXXU_RATR_STA_MID  2
+#define RTL8XXXU_RATR_STA_LOW  3
+
 struct rtl8xxxu_priv {
struct ieee80211_hw *hw;
struct usb_device *udev;
@@ -1299,6 +1337,14 @@ struct rtl8xxxu_priv {
u8 pi_enabled:1;
u8 no_pape:1;
u8 int_buf[USB_INTR_CONTENT_LENGTH];
+   u8 ratr_index;
+   u8 rssi_level;
+   /*
+* Single virtual interface permitted since the driver supports STATION
+* mode only.
+*/
+   struct ieee80211_vif *vif;
+   struct delayed_work ra_watchdog;
 };
 
 struct rtl8xxxu_rx_urb {
@@ -1335,6 +1381,8 @@ struct rtl8xxxu_fileops {
  bool ht40);
void (*update_rate_mask) (struct rtl8xxxu_priv *priv,
  u32 ramask, int sgi);
+   void (*refresh_rate_mask) (struct rtl8xxxu_priv *priv, int signal,
+  struct ieee80211_sta *sta);
void (*report_connect) (struct rtl8xxxu_priv *priv,
u8 macid, bool

Re: [RFC 1/7] mm: introduce MADV_COOL

2019-05-28 Thread Michal Hocko

On Wed 29-05-19 10:40:33, Hillf Danton wrote:
> 
> On Wed, 29 May 2019 00:11:15 +0800 Michal Hocko wrote:
> > On Tue 28-05-19 23:38:11, Hillf Danton wrote:
> > > 
> > > In short, I prefer to skip IO mapping since any kind of address range
> > > can be expected from userspace, and it may probably cover an IO mapping.
> > > And things can get out of control, if we reclaim some IO pages while
> > > underlying device is trying to fill data into any of them, for instance.
> > 
> > What do you mean by IO pages why what is the actual problem?
> > 
> Io pages are the backing-store pages of a mapping whose vm_flags has
> VM_IO set, and the comment in mm/memory.c says:
> /*
>  * Physically remapped pages are special. Tell the
>  * rest of the world about it:
>  *   VM_IO tells people not to look at these pages
>  *  (accesses can have side effects).
> 

OK, thanks for the clarification of the first part of the question. Now
to the second and the more important one. What is the actual concern?
AFAIK those pages shouldn't be on LRU list. If they are then they should
be safe to get reclaimed otherwise we would have a problem when
reclaiming them on the normal memory pressure. Why is this madvise any
different?
-- 
Michal Hocko
SUSE Labs

Re: [PATCH v5 0/2] Fix issues with vmalloc flush flag

2019-05-28 Thread Edgecombe, Rick P

On Tue, 2019-05-28 at 17:23 -0700, David Miller wrote:
> From: Rick Edgecombe 
> Date: Mon, 27 May 2019 14:10:56 -0700
> 
> > These two patches address issues with the recently added
> > VM_FLUSH_RESET_PERMS vmalloc flag.
> > 
> > Patch 1 addresses an issue that could cause a crash after other
> > architectures besides x86 rely on this path.
> > 
> > Patch 2 addresses an issue where in a rare case strange arguments
> > could be provided to flush_tlb_kernel_range(). 
> 
> It just occurred to me another situation that would cause trouble on
> sparc64, and that's if someone the address range of the main kernel
> image ended up being passed to flush_tlb_kernel_range().
> 
> That would flush the locked kernel mapping and crash the kernel
> instantly in a completely non-recoverable way.

Hmm, I haven't received the logs from Meelis that will show the real
ranges being passed into flush_tlb_kernel_range() on sparc, but it
should be flushing a range spanning from the modules to the direct map.
It looks like the kernel is at the very bottom of the address space, so
not included. Or do you mean the pages that hold the kernel text on the
direct map?

But regardless of this new code, DEBUG_PAGEALLOC hangs with the first
vmalloc free/unmap. That should be just flushing a single allocation in
the vmalloc range.

If it is somehow catching a locked entry though... Are there any sparc
flush mechanisms that could be used in vmalloc that won't touch locked
entries? Peter Z was pointing out that flush_tlb_all() might be more
approriate for vmalloc anyway.

RE: [PATCH RESEND 2/5] ARM: dts: imx7d-sdb: Assign corresponding power supply for LDOs

2019-05-28 Thread Anson Huang

Hi, Leonard

> -Original Message-
> From: Leonard Crestez
> Sent: Wednesday, May 29, 2019 3:24 AM
> To: Anson Huang 
> Cc: robh...@kernel.org; mark.rutl...@arm.com; shawn...@kernel.org;
> s.ha...@pengutronix.de; ker...@pengutronix.de; feste...@gmail.com;
> devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org; dl-linux-imx 
> Subject: Re: [PATCH RESEND 2/5] ARM: dts: imx7d-sdb: Assign corresponding
> power supply for LDOs
> 
> On 12.05.2019 12:57, Anson Huang wrote:
> > On i.MX7D SDB board, sw2 supplies 1p0d/1p2 LDO, this patch assigns
> > corresponding power supply for 1p0d/1p2 LDO to avoid confusion by
> > below log:
> >
> > vdd1p0d: supplied by regulator-dummy
> > vdd1p2: supplied by regulator-dummy
> >
> > With this patch, the power supply is more accurate:
> >
> > vdd1p0d: supplied by SW2
> > vdd1p2: supplied by SW2
> >
> > diff --git a/arch/arm/boot/dts/imx7d-sdb.dts
> > b/arch/arm/boot/dts/imx7d-sdb.dts
> >
> > +_1p0d {
> > +   vin-supply = <_reg>;
> > +};
> > +
> > +_1p2 {
> > +   vin-supply = <_reg>;
> > +};
> 
> It's not clear why but this patch breaks imx7d-sdb boot. Checked two
> boards: in a board farm and on my desk.

Thanks for reporting this issue, I can reproduce it now, a quick debug shows
that with this patch, when setting reg_1p0d's voltage to 1.0V, the SW2's voltage
will be changed to 1.5V, the expected voltage should be 1.8V, so 1.5V cause 
board
reset. Below patch can fix this issue, but I am still checking if this is the 
best fix, once
I figure out, I will send out a fix patch for review:

+++ b/arch/arm/boot/dts/imx7d-sdb.dts
@@ -267,6 +267,7 @@
regulator-max-microvolt = <185>;
regulator-boot-on;
regulator-always-on;
+   regulator-max-step-microvolt = <25000>;
};

Thanks,
Anson

> 
> --
> Regards,
> Leonard

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 973 matches

Mail list logo