Re: [PATCH v4 1/2] rcu/tree: Add basic support for kfree_rcu() batching

2019-08-16 Thread Joel Fernandes
On Fri, Aug 16, 2019 at 10:20:23PM -0700, Paul E. McKenney wrote:
> On Sat, Aug 17, 2019 at 12:30:24AM -0400, Joel Fernandes wrote:
> > On Fri, Aug 16, 2019 at 08:56:37PM -0700, Paul E. McKenney wrote:
> > > On Fri, Aug 16, 2019 at 09:32:23PM -0400, Joel Fernandes wrote:
> > > > Hi Paul,
> > > > 
> > > > On Fri, Aug 16, 2019 at 3:16 PM Paul E. McKenney 
> > > >  wrote:
> > > > > > > Hello, Joel,
> > > > > > >
> > > > > > > I reworked the commit log as follows, but was then unsuccessful in
> > > > > > > working out which -rcu commit to apply it to.  Could you please
> > > > > > > tell me what commit to apply this to?  (Once applied, git 
> > > > > > > cherry-pick
> > > > > > > is usually pretty good about handling minor conflicts.)
> > > > > >
> > > > > > It was originally based on v5.3-rc2
> > > > > >
> > > > > > I was able to apply it just now to the rcu -dev branch and I pushed 
> > > > > > it here:
> > > > > > https://github.com/joelagnel/linux-kernel.git (branch paul-dev)
> > > > > >
> > > > > > Let me know if any other issues, thanks for the change log rework!
> > > > >
> > > > > Pulled and cherry-picked, thank you!
> > > > >
> > > > > Just for grins, I also  pushed out a from-joel.2019.08.16a showing the
> > > > > results of the pull.  If you pull that branch, then run something like
> > > > > "gitk v5.3-rc2..", and then do the same with branch "dev", comparing 
> > > > > the
> > > > > two might illustrate some of the reasons for the current restrictions
> > > > > on pull requests and trees subject to rebase.
> > > > 
> > > > Right, I did the compare and see what you mean. I guess sending any
> > > > future pull requests against Linux -next would be the best option?
> > > 
> > > Hmmm...  You really want to send some pull requests, don't you?  ;-)
> > 
> > I would be lying if I said I don't have the itch to ;-)
> > 
> > > Suppose you had sent that pull request against Linux -next or v5.2
> > > or wherever.  What would happen next, given the high probability of a
> > > conflict with someone else's patch?  What would the result look like?
> > 
> > One hopes that the tools are able to automatically resolve the resolution,
> > however adequate re-inspection of the resulting code and testing it would be
> > needed in either case, to ensure the conflict resolution (whether manual or
> > automatic) happened correctly.
> 
> I didn't ask you to hope.  I instead asked you what tell me what would
> actually happen.  ;-)
> 
> You could actually try this by randomly grouping the patches in -rcu
> (say, placing every third patch into one of three groups), generating
> separate pull requests, and then merging the pull requests together.
> Then you wouldn't have to hope.  You could instead look at it in (say)
> gitk after the pieces were put together.

So you take whatever is worked on in 'dev' and create separate branches out
of them, then merge them together later?

I have seen you doing these tricks and would love to get ideas from your
experiences on these.

> > IIUC, this usually depends on the maintainer's preference on which branch to
> > send patches against.
> > 
> > Are you saying -rcu's dev branch is still the best option to send patches
> > against, even though it is rebased often?
> 
> Sounds like we might need to discuss this face to face.

Yes, let us talk for sure at plumbers, thank you so much!

(Also I sent a patch just now to fix that xchg() issue).

 - Joel



[PATCH] media: cpia2_usb: fix memory leaks

2019-08-16 Thread Wenwen Wang
In submit_urbs(), 'cam->sbuf[i].data' is allocated through kmalloc_array().
However, it is not deallocated if the following allocation for urbs fails.
To fix this issue, free 'cam->sbuf[i].data' if usb_alloc_urb() fails.

Signed-off-by: Wenwen Wang 
---
 drivers/media/usb/cpia2/cpia2_usb.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/media/usb/cpia2/cpia2_usb.c 
b/drivers/media/usb/cpia2/cpia2_usb.c
index 17468f7..3ab80a7 100644
--- a/drivers/media/usb/cpia2/cpia2_usb.c
+++ b/drivers/media/usb/cpia2/cpia2_usb.c
@@ -676,6 +676,10 @@ static int submit_urbs(struct camera_data *cam)
if (!urb) {
for (j = 0; j < i; j++)
usb_free_urb(cam->sbuf[j].urb);
+   for (j = 0; j < NUM_SBUF; j++) {
+   kfree(cam->sbuf[j].data);
+   cam->sbuf[j].data = NULL;
+   }
return -ENOMEM;
}
 
-- 
2.7.4



Re: [PATCH v4 1/2] rcu/tree: Add basic support for kfree_rcu() batching

2019-08-16 Thread Paul E. McKenney
On Sat, Aug 17, 2019 at 12:30:24AM -0400, Joel Fernandes wrote:
> On Fri, Aug 16, 2019 at 08:56:37PM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 16, 2019 at 09:32:23PM -0400, Joel Fernandes wrote:
> > > Hi Paul,
> > > 
> > > On Fri, Aug 16, 2019 at 3:16 PM Paul E. McKenney  
> > > wrote:
> > > > > > Hello, Joel,
> > > > > >
> > > > > > I reworked the commit log as follows, but was then unsuccessful in
> > > > > > working out which -rcu commit to apply it to.  Could you please
> > > > > > tell me what commit to apply this to?  (Once applied, git 
> > > > > > cherry-pick
> > > > > > is usually pretty good about handling minor conflicts.)
> > > > >
> > > > > It was originally based on v5.3-rc2
> > > > >
> > > > > I was able to apply it just now to the rcu -dev branch and I pushed 
> > > > > it here:
> > > > > https://github.com/joelagnel/linux-kernel.git (branch paul-dev)
> > > > >
> > > > > Let me know if any other issues, thanks for the change log rework!
> > > >
> > > > Pulled and cherry-picked, thank you!
> > > >
> > > > Just for grins, I also  pushed out a from-joel.2019.08.16a showing the
> > > > results of the pull.  If you pull that branch, then run something like
> > > > "gitk v5.3-rc2..", and then do the same with branch "dev", comparing the
> > > > two might illustrate some of the reasons for the current restrictions
> > > > on pull requests and trees subject to rebase.
> > > 
> > > Right, I did the compare and see what you mean. I guess sending any
> > > future pull requests against Linux -next would be the best option?
> > 
> > Hmmm...  You really want to send some pull requests, don't you?  ;-)
> 
> I would be lying if I said I don't have the itch to ;-)
> 
> > Suppose you had sent that pull request against Linux -next or v5.2
> > or wherever.  What would happen next, given the high probability of a
> > conflict with someone else's patch?  What would the result look like?
> 
> One hopes that the tools are able to automatically resolve the resolution,
> however adequate re-inspection of the resulting code and testing it would be
> needed in either case, to ensure the conflict resolution (whether manual or
> automatic) happened correctly.

I didn't ask you to hope.  I instead asked you what tell me what would
actually happen.  ;-)

You could actually try this by randomly grouping the patches in -rcu
(say, placing every third patch into one of three groups), generating
separate pull requests, and then merging the pull requests together.
Then you wouldn't have to hope.  You could instead look at it in (say)
gitk after the pieces were put together.

And there are more questions.  For example, how would this affect testing
given issues involving both RCU and other pieces of the kernel?

> IIUC, this usually depends on the maintainer's preference on which branch to
> send patches against.
> 
> Are you saying -rcu's dev branch is still the best option to send patches
> against, even though it is rebased often?

Sounds like we might need to discuss this face to face.

Thanx, Paul


Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

2019-08-16 Thread Paul E. McKenney
On Fri, Aug 16, 2019 at 03:57:43PM -0700, Linus Torvalds wrote:

[ . . . ]

> We add READ_ONCE and WRITE_ONCE annotations when they make sense. Not
> because of some theoretical "compiler is free to do garbage"
> arguments. If such garbage happens, we need to fix the compiler, the
> same way we already do with
> 
>   -fno-strict-aliasing

Yeah, the compete-with-FORTRAN stuff.  :-/

There is some work going on in the C committee on this, where the
theorists would like to restrict strict-alias based optimizations to
enable better analysis tooling.  And no, although the theorists are
pushing in the direction we would like them to, as far as I can see
they are not pushing as far as we would like.  But it might be that
-fno-strict-aliasing needs some upgrades as well.  I expect to learn
more at the next meeting in a few months.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf

>   -fno-delete-null-pointer-checks
>   -fno-strict-overflow
> 
> because all those "optimizations" are just fundamentally unsafe and wrong.

I was hoping that -fno-strict-overflow might go into the C++ (not C)
standard, and even thought that it had at one point, but what went into
the standard was that signed integers are twos complement, not that
overflowing them is well defined.

We are pushing to hopefully end but at least to restrict the current
pointer lifetime-end zap behavior in both C and C++, which is similar
to the pointer provenance/alias analysis that -fno-strict-aliasing at
least partially suppresses.  The zapping invalidates loading, storing,
comparing, or doing arithmetic on a pointer to an object whose lifetime
has ended.  (The WG14 PDF linked to below gives a non-exhaustive list
of problems that this causes.)

The Linux kernel used to avoid this by refusing to tell the compiler about
kmalloc() and friends, but that changed a few years ago.  This zapping
rules out a non-trivial class of concurrent algorithms, but for once
RCU is unaffected.  Some committee members complained that zapping has
been part of the standard since about 1990, but Maged Michael found a
writeup of one of the algorithms dating back to 1973.  ;-)

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2369.pdf
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1726r0.pdf

> I really wish the compiler would never take advantage of "I can prove
> this is undefined behavior" kind of things when it comes to the kernel
> (or any other projects I am involved with, for that matter). If you
> can prove that, then you shouldn't decide to generate random code
> without a big warning. But that's what those optimizations that we
> disable effectively all do.
> 
> I'd love to have a flag that says "all undefined behavior is treated
> as implementation-defined". There's a somewhat subtle - but very
> important - difference there.

It would also be nice to downgrade some of the undefined behavior in
the standard itself.  Compiler writers usually hate this because it
would require them to document what their compiler does in each case.
So they would prefer "unspecified" if the could not have "undefined".

> And that's what some hypothetical speculative write optimizations do
> too. I do not believe they are valid for the kernel. If the code says
> 
>if (a)
>   global_var = 1
>else
>   global_var = 0
> 
> then the compiler had better not turn that into
> 
>  global_var = 0
>  if (a)
>  global_var = 1
> 
> even if there isn't a volatile there. But yes, we've had compiler
> writers that say "if you read the specs, that's ok".
> 
> No, it's not ok. Because reality trumps any weasel-spec-reading.
> 
> And happily, I don't think we've ever really seen a compiler that we
> use that actually does the above kind of speculative write thing (but
> doing it for your own local variables that can't be seen by other
> threads of execution - go wild).

Me, I would still want to use WRITE_ONCE() in this case.

> So in general, we very much expect the compiler to do sane code
> generation, and not (for example) do store tearing on normal
> word-sized things or add writes that weren't there originally etc.
> 
> And yes, reads are different from writes. Reads don't have the same
> kind of "other threads of execution can see them" effects, so a
> compiler turning a single read into multiple reads is much more
> realistic and not the same kind of "we need to expect a certain kind
> of sanity from the compiler" issue.

Though each of those compiler-generated multiple reads might return a
different value, right?

Thanx, Paul



Re: [PATCH -rcu/dev] Please squash: fixup! rcu/tree: Add basic support for kfree_rcu() batching

2019-08-16 Thread Joel Fernandes
On Fri, Aug 16, 2019 at 09:38:54PM -0700, Paul Walmsley wrote:
> On Sat, 17 Aug 2019, Joel Fernandes (Google) wrote:
> 
> > xchg() on a bool is causing issues on riscv and arm32.
> 
> Indeed, it seems best not to use xchg() on any type that's not 32 bits 
> long or that's not the CPU's native word size.  Probably we should update 
> the documentation.

I would endorse any such documentation effort ;-)

> > Please squash this into the -rcu dev branch to resolve the issue.
> > 
> > Please squash this fix.
> > 
> > Fixes: -rcu dev commit 3cbd3aa7d9c7bdf ("rcu/tree: Add basic support for 
> > kfree_rcu() batching")
> > 
> > Signed-off-by: Joel Fernandes (Google) 
> 
> Link: 
> https://lore.kernel.org/lkml/alpine.deb.2.21..1908161931110.32...@viisi.sifive.com/T/#me9956f66cb611b95d26ae92700e1d901f46e8c59
> Reviewed-by: Paul Walmsley 

Thanks Paul! And nice to meet you again after many years ;-) Glad to see you
working on riscv.

thanks,

 - Joel



Re: [PATCH -rcu/dev] Please squash: fixup! rcu/tree: Add basic support for kfree_rcu() batching

2019-08-16 Thread Paul Walmsley
On Sat, 17 Aug 2019, Joel Fernandes (Google) wrote:

> xchg() on a bool is causing issues on riscv and arm32.

Indeed, it seems best not to use xchg() on any type that's not 32 bits 
long or that's not the CPU's native word size.  Probably we should update 
the documentation.

> Please squash this into the -rcu dev branch to resolve the issue.
> 
> Please squash this fix.
> 
> Fixes: -rcu dev commit 3cbd3aa7d9c7bdf ("rcu/tree: Add basic support for 
> kfree_rcu() batching")
> 
> Signed-off-by: Joel Fernandes (Google) 

Link: 
https://lore.kernel.org/lkml/alpine.deb.2.21..1908161931110.32...@viisi.sifive.com/T/#me9956f66cb611b95d26ae92700e1d901f46e8c59
Reviewed-by: Paul Walmsley 


- Paul


[PATCH] Fix an OOB bug in uac_mixer_unit_bmControls

2019-08-16 Thread Hui Peng
`uac_mixer_unit_get_channels` calls `uac_mixer_unit_bmControls`
to get pointer to bmControls field. The current implementation of
`uac_mixer_unit_get_channels` does properly check the size of
uac_mixer_unit_descriptor descriptor and may allow OOB access
in `uac_mixer_unit_bmControls`.

Reported-by: Hui Peng 
Reported-by: Mathias Payer 
Signed-off-by: Hui Peng 
---
 sound/usb/mixer.c | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c
index b5927c3d5bc0..00e6274a63c3 100644
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -738,28 +738,39 @@ static int get_cluster_channels_v3(struct mixer_build 
*state, unsigned int clust
 static int uac_mixer_unit_get_channels(struct mixer_build *state,
   struct uac_mixer_unit_descriptor *desc)
 {
-   int mu_channels;
+   int mu_channels = 0;
void *c;
 
-   if (desc->bLength < sizeof(*desc))
-   return -EINVAL;
if (!desc->bNrInPins)
return -EINVAL;
-   if (desc->bLength < sizeof(*desc) + desc->bNrInPins)
-   return -EINVAL;
 
switch (state->mixer->protocol) {
case UAC_VERSION_1:
+   // limit derived from uac_mixer_unit_bmControls
+   if (desc->bLength < sizeof(*desc) + desc->bNrInPins + 4)
+   return 0;
+
+   mu_channels = uac_mixer_unit_bNrChannels(desc);
+   break;
+
case UAC_VERSION_2:
-   default:
-   if (desc->bLength < sizeof(*desc) + desc->bNrInPins + 1)
+   // limit derived from uac_mixer_unit_bmControls
+   if (desc->bLength < sizeof(*desc) + desc->bNrInPins + 6)
return 0; /* no bmControls -> skip */
+
mu_channels = uac_mixer_unit_bNrChannels(desc);
break;
case UAC_VERSION_3:
+   // limit derived from uac_mixer_unit_bmControls
+   if (desc->bLength < sizeof(*desc) + desc->bNrInPins + 2)
+   return 0; /* no bmControls -> skip */
+
mu_channels = get_cluster_channels_v3(state,
uac3_mixer_unit_wClusterDescrID(desc));
break;
+
+   default:
+   break;
}
 
if (!mu_channels)
-- 
2.22.1



Re: [PATCH v4 1/2] rcu/tree: Add basic support for kfree_rcu() batching

2019-08-16 Thread Joel Fernandes
On Fri, Aug 16, 2019 at 08:56:37PM -0700, Paul E. McKenney wrote:
> On Fri, Aug 16, 2019 at 09:32:23PM -0400, Joel Fernandes wrote:
> > Hi Paul,
> > 
> > On Fri, Aug 16, 2019 at 3:16 PM Paul E. McKenney  
> > wrote:
> > > > > Hello, Joel,
> > > > >
> > > > > I reworked the commit log as follows, but was then unsuccessful in
> > > > > working out which -rcu commit to apply it to.  Could you please
> > > > > tell me what commit to apply this to?  (Once applied, git cherry-pick
> > > > > is usually pretty good about handling minor conflicts.)
> > > >
> > > > It was originally based on v5.3-rc2
> > > >
> > > > I was able to apply it just now to the rcu -dev branch and I pushed it 
> > > > here:
> > > > https://github.com/joelagnel/linux-kernel.git (branch paul-dev)
> > > >
> > > > Let me know if any other issues, thanks for the change log rework!
> > >
> > > Pulled and cherry-picked, thank you!
> > >
> > > Just for grins, I also  pushed out a from-joel.2019.08.16a showing the
> > > results of the pull.  If you pull that branch, then run something like
> > > "gitk v5.3-rc2..", and then do the same with branch "dev", comparing the
> > > two might illustrate some of the reasons for the current restrictions
> > > on pull requests and trees subject to rebase.
> > 
> > Right, I did the compare and see what you mean. I guess sending any
> > future pull requests against Linux -next would be the best option?
> 
> Hmmm...  You really want to send some pull requests, don't you?  ;-)

I would be lying if I said I don't have the itch to ;-)

> Suppose you had sent that pull request against Linux -next or v5.2
> or wherever.  What would happen next, given the high probability of a
> conflict with someone else's patch?  What would the result look like?

One hopes that the tools are able to automatically resolve the resolution,
however adequate re-inspection of the resulting code and testing it would be
needed in either case, to ensure the conflict resolution (whether manual or
automatic) happened correctly.

IIUC, this usually depends on the maintainer's preference on which branch to
send patches against.

Are you saying -rcu's dev branch is still the best option to send patches
against, even though it is rebased often?

thanks,

 - Joel



[PATCH -rcu/dev] Please squash: fixup! rcu/tree: Add basic support for kfree_rcu() batching

2019-08-16 Thread Joel Fernandes (Google)
xchg() on a bool is causing issues on riscv and arm32. Please squash
this into the -rcu dev branch to resolve the issue.

Please squash this fix.

Fixes: -rcu dev commit 3cbd3aa7d9c7bdf ("rcu/tree: Add basic support for 
kfree_rcu() batching")

Signed-off-by: Joel Fernandes (Google) 
---
 kernel/rcu/tree.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4f7c3096d786..33192a58b39a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2717,7 +2717,7 @@ struct kfree_rcu_cpu {
 * is busy, ->head just continues to grow and we retry flushing later.
 */
struct delayed_work monitor_work;
-   bool monitor_todo;  /* Is a delayed work pending execution? */
+   int monitor_todo;   /* Is a delayed work pending execution? */
 };
 
 static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc);
@@ -2790,7 +2790,7 @@ static inline void kfree_rcu_drain_unlock(struct 
kfree_rcu_cpu *krcp,
/* Previous batch that was queued to RCU did not get free yet, let us
 * try again soon.
 */
-   if (!xchg(>monitor_todo, true))
+   if (!xchg(>monitor_todo, 1))
schedule_delayed_work(>monitor_work, KFREE_DRAIN_JIFFIES);
spin_unlock_irqrestore(>lock, flags);
 }
@@ -2806,7 +2806,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
 monitor_work.work);
 
spin_lock_irqsave(>lock, flags);
-   if (xchg(>monitor_todo, false))
+   if (xchg(>monitor_todo, 0))
kfree_rcu_drain_unlock(krcp, flags);
else
spin_unlock_irqrestore(>lock, flags);
@@ -2858,7 +2858,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t 
func)
krcp->head = head;
 
/* Schedule monitor for timely drain after KFREE_DRAIN_JIFFIES. */
-   if (!xchg(>monitor_todo, true))
+   if (!xchg(>monitor_todo, 1))
schedule_delayed_work(>monitor_work, KFREE_DRAIN_JIFFIES);
 
spin_unlock(>lock);
-- 
2.23.0.rc1.153.gdeed80330f-goog



Re: [PATCH 14/15] drivers: thermal: tsens: Create function to return sign-extended temperature

2019-08-16 Thread Stephen Boyd
Quoting Amit Kucheria (2019-07-25 15:18:49)
> diff --git a/drivers/thermal/qcom/tsens-common.c 
> b/drivers/thermal/qcom/tsens-common.c
> index 7ab2e740a1da..13a875b99094 100644
> --- a/drivers/thermal/qcom/tsens-common.c
> +++ b/drivers/thermal/qcom/tsens-common.c
> @@ -84,13 +84,35 @@ static inline int code_to_degc(u32 adc_code, const struct 
> tsens_sensor *s)
> return degc;
>  }
>  
> +/**
> + * tsens_hw_to_mC - Return properly sign extended temperature in mCelsius,

Can you make this proper kernel-doc? Describe the arguments and have a
"Return:" section.

> + * whether in ADC code or deciCelsius depending on IP version.
> + * This function handles the different widths of the signed integer across 
> IPs.
> + */
> +static int tsens_hw_to_mC(char *str, struct tsens_sensor *s, int field, int 
> temp)
> +{
> +   struct tsens_priv *priv = s->priv;
> +   u32 mask;
> +
> +   if (priv->feat->adc) {
> +   /* Convert temperature from ADC code to milliCelsius */
> +   return code_to_degc(temp, s) * 1000;
> +   } else {

Please deindent and drop the else because there's a return above.

> +   mask = GENMASK(priv->fields[field].msb,
> +  priv->fields[field].lsb) >> 
> priv->fields[field].lsb;

Why is the mask generated, shifted right, sent into fls(), and then
passed to sign_extend32? Shoudln't it be something like 

sign_extend32(temp, priv->fields[field].msg - priv->fiels[field].lsb - 
1)

> +   dev_dbg(priv->dev, "%s: mask: %d\n", str, fls(mask));
> +   /* Convert temperature from deciCelsius to milliCelsius */
> +   return sign_extend32(temp, fls(mask) - 1) * 100;
> +   }
> +}
> +
> @@ -112,15 +134,7 @@ int get_temp_tsens_valid(struct tsens_sensor *s, int 
> *temp)
> if (ret)
> return ret;
>  
> -   if (priv->feat->adc) {
> -   /* Convert temperature from ADC code to milliCelsius */
> -   *temp = code_to_degc(last_temp, s) * 1000;
> -   } else {
> -   mask = GENMASK(priv->fields[LAST_TEMP_0].msb,
> -  priv->fields[LAST_TEMP_0].lsb);
> -   /* Convert temperature from deciCelsius to milliCelsius */
> -   *temp = sign_extend32(last_temp, fls(mask) - 1) * 100;

Oh the code is copied. Seems really complicated still.

> -   }
> +   *temp = tsens_hw_to_mC("get_temp", s, LAST_TEMP_0, last_temp);


Re: [PATCH 07/15] dt: thermal: tsens: Document interrupt support in tsens driver

2019-08-16 Thread Stephen Boyd
Quoting Amit Kucheria (2019-08-16 15:02:08)
> 
> Depending on the version of the tsens IP, there can be 1 (upper/lower
> threshold), 2 (upper/lower + critical threshold) or 3 (upper/lower +
> critical + zero degree) interrupts. This patch series only introduces
> support for a single interrupt (upper/lower).
> 
> I used the names tsens0, tsens1 to encapsulate the controller instance
> since some SoCs have 1 controller, others have two. So we'll end up
> with something like the following in DT:
> 
> tsens0: thermal-sensor@c263000 {
> compatible = "qcom,sdm845-tsens", "qcom,tsens-v2";
> reg = <0 0x0c263000 0 0x1ff>, /* TM */
>   <0 0x0c222000 0 0x1ff>; /* SROT */
> #qcom,sensors = <13>;
> interrupts = ,
>  ;
> interrupt-names = "tsens0", "tsens0-critical";
> #thermal-sensor-cells = <1>;
> };
> 
> tsens1: thermal-sensor@c265000 {
> compatible = "qcom,sdm845-tsens", "qcom,tsens-v2";
> reg = <0 0x0c265000 0 0x1ff>, /* TM */
>   <0 0x0c223000 0 0x1ff>; /* SROT */
> #qcom,sensors = <8>;
> interrupts = ,
>  ;
> interrupt-names = "tsens1", "tsens1-critical";
> #thermal-sensor-cells = <1>;
> }
> 
> Does that work?
> 

Can you convert this binding to YAML? Then it looks like we can enforce
the number of interrupts based on the compatible string.



[PATCH] Drivers: hv: balloon: Remove dependencies on guest page size

2019-08-16 Thread Himadri Pandya
Hyper-V assumes page size to be 4K. This might not be the case for
ARM64 architecture. Hence use hyper-v specific page size and page
shift definitions to avoid conflicts between different host and guest
page sizes on ARM64.

Also, remove some old and incorrect comments and redefine ballooning
granularities to handle larger page sizes correctly.

Signed-off-by: Himadri Pandya 
---
 drivers/hv/hv_balloon.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 34bd73526afd..935904830d42 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -23,6 +23,7 @@
 #include 
 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include "hv_trace_balloon.h"
@@ -341,8 +342,6 @@ struct dm_unballoon_response {
  *
  * mem_range: Memory range to hot add.
  *
- * On Linux we currently don't support this since we cannot hot add
- * arbitrary granularity of memory.
  */
 
 struct dm_hot_add {
@@ -477,7 +476,7 @@ module_param(pressure_report_delay, uint, (S_IRUGO | 
S_IWUSR));
 MODULE_PARM_DESC(pressure_report_delay, "Delay in secs in reporting pressure");
 static atomic_t trans_id = ATOMIC_INIT(0);
 
-static int dm_ring_size = (5 * PAGE_SIZE);
+static int dm_ring_size = 20 * 1024;
 
 /*
  * Driver specific state.
@@ -493,10 +492,10 @@ enum hv_dm_state {
 };
 
 
-static __u8 recv_buffer[PAGE_SIZE];
-static __u8 balloon_up_send_buffer[PAGE_SIZE];
-#define PAGES_IN_2M512
-#define HA_CHUNK (32 * 1024)
+static __u8 recv_buffer[HV_HYP_PAGE_SIZE];
+static __u8 balloon_up_send_buffer[HV_HYP_PAGE_SIZE];
+#define PAGES_IN_2M (2 * 1024 * 1024 / PAGE_SIZE)
+#define HA_CHUNK (128 * 1024 * 1024 / PAGE_SIZE)
 
 struct hv_dynmem_device {
struct hv_device *dev;
@@ -1076,7 +1075,7 @@ static void process_info(struct hv_dynmem_device *dm, 
struct dm_info_msg *msg)
__u64 *max_page_count = (__u64 *)_hdr[1];
 
pr_info("Max. dynamic memory size: %llu MB\n",
-   (*max_page_count) >> (20 - PAGE_SHIFT));
+   (*max_page_count) >> (20 - HV_HYP_PAGE_SHIFT));
}
 
break;
@@ -1218,7 +1217,7 @@ static unsigned int alloc_balloon_pages(struct 
hv_dynmem_device *dm,
 
for (i = 0; (i * alloc_unit) < num_pages; i++) {
if (bl_resp->hdr.size + sizeof(union dm_mem_page_range) >
-   PAGE_SIZE)
+   HV_HYP_PAGE_SIZE)
return i * alloc_unit;
 
/*
@@ -1274,9 +1273,9 @@ static void balloon_up(struct work_struct *dummy)
 
/*
 * We will attempt 2M allocations. However, if we fail to
-* allocate 2M chunks, we will go back to 4k allocations.
+* allocate 2M chunks, we will go back to PAGE_SIZE allocations.
 */
-   alloc_unit = 512;
+   alloc_unit = PAGES_IN_2M;
 
avail_pages = si_mem_available();
floor = compute_balloon_floor();
@@ -1292,7 +1291,7 @@ static void balloon_up(struct work_struct *dummy)
}
 
while (!done) {
-   memset(balloon_up_send_buffer, 0, PAGE_SIZE);
+   memset(balloon_up_send_buffer, 0, HV_HYP_PAGE_SIZE);
bl_resp = (struct dm_balloon_response *)balloon_up_send_buffer;
bl_resp->hdr.type = DM_BALLOON_RESPONSE;
bl_resp->hdr.size = sizeof(struct dm_balloon_response);
@@ -1491,7 +1490,7 @@ static void balloon_onchannelcallback(void *context)
 
memset(recv_buffer, 0, sizeof(recv_buffer));
vmbus_recvpacket(dev->channel, recv_buffer,
-PAGE_SIZE, , );
+HV_HYP_PAGE_SIZE, , );
 
if (recvlen > 0) {
dm_msg = (struct dm_message *)recv_buffer;
-- 
2.17.1



Re: [PATCH 04/15] drivers: thermal: tsens: Add debugfs support

2019-08-16 Thread Stephen Boyd
Quoting Amit Kucheria (2019-07-25 15:18:39)
> Dump some basic version info and sensor details into debugfs
> 

Maybe you can put some sample output in the commit text.

> Signed-off-by: Amit Kucheria 
> ---
>  drivers/thermal/qcom/tsens-common.c | 85 +
>  drivers/thermal/qcom/tsens.c|  2 +
>  drivers/thermal/qcom/tsens.h|  6 ++
>  3 files changed, 93 insertions(+)
> 
> diff --git a/drivers/thermal/qcom/tsens-common.c 
> b/drivers/thermal/qcom/tsens-common.c
> index 7437bfe196e5..7ab2e740a1da 100644
> --- a/drivers/thermal/qcom/tsens-common.c
> +++ b/drivers/thermal/qcom/tsens-common.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2015, The Linux Foundation. All rights reserved.
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -139,6 +140,79 @@ int get_temp_common(struct tsens_sensor *s, int *temp)
> return 0;
>  }
>  
> +#ifdef CONFIG_DEBUG_FS
> +static int dbg_sensors_show(struct seq_file *s, void *data)
> +{
> +   struct platform_device *pdev = s->private;
> +   struct tsens_priv *priv = platform_get_drvdata(pdev);
> +   int i;
> +
> +   seq_printf(s, "max: %2d\nnum: %2d\n\n",
> +  priv->feat->max_sensors, priv->num_sensors);
> +
> +   seq_puts(s, "  id   slope  offset\n\n");
> +   for (i = 0;  i < priv->num_sensors; i++) {
> +   seq_printf(s, "%8d%8d%8d\n", priv->sensor[i].hw_id,

Does this not have spaces between the digits on purpose?

> +  priv->sensor[i].slope, priv->sensor[i].offset);
> +   }
> +
> +   return 0;
> +}
> +
> +static int dbg_version_show(struct seq_file *s, void *data)
> +{
> +   struct platform_device *pdev = s->private;
> +   struct tsens_priv *priv = platform_get_drvdata(pdev);
> +   u32 maj_ver, min_ver, step_ver;
> +   int ret;
> +
> +   if (tsens_ver(priv) > VER_0_1) {
> +   ret = regmap_field_read(priv->rf[VER_MAJOR], _ver);
> +   if (ret)
> +   return ret;
> +   ret = regmap_field_read(priv->rf[VER_MINOR], _ver);
> +   if (ret)
> +   return ret;
> +   ret = regmap_field_read(priv->rf[VER_STEP], _ver);
> +   if (ret)
> +   return ret;
> +   seq_printf(s, "%d.%d.%d\n", maj_ver, min_ver, step_ver);
> +   } else {
> +   seq_puts(s, "0.1.0\n");
> +   }
> +
> +   return 0;
> +}
> +
> +DEFINE_SHOW_ATTRIBUTE(dbg_version);
> +DEFINE_SHOW_ATTRIBUTE(dbg_sensors);
> +
> +static void tsens_debug_init(struct platform_device *pdev)
> +{
> +   struct tsens_priv *priv = platform_get_drvdata(pdev);
> +   struct dentry *root, *file;
> +
> +   root = debugfs_lookup("tsens", NULL);

Does this get created many times? Why doesn't tsens have a pointer to
the root saved away somewhere globally?

> +   if (!root)
> +   priv->debug_root = debugfs_create_dir("tsens", NULL);
> +   else
> +   priv->debug_root = root;
> +
> +   file = debugfs_lookup("version", priv->debug_root);
> +   if (!file)
> +   debugfs_create_file("version", 0444, priv->debug_root,
> +   pdev, _version_fops);
> +
> +   /* A directory for each instance of the TSENS IP */
> +   priv->debug = debugfs_create_dir(dev_name(>dev), 
> priv->debug_root);
> +   debugfs_create_file("sensors", 0444, priv->debug, pdev, 
> _sensors_fops);
> +
> +   return;
> +}


Re: [PATCH 03/15] drivers: thermal: tsens: Add __func__ identifier to debug statements

2019-08-16 Thread Stephen Boyd
Quoting Amit Kucheria (2019-07-25 15:18:38)
> Printing the function name when enabling debugging makes logs easier to
> read.
> 
> Signed-off-by: Amit Kucheria 
> ---

Reviewed-by: Stephen Boyd 



Re: [PATCH 02/15] drivers: thermal: tsens: Simplify code flow in tsens_probe

2019-08-16 Thread Stephen Boyd
Quoting Amit Kucheria (2019-07-25 15:18:37)
> Move platform_set_drvdata up to avoid an extra 'if (ret)' check after
> the call to tsens_register.
> 
> Signed-off-by: Amit Kucheria 
> ---

Reviewed-by: Stephen Boyd 



Re: [PATCH 01/15] drivers: thermal: tsens: Get rid of id field in tsens_sensor

2019-08-16 Thread Stephen Boyd
Quoting Amit Kucheria (2019-07-25 15:18:36)
> There are two fields - id and hw_id - to track what sensor an action was
> to performed on. This was because the sensors connected to a TSENS IP
> might not be contiguous i.e. 1, 2, 4, 5 with 3 being skipped.
> 
> This causes confusion in the code which uses hw_id sometimes and id
> other times (tsens_get_temp, tsens_get_trend).
> 
> Switch to only using the hw_id field to track the physical ID of the
> sensor. When we iterate through all the sensors connected to an IP
> block, we use an index i to loop through the list of sensors, and then
> return the actual hw_id that is registered on that index.
> 
> Signed-off-by: Amit Kucheria 
> ---

Nice cleanup!

Reviewed-by: Stephen Boyd 



Re: [PATCH 1/5] dt-bindings: clock: Add Bitmain BM1880 SoC clock controller binding

2019-08-16 Thread Manivannan Sadhasivam
On Fri, Aug 16, 2019 at 08:46:11PM -0700, Stephen Boyd wrote:
> Quoting Manivannan Sadhasivam (2019-08-16 20:34:22)
> > On Wed, Aug 07, 2019 at 10:01:28PM -0700, Stephen Boyd wrote:
> > > Quoting Manivannan Sadhasivam (2019-07-05 08:14:36)
> > > > +It is expected that it is defined using standard clock bindings as 
> > > > "osc".
> > > > +
> > > > +Example: 
> > > > +
> > > > +clk: clock-controller@800 {
> > > > +compatible = "bitmain,bm1880-clk";
> > > > +reg = <0xe8 0x0c>,<0x800 0xb0>;
> > > 
> > > It looks weird still. What hardware module is this actually part of?
> > > Some larger power manager block?
> > > 
> > 
> > These are all part of the sysctrl block (clock + pinctrl + reset) and the
> > register domains got split between system and pll.
> > 
> 
> And that can't be one node that probes the clk, pinctrl, and reset
> drivers from C code?

It is not a MFD for sure. It's just grouping of the register domains together.

Thanks,
Mani
> 


[PATCH] libata: Fix a memory leak bug

2019-08-16 Thread Wenwen Wang
In ata_init(), 'ata_force_tbl' is allocated through kcalloc() in
ata_parse_force_param(). However, it is not deallocated if
ata_attach_transport() fails, leading to a memory leak bug. To fix this
issue, free 'ata_force_tbl' before go to the 'err_out' label.

Signed-off-by: Wenwen Wang 
---
 drivers/ata/libata-core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 28c492b..185dd69 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -7040,6 +7040,7 @@ static int __init ata_init(void)
ata_scsi_transport_template = ata_attach_transport();
if (!ata_scsi_transport_template) {
ata_sff_exit();
+   kfree(ata_force_tbl);
rc = -ENOMEM;
goto err_out;
}
-- 
2.7.4



Re: devm_memremap_pages() triggers a kasan_add_zero_shadow() warning

2019-08-16 Thread Dan Williams
On Fri, Aug 16, 2019 at 8:34 PM Qian Cai  wrote:
>
>
>
> > On Aug 16, 2019, at 5:48 PM, Dan Williams  wrote:
> >
> > On Fri, Aug 16, 2019 at 2:36 PM Qian Cai  wrote:
> >>
> >> Every so often recently, booting Intel CPU server on linux-next triggers 
> >> this
> >> warning. Trying to figure out if  the commit 7cc7867fb061
> >> ("mm/devm_memremap_pages: enable sub-section remap") is the culprit here.
> >>
> >> # ./scripts/faddr2line vmlinux devm_memremap_pages+0x894/0xc70
> >> devm_memremap_pages+0x894/0xc70:
> >> devm_memremap_pages at mm/memremap.c:307
> >
> > Previously the forced section alignment in devm_memremap_pages() would
> > cause the implementation to never violate the KASAN_SHADOW_SCALE_SIZE
> > (12K on x86) constraint.
> >
> > Can you provide a dump of /proc/iomem? I'm curious what resource is
> > triggering such a small alignment granularity.
>
> This is with memmap=4G!4G ,
>
> # cat /proc/iomem
[..]
> 1-155df : Persistent Memory (legacy)
>   1-155df : namespace0.0
> 155e0-15982bfff : System RAM
>   155e0-156a00fa0 : Kernel code
>   156a00fa1-15765d67f : Kernel data
>   157837000-1597f : Kernel bss
> 15982c000-1 : Persistent Memory (legacy)
> 2-87fff : System RAM

Ok, looks like 4G is bad choice to land the pmem emulation on this
system because it collides with where the kernel is deployed and gets
broken into tiny pieces that violate kasan's. This is a known problem
with memmap=. You need to pick an memory range that does not collide
with anything else. See:


https://nvdimm.wiki.kernel.org/how_to_choose_the_correct_memmap_kernel_parameter_for_pmem_on_your_system

...for more info.


Re: [PATCH v4 1/2] rcu/tree: Add basic support for kfree_rcu() batching

2019-08-16 Thread Paul E. McKenney
On Fri, Aug 16, 2019 at 09:32:23PM -0400, Joel Fernandes wrote:
> Hi Paul,
> 
> On Fri, Aug 16, 2019 at 3:16 PM Paul E. McKenney  
> wrote:
> > > > Hello, Joel,
> > > >
> > > > I reworked the commit log as follows, but was then unsuccessful in
> > > > working out which -rcu commit to apply it to.  Could you please
> > > > tell me what commit to apply this to?  (Once applied, git cherry-pick
> > > > is usually pretty good about handling minor conflicts.)
> > >
> > > It was originally based on v5.3-rc2
> > >
> > > I was able to apply it just now to the rcu -dev branch and I pushed it 
> > > here:
> > > https://github.com/joelagnel/linux-kernel.git (branch paul-dev)
> > >
> > > Let me know if any other issues, thanks for the change log rework!
> >
> > Pulled and cherry-picked, thank you!
> >
> > Just for grins, I also  pushed out a from-joel.2019.08.16a showing the
> > results of the pull.  If you pull that branch, then run something like
> > "gitk v5.3-rc2..", and then do the same with branch "dev", comparing the
> > two might illustrate some of the reasons for the current restrictions
> > on pull requests and trees subject to rebase.
> 
> Right, I did the compare and see what you mean. I guess sending any
> future pull requests against Linux -next would be the best option?

Hmmm...  You really want to send some pull requests, don't you?  ;-)

Suppose you had sent that pull request against Linux -next or v5.2
or wherever.  What would happen next, given the high probability of a
conflict with someone else's patch?  What would the result look like?

Thanx, Paul



Re: [PATCH 4/5] clk: Add driver for Bitmain BM1880 SoC clock controller

2019-08-16 Thread Manivannan Sadhasivam
Hi Stephen,

On Wed, Aug 07, 2019 at 10:15:59PM -0700, Stephen Boyd wrote:
> Quoting Manivannan Sadhasivam (2019-07-05 08:14:39)
> > diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
> > index fc1e0cf44995..ffc61ed85ade 100644
> > --- a/drivers/clk/Kconfig
> > +++ b/drivers/clk/Kconfig
> > @@ -304,6 +304,12 @@ config COMMON_CLK_FIXED_MMIO
> > help
> >   Support for Memory Mapped IO Fixed clocks
> >  
> > +config COMMON_CLK_BM1880
> > +   bool "Clock driver for Bitmain BM1880 SoC"
> > +   depends on ARCH_BITMAIN || COMPILE_TEST
> > +   help
> > + This driver supports the clocks on Bitmain BM1880 SoC.
> 
> Can you add this config somewhere else besides the end? Preferably
> close to alphabetically in this file.
> 

Okay. I got confused by the fact that Makefile is sorted but not the
Kconfig.

> > +
> >  source "drivers/clk/actions/Kconfig"
> >  source "drivers/clk/analogbits/Kconfig"
> >  source "drivers/clk/bcm/Kconfig"
> > diff --git a/drivers/clk/clk-bm1880.c b/drivers/clk/clk-bm1880.c
> > new file mode 100644
> > index ..26cdb75bb936
> > --- /dev/null
> > +++ b/drivers/clk/clk-bm1880.c
> > @@ -0,0 +1,947 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/*
> > + * Bitmain BM1880 SoC clock driver
> > + *
> > + * Copyright (c) 2019 Linaro Ltd.
> > + * Author: Manivannan Sadhasivam 
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> 
> Should probably add kernel.h for at least container_of()
> 

okay.

> > +
> > +#include 
> > +
> > +#define BM1880_CLK_MPLL_CTL0x00
> > +#define BM1880_CLK_SPLL_CTL0x04
> > +#define BM1880_CLK_FPLL_CTL0x08
> > +#define BM1880_CLK_DDRPLL_CTL  0x0c
> > +
> > +#define BM1880_CLK_ENABLE0 0x00
> > +#define BM1880_CLK_ENABLE1 0x04
> > +#define BM1880_CLK_SELECT  0x20
> > +#define BM1880_CLK_DIV00x40
> > +#define BM1880_CLK_DIV10x44
> > +#define BM1880_CLK_DIV20x48
> > +#define BM1880_CLK_DIV30x4c
> > +#define BM1880_CLK_DIV40x50
> > +#define BM1880_CLK_DIV50x54
> > +#define BM1880_CLK_DIV60x58
> > +#define BM1880_CLK_DIV70x5c
> > +#define BM1880_CLK_DIV80x60
> > +#define BM1880_CLK_DIV90x64
> > +#define BM1880_CLK_DIV10   0x68
> > +#define BM1880_CLK_DIV11   0x6c
> > +#define BM1880_CLK_DIV12   0x70
> > +#define BM1880_CLK_DIV13   0x74
> > +#define BM1880_CLK_DIV14   0x78
> > +#define BM1880_CLK_DIV15   0x7c
> > +#define BM1880_CLK_DIV16   0x80
> > +#define BM1880_CLK_DIV17   0x84
> > +#define BM1880_CLK_DIV18   0x88
> > +#define BM1880_CLK_DIV19   0x8c
> > +#define BM1880_CLK_DIV20   0x90
> > +#define BM1880_CLK_DIV21   0x94
> > +#define BM1880_CLK_DIV22   0x98
> > +#define BM1880_CLK_DIV23   0x9c
> > +#define BM1880_CLK_DIV24   0xa0
> > +#define BM1880_CLK_DIV25   0xa4
> > +#define BM1880_CLK_DIV26   0xa8
> > +#define BM1880_CLK_DIV27   0xac
> > +#define BM1880_CLK_DIV28   0xb0
> > +
> > +#define to_bm1880_pll_clk(_hw) container_of(_hw, struct 
> > bm1880_pll_hw_clock, hw)
> > +#define to_bm1880_div_clk(_hw) container_of(_hw, struct 
> > bm1880_div_hw_clock, hw)
> > +
> > +static DEFINE_SPINLOCK(bm1880_clk_lock);
> > +
> > +struct bm1880_clock_data {
> > +   void __iomem *pll_base;
> > +   void __iomem *sys_base;
> > +   struct clk_onecell_data clk_data;
> > +};
> > +
> > +struct bm1880_gate_clock {
> > +   unsigned intid;
> > +   const char  *name;
> > +   const char  *parent;
> > +   u32 gate_reg;
> > +   s8  gate_shift;
> > +   unsigned long   flags;
> > +};
> > +
> > +struct bm1880_mux_clock {
> > +   unsigned intid;
> > +   const char  *name;
> > +   const char  * const * parents;
> > +   s8  num_parents;
> > +   u32 reg;
> > +   s8  shift;
> > +   unsigned long   flags;
> > +};
> > +
> > +struct bm1880_div_clock {
> > +   unsigned intid;
> > +   const char  *name;
> > +   const char  *parent;
> > +   u32 reg;
> > +   u8  shift;
> > +   u8  width;
> > +   u32 initval;
> > +   struct clk_div_table *table;
> > +   unsigned long   flags;
> > +};
> > +
> > +struct bm1880_div_hw_clock {
> > +   struct bm1880_div_clock div;
> > +   void __iomem *base;
> > +   spinlock_t *lock;
> > +   struct clk_hw hw;
> > +};
> > +
> > +struct bm1880_composite_clock {
> > +   unsigned intid;
> > +   const char  *name;
> > +   const char  *parent;
> > +   const char  * const * parents;
> > +   unsigned intnum_parents;
> > +   unsigned long   flags;
> > +
> > +   u32 gate_reg;
> > +   u32 mux_reg;
> > +   u32 div_reg;
> > +
> > +  

Re: [PATCH 5/6] clk: imx8mn: Add necessary frequency support for ARM PLL table

2019-08-16 Thread Stephen Boyd
Quoting anson.hu...@nxp.com (2019-08-15 03:59:42)
> diff --git a/drivers/clk/imx/clk-imx8mn.c b/drivers/clk/imx/clk-imx8mn.c
> index ecd1062..3f1239a 100644
> --- a/drivers/clk/imx/clk-imx8mn.c
> +++ b/drivers/clk/imx/clk-imx8mn.c
> @@ -82,6 +84,7 @@ static struct imx_pll14xx_clk imx8mn_dram_pll = {
>  static struct imx_pll14xx_clk imx8mn_arm_pll = {
> .type = PLL_1416X,
> .rate_table = imx8mn_pll1416x_tbl,
> +   .rate_count = ARRAY_SIZE(imx8mn_pll1416x_tbl),

Why is rate_count added? That's not described in the commit text.



Re: [PATCH 1/5] dt-bindings: clock: Add Bitmain BM1880 SoC clock controller binding

2019-08-16 Thread Stephen Boyd
Quoting Manivannan Sadhasivam (2019-08-16 20:34:22)
> On Wed, Aug 07, 2019 at 10:01:28PM -0700, Stephen Boyd wrote:
> > Quoting Manivannan Sadhasivam (2019-07-05 08:14:36)
> > > +It is expected that it is defined using standard clock bindings as "osc".
> > > +
> > > +Example: 
> > > +
> > > +clk: clock-controller@800 {
> > > +compatible = "bitmain,bm1880-clk";
> > > +reg = <0xe8 0x0c>,<0x800 0xb0>;
> > 
> > It looks weird still. What hardware module is this actually part of?
> > Some larger power manager block?
> > 
> 
> These are all part of the sysctrl block (clock + pinctrl + reset) and the
> register domains got split between system and pll.
> 

And that can't be one node that probes the clk, pinctrl, and reset
drivers from C code?



Re: [PATCH] arch : arm : add a criteria for pfn_valid

2019-08-16 Thread Matthew Wilcox
On Sat, Aug 17, 2019 at 11:00:13AM +0800, Zhaoyang Huang wrote:
>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>  int pfn_valid(unsigned long pfn)
>  {
> - return memblock_is_map_memory(__pfn_to_phys(pfn));
> + return (pfn > max_pfn) ?
> + false : memblock_is_map_memory(__pfn_to_phys(pfn));
>  }

This is a really awkward way to use the ternary operator.  It's easier to
read if you just:

+   if (pfn > max_pfn)
+   return 0;
return memblock_is_map_memory(__pfn_to_phys(pfn));

(if you really wanted to be clever ... er, obscure, you'd've written:

return (pfn <= max_pfn) && memblock_is_map_memory(__pfn_to_phys(pfn));

... but don't do that)

Also, why is this diverged between arm and arm64?


Re: devm_memremap_pages() triggers a kasan_add_zero_shadow() warning

2019-08-16 Thread Qian Cai



> On Aug 16, 2019, at 5:48 PM, Dan Williams  wrote:
> 
> On Fri, Aug 16, 2019 at 2:36 PM Qian Cai  wrote:
>> 
>> Every so often recently, booting Intel CPU server on linux-next triggers this
>> warning. Trying to figure out if  the commit 7cc7867fb061
>> ("mm/devm_memremap_pages: enable sub-section remap") is the culprit here.
>> 
>> # ./scripts/faddr2line vmlinux devm_memremap_pages+0x894/0xc70
>> devm_memremap_pages+0x894/0xc70:
>> devm_memremap_pages at mm/memremap.c:307
> 
> Previously the forced section alignment in devm_memremap_pages() would
> cause the implementation to never violate the KASAN_SHADOW_SCALE_SIZE
> (12K on x86) constraint.
> 
> Can you provide a dump of /proc/iomem? I'm curious what resource is
> triggering such a small alignment granularity.

This is with memmap=4G!4G ,

# cat /proc/iomem 
-0fff : Reserved
1000-00093fff : System RAM
00094000-0009 : Reserved
000a-000b : PCI Bus :00
000c-000c7fff : Video ROM
000c8000-000cbfff : Adapter ROM
000cc000-000ccfff : Adapter ROM
000e-000f : Reserved
  000f-000f : System ROM
0010-5a7a0fff : System RAM
5a7a1000-5b5e0fff : Reserved
5b5e1000-790fefff : System RAM
  6900-78ff : Crash kernel
790ff000-791fefff : Reserved
791ff000-7b5fefff : ACPI Non-volatile Storage
7b5ff000-7b7fefff : ACPI Tables
7b7ff000-7b7f : System RAM
7b80-8fff : Reserved
  8000-8fff : PCI MMCONFIG  [bus 00-ff]
9000-c7ffbfff : PCI Bus :00
  9000-92af : PCI Bus :01
9000-9000 : :01:00.2
9100-91ff : :01:00.1
9200-927f : :01:00.1
9280-928f : :01:00.2
9290-929f : :01:00.2
92a0-92a7 : :01:00.2
92a8-92a87fff : :01:00.2
92a88000-92a8bfff : :01:00.1
92a8c000-92a8c0ff : :01:00.2
92a8d000-92a8d1ff : :01:00.0
  92b0-92df : PCI Bus :02
92b0-92bf : :02:00.1
  92b0-92bf : igb
92c0-92cf : :02:00.0
  92c0-92cf : igb
92d0-92d03fff : :02:00.1
  92d0-92d03fff : igb
92d04000-92d07fff : :02:00.0
  92d04000-92d07fff : igb
92d8-92df : :02:00.0
  92e0-92ff : PCI Bus :03
92e0-92ef : :03:00.0
  92e0-92ef : hpsa
92f0-92f003ff : :03:00.0
  92f0-92f003ff : hpsa
92f8-92ff : :03:00.0
  9300-930003ff : :00:1d.0
  93001000-930013ff : :00:1a.0
  93003000-93003fff : :00:05.4
c7ffc000-c7ffcfff : dmar1
c800-fbffbfff : PCI Bus :80
  c800-c8000fff : :80:05.4
fbffc000-fbffcfff : dmar0
fec0-fecf : PNP0003:00
  fec0-fec003ff : IOAPIC 0
  fec01000-fec013ff : IOAPIC 1
  fec4-fec403ff : IOAPIC 2
fed0-fed003ff : HPET 0
  fed0-fed003ff : PNP0103:00
fed12000-fed1200f : pnp 00:01
fed12010-fed1201f : pnp 00:01
fed1b000-fed1bfff : pnp 00:01
fed1c000-fed3 : pnp 00:01
fed45000-fed8bfff : pnp 00:01
fee0-feef : pnp 00:01
  fee0-fee00fff : Local APIC
ff80- : Reserved
1-155df : Persistent Memory (legacy)
  1-155df : namespace0.0
155e0-15982bfff : System RAM
  155e0-156a00fa0 : Kernel code
  156a00fa1-15765d67f : Kernel data
  157837000-1597f : Kernel bss
15982c000-1 : Persistent Memory (legacy)
2-87fff : System RAM
  85800-877ff : Crash kernel
380-39f : PCI Bus :00
  39fffe0-39fffef : PCI Bus :02
  390-390 : :00:14.0
  391-3913fff : :00:04.7
  3914000-3917fff : :00:04.6
  3918000-391bfff : :00:04.5
  391c000-391 : :00:04.4
  392-3923fff : :00:04.3
  3924000-3927fff : :00:04.2
  3928000-392bfff : :00:04.1
  392c000-392 : :00:04.0
  3931000-39310ff : :00:1f.3
3a0-3bf : PCI Bus :80
  3b0-3b03fff : :80:04.7
  3b04000-3b07fff : :80:04.6
  3b08000-3b0bfff : :80:04.5
  3b0c000-3b0 : :80:04.4
  3b1-3b13fff : :80:04.3
  3b14000-3b17fff : :80:04.2
  3b18000-3b1bfff : :80:04.1
  3b1c000-3b1 : :80:04.0

> 
> Is it truly only linux-next or does latest mainline have this issue as well?

No idea. I have not had a chance to test it on the mainline yet.



Re: [PATCH 1/5] dt-bindings: clock: Add Bitmain BM1880 SoC clock controller binding

2019-08-16 Thread Manivannan Sadhasivam
Hi Stephen,

On Wed, Aug 07, 2019 at 10:01:28PM -0700, Stephen Boyd wrote:
> Quoting Manivannan Sadhasivam (2019-07-05 08:14:36)
> > Add devicetree binding for Bitmain BM1880 SoC clock controller.
> > 
> > Signed-off-by: Manivannan Sadhasivam 
> > ---
> >  .../bindings/clock/bitmain,bm1880-clk.txt | 47 +++
> 
> Can you convert this to YAML? It's all the rage right now.
> 

Sure.

> >  include/dt-bindings/clock/bm1880-clock.h  | 82 +++
> >  2 files changed, 129 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/clock/bitmain,bm1880-clk.txt
> >  create mode 100644 include/dt-bindings/clock/bm1880-clock.h
> > 
> > diff --git a/Documentation/devicetree/bindings/clock/bitmain,bm1880-clk.txt 
> > b/Documentation/devicetree/bindings/clock/bitmain,bm1880-clk.txt
> > new file mode 100644
> > index ..9c967095d430
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/clock/bitmain,bm1880-clk.txt
> > @@ -0,0 +1,47 @@
> > +* Bitmain BM1880 Clock Controller
> > +
> > +The Bitmain BM1880 clock controler generates and supplies clock to
> > +various peripherals within the SoC.
> > +
> > +Required Properties:
> > +
> > +- compatible: Should be "bitmain,bm1880-clk"
> > +- reg :Register address and size of PLL and SYS control domains
> > +- reg-names : Register domain names: "pll" and "sys"
> > +- clocks : Phandle of the input reference clock.
> > +- #clock-cells: Should be 1.
> > +
> > +Each clock is assigned an identifier, and client nodes can use this 
> > identifier
> > +to specify the clock which they consume.
> > +
> > +All available clocks are defined as preprocessor macros in corresponding
> > +dt-bindings/clock/bm1880-clock.h header and can be used in device tree 
> > sources.
> > +
> > +External clocks:
> > +
> > +The osc clock used as the input for the plls is generated outside the SoC.
> > +It is expected that it is defined using standard clock bindings as "osc".
> > +
> > +Example: 
> > +
> > +clk: clock-controller@800 {
> > +compatible = "bitmain,bm1880-clk";
> > +reg = <0xe8 0x0c>,<0x800 0xb0>;
> 
> It looks weird still. What hardware module is this actually part of?
> Some larger power manager block?
> 

These are all part of the sysctrl block (clock + pinctrl + reset) and the
register domains got split between system and pll.

Thanks,
Mani

> > +reg-names = "pll", "sys";
> > +clocks = <>;
> > +#clock-cells = <1>;
> > +};
> > +


Re: [PATCH] Partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"

2019-08-16 Thread Yafang Shao
On Sat, Aug 17, 2019 at 8:47 AM Roman Gushchin  wrote:
>
> Commit 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync
> with the hierarchical ones") effectively decreased the precision of
> per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters.
>
> That's good for displaying in memory.stat, but brings a serious regression
> into the reclaim process.
>
> One issue I've discovered and debugged is the following:
> lruvec_lru_size() can return 0 instead of the actual number of pages
> in the lru list, preventing the kernel to reclaim last remaining
> pages. Result is yet another dying memory cgroups flooding.
> The opposite is also happening: scanning an empty lru list
> is the waste of cpu time.
>
> Also, inactive_list_is_low() can return incorrect values, preventing
> the active lru from being scanned and freed. It can fail both because
> the size of active and inactive lists are inaccurate, and because
> the number of workingset refaults isn't precise. In other words,
> the result is pretty random.
>
> I'm not sure, if using the approximate number of slab pages in
> count_shadow_number() is acceptable, but issues described above
> are enough to partially revert the patch.
>
> Let's keep per-memcg vmstat_local batched (they are only used for
> displaying stats to the userspace), but keep lruvec stats precise.
> This change fixes the dead memcg flooding on my setup.
>

That will make some misunderstanding if the local counters are not in
sync with the hierarchical ones
(someone may doubt whether there're something leaked.).
If we have to do it like this, I think we should better document this behavior.

> Fixes: 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with 
> the hierarchical ones")
> Signed-off-by: Roman Gushchin 
> Cc: Yafang Shao 
> Cc: Johannes Weiner 
> ---
>  mm/memcontrol.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 249187907339..3429340adb56 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -746,15 +746,13 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum 
> node_stat_item idx,
> /* Update memcg */
> __mod_memcg_state(memcg, idx, val);
>
> +   /* Update lruvec */
> +   __this_cpu_add(pn->lruvec_stat_local->count[idx], val);
> +
> x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]);
> if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
> struct mem_cgroup_per_node *pi;
>
> -   /*
> -* Batch local counters to keep them in sync with
> -* the hierarchical ones.
> -*/
> -   __this_cpu_add(pn->lruvec_stat_local->count[idx], x);
> for (pi = pn; pi; pi = parent_nodeinfo(pi, pgdat->node_id))
> atomic_long_add(x, >lruvec_stat[idx]);
> x = 0;
> --
> 2.21.0
>


Re: [PATCH net-next 1/3] net: mdio: add support for passing a PTP system timestamp to the mii_bus driver

2019-08-16 Thread Richard Cochran
On Fri, Aug 16, 2019 at 06:31:55PM +0200, Hubert Feurstein wrote:
>  
>  int __mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
>  int __mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
> +int __mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
> + struct ptp_system_timestamp *sts);
>  
>  int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
>  int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum);
>  int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
>  int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val);
> +int mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
> +   struct ptp_system_timestamp *sts);
> +int mdiobus_write_sts_nested(struct mii_bus *bus, int addr, u32 regnum, u16 
> val,
> +  struct ptp_system_timestamp *sts);

Following the pattern, you have made three new global
mdiobus_write_sts() functions.  However, your patch set only uses
mdiobus_write_sts_nested().

Please don't add global functions with no users.  Let the first user
add them, if and when the need arises.

Thanks,
Richard


Re: [rcu:from-joel.2019.08.16a 143/172] kernel/rcu/tree.c:2808:6: note: in expansion of macro 'xchg'

2019-08-16 Thread Paul Walmsley
On Fri, 16 Aug 2019, Joel Fernandes wrote:

> On Sat, Aug 17, 2019 at 05:10:59AM +0800, kbuild test robot wrote:
> > tree:   
> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> >  from-joel.2019.08.16a
> > head:   01b0e4d3e0ac279b295bc06a3591f0b810b9908f
> > commit: bda80ba9decc7a32413e88d2f070de180c4b76ab [143/172] rcu/tree: Add 
> > basic support for kfree_rcu() batching
> > config: riscv-defconfig (attached as .config)
> > compiler: riscv64-linux-gcc (GCC) 7.4.0
> > reproduce:
> > wget 
> > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> > ~/bin/make.cross
> > chmod +x ~/bin/make.cross
> > git checkout bda80ba9decc7a32413e88d2f070de180c4b76ab
> > # save the attached .config to linux build tree
> > GCC_VERSION=7.4.0 make.cross ARCH=riscv 
> 
> This seems to be a riscv issue:
> 
> A call to '__compiletime_assert_2792' declared with attribute error:
> BUILD_BUG failed
> 
> Could riscv folks take a look at it? Why is using xchg() causing issues? The
> xchg() is being done on a bool.

sizeof(bool) = 1, and the only xchg() implementation that's currently 
present on all RISC-V in Linux is usable only on 32-bit types.  SPARC, 
Microblaze, C-SKY, and Hexagon all look like they have the same 
limitation.  You'll probably see similar build breakage for those 
platforms too.

MIPS, OpenRISC, Xtensa, and SuperH have compatibility functions for the 
8-bit and 16-bit cases.  We could add those for RISC-V.  However, they'll 
be slower than the 4- and 8-byte cases.  This is both because there's more 
code involved, and because there's an increased risk of false-sharing.

Can you use a u32 instead for xchg/cmpxchg operations like struct 
kfree_rcu_cpu.monitor_todo that are intended to be cross-platform?  Based 
on a cursory glance through the tree, it looks like there's only one 
cross-platform driver that uses 'true' or 'false' with the kernel 
xchg/cmpxchg, and it uses an int as the underlying data type.

I suppose one could typedef "xchg_bool" on a per-architecture basis, to be 
smallest type that makes sense to use in an xchg() or cmpxchg().  That 
might save some memory on architectures that have a fast 1-byte 
xchg()/cmpxchg().  But given the lack of users in the tree, I'm not sure 
it's worth the effort.


- Paul


[PATCH] arch : arm : add a criteria for pfn_valid

2019-08-16 Thread Zhaoyang Huang
From: Zhaoyang Huang 

pfn_valid can be wrong while the MSB of physical address be trimed as pfn
larger than the max_pfn.

Signed-off-by: Zhaoyang Huang 
---
 arch/arm/mm/init.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index c2daabb..9c4d938 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -177,7 +177,8 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max_low,
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
 int pfn_valid(unsigned long pfn)
 {
-   return memblock_is_map_memory(__pfn_to_phys(pfn));
+   return (pfn > max_pfn) ?
+   false : memblock_is_map_memory(__pfn_to_phys(pfn));
 }
 EXPORT_SYMBOL(pfn_valid);
 #endif
-- 
1.9.1



Re: [RFC PATCH v2 2/3] mm/gup: introduce FOLL_PIN flag for get_user_pages()

2019-08-16 Thread John Hubbard
On 8/16/19 7:24 PM, jhubb...@nvidia.com wrote:
> From: John Hubbard 
> DKIM-Signature: v a a-sha256; claxed/relaxed; d idia.com; s;
>   t66008674; bhMai0va6k/z2enpQJ4Nfvbj5WByFxGAO1JwdIBbXioh 
> PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer:
>In-Reply-To:References:MIME-Version:X-NVConfidentiality:
>Content-Transfer-Encoding:Content-Type;
>   bÖUDSde9XF/IsNteBaYOBWeKiHhWmeU9ekUJNvCviHssBDCtw0T+M/2TlEPEzomIT
>fGXzIQNlGN6MXFbaBoyBmF/zjCu02TmTNExbVJ3/5N6PTyOuJFCx9ZN1/5gXsB11m1
>xAHIWE+VOZs4qqDeHDBqKZq+FaxQHNvGz0j6lyVBA70TfseNoZqZZrSil8uvaKJwKd
>TQ1ht+AGWbw9p610JmaPb4u6o/eV6Ns8Sl3EVnjWWu94T6ISNIaWCiC6wQQF6L1YCH
>G5Pjn+0rEjhk6XG4TyLudi5lWp3IVBHd8+WlWlnl+bvLCC55RUAjPJLn7LaVyVdh0F
>nLHwm3bN2Jotg

I cannot readily explain the above email glitch, but I did just now switch
back to mailgw.nvidia.com for this patchset, in order to get the nice behavior
of having "From:" really be my native NVIDIA email address. That's very nice,
but if the glitches happen again, I'll switch back to using gmail for 
git-send-email.

Sorry about the weirdness. It does still let you apply the patch, I
just now checked on that.

thanks,
-- 
John Hubbard
NVIDIA



Re: [RFC PATCH v2 00/19] RDMA/FS DAX truncate proposal V1,000,002 ;-)

2019-08-16 Thread Dave Chinner
On Fri, Aug 16, 2019 at 12:05:28PM -0700, Ira Weiny wrote:
> On Thu, Aug 15, 2019 at 03:05:58PM +0200, Jan Kara wrote:
> > On Wed 14-08-19 11:08:49, Ira Weiny wrote:
> > > On Wed, Aug 14, 2019 at 12:17:14PM +0200, Jan Kara wrote:
> 2) Second reason is that I thought I did not have a good way to tell if the
>lease was actually in use.  What I mean is that letting the lease go should
>be ok IFF we don't have any pins...  I was thinking that without John's 
> code
>we don't have a way to know if there are any pins...  But that is wrong...
>All we have to do is check
> 
>   !list_empty(file->file_pins)
> 
> So now with this detail I think you are right, we should be able to hold the
> lease through the struct file even if the process no longer has any
> "references" to it (ie closes and munmaps the file).

I really, really dislike the idea of zombie layout leases. It's a
nasty hack for poor application behaviour. This is a "we allow use
after layout lease release" API, and I think encoding largely
untraceable zombie objects into an API is very poor design.

>From the fcntl man page:

LEASES
Leases are associated with an open file description (see
open(2)).  This means that duplicate file descriptors
(created by, for example, fork(2) or dup(2))  re‐ fer  to
the  same  lease,  and this lease may be modified or
released using any of these descriptors.  Furthermore, the
lease is released by either an explicit F_UNLCK operation on
any of these duplicate file descriptors, or when all such
file descriptors have been closed.

Leases are associated with *open* file descriptors, not the
lifetime of the struct file in the kernel. If the application closes
the open fds that refer to the lease, then the kernel does not
guarantee, and the application has no right to expect, that the
lease remains active in any way once the application closes all
direct references to the lease.

IOWs, applications using layout leases need to hold the lease fd
open for as long as the want access to the physical file layout. It
is a also a requirement of the layout lease that the holder releases
the resources it holds on the layout before it releases the layout
lease, exclusive lease or not. Closing the fd indicates they do not
need access to the file any more, and so the lease should be
reclaimed at that point.

I'm of a mind to make the last close() on a file block if there's an
active layout lease to prevent processes from zombie-ing layout
leases like this. i.e. you can't close the fd until resources that
pin the lease have been released.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


[PATCH net-next] net: hns: add phy_attached_info() to the hns driver

2019-08-16 Thread Yonglong Liu
This patch adds the call to phy_attached_info() to the hns driver
to identify which exact PHY drivers is in use.

Suggested-by: Heiner Kallweit 
Signed-off-by: Yonglong Liu 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 1545536..a48396d 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -1182,6 +1182,8 @@ int hns_nic_init_phy(struct net_device *ndev, struct 
hnae_handle *h)
if (unlikely(ret))
return -ENODEV;
 
+   phy_attached_info(phy_dev);
+
return 0;
 }
 
-- 
2.8.1



[RFC PATCH v2 0/3] mm/gup: introduce vaddr_pin_pages_remote(), FOLL_PIN

2019-08-16 Thread jhubbard
From: John Hubbard 

Hi Ira,

As requested, this is for your tree:
https://github.com/weiny2/linux-kernel.git (mmotm-rdmafsdax-b0-v4), to be
applied at your last authored commit, which is: commit f625f92ecfb4
("mm/gup: Remove FOLL_LONGTERM DAX exclusion"). In other words, please
delete my previous patches from the tree, and apply these replacement
patches.

This now has a user for the new vaddr_pin_user_pages_remote() call. And
it also moves the gup flag setting out to the caller.

I'm pretty pleased to be able to include a bit of documentation (see the
FOLL_PIN patch) that covers those four cases. This should really help
clarify things. Thanks to Jan Kara and Vlastimil Babka for providing
the meaingful core of that documentation.

The naming can of course be tweaked to match whatever the final is. For
now, I've used vaddr_pin_user_pages_remote(). That addresses Jason's
request for a "user" in the name, and it also makes it very clear that
it's a replacement for get_user_pages_remote().

v1 of this RFC is here:
https://lore.kernel.org/r/20190812015044.26176-1-jhubb...@nvidia.com

John Hubbard (3):
  For Ira: tiny formatting tweak to kerneldoc
  mm/gup: introduce FOLL_PIN flag for get_user_pages()
  mm/gup: introduce vaddr_pin_pages_remote(), and invoke it

 include/linux/mm.h | 61 +-
 mm/gup.c   | 37 +++--
 mm/process_vm_access.c | 23 +---
 3 files changed, 104 insertions(+), 17 deletions(-)

-- 
2.22.1



[RFC PATCH v2 2/3] mm/gup: introduce FOLL_PIN flag for get_user_pages()

2019-08-16 Thread jhubbard
From: John Hubbard 
DKIM-Signature: v a*
- * In the CMA case: longterm pins in a CMA region would unnecessarily fragment
- * that region.  And so CMA attempts to migrate the page before pinning when
+ * In the CMA case: long term pins in a CMA region would unnecessarily fragment
+ * that region.  And so, CMA attempts to migrate the page before pinning, when
  * FOLL_LONGTERM is specified.
+ *
+ * FOLL_PIN indicates that a special kind of tracking (not just 
page->_refcount,
+ * but an additional pin counting system) will be invoked. This is intended for
+ * anything that gets a page reference and then touches page data (for example,
+ * Direct IO). This lets the filesystem know that some non-file-system entity 
is
+ * potentially changing the pages' data. FOLL_PIN pages must be released,
+ * ultimately, by a call to put_user_page(). Typically that will be via one of
+ * the vaddr_unpin_pages() variants.
+ *
+ * FIXME: note that this special tracking is not in place yet. However, the
+ * pages should still be released by put_user_page().
+ *
+ * When and where to use each flag:
+ *
+ * CASE 1: Direct IO (DIO). There are GUP references to pages that are serving
+ * as DIO buffers. These buffers are needed for a relatively short time (so 
they
+ * are not "long term"). No special synchronization with page_mkclean() or
+ * munmap() is provided. Therefore, flags to set at the call site are:
+ *
+ * FOLL_PIN
+ *
+ * CASE 2: RDMA. There are GUP references to pages that are serving as DMA
+ * buffers. These buffers are needed for a long time ("long term"). No special
+ * synchronization with page_mkclean() or munmap() is provided. Therefore, 
flags
+ * to set at the call site are:
+ *
+ * FOLL_PIN | FOLL_LONGTERM
+ *
+ * There is also a special case when the pages are DAX pages: in addition to 
the
+ * above flags, the caller needs a file lease. This is provided via the struct
+ * vaddr_pin argument to vaddr_pin_pages().
+ *
+ * CASE 3: ODP (Mellanox/Infiniband On Demand Paging: the hardware supports
+ * replayable page faulting). There are GUP references to pages serving as DMA
+ * buffers. For ODP, MMU notifiers are used to synchronize with page_mkclean()
+ * and munmap(). Therefore, normal GUP calls are sufficient, so neither flag
+ * needs to be set.
+ *
+ * CASE 4: pinning for struct page manipulation only. Here, normal GUP calls 
are
+ * sufficient, so neither flag needs to be set.
  */
 
 static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
-- 
2.22.1



[RFC PATCH v2 3/3] mm/gup: introduce vaddr_pin_pages_remote(), and invoke it

2019-08-16 Thread jhubbard
From: John Hubbard 

vaddr_pin_user_pages_remote() is the "vaddr_pin_pages" corresponding
variant to get_user_pages_remote(): it adds the ability to handle
FOLL_PIN, FOLL_LONGTERM, or both.

Note that the put_user_page*() requirement won't be truly required until
all of the call sites have been converted, and the tracking of pages is
activated.

Also, change process_vm_rw_single_vec() to invoke the new function.

Signed-off-by: John Hubbard 
---
 include/linux/mm.h |  5 +
 mm/gup.c   | 33 +
 mm/process_vm_access.c | 23 ++-
 3 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6e7de424bf5e..849b509e9f89 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1606,6 +1606,11 @@ int __account_locked_vm(struct mm_struct *mm, unsigned 
long pages, bool inc,
 long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
 unsigned int gup_flags, struct page **pages,
 struct vaddr_pin *vaddr_pin);
+long vaddr_pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+unsigned long start, unsigned long nr_pages,
+unsigned int gup_flags, struct page **pages,
+struct vm_area_struct **vmas, int *locked,
+struct vaddr_pin *vaddr_pin);
 void vaddr_unpin_pages(struct page **pages, unsigned long nr_pages,
   struct vaddr_pin *vaddr_pin, bool make_dirty);
 bool mapping_inode_has_layout(struct vaddr_pin *vaddr_pin, struct page *page);
diff --git a/mm/gup.c b/mm/gup.c
index e49096d012ea..d7ce9b38178f 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2522,3 +2522,36 @@ void vaddr_unpin_pages(struct page **pages, unsigned 
long nr_pages,
__put_user_pages_dirty_lock(pages, nr_pages, make_dirty, vaddr_pin);
 }
 EXPORT_SYMBOL(vaddr_unpin_pages);
+
+/**
+ * vaddr_pin_user_pages_remote() - pin pages by virtual address and return the
+ * pages to the user.
+ *
+ * @tsk:   the task_struct to use for page fault accounting, or
+ * NULL if faults are not to be recorded.
+ * @mm:mm_struct of target mm
+ * @addr:  start address
+ * @nr_pages:  number of pages to pin
+ * @gup_flags: flags to use for the pin. Please see FOLL_* documentation in
+ * mm.h.
+ * @pages: array of pages returned
+ * @vaddr_pin:  If FOLL_LONGTERM is set, then vaddr_pin should point to an
+ * initialized struct that contains the owning mm and file. Otherwise, 
vaddr_pin
+ * should be set to NULL.
+ *
+ * This is the "vaddr_pin_pages" corresponding variant to
+ * get_user_pages_remote(), but with the ability to handle FOLL_PIN,
+ * FOLL_LONGTERM, or both.
+ */
+long vaddr_pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+unsigned long start, unsigned long nr_pages,
+unsigned int gup_flags, struct page **pages,
+struct vm_area_struct **vmas, int *locked,
+struct vaddr_pin *vaddr_pin)
+{
+   gup_flags |= FOLL_TOUCH | FOLL_REMOTE;
+
+   return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
+  locked, gup_flags, vaddr_pin);
+}
+EXPORT_SYMBOL(vaddr_pin_user_pages_remote);
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 357aa7bef6c0..e08c1f760ad4 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -96,7 +96,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
flags |= FOLL_WRITE;
 
while (!rc && nr_pages && iov_iter_count(iter)) {
-   int pages = min(nr_pages, max_pages_per_loop);
+   int pinned_pages = min(nr_pages, max_pages_per_loop);
int locked = 1;
size_t bytes;
 
@@ -106,14 +106,18 @@ static int process_vm_rw_single_vec(unsigned long addr,
 * current/current->mm
 */
down_read(>mmap_sem);
-   pages = get_user_pages_remote(task, mm, pa, pages, flags,
- process_pages, NULL, );
+
+   flags |= FOLL_PIN;
+   pinned_pages = vaddr_pin_user_pages_remote(task, mm, pa,
+  pinned_pages, flags,
+  process_pages, NULL,
+  , NULL);
if (locked)
up_read(>mmap_sem);
-   if (pages <= 0)
+   if (pinned_pages <= 0)
return -EFAULT;
 
-   bytes = pages * PAGE_SIZE - start_offset;
+   bytes = pinned_pages * PAGE_SIZE - start_offset;
if (bytes > len)
 

[RFC PATCH v2 1/3] For Ira: tiny formatting tweak to kerneldoc

2019-08-16 Thread jhubbard
From: John Hubbard 

For your vaddr_pin_pages() and vaddr_unpin_pages().
Just merge it into wherever it goes please. Didn't want to
cause merge problems so it's a separate patch-let.

Signed-off-by: John Hubbard 
---
 mm/gup.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 56421b880325..e49096d012ea 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2465,7 +2465,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
 EXPORT_SYMBOL_GPL(get_user_pages_fast);
 
 /**
- * vaddr_pin_pages pin pages by virtual address and return the pages to the
+ * vaddr_pin_pages() - pin pages by virtual address and return the pages to the
  * user.
  *
  * @addr: start address
@@ -2505,7 +2505,7 @@ long vaddr_pin_pages(unsigned long addr, unsigned long 
nr_pages,
 EXPORT_SYMBOL(vaddr_pin_pages);
 
 /**
- * vaddr_unpin_pages - counterpart to vaddr_pin_pages
+ * vaddr_unpin_pages() - counterpart to vaddr_pin_pages
  *
  * @pages: array of pages returned
  * @nr_pages: number of pages in pages
-- 
2.22.1



Re: [PATCH] ASoC: imx-audmux: Add driver suspend and resume to support MEGA Fast

2019-08-16 Thread S.j. Wang
Hi Mark

> 
> On Fri, Aug 16, 2019 at 01:03:14AM -0400, Shengjiu Wang wrote:
> 
> > +   for (i = 0; i < reg_max; i++)
> > +   regcache[i] = readl(audmux_base + i * 4);
> 
> If only there were some framework which provided a register cache!  

Yes, next step I can refine this driver to use the regmap.

Best regards
Wang shengjiu


Re: [PATCH net] net: hns: add phy_attached_info() to the hns driver

2019-08-16 Thread Yonglong Liu
Please ignore this patch, it is not bugfix, should send to net-next.
Sorry for the noise.

On 2019/8/17 9:56, Yonglong Liu wrote:
> This patch add the call to phy_attached_info() to the hns driver
> to identify which exact PHY drivers is in use.
> 
> Signed-off-by: Yonglong Liu 
> ---
>  drivers/net/ethernet/hisilicon/hns/hns_enet.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
> b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
> index 2235dd5..ab5118d 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
> @@ -1182,6 +1182,8 @@ int hns_nic_init_phy(struct net_device *ndev, struct 
> hnae_handle *h)
>   if (unlikely(ret))
>   return -ENODEV;
>  
> + phy_attached_info(phy_dev);
> +
>   return 0;
>  }
>  
> 



Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

2019-08-16 Thread Steven Rostedt
On Fri, 16 Aug 2019 21:36:49 -0400 (EDT)
Mathieu Desnoyers  wrote:

> - On Aug 16, 2019, at 5:04 PM, Linus Torvalds 
> torva...@linux-foundation.org wrote:
> 
> > On Fri, Aug 16, 2019 at 1:49 PM Thomas Gleixner  wrote: 
> >  
> >>
> >> Can we finally put a foot down and tell compiler and standard committee
> >> people to stop this insanity?  
> > 
> > It's already effectively done.
> > 
> > Yes, values can be read from memory multiple times if they need
> > reloading. So "READ_ONCE()" when the value can change is a damn good
> > idea.
> > 
> > But it should only be used if the value *can* change. Inside a locked
> > region it is actively pointless and misleading.
> > 
> > Similarly, WRITE_ONCE() should only be used if you have a _reason_ for
> > using it (notably if you're not holding a lock).
> > 
> > If people use READ_ONCE/WRITE_ONCE when there are locks that prevent
> > the values from changing, they are only making the code illegible.
> > Don't do it.  
> 
> I agree with your argument in the case where both read-side and write-side
> are protected by the same lock: READ/WRITE_ONCE are useless then. However,
> in the scenario we have here, only the write-side is protected by the lock
> against concurrent updates, but reads don't take any lock.

And because reads are not protected by any lock or memory barrier,
using READ_ONCE() is pointless. The CPU could be doing a lot of hidden
manipulation of that variable too.

Again, this is just to enable caching of the comm. Nothing more. It's a
"best effort" algorithm. We get it, we then can map a pid to a name. If
not, we just have a pid and we map "<...>".

Next you'll be asking for the memory barriers to guarantee a real hit.
And honestly, this information is not worth any overhead.

And most often we enable this before we enable the tracepoint we want
this information from, which requires modification of the text area and
will do a bunch of syncs across CPUs. That alone will most likely keep
any race from happening.

The only real bug is the check to know if we should add the probe or
not was done outside the lock, and when we hit the race, we could add a
probe twice, causing the kernel to spit out a warning. You fixed that,
and that's all that needs to be done. I'm now even more against adding
the READ_ONCE() or WRITE_ONCE().


-- Steve



> 
> If WRITE_ONCE has any use at all (protecting against store tearing and
> invented stores), it should be used even with a lock held in this
> scenario, because the lock does not prevent READ_ONCE() from observing
> transient states caused by lack of WRITE_ONCE() for the update.
> 
> So why does WRITE_ONCE exist in the first place ? Is it for documentation
> purposes only or are there actual situations where omitting it can cause
> bugs with real-life compilers ?
> 
> In terms of code change, should we favor only introducing WRITE_ONCE
> in new code, or should existing code matching those conditions be
> moved to WRITE_ONCE as bug fixes ?
> 
>


[PATCH v2 RFC 3/2] fstests: check that we can't write to swap files

2019-08-16 Thread Darrick J. Wong
From: Darrick J. Wong 

While active, the media backing a swap file is leased to the kernel.
Userspace has no business writing to it.  Make sure we can't do this.

Signed-off-by: Darrick J. Wong 
---
v2: add tests for writable fds after swapon
---
 src/swapon.c  |  135 -
 tests/generic/717 |   70 +
 tests/generic/717.out |   14 +
 tests/generic/718 |   55 
 tests/generic/718.out |   12 
 tests/generic/group   |2 +
 6 files changed, 284 insertions(+), 4 deletions(-)
 create mode 100755 tests/generic/717
 create mode 100644 tests/generic/717.out
 create mode 100755 tests/generic/718
 create mode 100644 tests/generic/718.out

diff --git a/src/swapon.c b/src/swapon.c
index 0cb7108a..afaed405 100644
--- a/src/swapon.c
+++ b/src/swapon.c
@@ -3,22 +3,149 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static void usage(const char *prog)
+{
+   fprintf(stderr, "usage: %s [-v verb] PATH\n", prog);
+   exit(EXIT_FAILURE);
+}
+
+enum verbs {
+   TEST_SWAPON = 0,
+   TEST_WRITE,
+   TEST_MWRITE_AFTER,
+   TEST_MWRITE_BEFORE_AND_MWRITE_AFTER,
+   TEST_MWRITE_BEFORE,
+   MAX_TEST_VERBS,
+};
+
+#define BUF_SIZE 262144
+static char buf[BUF_SIZE];
+
+static void handle_signal(int signal)
+{
+   fprintf(stderr, "Caught signal %d, terminating...\n", signal);
+   exit(EXIT_FAILURE);
+}
 
 int main(int argc, char **argv)
 {
-   int ret;
+   struct sigaction act = {
+   .sa_handler = handle_signal,
+   };
+   enum verbs verb = TEST_SWAPON;
+   void *p;
+   ssize_t sz;
+   int fd = -1;
+   int ret, c;
+
+   memset(buf, 0x58, BUF_SIZE);
+
+   while ((c = getopt(argc, argv, "v:")) != -1) {
+   switch (c) {
+   case 'v':
+   verb = atoi(optarg);
+   if (verb < TEST_SWAPON || verb >= MAX_TEST_VERBS) {
+   fprintf(stderr, "Verbs must be 0-%d.\n",
+   MAX_TEST_VERBS - 1);
+   usage(argv[0]);
+   }
+   break;
+   default:
+   usage(argv[0]);
+   break;
+   }
+   }
 
-   if (argc != 2) {
-   fprintf(stderr, "usage: %s PATH\n", argv[0]);
+   ret = sigaction(SIGSEGV, , NULL);
+   if (ret) {
+   perror("sigsegv action");
return EXIT_FAILURE;
}
 
-   ret = swapon(argv[1], 0);
+   ret = sigaction(SIGBUS, , NULL);
+   if (ret) {
+   perror("sigbus action");
+   return EXIT_FAILURE;
+   }
+
+   switch (verb) {
+   case TEST_WRITE:
+   case TEST_MWRITE_AFTER:
+   case TEST_MWRITE_BEFORE_AND_MWRITE_AFTER:
+   case TEST_MWRITE_BEFORE:
+   fd = open(argv[optind], O_RDWR);
+   if (fd < 0) {
+   perror(argv[optind]);
+   return EXIT_FAILURE;
+   }
+   break;
+   default:
+   break;
+   }
+
+   switch (verb) {
+   case TEST_MWRITE_BEFORE_AND_MWRITE_AFTER:
+   case TEST_MWRITE_BEFORE:
+   p = mmap(NULL, BUF_SIZE, PROT_WRITE | PROT_READ, MAP_SHARED,
+   fd, 65536);
+   if (p == MAP_FAILED) {
+   perror("mmap");
+   return EXIT_FAILURE;
+   }
+   memcpy(p, buf, BUF_SIZE);
+   break;
+   default:
+   break;
+   }
+
+   if (optind != argc - 1)
+   usage(argv[0]);
+
+   ret = swapon(argv[optind], 0);
if (ret) {
perror("swapon");
return EXIT_FAILURE;
}
 
+   switch (verb) {
+   case TEST_WRITE:
+   sz = pwrite(fd, buf, BUF_SIZE, 65536);
+   if (sz < 0) {
+   perror("pwrite");
+   return EXIT_FAILURE;
+   }
+   break;
+   case TEST_MWRITE_AFTER:
+   p = mmap(NULL, BUF_SIZE, PROT_WRITE | PROT_READ, MAP_SHARED,
+   fd, 65536);
+   if (p == MAP_FAILED) {
+   perror("mmap");
+   return EXIT_FAILURE;
+   }
+   /* fall through */
+   case TEST_MWRITE_BEFORE_AND_MWRITE_AFTER:
+   memcpy(p, buf, BUF_SIZE);
+   break;
+   default:
+   break;
+   }
+
+   if (fd >= 0) {
+   ret = fsync(fd);
+   if (ret)
+   perror("fsync");
+   ret = close(fd);
+   if (ret)
+   perror("close");
+   }
+
return EXIT_SUCCESS;
 }
diff 

[PATCH net] net: hns: add phy_attached_info() to the hns driver

2019-08-16 Thread Yonglong Liu
This patch add the call to phy_attached_info() to the hns driver
to identify which exact PHY drivers is in use.

Signed-off-by: Yonglong Liu 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 2235dd5..ab5118d 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -1182,6 +1182,8 @@ int hns_nic_init_phy(struct net_device *ndev, struct 
hnae_handle *h)
if (unlikely(ret))
return -ENODEV;
 
+   phy_attached_info(phy_dev);
+
return 0;
 }
 
-- 
2.8.1



[rcu:from-joel.2019.08.16a 143/172] tree.c:undefined reference to `__bad_xchg'

2019-08-16 Thread kbuild test robot
tree:   
https://kernel.googlesource.com/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 
from-joel.2019.08.16a
head:   01b0e4d3e0ac279b295bc06a3591f0b810b9908f
commit: bda80ba9decc7a32413e88d2f070de180c4b76ab [143/172] rcu/tree: Add basic 
support for kfree_rcu() batching
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 7.4.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout bda80ba9decc7a32413e88d2f070de180c4b76ab
# save the attached .config to linux build tree
GCC_VERSION=7.4.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   kernel/rcu/tree.o: In function `kfree_rcu_monitor':
>> tree.c:(.text+0x27d8): undefined reference to `__bad_xchg'
   kernel/rcu/tree.o: In function `kfree_call_rcu':
   tree.c:(.text+0xac70): undefined reference to `__bad_xchg'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

2019-08-16 Thread Mathieu Desnoyers
- On Aug 16, 2019, at 6:57 PM, Linus Torvalds torva...@linux-foundation.org 
wrote:

> So in general, we very much expect the compiler to do sane code
> generation, and not (for example) do store tearing on normal
> word-sized things or add writes that weren't there originally etc.

My understanding of https://lwn.net/Articles/793253/ section "Store tearing"
which points at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56028 seems
to contradict your expectation at least when writing constants to a
64-bit word without a volatile access.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

2019-08-16 Thread Mathieu Desnoyers
- On Aug 16, 2019, at 5:04 PM, Linus Torvalds torva...@linux-foundation.org 
wrote:

> On Fri, Aug 16, 2019 at 1:49 PM Thomas Gleixner  wrote:
>>
>> Can we finally put a foot down and tell compiler and standard committee
>> people to stop this insanity?
> 
> It's already effectively done.
> 
> Yes, values can be read from memory multiple times if they need
> reloading. So "READ_ONCE()" when the value can change is a damn good
> idea.
> 
> But it should only be used if the value *can* change. Inside a locked
> region it is actively pointless and misleading.
> 
> Similarly, WRITE_ONCE() should only be used if you have a _reason_ for
> using it (notably if you're not holding a lock).
> 
> If people use READ_ONCE/WRITE_ONCE when there are locks that prevent
> the values from changing, they are only making the code illegible.
> Don't do it.

I agree with your argument in the case where both read-side and write-side
are protected by the same lock: READ/WRITE_ONCE are useless then. However,
in the scenario we have here, only the write-side is protected by the lock
against concurrent updates, but reads don't take any lock.

If WRITE_ONCE has any use at all (protecting against store tearing and
invented stores), it should be used even with a lock held in this
scenario, because the lock does not prevent READ_ONCE() from observing
transient states caused by lack of WRITE_ONCE() for the update.

So why does WRITE_ONCE exist in the first place ? Is it for documentation
purposes only or are there actual situations where omitting it can cause
bugs with real-life compilers ?

In terms of code change, should we favor only introducing WRITE_ONCE
in new code, or should existing code matching those conditions be
moved to WRITE_ONCE as bug fixes ?

Thanks,

Mathieu

> 
> But in the *absence* of locking, READ_ONCE/WRITE_ONCE is usually a
> good thing.  The READ_ONCE actually tends to matter, because even if
> the value is used only once at a source level, the compiler *could*
> decide to do something else.
> 
> The WRITE_ONCE() may or may not matter (afaik, thanks to concurrency,
> modern C standard does not allow optimistic writes anyway, and we
> wouldn't really accept such a compiler option if it did).
> 
> But if the write is done without locking, it's good practice just to
> show you are aware of the whole "done without locking" part.
> 
>   Linus

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH v4 1/2] rcu/tree: Add basic support for kfree_rcu() batching

2019-08-16 Thread Joel Fernandes
Hi Paul,

On Fri, Aug 16, 2019 at 3:16 PM Paul E. McKenney  wrote:
> > > Hello, Joel,
> > >
> > > I reworked the commit log as follows, but was then unsuccessful in
> > > working out which -rcu commit to apply it to.  Could you please
> > > tell me what commit to apply this to?  (Once applied, git cherry-pick
> > > is usually pretty good about handling minor conflicts.)
> >
> > It was originally based on v5.3-rc2
> >
> > I was able to apply it just now to the rcu -dev branch and I pushed it here:
> > https://github.com/joelagnel/linux-kernel.git (branch paul-dev)
> >
> > Let me know if any other issues, thanks for the change log rework!
>
> Pulled and cherry-picked, thank you!
>
> Just for grins, I also  pushed out a from-joel.2019.08.16a showing the
> results of the pull.  If you pull that branch, then run something like
> "gitk v5.3-rc2..", and then do the same with branch "dev", comparing the
> two might illustrate some of the reasons for the current restrictions
> on pull requests and trees subject to rebase.

Right, I did the compare and see what you mean. I guess sending any
future pull requests against Linux -next would be the best option?

thanks,

 - Joel


Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

2019-08-16 Thread Mathieu Desnoyers
- On Aug 16, 2019, at 4:49 PM, rostedt rost...@goodmis.org wrote:

> On Fri, 16 Aug 2019 16:44:10 -0400
> Joel Fernandes  wrote:
> 
> 
>> I am also more on the side of using *_ONCE. To me, by principal, I
>> would be willing to convert any concurrent plain access using _ONCE,
>> just so we don't have to worry about it now or in the future and also
>> documents the access.
>> 
>> Perhaps the commit message can be reworded to mention that the _ONCE
>> is an additional clean up for safe access.
> 
> The most I'll take is two separate patches. One is going to be marked
> for stable as it fixes a real bug. The other is more for cosmetic or
> theoretical issues, that I will state clearly "NOT FOR STABLE", such
> that the autosel doesn't take them.

Splitting this into two separate patches makes perfect sense.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


[GIT PULL] RISC-V updates for v5.3-rc5

2019-08-16 Thread Paul Walmsley


Linus,

The following changes since commit d45331b00ddb179e291766617259261c112db872:

  Linux 5.3-rc4 (2019-08-11 13:26:41 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git 
tags/riscv/for-v5.3-rc5

for you to fetch changes up to 69703eb9a8ae28a46cd5bce7d69ceeef6273a104:

  riscv: Make __fstate_clean() work correctly. (2019-08-14 13:20:46 -0700)


RISC-V updates for v5.3-rc5

These updates include:

- Two patches to fix significant bugs in floating point register
  context handling

- A minor fix in RISC-V flush_tlb_page(), to supply a valid end
  address to flush_tlb_range()

- Two minor defconfig additions: to build the virtio hwrng driver by
  default (for QEMU targets), and to partially synchronize the 32-bit
  defconfig with the 64-bit defconfig


Alistair Francis (2):
  riscv: rv32_defconfig: Update the defconfig
  riscv: defconfig: Update the defconfig

Paul Walmsley (1):
  riscv: fix flush_tlb_range() end address for flush_tlb_page()

Vincent Chen (2):
  riscv: Correct the initialized flow of FP register
  riscv: Make __fstate_clean() work correctly.

 arch/riscv/configs/defconfig   |  2 ++
 arch/riscv/configs/rv32_defconfig  |  3 +++
 arch/riscv/include/asm/switch_to.h |  8 +++-
 arch/riscv/include/asm/tlbflush.h  | 11 +--
 arch/riscv/kernel/process.c| 11 +--
 5 files changed, 30 insertions(+), 5 deletions(-)


Re: Kernel 5.2.8 - au0828 - Tuner Is Busy

2019-08-16 Thread Nathan Royce
On Fri, Aug 16, 2019 at 1:42 PM Greg Kroah-Hartman
 wrote:
> If you revert that one commit, does things start working again?
>
> thanks,
>
> greg k-h
Hey Greg, I just got finished building it after running "$ git revert
812658d88d26" and verifying it reverted by comparing one of the files
from git log -p, but alas, no joy.

On Fri, Aug 16, 2019 at 5:41 PM Brad Love  wrote:
>
> Hi Nathan,
>
> I don't have a "woodbury", but I have a Hauppauge 950Q sitting around
> and tested it on latest mainline kernel. w_scan is ok and streaming is
> fine. There's no unexpected errors. The 950Q uses the same au0828 bridge
> and au8522 demod as woodbury, but a different tuner. Your problem
> wouldn't appear to be a general au0828 issue.
>
> You might have to check out git bisect. That will be the quickest way to
> get to the bottom, if you've got points A and B, and are
> building/running your own kernel.
>
> Cheers,
>
> Brad
Thanks Brad, I'll explore bisecting and hopefully will be able to
narrow down the cause.
I wasn't running my own kernel, but rather using the Arch Linux kernel
and modding the one module and putting it in "extramodules".


Re: [rcu:from-joel.2019.08.16a 143/172] kernel/rcu/tree.c:2808:6: note: in expansion of macro 'xchg'

2019-08-16 Thread Joel Fernandes
Resending with more folks added.

On Sat, Aug 17, 2019 at 05:10:59AM +0800, kbuild test robot wrote:
> tree:   
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
>  from-joel.2019.08.16a
> head:   01b0e4d3e0ac279b295bc06a3591f0b810b9908f
> commit: bda80ba9decc7a32413e88d2f070de180c4b76ab [143/172] rcu/tree: Add 
> basic support for kfree_rcu() batching
> config: riscv-defconfig (attached as .config)
> compiler: riscv64-linux-gcc (GCC) 7.4.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> git checkout bda80ba9decc7a32413e88d2f070de180c4b76ab
> # save the attached .config to linux build tree
> GCC_VERSION=7.4.0 make.cross ARCH=riscv 

This seems to be a riscv issue:

A call to '__compiletime_assert_2792' declared with attribute error:
BUILD_BUG failed

Could riscv folks take a look at it? Why is using xchg() causing issues? The
xchg() is being done on a bool.

thanks,

 - Joel


> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot 
> 
> All error/warnings (new ones prefixed by >>):
> 
>In file included from arch/riscv/include/asm/atomic.h:19:0,
> from include/linux/atomic.h:7,
> from include/linux/spinlock.h:445,
> from kernel/rcu/tree.c:23:
>kernel/rcu/tree.c: In function 'kfree_rcu_monitor':
> >> arch/riscv/include/asm/cmpxchg.h:140:2: warning: '__ret' is used 
> >> uninitialized in this function [-Wuninitialized]
>  __ret;\
>  ^
>arch/riscv/include/asm/cmpxchg.h:121:21: note: '__ret' was declared here
>  __typeof__(*(ptr)) __ret; \
> ^
> >> arch/riscv/include/asm/cmpxchg.h:146:23: note: in expansion of macro 
> >> '__xchg'
>  (__typeof__(*(ptr))) __xchg((ptr), _x_, sizeof(*(ptr))); \
>   ^~
> >> kernel/rcu/tree.c:2808:6: note: in expansion of macro 'xchg'
>  if (xchg(>monitor_todo, false))
>  ^~~~
>In file included from include/linux/kernel.h:11:0,
> from kernel/rcu/tree.c:21:
> >> include/linux/compiler.h:350:38: error: call to 
> >> '__compiletime_assert_2808' declared with attribute error: BUILD_BUG failed
>  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>  ^
>include/linux/compiler.h:331:4: note: in definition of macro 
> '__compiletime_assert'
>prefix ## suffix();\
>^~
>include/linux/compiler.h:350:2: note: in expansion of macro 
> '_compiletime_assert'
>  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>  ^~~
>include/linux/build_bug.h:39:37: note: in expansion of macro 
> 'compiletime_assert'
> #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
> ^~
>include/linux/build_bug.h:59:21: note: in expansion of macro 
> 'BUILD_BUG_ON_MSG'
> #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
> ^~~~
> >> arch/riscv/include/asm/cmpxchg.h:138:3: note: in expansion of macro 
> >> 'BUILD_BUG'
>   BUILD_BUG();  \
>   ^
> >> arch/riscv/include/asm/cmpxchg.h:146:23: note: in expansion of macro 
> >> '__xchg'
>  (__typeof__(*(ptr))) __xchg((ptr), _x_, sizeof(*(ptr))); \
>   ^~
> >> kernel/rcu/tree.c:2808:6: note: in expansion of macro 'xchg'
>  if (xchg(>monitor_todo, false))
>  ^~~~
>In function 'kfree_rcu_drain_unlock',
>inlined from 'kfree_rcu_monitor' at kernel/rcu/tree.c:2809:3:
>include/linux/compiler.h:350:38: error: call to 
> '__compiletime_assert_2792' declared with attribute error: BUILD_BUG failed
>  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>  ^
>include/linux/compiler.h:331:4: note: in definition of macro 
> '__compiletime_assert'
>prefix ## suffix();\
>^~
>include/linux/compiler.h:350:2: note: in expansion of macro 
> '_compiletime_assert'
>  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>  ^~~
>include/linux/build_bug.h:39:37: note: in expansion of macro 
> 'compiletime_assert'
> #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
> ^~
>include/linux/build_bug.h:59:21: note: in expansion of macro 
> 'BUILD_BUG_ON_MSG'
> #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
> ^~~~
> >> arch/riscv/include/asm/cmpxchg.h:138:3: note: in expansion of macro 
> >> 'BUILD_BUG'
>   BUILD_BUG();  \
>   ^
> >> arch/riscv/include/asm/cmpxchg.h:146:23: note: in expansion of 

Re: [rcu:from-joel.2019.08.16a 143/172] kernel/rcu/tree.c:2808:6: note: in expansion of macro 'xchg'

2019-08-16 Thread Joel Fernandes
On Sat, Aug 17, 2019 at 05:10:59AM +0800, kbuild test robot wrote:
> tree:   
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
>  from-joel.2019.08.16a
> head:   01b0e4d3e0ac279b295bc06a3591f0b810b9908f
> commit: bda80ba9decc7a32413e88d2f070de180c4b76ab [143/172] rcu/tree: Add 
> basic support for kfree_rcu() batching
> config: riscv-defconfig (attached as .config)
> compiler: riscv64-linux-gcc (GCC) 7.4.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> git checkout bda80ba9decc7a32413e88d2f070de180c4b76ab
> # save the attached .config to linux build tree
> GCC_VERSION=7.4.0 make.cross ARCH=riscv 

This seems to be a riscv issue:

A call to '__compiletime_assert_2792' declared with attribute error:
BUILD_BUG failed

Could riscv folks take a look at it? Why is using xchg() causing issues? The
xchg() is being done on a bool.

thanks,

 - Joel


> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot 
> 
> All error/warnings (new ones prefixed by >>):
> 
>In file included from arch/riscv/include/asm/atomic.h:19:0,
> from include/linux/atomic.h:7,
> from include/linux/spinlock.h:445,
> from kernel/rcu/tree.c:23:
>kernel/rcu/tree.c: In function 'kfree_rcu_monitor':
> >> arch/riscv/include/asm/cmpxchg.h:140:2: warning: '__ret' is used 
> >> uninitialized in this function [-Wuninitialized]
>  __ret;\
>  ^
>arch/riscv/include/asm/cmpxchg.h:121:21: note: '__ret' was declared here
>  __typeof__(*(ptr)) __ret; \
> ^
> >> arch/riscv/include/asm/cmpxchg.h:146:23: note: in expansion of macro 
> >> '__xchg'
>  (__typeof__(*(ptr))) __xchg((ptr), _x_, sizeof(*(ptr))); \
>   ^~
> >> kernel/rcu/tree.c:2808:6: note: in expansion of macro 'xchg'
>  if (xchg(>monitor_todo, false))
>  ^~~~
>In file included from include/linux/kernel.h:11:0,
> from kernel/rcu/tree.c:21:
> >> include/linux/compiler.h:350:38: error: call to 
> >> '__compiletime_assert_2808' declared with attribute error: BUILD_BUG failed
>  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>  ^
>include/linux/compiler.h:331:4: note: in definition of macro 
> '__compiletime_assert'
>prefix ## suffix();\
>^~
>include/linux/compiler.h:350:2: note: in expansion of macro 
> '_compiletime_assert'
>  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>  ^~~
>include/linux/build_bug.h:39:37: note: in expansion of macro 
> 'compiletime_assert'
> #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
> ^~
>include/linux/build_bug.h:59:21: note: in expansion of macro 
> 'BUILD_BUG_ON_MSG'
> #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
> ^~~~
> >> arch/riscv/include/asm/cmpxchg.h:138:3: note: in expansion of macro 
> >> 'BUILD_BUG'
>   BUILD_BUG();  \
>   ^
> >> arch/riscv/include/asm/cmpxchg.h:146:23: note: in expansion of macro 
> >> '__xchg'
>  (__typeof__(*(ptr))) __xchg((ptr), _x_, sizeof(*(ptr))); \
>   ^~
> >> kernel/rcu/tree.c:2808:6: note: in expansion of macro 'xchg'
>  if (xchg(>monitor_todo, false))
>  ^~~~
>In function 'kfree_rcu_drain_unlock',
>inlined from 'kfree_rcu_monitor' at kernel/rcu/tree.c:2809:3:
>include/linux/compiler.h:350:38: error: call to 
> '__compiletime_assert_2792' declared with attribute error: BUILD_BUG failed
>  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>  ^
>include/linux/compiler.h:331:4: note: in definition of macro 
> '__compiletime_assert'
>prefix ## suffix();\
>^~
>include/linux/compiler.h:350:2: note: in expansion of macro 
> '_compiletime_assert'
>  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>  ^~~
>include/linux/build_bug.h:39:37: note: in expansion of macro 
> 'compiletime_assert'
> #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
> ^~
>include/linux/build_bug.h:59:21: note: in expansion of macro 
> 'BUILD_BUG_ON_MSG'
> #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
> ^~~~
> >> arch/riscv/include/asm/cmpxchg.h:138:3: note: in expansion of macro 
> >> 'BUILD_BUG'
>   BUILD_BUG();  \
>   ^
> >> arch/riscv/include/asm/cmpxchg.h:146:23: note: in expansion of macro 
> >> '__xchg'
>  

[PATCH] Partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"

2019-08-16 Thread Roman Gushchin
Commit 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync
with the hierarchical ones") effectively decreased the precision of
per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters.

That's good for displaying in memory.stat, but brings a serious regression
into the reclaim process.

One issue I've discovered and debugged is the following:
lruvec_lru_size() can return 0 instead of the actual number of pages
in the lru list, preventing the kernel to reclaim last remaining
pages. Result is yet another dying memory cgroups flooding.
The opposite is also happening: scanning an empty lru list
is the waste of cpu time.

Also, inactive_list_is_low() can return incorrect values, preventing
the active lru from being scanned and freed. It can fail both because
the size of active and inactive lists are inaccurate, and because
the number of workingset refaults isn't precise. In other words,
the result is pretty random.

I'm not sure, if using the approximate number of slab pages in
count_shadow_number() is acceptable, but issues described above
are enough to partially revert the patch.

Let's keep per-memcg vmstat_local batched (they are only used for
displaying stats to the userspace), but keep lruvec stats precise.
This change fixes the dead memcg flooding on my setup.

Fixes: 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with the 
hierarchical ones")
Signed-off-by: Roman Gushchin 
Cc: Yafang Shao 
Cc: Johannes Weiner 
---
 mm/memcontrol.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 249187907339..3429340adb56 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -746,15 +746,13 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum 
node_stat_item idx,
/* Update memcg */
__mod_memcg_state(memcg, idx, val);
 
+   /* Update lruvec */
+   __this_cpu_add(pn->lruvec_stat_local->count[idx], val);
+
x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]);
if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
struct mem_cgroup_per_node *pi;
 
-   /*
-* Batch local counters to keep them in sync with
-* the hierarchical ones.
-*/
-   __this_cpu_add(pn->lruvec_stat_local->count[idx], x);
for (pi = pn; pi; pi = parent_nodeinfo(pi, pgdat->node_id))
atomic_long_add(x, >lruvec_stat[idx]);
x = 0;
-- 
2.21.0



Re: [PULL 0/1] Xtensa fix for v5.3

2019-08-16 Thread pr-tracker-bot
The pull request you sent on Fri, 16 Aug 2019 17:23:49 -0700:

> git://github.com/jcmvbkbc/linux-xtensa.git tags/xtensa-20190816

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/6e625a1a3f471d63989d3a66cdf6a0c307654848

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


[PULL 0/1] Xtensa fix for v5.3

2019-08-16 Thread Max Filippov
Hi Linus,

please pull another small fix for the Xtensa architecture.

The following changes since commit d45331b00ddb179e291766617259261c112db872:

  Linux 5.3-rc4 (2019-08-11 13:26:41 -0700)

are available in the git repository at:

  git://github.com/jcmvbkbc/linux-xtensa.git tags/xtensa-20190816

for you to fetch changes up to cd8869f4cb257f22b89495ca40f5281e58ba359c:

  xtensa: add missing isync to the cpu_reset TLB code (2019-08-12 15:05:48 
-0700)


Xtensa fixes for v5.3:

- add missing isync into cpu_reset to make sure ITLB changes are
  effective.


Max Filippov (1):
  xtensa: add missing isync to the cpu_reset TLB code

 arch/xtensa/kernel/setup.c | 1 +
 1 file changed, 1 insertion(+)

-- 
Thanks.
-- Max


[PATCH v6 5/8] PCI/ATS: Add PRI support for PCIe VF devices

2019-08-16 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

When IOMMU tries to enable Page Request Interface (PRI) for VF device
in iommu_enable_dev_iotlb(), it always fails because PRI support for
PCIe VF device is currently broken. Current implementation expects
the given PCIe device (PF & VF) to implement PRI capability before
enabling the PRI support. But this assumption is incorrect. As per PCIe
spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
PRI of the PF and not implement it. Hence we need to create exception
for handling the PRI support for PCIe VF device.

Also, since PRI is a shared resource between PF/VF, following rules
should apply.

1. Use proper locking before accessing/modifying PF resources in VF
   PRI enable/disable call.
2. Use reference count logic to track the usage of PRI resource.
3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.

Cc: Ashok Raj 
Cc: Keith Busch 
Suggested-by: Ashok Raj 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 drivers/pci/ats.c   | 121 ++--
 include/linux/pci.h |   1 +
 2 files changed, 95 insertions(+), 27 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 022698a85c98..e71187d83401 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -21,6 +21,15 @@ static void pci_pri_init(struct pci_dev *pdev)
 #ifdef CONFIG_PCI_PRI
int pos;
 
+   /*
+* As per PCIe r4.0, sec 9.3.7.11, only PF is permitted to
+* implement PRI and all associated VFs can only use it.
+* Since PF already initialized the PRI parameters there is
+* no need to proceed further.
+*/
+   if (pdev->is_virtfn)
+   return;
+
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
if (!pos)
return;
@@ -208,31 +217,55 @@ EXPORT_SYMBOL_GPL(pci_ats_page_aligned);
  */
 int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
 {
-   u16 control, status;
+   u16 status;
u32 max_requests;
+   int ret = 0;
+   struct pci_dev *pf = pci_physfn(pdev);
 
if (WARN_ON(pdev->pri_enabled))
return -EBUSY;
 
-   if (!pdev->pri_cap)
+   if (!pf->pri_cap)
return -EINVAL;
 
-   pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, );
-   if (!(status & PCI_PRI_STATUS_STOPPED))
-   return -EBUSY;
+   pci_physfn_reslock(pdev);
+
+   if (pdev->is_virtfn && pf->pri_enabled)
+   goto update_status;
 
-   pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,
- _requests);
+   /*
+* Before updating PRI registers, make sure there is no
+* outstanding PRI requests.
+*/
+   pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, );
+   if (!(status & PCI_PRI_STATUS_STOPPED)) {
+   ret = -EBUSY;
+   goto done;
+   }
+
+   pci_read_config_dword(pf, pf->pri_cap + PCI_PRI_MAX_REQ, _requests);
reqs = min(max_requests, reqs);
-   pdev->pri_reqs_alloc = reqs;
-   pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
+   pf->pri_reqs_alloc = reqs;
+   pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
 
-   control = PCI_PRI_CTRL_ENABLE;
-   pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
+   pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL,
+ PCI_PRI_CTRL_ENABLE);
 
-   pdev->pri_enabled = 1;
+   /*
+* If PRI is not already enabled in PF, increment the PF
+* pri_ref_cnt to track the usage of PRI interface.
+*/
+   if (pdev->is_virtfn && !pf->pri_enabled) {
+   atomic_inc(>pri_ref_cnt);
+   pf->pri_enabled = 1;
+   }
 
-   return 0;
+update_status:
+   atomic_inc(>pri_ref_cnt);
+   pdev->pri_enabled = 1;
+done:
+   pci_physfn_resunlock(pdev);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(pci_enable_pri);
 
@@ -245,18 +278,32 @@ EXPORT_SYMBOL_GPL(pci_enable_pri);
 void pci_disable_pri(struct pci_dev *pdev)
 {
u16 control;
+   struct pci_dev *pf = pci_physfn(pdev);
 
if (WARN_ON(!pdev->pri_enabled))
return;
 
-   if (!pdev->pri_cap)
+   if (!pf->pri_cap)
return;
 
-   pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, );
+   pci_physfn_reslock(pdev);
+
+   atomic_dec(>pri_ref_cnt);
+
+   /*
+* If pri_ref_cnt is not zero, then don't modify hardware
+* registers.
+*/
+   if (atomic_read(>pri_ref_cnt))
+   goto done;
+
+   pci_read_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, );
control &= ~PCI_PRI_CTRL_ENABLE;
-   pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
+   pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
 
+done:
pdev->pri_enabled = 0;
+   

[PATCH v6 3/8] PCI/ATS: Cache PASID capability check result

2019-08-16 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

Currently, PASID capability checks are repeated across all PASID API's.
Instead, cache the capability check result in pci_pasid_init() and use
it in other PASID API's.

Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 drivers/pci/ats.c   | 50 ++---
 include/linux/pci.h |  1 +
 2 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index b863562b6702..022698a85c98 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -29,6 +29,19 @@ static void pci_pri_init(struct pci_dev *pdev)
 #endif
 }
 
+static void pci_pasid_init(struct pci_dev *pdev)
+{
+#ifdef CONFIG_PCI_PASID
+   int pos;
+
+   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
+   if (!pos)
+   return;
+
+   pdev->pasid_cap = pos;
+#endif
+}
+
 void pci_ats_init(struct pci_dev *dev)
 {
int pos;
@@ -43,6 +56,8 @@ void pci_ats_init(struct pci_dev *dev)
dev->ats_cap = pos;
 
pci_pri_init(dev);
+
+   pci_pasid_init(dev);
 }
 
 /**
@@ -303,7 +318,6 @@ EXPORT_SYMBOL_GPL(pci_reset_pri);
 int pci_enable_pasid(struct pci_dev *pdev, int features)
 {
u16 control, supported;
-   int pos;
 
if (WARN_ON(pdev->pasid_enabled))
return -EBUSY;
@@ -311,11 +325,11 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
if (!pdev->eetlp_prefix_path)
return -EINVAL;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
-   if (!pos)
+   if (!pdev->pasid_cap)
return -EINVAL;
 
-   pci_read_config_word(pdev, pos + PCI_PASID_CAP, );
+   pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
+);
supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
 
/* User wants to enable anything unsupported? */
@@ -325,7 +339,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
control = PCI_PASID_CTRL_ENABLE | features;
pdev->pasid_features = features;
 
-   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
+   pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
 
pdev->pasid_enabled = 1;
 
@@ -340,16 +354,14 @@ EXPORT_SYMBOL_GPL(pci_enable_pasid);
 void pci_disable_pasid(struct pci_dev *pdev)
 {
u16 control = 0;
-   int pos;
 
if (WARN_ON(!pdev->pasid_enabled))
return;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
-   if (!pos)
+   if (!pdev->pasid_cap)
return;
 
-   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
+   pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
 
pdev->pasid_enabled = 0;
 }
@@ -362,17 +374,15 @@ EXPORT_SYMBOL_GPL(pci_disable_pasid);
 void pci_restore_pasid_state(struct pci_dev *pdev)
 {
u16 control;
-   int pos;
 
if (!pdev->pasid_enabled)
return;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
-   if (!pos)
+   if (!pdev->pasid_cap)
return;
 
control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features;
-   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
+   pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
 }
 EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
 
@@ -389,13 +399,12 @@ EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
 int pci_pasid_features(struct pci_dev *pdev)
 {
u16 supported;
-   int pos;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
-   if (!pos)
+   if (!pdev->pasid_cap)
return -EINVAL;
 
-   pci_read_config_word(pdev, pos + PCI_PASID_CAP, );
+   pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
+);
 
supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
 
@@ -445,13 +454,12 @@ EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
 int pci_max_pasids(struct pci_dev *pdev)
 {
u16 supported;
-   int pos;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
-   if (!pos)
+   if (!pdev->pasid_cap)
return -EINVAL;
 
-   pci_read_config_word(pdev, pos + PCI_PASID_CAP, );
+   pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
+);
 
supported = (supported & PASID_NUMBER_MASK) >> PASID_NUMBER_SHIFT;
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 56b55db099fc..27224c0db849 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -459,6 +459,7 @@ struct pci_dev {
u32 pri_reqs_alloc; /* Number of PRI requests allocated */
 #endif
 #ifdef CONFIG_PCI_PASID
+   u16 pasid_cap;  /* PASID Capability offset */
u16 pasid_features;
 #endif
 #ifdef CONFIG_PCI_P2PDMA
-- 
2.21.0



[PATCH v6 4/8] PCI/IOV: Add pci_physfn_reslock/unlock() interfaces

2019-08-16 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

As per PCIe spec r5.0, sec 9.3.7, in SR-IOV devices, capabilities like
PASID, PRI, VC, etc are shared between PF and its associated VFs. So, to
prevent race conditions between PF/VF while updating configuration
registers of these shared capabilities, a new synchronization mechanism
is required.

As a first step, create shared resource lock and expose expose
pci_physfn_reslock/unlock() API's. Users of these shared capabilities can
use these lock/unlock interfaces to synchronize its access.

Since the shared capability is always implemented by PF, reslock mutex
has been added to pci_sriov structure which only exists for PF.

NOTE: Currently this reslock is common for all shared capabilities
between PF/VF. In future, if any performance impact has been noticed, we
should create individual locks for each of the shared capability.

Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 drivers/pci/iov.c |  1 +
 drivers/pci/pci.h | 40 
 2 files changed, 41 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 525fd3f272b3..004e7076b065 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -507,6 +507,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
else
iov->dev = dev;
 
+   mutex_init(>reslock);
dev->sriov = iov;
dev->is_physfn = 1;
rc = compute_max_vf_buses(dev);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d22d1b807701..c7fa09f3389a 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -304,6 +304,19 @@ struct pci_sriov {
u16 subsystem_device; /* VF subsystem device */
resource_size_t barsz[PCI_SRIOV_NUM_BARS];  /* VF BAR size */
booldrivers_autoprobe; /* Auto probing of VFs by driver */
+   /*
+* reslock mutex is used for synchronizing updates to resources
+* shared between PF and all associated VFs. For example, in
+* SRIOV devices, PRI and PASID interfaces are shared between
+* PF an all VFs, and hence we need proper locking mechanism to
+* prevent both PF and VF update the PRI or PASID configuration
+* registers at the same time.
+* NOTE: Currently, this lock is shared by all capabilities that
+* has shared resource between PF and VFs. If there is any performance
+* impact then perhaps we need to create separate lock for each of
+* the independent capability that has shared resources.
+*/
+   struct mutexreslock;/* PF/VF shared resource lock */
 };
 
 /**
@@ -433,6 +446,27 @@ void pci_iov_update_resource(struct pci_dev *dev, int 
resno);
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
 void pci_restore_iov_state(struct pci_dev *dev);
 int pci_iov_bus_range(struct pci_bus *bus);
+static inline void pci_physfn_reslock(struct pci_dev *dev)
+{
+   struct pci_dev *pf = pci_physfn(dev);
+
+   /* For non SRIOV devices, locking is not needed */
+   if (!pf->is_physfn)
+   return;
+
+   mutex_lock(>sriov->reslock);
+}
+
+static inline void pci_physfn_resunlock(struct pci_dev *dev)
+{
+   struct pci_dev *pf = pci_physfn(dev);
+
+   /* For non SRIOV devices, reslock is never held */
+   if (!pf->is_physfn)
+   return;
+
+   mutex_unlock(>sriov->reslock);
+}
 
 #else
 static inline int pci_iov_init(struct pci_dev *dev)
@@ -453,6 +487,12 @@ static inline int pci_iov_bus_range(struct pci_bus *bus)
 {
return 0;
 }
+static inline void pci_physfn_reslock(struct pci_dev *dev)
+{
+}
+static inline void pci_physfn_resunlock(struct pci_dev *dev)
+{
+}
 
 #endif /* CONFIG_PCI_IOV */
 
-- 
2.21.0



[PATCH v6 8/8] PCI: Skip Enhanced Allocation (EA) initialization for VF device

2019-08-16 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

As per PCIe r4.0, sec 9.3.6, VF must not implement Enhanced Allocation
Capability. So skip pci_ea_init() for virtual devices.

Cc: Ashok Raj 
Cc: Keith Busch 
Suggested-by: Ashok Raj 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 drivers/pci/pci.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1b27b5af3d55..266600a11769 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3025,6 +3025,13 @@ void pci_ea_init(struct pci_dev *dev)
int offset;
int i;
 
+   /*
+* Per PCIe r4.0, sec 9.3.6, VF must not implement Enhanced
+* Allocation Capability.
+*/
+   if (dev->is_virtfn)
+   return;
+
/* find PCI EA capability in list */
ea = pci_find_capability(dev, PCI_CAP_ID_EA);
if (!ea)
-- 
2.21.0



[PATCH v6 7/8] PCI/ATS: Disable PF/VF ATS service independently

2019-08-16 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

Currently all VF's needs to be disable their ATS service before
disabling the ATS service in corresponding PF device. But this logic is
incorrect and does not align with the spec. Also it might lead to
some power and performance impact in the system. As per PCIe spec r4.0,
sec 9.3.7.8, ATS Capabilities in VFs and their associated PFs may be
enabled/disabled independently. So remove this dependency logic in
enable/disable code.

Cc: Ashok Raj 
Cc: Keith Busch 
Suggested-by: Ashok Raj 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 drivers/pci/ats.c   | 11 ---
 include/linux/pci.h |  1 -
 2 files changed, 12 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index ca633482e565..3a3e81630d7c 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -108,8 +108,6 @@ int pci_enable_ats(struct pci_dev *dev, int ps)
pdev = pci_physfn(dev);
if (pdev->ats_stu != ps)
return -EINVAL;
-
-   atomic_inc(>ats_ref_cnt);  /* count enabled VFs */
} else {
dev->ats_stu = ps;
ctrl |= PCI_ATS_CTRL_STU(dev->ats_stu - PCI_ATS_MIN_STU);
@@ -127,20 +125,11 @@ EXPORT_SYMBOL_GPL(pci_enable_ats);
  */
 void pci_disable_ats(struct pci_dev *dev)
 {
-   struct pci_dev *pdev;
u16 ctrl;
 
if (WARN_ON(!dev->ats_enabled))
return;
 
-   if (atomic_read(>ats_ref_cnt))
-   return; /* VFs still enabled */
-
-   if (dev->is_virtfn) {
-   pdev = pci_physfn(dev);
-   atomic_dec(>ats_ref_cnt);
-   }
-
pci_read_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, );
ctrl &= ~PCI_ATS_CTRL_ENABLE;
pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 735dc731e0aa..5c72590df0bf 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -452,7 +452,6 @@ struct pci_dev {
};
u16 ats_cap;/* ATS Capability offset */
u8  ats_stu;/* ATS Smallest Translation Unit */
-   atomic_tats_ref_cnt;/* Number of VFs with ATS enabled */
 #endif
 #ifdef CONFIG_PCI_PRI
u16 pri_cap;/* PRI Capability offset */
-- 
2.21.0



[PATCH v6 2/8] PCI/ATS: Cache PRI capability check result

2019-08-16 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

Currently, PRI capability checks are repeated across all PRI API's.
Instead, cache the capability check result in pci_pri_init() and use it
in other PRI API's.

Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 drivers/pci/ats.c   | 56 +
 include/linux/pci.h |  1 +
 2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index cdd936d10f68..b863562b6702 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -16,6 +16,19 @@
 
 #include "pci.h"
 
+static void pci_pri_init(struct pci_dev *pdev)
+{
+#ifdef CONFIG_PCI_PRI
+   int pos;
+
+   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+   if (!pos)
+   return;
+
+   pdev->pri_cap = pos;
+#endif
+}
+
 void pci_ats_init(struct pci_dev *dev)
 {
int pos;
@@ -28,6 +41,8 @@ void pci_ats_init(struct pci_dev *dev)
return;
 
dev->ats_cap = pos;
+
+   pci_pri_init(dev);
 }
 
 /**
@@ -180,26 +195,25 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
 {
u16 control, status;
u32 max_requests;
-   int pos;
 
if (WARN_ON(pdev->pri_enabled))
return -EBUSY;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-   if (!pos)
+   if (!pdev->pri_cap)
return -EINVAL;
 
-   pci_read_config_word(pdev, pos + PCI_PRI_STATUS, );
+   pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, );
if (!(status & PCI_PRI_STATUS_STOPPED))
return -EBUSY;
 
-   pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, _requests);
+   pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,
+ _requests);
reqs = min(max_requests, reqs);
pdev->pri_reqs_alloc = reqs;
-   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
+   pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
 
control = PCI_PRI_CTRL_ENABLE;
-   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
+   pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
 
pdev->pri_enabled = 1;
 
@@ -216,18 +230,16 @@ EXPORT_SYMBOL_GPL(pci_enable_pri);
 void pci_disable_pri(struct pci_dev *pdev)
 {
u16 control;
-   int pos;
 
if (WARN_ON(!pdev->pri_enabled))
return;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-   if (!pos)
+   if (!pdev->pri_cap)
return;
 
-   pci_read_config_word(pdev, pos + PCI_PRI_CTRL, );
+   pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, );
control &= ~PCI_PRI_CTRL_ENABLE;
-   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
+   pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
 
pdev->pri_enabled = 0;
 }
@@ -241,17 +253,15 @@ void pci_restore_pri_state(struct pci_dev *pdev)
 {
u16 control = PCI_PRI_CTRL_ENABLE;
u32 reqs = pdev->pri_reqs_alloc;
-   int pos;
 
if (!pdev->pri_enabled)
return;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-   if (!pos)
+   if (!pdev->pri_cap)
return;
 
-   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
-   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
+   pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
+   pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
 }
 EXPORT_SYMBOL_GPL(pci_restore_pri_state);
 
@@ -265,17 +275,15 @@ EXPORT_SYMBOL_GPL(pci_restore_pri_state);
 int pci_reset_pri(struct pci_dev *pdev)
 {
u16 control;
-   int pos;
 
if (WARN_ON(pdev->pri_enabled))
return -EBUSY;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-   if (!pos)
+   if (!pdev->pri_cap)
return -EINVAL;
 
control = PCI_PRI_CTRL_RESET;
-   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
+   pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
 
return 0;
 }
@@ -410,13 +418,11 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
 int pci_prg_resp_pasid_required(struct pci_dev *pdev)
 {
u16 status;
-   int pos;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-   if (!pos)
+   if (!pdev->pri_cap)
return 0;
 
-   pci_read_config_word(pdev, pos + PCI_PRI_STATUS, );
+   pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, );
 
if (status & PCI_PRI_STATUS_PASID)
return 1;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 9e700d9f9f28..56b55db099fc 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -455,6 +455,7 @@ struct pci_dev {
atomic_tats_ref_cnt;/* Number of VFs with ATS enabled */
 #endif
 #ifdef CONFIG_PCI_PRI
+ 

[PATCH v6 6/8] PCI/ATS: Add PASID support for PCIe VF devices

2019-08-16 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

When IOMMU tries to enable PASID for VF device in
iommu_enable_dev_iotlb(), it always fails because PASID support for PCIe
VF device is currently broken in PCIE driver. Current implementation
expects the given PCIe device (PF & VF) to implement PASID capability
before enabling the PASID support. But this assumption is incorrect. As
per PCIe spec r4.0, sec 9.3.7.14, all VFs associated with PF can only
use the PASID of the PF and not implement it.

Also, since PASID is a shared resource between PF/VF, following rules
should apply.

1. Use proper locking before accessing/modifying PF resources in VF
   PASID enable/disable call.
2. Use reference count logic to track the usage of PASID resource.
3. Disable PASID only if the PASID reference count (pasid_ref_cnt) is zero.

Cc: Ashok Raj 
Cc: Keith Busch 
Suggested-by: Ashok Raj 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 drivers/pci/ats.c   | 88 +++--
 include/linux/pci.h |  1 +
 2 files changed, 70 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index e71187d83401..ca633482e565 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -43,6 +43,15 @@ static void pci_pasid_init(struct pci_dev *pdev)
 #ifdef CONFIG_PCI_PASID
int pos;
 
+   /*
+* As per PCIe r4.0, sec 9.3.7.14, only PF is permitted to
+* implement PASID Capability and all associated VFs can
+* only use it. Since PF already initialized the PASID
+* parameters there is no need to proceed further.
+*/
+   if (pdev->is_virtfn)
+   return;
+
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
if (!pos)
return;
@@ -384,6 +393,8 @@ EXPORT_SYMBOL_GPL(pci_reset_pri);
 int pci_enable_pasid(struct pci_dev *pdev, int features)
 {
u16 control, supported;
+   int ret = 0;
+   struct pci_dev *pf = pci_physfn(pdev);
 
if (WARN_ON(pdev->pasid_enabled))
return -EBUSY;
@@ -391,25 +402,42 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
if (!pdev->eetlp_prefix_path)
return -EINVAL;
 
-   if (!pdev->pasid_cap)
+   if (!pf->pasid_cap)
return -EINVAL;
 
-   pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
-);
+   pci_physfn_reslock(pdev);
+
+   if (pdev->is_virtfn && pf->pasid_enabled)
+   goto update_status;
+
+   pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP, );
supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
 
/* User wants to enable anything unsupported? */
-   if ((supported & features) != features)
-   return -EINVAL;
+   if ((supported & features) != features) {
+   ret = -EINVAL;
+   goto done;
+   }
 
control = PCI_PASID_CTRL_ENABLE | features;
-   pdev->pasid_features = features;
-
+   pf->pasid_features = features;
pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
 
-   pdev->pasid_enabled = 1;
+   /*
+* If PASID is not already enabled in PF, increment pasid_ref_cnt
+* to count PF PASID usage.
+*/
+   if (pdev->is_virtfn && !pf->pasid_enabled) {
+   atomic_inc(>pasid_ref_cnt);
+   pf->pasid_enabled = 1;
+   }
 
-   return 0;
+update_status:
+   atomic_inc(>pasid_ref_cnt);
+   pdev->pasid_enabled = 1;
+done:
+   pci_physfn_resunlock(pdev);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(pci_enable_pasid);
 
@@ -420,16 +448,28 @@ EXPORT_SYMBOL_GPL(pci_enable_pasid);
 void pci_disable_pasid(struct pci_dev *pdev)
 {
u16 control = 0;
+   struct pci_dev *pf = pci_physfn(pdev);
 
if (WARN_ON(!pdev->pasid_enabled))
return;
 
-   if (!pdev->pasid_cap)
+   if (!pf->pasid_cap)
return;
 
-   pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
+   pci_physfn_reslock(pdev);
+
+   atomic_dec(>pasid_ref_cnt);
 
+   if (atomic_read(>pasid_ref_cnt))
+   goto done;
+
+   /* Disable PASID only if pasid_ref_cnt is zero */
+   pci_write_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, control);
+
+done:
pdev->pasid_enabled = 0;
+   pci_physfn_resunlock(pdev);
+
 }
 EXPORT_SYMBOL_GPL(pci_disable_pasid);
 
@@ -440,15 +480,25 @@ EXPORT_SYMBOL_GPL(pci_disable_pasid);
 void pci_restore_pasid_state(struct pci_dev *pdev)
 {
u16 control;
+   struct pci_dev *pf = pci_physfn(pdev);
 
if (!pdev->pasid_enabled)
return;
 
-   if (!pdev->pasid_cap)
+   if (!pf->pasid_cap)
return;
 
+   pci_physfn_reslock(pdev);
+
+   pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, );
+   if (control & PCI_PASID_CTRL_ENABLE)
+   goto done;
+
control = 

[PATCH v6 1/8] PCI/ATS: Fix pci_prg_resp_pasid_required() dependency issues

2019-08-16 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

Since pci_prg_resp_pasid_required() function has dependency on both
PASID and PRI, define it only if both CONFIG_PCI_PRI and
CONFIG_PCI_PASID config options are enabled.

Fixes: e5567f5f6762 ("PCI/ATS: Add pci_prg_resp_pasid_required()
interface.")
Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 drivers/pci/ats.c   | 10 ++
 include/linux/pci-ats.h | 12 +---
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index e18499243f84..cdd936d10f68 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -395,6 +395,8 @@ int pci_pasid_features(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_pasid_features);
 
+#ifdef CONFIG_PCI_PRI
+
 /**
  * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
  *  status.
@@ -402,10 +404,8 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
  *
  * Returns 1 if PASID is required in PRG Response Message, 0 otherwise.
  *
- * Even though the PRG response PASID status is read from PRI Status
- * Register, since this API will mainly be used by PASID users, this
- * function is defined within #ifdef CONFIG_PCI_PASID instead of
- * CONFIG_PCI_PRI.
+ * Since this API has dependency on both PRI and PASID, protect it
+ * with both CONFIG_PCI_PRI and CONFIG_PCI_PASID.
  */
 int pci_prg_resp_pasid_required(struct pci_dev *pdev)
 {
@@ -425,6 +425,8 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
 
+#endif
+
 #define PASID_NUMBER_SHIFT 8
 #define PASID_NUMBER_MASK  (0x1f << PASID_NUMBER_SHIFT)
 /**
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 1ebb88e7c184..1a0bdaee2f32 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -40,7 +40,6 @@ void pci_disable_pasid(struct pci_dev *pdev);
 void pci_restore_pasid_state(struct pci_dev *pdev);
 int pci_pasid_features(struct pci_dev *pdev);
 int pci_max_pasids(struct pci_dev *pdev);
-int pci_prg_resp_pasid_required(struct pci_dev *pdev);
 
 #else  /* CONFIG_PCI_PASID */
 
@@ -67,11 +66,18 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
return -EINVAL;
 }
 
+#endif /* CONFIG_PCI_PASID */
+
+#if defined(CONFIG_PCI_PRI) && defined(CONFIG_PCI_PASID)
+
+int pci_prg_resp_pasid_required(struct pci_dev *pdev);
+
+#else /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
+
 static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)
 {
return 0;
 }
-#endif /* CONFIG_PCI_PASID */
-
+#endif
 
 #endif /* LINUX_PCI_ATS_H*/
-- 
2.21.0



Re: [PATCH][next] bus: moxtet: fix unsigned comparison to less than zero

2019-08-16 Thread Marek Behun
On Fri, 16 Aug 2019 23:41:06 +0100
Colin King  wrote:

> From: Colin Ian King 
> 
> Currently the size_t variable res is being checked for
> an error failure however the unsigned variable is never
> less than zero so this test is always false. Fix this by
> making variable res ssize_t
> 
> Addresses-Coverity: ("Unsigned compared against 0")
> Fixes: 5bc7f990cd98 ("bus: Add support for Moxtet bus")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/bus/moxtet.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/moxtet.c b/drivers/bus/moxtet.c
> index 1ee4570e7e17..288a9e4c6c7b 100644
> --- a/drivers/bus/moxtet.c
> +++ b/drivers/bus/moxtet.c
> @@ -514,7 +514,7 @@ static ssize_t output_write(struct file *file, const char 
> __user *buf,
>   struct moxtet *moxtet = file->private_data;
>   u8 bin[TURRIS_MOX_MAX_MODULES];
>   u8 hex[sizeof(bin) * 2 + 1];
> - size_t res;
> + ssize_t res;
>   loff_t dummy = 0;
>   int err, i;
>  

Hi Colin,
thanks. Should I just Ack this, or do I need to send patch to the
developer who commited my patches?
Thanks.

Marek


[PATCH v6 0/8] Fix PF/VF dependency issue

2019-08-16 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

Current implementation of ATS, PASID, PRI does not handle VF dependencies
correctly. Following patches addresses this issue.

Changes since v5:
 * Created new patches for PRI/PASID capability caching.
 * Removed individual locks (pri_lock, pasid_lock) and added common
   resource lock to synchronize PRI/PASID updates between PF/VF.
 * Addressed comments from Bjorn Helgaas.

Changes since v4:
 * Defined empty functions for pci_pri_init() and pci_pasid_init() for cases
   where CONFIG_PCI_PRI and CONFIG_PCI_PASID are not enabled.

Changes since v3:
 * Fixed critical path (lock context) in pci_restore_*_state functions.

Changes since v2:
 * Added locking mechanism to synchronize accessing PF registers in VF.
 * Removed spec compliance checks in patches.
 * Addressed comments from Bjorn Helgaas.

Changes since v1:
 * Added more details about the patches in commit log.
 * Removed bulk spec check patch.
 * Addressed comments from Bjorn Helgaas.

Kuppuswamy Sathyanarayanan (8):
  PCI/ATS: Fix pci_prg_resp_pasid_required() dependency issues
  PCI/ATS: Cache PRI capability check result
  PCI/ATS: Cache PASID capability check result
  PCI/IOV: Add pci_physfn_reslock/unlock() interfaces
  PCI/ATS: Add PRI support for PCIe VF devices
  PCI/ATS: Add PASID support for PCIe VF devices
  PCI/ATS: Disable PF/VF ATS service independently
  PCI: Skip Enhanced Allocation (EA) initialization for VF device

 drivers/pci/ats.c   | 276 +---
 drivers/pci/iov.c   |   1 +
 drivers/pci/pci.c   |   7 +
 drivers/pci/pci.h   |  40 ++
 include/linux/pci-ats.h |  12 +-
 include/linux/pci.h |   5 +-
 6 files changed, 260 insertions(+), 81 deletions(-)

-- 
2.21.0



Re: add a not device managed memremap_pages v2

2019-08-16 Thread Ira Weiny
On Fri, Aug 16, 2019 at 08:54:30AM +0200, Christoph Hellwig wrote:
> Hi Dan and Jason,
> 
> Bharata has been working on secure page management for kvmppc guests,
> and one I thing I noticed is that he had to fake up a struct device
> just so that it could be passed to the devm_memremap_pages
> instrastructure for device private memory.
> 
> This series adds non-device managed versions of the
> devm_request_free_mem_region and devm_memremap_pages functions for
> his use case.
> 
> Changes since v1:
>  - don't overload devm_request_free_mem_region
>  - export the memremap_pages and munmap_pages as kvmppc can be a module

Except for the questions from Andrew this does not look to change anything so:

Reviewed-by: Ira Weiny 

> ___
> Linux-nvdimm mailing list
> linux-nvd...@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH] lib/hexdump: Make print_hex_dump_bytes() a nop on !DEBUG builds

2019-08-16 Thread Stephen Boyd
I'm seeing a bunch of debug prints from a user of print_hex_dump_bytes()
in my kernel logs, but I don't have CONFIG_DYNAMIC_DEBUG enabled nor do
I have DEBUG defined in my build. The problem is that
print_hex_dump_bytes() calls a wrapper function in lib/hexdump.c  that
calls print_hex_dump() with KERN_DEBUG level. There are three cases to
consider here

  1. CONFIG_DYNAMIC_DEBUG=y  --> call dynamic_hex_dum()
  2. CONFIG_DYNAMIC_DEBUG=n && DEBUG --> call print_hex_dump()
  3. CONFIG_DYNAMIC_DEBUG=n && !DEBUG --> stub it out

Right now, that last case isn't detected and we still call
print_hex_dump() from the stub wrapper.

Let's make print_hex_dump_bytes() only call print_hex_dump_debug() so
that it works properly in all cases.

Case #1, print_hex_dump_debug() calls dynamic_hex_dump() and we get same
behavior. Case #2, print_hex_dump_debug() calls print_hex_dump() with
KERN_DEBUG and we get the same behavior. Case #3, print_hex_dump_debug()
is a nop, changing behavior to what we want, i.e. print nothing.

Signed-off-by: Stephen Boyd 
---
 include/linux/printk.h | 22 +++---
 lib/hexdump.c  | 21 -
 2 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index cefd374c47b1..c09d67edda3a 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -488,13 +488,6 @@ extern int hex_dump_to_buffer(const void *buf, size_t len, 
int rowsize,
 extern void print_hex_dump(const char *level, const char *prefix_str,
   int prefix_type, int rowsize, int groupsize,
   const void *buf, size_t len, bool ascii);
-#if defined(CONFIG_DYNAMIC_DEBUG)
-#define print_hex_dump_bytes(prefix_str, prefix_type, buf, len)\
-   dynamic_hex_dump(prefix_str, prefix_type, 16, 1, buf, len, true)
-#else
-extern void print_hex_dump_bytes(const char *prefix_str, int prefix_type,
-const void *buf, size_t len);
-#endif /* defined(CONFIG_DYNAMIC_DEBUG) */
 #else
 static inline void print_hex_dump(const char *level, const char *prefix_str,
  int prefix_type, int rowsize, int groupsize,
@@ -526,4 +519,19 @@ static inline void print_hex_dump_debug(const char 
*prefix_str, int prefix_type,
 }
 #endif
 
+/**
+ * print_hex_dump_bytes - shorthand form of print_hex_dump() with default 
params
+ * @prefix_str: string to prefix each line with;
+ *  caller supplies trailing spaces for alignment if desired
+ * @prefix_type: controls whether prefix of an offset, address, or none
+ *  is printed (%DUMP_PREFIX_OFFSET, %DUMP_PREFIX_ADDRESS, %DUMP_PREFIX_NONE)
+ * @buf: data blob to dump
+ * @len: number of bytes in the @buf
+ *
+ * Calls print_hex_dump(), with log level of KERN_DEBUG,
+ * rowsize of 16, groupsize of 1, and ASCII output included.
+ */
+#define print_hex_dump_bytes(prefix_str, prefix_type, buf, len)\
+   print_hex_dump_debug(prefix_str, prefix_type, 16, 1, buf, len, true)
+
 #endif
diff --git a/lib/hexdump.c b/lib/hexdump.c
index b1d55b669ae2..147133f8eb2f 100644
--- a/lib/hexdump.c
+++ b/lib/hexdump.c
@@ -270,25 +270,4 @@ void print_hex_dump(const char *level, const char 
*prefix_str, int prefix_type,
 }
 EXPORT_SYMBOL(print_hex_dump);
 
-#if !defined(CONFIG_DYNAMIC_DEBUG)
-/**
- * print_hex_dump_bytes - shorthand form of print_hex_dump() with default 
params
- * @prefix_str: string to prefix each line with;
- *  caller supplies trailing spaces for alignment if desired
- * @prefix_type: controls whether prefix of an offset, address, or none
- *  is printed (%DUMP_PREFIX_OFFSET, %DUMP_PREFIX_ADDRESS, %DUMP_PREFIX_NONE)
- * @buf: data blob to dump
- * @len: number of bytes in the @buf
- *
- * Calls print_hex_dump(), with log level of KERN_DEBUG,
- * rowsize of 16, groupsize of 1, and ASCII output included.
- */
-void print_hex_dump_bytes(const char *prefix_str, int prefix_type,
- const void *buf, size_t len)
-{
-   print_hex_dump(KERN_DEBUG, prefix_str, prefix_type, 16, 1,
-  buf, len, true);
-}
-EXPORT_SYMBOL(print_hex_dump_bytes);
-#endif /* !defined(CONFIG_DYNAMIC_DEBUG) */
 #endif /* defined(CONFIG_PRINTK) */
-- 
Sent by a computer through tubes



Re: [PATCH v2 1/6] hwrng: core: Freeze khwrng thread during suspend

2019-08-16 Thread Jarkko Sakkinen
On Fri, Aug 16, 2019 at 05:56:12PM +0200, Alexander Steffen wrote:
> > Andrey talked to me a little about this today. Andrey would prefer we
> > don't just let the TPM go into a wonky state if it's used during
> > suspend/resume so that it can stay resilient to errors. Sounds OK to me,
> > but my point still stands that we need to fix the callers.
> > 
> > I'll resurrect the IS_SUSPENDED flag and make it set generically by the
> > tpm_pm_suspend() and tpm_pm_resume() functions and then spit out a big
> > WARN_ON() and return an error value like -EAGAIN if the TPM functions
> > are called when the TPM is suspended. I hope we don't hit the warning
> > message, but if we do then at least we can track it down rather quickly
> > and figure out how to fix the caller instead of just silently returning
> > -EAGAIN and hoping for that to be visible to the user.
> 
> There is another use case I see for this functionality: There are ways for
> user space to upgrade the TPM's firmware via /dev/tpm0 (using e.g.
> TPM2_FieldUpgradeStart/TPM2_FieldUpgradeData). While upgrading, the normal
> TPM functionality might not be available (commands return TPM_RC_UPGRADE or
> other error codes). Even after the upgrade is finished, the TPM might
> continue to refuse command execution (e.g. with TPM_RC_REBOOT).
> 
> I'm not sure whether all in-kernel users are prepared to deal correctly with
> those error codes. But even if they are, it might be better to block them
> from sending commands in the first place, to not interfere with the upgrade
> process.
> 
> What would you think about a way for a user space upgrade tool to also set
> this flag, to make the TPM unavailable for everything but the upgrade
> process?
> 
> Alexander

NOTE: Just commenting the FW use case.

I don't like it because it contains variable amount of reserved time
for a hardware resource.

Right now a user thread gets a lease of one TPM command for /dev/tpm0
and that is how I would like to keep it.

/Jarkko


Re: [RFC PATCH v2 00/19] RDMA/FS DAX truncate proposal V1,000,002 ; -)

2019-08-16 Thread Ira Weiny
On Fri, Aug 16, 2019 at 12:05:28PM -0700, 'Ira Weiny' wrote:
> On Thu, Aug 15, 2019 at 03:05:58PM +0200, Jan Kara wrote:
> > On Wed 14-08-19 11:08:49, Ira Weiny wrote:
> > > On Wed, Aug 14, 2019 at 12:17:14PM +0200, Jan Kara wrote:
> > > > Hello!
> > > > 
> > > > On Fri 09-08-19 15:58:14, ira.we...@intel.com wrote:
> > > > > Pre-requisites
> > > > > ==
> > > > >   Based on mmotm tree.
> > > > > 
> > > > > Based on the feedback from LSFmm, the LWN article, the RFC series 
> > > > > since
> > > > > then, and a ton of scenarios I've worked in my mind and/or 
> > > > > tested...[1]
> > > > > 
> > > > > Solution summary
> > > > > 
> > > > > 
> > > > > The real issue is that there is no use case for a user to have RDMA 
> > > > > pinn'ed
> > > > > memory which is then truncated.  So really any solution we present 
> > > > > which:
> > > > > 
> > > > > A) Prevents file system corruption or data leaks
> > > > > ...and...
> > > > > B) Informs the user that they did something wrong
> > > > > 
> > > > > Should be an acceptable solution.
> > > > > 
> > > > > Because this is slightly new behavior.  And because this is going to 
> > > > > be
> > > > > specific to DAX (because of the lack of a page cache) we have made 
> > > > > the user
> > > > > "opt in" to this behavior.
> > > > > 
> > > > > The following patches implement the following solution.
> > > > > 
> > > > > 0) Registrations to Device DAX char devs are not affected
> > > > > 
> > > > > 1) The user has to opt in to allowing page pins on a file with an 
> > > > > exclusive
> > > > >layout lease.  Both exclusive and layout lease flags are user 
> > > > > visible now.
> > > > > 
> > > > > 2) page pins will fail if the lease is not active when the file back 
> > > > > page is
> > > > >encountered.
> > > > > 
> > > > > 3) Any truncate or hole punch operation on a pinned DAX page will 
> > > > > fail.
> > > > 
> > > > So I didn't fully grok the patch set yet but by "pinned DAX page" do you
> > > > mean a page which has corresponding file_pin covering it? Or do you 
> > > > mean a
> > > > page which has pincount increased? If the first then I'd rephrase this 
> > > > to
> > > > be less ambiguous, if the second then I think it is wrong. 
> > > 
> > > I mean the second.  but by "fail" I mean hang.  Right now the "normal" 
> > > page
> > > pincount processing will hang the truncate.  Given the discussion with 
> > > John H
> > > we can make this a bit better if we use something like FOLL_PIN and the 
> > > page
> > > count bias to indicate this type of pin.  Then I could fail the truncate
> > > outright.  but that is not done yet.
> > > 
> > > so... I used the word "fail" to be a bit more vague as the final 
> > > implementation
> > > may return ETXTBUSY or hang as noted.
> > 
> > Ah, OK. Hanging is fine in principle but with longterm pins, your work
> > makes sure they actually fail with ETXTBUSY, doesn't it? The thing is that
> > e.g. DIO will use page pins as well for its buffers and we must wait there
> > until the pin is released. So please just clarify your 'fail' here a bit
> > :).
> 
> It will fail with ETXTBSY.  I've fixed a bug...  See below.
> 
> > 
> > > > > 4) The user has the option of holding the lease or releasing it.  If 
> > > > > they
> > > > >release it no other pin calls will work on the file.
> > > > 
> > > > Last time we spoke the plan was that the lease is kept while the pages 
> > > > are
> > > > pinned (and an attempt to release the lease would block until the pages 
> > > > are
> > > > unpinned). That also makes it clear that the *lease* is what is making
> > > > truncate and hole punch fail with ETXTBUSY and the file_pin structure is
> > > > just an implementation detail how the existence is efficiently tracked 
> > > > (and
> > > > what keeps the backing file for the pages open so that the lease does 
> > > > not
> > > > get auto-destroyed). Why did you change this?
> > > 
> > > closing the file _and_ unmaping it will cause the lease to be released
> > > regardless of if we allow this or not.
> > > 
> > > As we discussed preventing the close seemed intractable.
> > 
> > Yes, preventing the application from closing the file is difficult. But
> > from a quick look at your patches it seemed to me that you actually hold a
> > backing file reference from the file_pin structure thus even though the
> > application closes its file descriptor, the struct file (and thus the
> > lease) lives further until the file_pin gets released. And that should last
> > as long as the pages are pinned. Am I missing something?
> > 
> > > I thought about failing the munmap but that seemed wrong as well.  But 
> > > more
> > > importantly AFAIK RDMA can pass its memory pins to other processes via FD
> > > passing...  This means that one could pin this memory, pass it to another
> > > process and exit.  The file lease on the pin'ed file is lost.
> > 
> > Not if file_pin grabs struct file reference as I mentioned above...
> 

Re: Lay common foundation to make PVR/SGX work without hacks on OMAP34xx, OMAP36xx, AM335x and potentially OMAP4, OMAP5

2019-08-16 Thread Adam Ford
On Wed, Aug 14, 2019 at 8:16 AM Tony Lindgren  wrote:
>
> * H. Nikolaus Schaller  [190814 10:34]:
> >
> > > Am 14.08.2019 um 11:47 schrieb Tony Lindgren :
> > >
> > > * H. Nikolaus Schaller  [190814 08:57]:
> > >> I also have pushed good news to
> > >>
> > >>https://github.com/openpvrsgx-devgroup/linux_openpvrsgx/tree/letux-pvr
> > >>
> > >> Thanks to the help from the Pyra community, I was able to get a (binary) 
> > >> reference
> > >> implementation using DRM that works on Pyra/OMAP5. At least the 
> > >> gles1test1.
> > >>
> > >> With that reference setup I was able to fix my Makefiles for the 
> > >> staging/pvr implementation.
> > >>
> > >> I have tested that it works with v4.19.66 and v5.3-rc4 (LPAE build of 
> > >> the LetuxOS kernel tree)
> > >> on the Pyra.
> > >>
> > >> In which areas does this tree go beyond the TI SDK/IMG DDK 1.14?
> > >>
> > >> * includes internal API fixes for kernels up to v5.3
> > >> * lives in drivers/staging/pvr/1.14.3699939 - so that we can ask for 
> > >> inclusion in linux-next
> > >> * has Kconfig and Makefiles for in-kernel configuration (no separate 
> > >> build system)
> > >> * builds separate kernel modules for omap3430, omap3630, am335x, omap4, 
> > >> omap5, dra7 etc.
> > >>  pvrsrvkm
> > >>  e.g. pvrsrvkm_omap_omap5_sgx544_116
> > >> * the correct kernel module is automatically probed by matching 
> > >> .compatible in device tree
> > >>  so that the code is multi-platform friendly
> > >> * includes SoC integration for OMAP3/4/5 and has some preliminary 
> > >> bindings documentation
> > >> * code base should also support JZ4780/CI20 and some Intel Atom 
> > >> processors (CedarView, Poulsbo)
> > >> * has got a ToDo to describe what should be done during staging phase
> > >>
> > >>
> > >> https://github.com/openpvrsgx-devgroup/linux_openpvrsgx/blob/letux/latest-pvr/drivers/staging/pvr/TODO
> > >>
> > >> My plans for the next steps are:
> > >>
> > >> * do more testing (e.g. X11, kmscube)
> > >> * check if and/or how it can run on am335x (BeagleBone) or OMAP3 (e.g. 
> > >> GTA04, OpenPandora)
> > >> * try a JZ480/CI20 build (unfortuantely I have no HDMI there with 
> > >> mainline kernels and I am
> > >>  missing the user-space libraries for MIPS).



> > >
> > > That sounds good to me, just one comment. Before getting these into
> > > staging, I'd like to have omap variants use proper interconnect
> > > target module in devicetree like we already have in omap4.dtsi
> > > as target-module@5600. This should simplify things further
> > > as the module child device driver(s) can just enable things with
> > > runtime PM and we can leave out all the legacy hwmod platform data
> > > that sounds like you're still carrying.
> >
> > Yes, there is still a lot of SoC-glue included:
> >
> >   
> > https://github.com/openpvrsgx-devgroup/linux_openpvrsgx/commits/letux/omap-pvr-soc-glue
> >
> > It would indeed be a good move to simplify and reduce the glue code
> > and make it more maintainable / stable / identical on different platforms.
>
> OK yeah all that should just disappear :)
>
> > > I have patches here to add similar interconnect target modules for
> > > at least omap34xx, omap36xx, omap5, and am335x that I'll try to post
> > > later on today to play with. For am335x, things still depend on the
> > > recentely posted prm rstctrl patches. I'm not sure if I already
> > > did a dts patch for dra7 yet, need to check.
> >
> > I assume it is not yet in linux-next... So something for v5.5 or later.
>
> Well I just posted some sgx interconnect target module patches. We might
> still have them in v5.4 assuming people manage to review and test them.

Nikolaus,

I tested Tony's change and I can confirm that I can read the value
when enabled.  Should I apply his patches to your branch before I
test, or is it go too to go as-is? I've got an AM3517, OMAP35 and a
DM3730.  I am not sure if the AM3517 is even on the radar, but it has
an sgx530 as well.

adam

>
> Regards,
>
> Tony


Re: [PATCH] ipvlan: set hw_enc_features like macvlan

2019-08-16 Thread David Miller
From: Bill Sommerfeld 
Date: Wed, 14 Aug 2019 17:10:43 -0700

> Allow encapsulated packets sent to tunnels layered over ipvlan to use
> offloads rather than forcing SW fallbacks.
> 
> Since commit f21e5077010acda73a60 ("macvlan: add offload features for
> encapsulation"), macvlan has set dev->hw_enc_features to include
> everything in dev->features; do likewise in ipvlan.
> 
> Signed-off-by: Bill Sommerfeld 

Applied to net-next.


Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

2019-08-16 Thread Linus Torvalds
On Fri, Aug 16, 2019 at 3:27 PM Valentin Schneider
 wrote:
>
> How would you differentiate optimizations you want from those you don't with
> just a flag? There's a reason we use volatile casts instead of declaring
> everything volatile: we actually *want* those optimizations. It just so
> happens that we don't want them *in some places*, and we have tools to tag
> them as such.

We actually disable lots of "valid" (read: the standard allows them,
but they are completely wrong for the kernel) optimizations because
they are wrong.

The whole type-based alias thing is just wrong. The C standards body
was incompetent to allow that garbage. So we disable it.

If you can *prove* that no aliasing exists, go ahead and re-order
accesses. But no guesses based on random types.

Similarly, if some compiler decides that it's ok to make speculative
writes (knowing it will over-write it with the right data later) to
data that is possibly visible to other threads, then such an
"optimization" needs to just be disabled. It might help some
benchmark, and if you read the standard just the right way it might be
allowed - but that doesn't make it valid.

We already had situations like that, where compiler people thought it
would be ok (for example) to turns a narrow write into a wider
read-modify-write because it had already done the wider read for other
reasons.

Again, the original C standard "allows" that in theory, because the
original C standard doesn't take threading into account. In fact, the
alpha architecture made actively bad design decisions based on that
(incorrect) assumption.

It turns out that in that case, even non-kernel people rebelled, and
it's apparently thankfully not allowed in newer versions of the
standard, exactly because threading has become a thing. You can't
magically write back unrelated variables just because they might be
next-door neighbors and share a word.

So no, we do *not* in general just say that we want any random
optimizations. A compiler that turns a single write into something
else is almost certainly something that shouldn't be allowed near the
kernel.

We add READ_ONCE and WRITE_ONCE annotations when they make sense. Not
because of some theoretical "compiler is free to do garbage"
arguments. If such garbage happens, we need to fix the compiler, the
same way we already do with

  -fno-strict-aliasing
  -fno-delete-null-pointer-checks
  -fno-strict-overflow

because all those "optimizations" are just fundamentally unsafe and wrong.

I really wish the compiler would never take advantage of "I can prove
this is undefined behavior" kind of things when it comes to the kernel
(or any other projects I am involved with, for that matter). If you
can prove that, then you shouldn't decide to generate random code
without a big warning. But that's what those optimizations that we
disable effectively all do.

I'd love to have a flag that says "all undefined behavior is treated
as implementation-defined". There's a somewhat subtle - but very
important - difference there.

And that's what some hypothetical speculative write optimizations do
too. I do not believe they are valid for the kernel. If the code says

   if (a)
  global_var = 1
   else
  global_var = 0

then the compiler had better not turn that into

 global_var = 0
 if (a)
 global_var = 1

even if there isn't a volatile there. But yes, we've had compiler
writers that say "if you read the specs, that's ok".

No, it's not ok. Because reality trumps any weasel-spec-reading.

And happily, I don't think we've ever really seen a compiler that we
use that actually does the above kind of speculative write thing (but
doing it for your own local variables that can't be seen by other
threads of execution - go wild).

So in general, we very much expect the compiler to do sane code
generation, and not (for example) do store tearing on normal
word-sized things or add writes that weren't there originally etc.

And yes, reads are different from writes. Reads don't have the same
kind of "other threads of execution can see them" effects, so a
compiler turning a single read into multiple reads is much more
realistic and not the same kind of "we need to expect a certain kind
of sanity from the compiler" issue.

  Linus


Re: [PATCHv6 23/36] x86/vdso: Allocate timens vdso

2019-08-16 Thread Dmitry Safonov
Hi Andy, Thomas,

thank you very much for your time and the reviews, appreciate that.

On 8/16/19 9:10 PM, Thomas Gleixner wrote:
> On Fri, 16 Aug 2019, Andy Lutomirski wrote:
[..]
>> I'm unconvinced that any of this magic is wise.  I think you should make a
>> special timens vvar page that causes the normal fastpath to fail (using a
>> special vclock mode, a special seq value, or a special "last" value) and then
>> make the failure path detect that timens is in use and use the timens path.

I see. That's so clever, it haven't come on my mind.
Hmm, is that better because of the price of 5-byte NOP?
I'm a bit afraid to complicate that seq/vclock logic more..

So, what I'm driving at is would you change your mind if timens still
had boot-time dynamic patching but without introducing NOP?

We've got the point that you want to have no penalty at all for host
tasks [on RFC reply] by introducing `if` as trashing cache and branch
predictor, but I wasn't sure if NOP is also unacceptable.

At that moment we had a "plan B" with something that was half-wittedly
called "retcalls". The basic idea is that all that the timens brings
into vdso are calls clk_to_ns(), which are all placed in tails of
functions. So, if we could just memcpy() function returns in host vdso
over introduced time-ns tail calls - it would be a very same code that
lied before. There is a draft of those [1], that actually works on x86
on both mine and Andrei's machines.

Consulting with Andrei, I've decided that we better stick to
static_branchs as they are well known and have already backends for
other architectures. We probably mistakenly decided that a price of NOP
on scalar machines is negligible and would be acceptable.

Would those self-invented "retcalls" be something that could be reviewed
and potentially accepted in further iterations?

[1]
https://github.com/0x7f454c46/linux/commit/ab0eeb646f43#diff-c22e1e73e7367f371e1f12e3877ea12f

> My initial suggestion still stands. Do that at compile time. It really does
> not matter whether we have another 2 or 3 variants of vdso binaries.
> 
> Use it and be done with it. No special magic, just straight forward
> decisions to use a timens capable VDSO or not.

I believe that was something we did in version 1 of the patches set.
It doesn't sound like a rocket science to do, but it resulted in a
couple of ugly patches.

The post-attempt notes about downsides of doing it compile-time are:

1. There is additional .so for each vdso: 64-bit, ia32, x32. The same
for every architecture to-be supported. It adds rules in Makefiles. [2]
2. If we still intend to keep setns() working without exec(), function
entries on both host/namespace vdso should be aligned to each other [3].
That results in a patch to vdso2c to generate offsets [4, 5] and in
linker magic to align another vdso [6].
3. As unexpected consequence, we also need to align local functions on
vdso [7].

So, it might be all related to my lack of skills, but it seems to bring
some big amount of complexity into build process. And in my point of
view, major issue is that it would not scale easily when the day will
come and there will be a need to introduce another vdso.so. As I didn't
want to be the guy that happens to be remembered as "he wrote this
unmaintainable pile of garbage", I've taken dynamic patching approach
that is done once a boot time.

Regardless, we both with Andrei want to improve the patches set and make
it acceptable and easy to maintain in future. I hope, that our effort to
do that is visible through evolution of patches. And we're very glad
that we've constructive critics and such patient maintainers.
So, if I'm mistaken in those points about compile-time vdso(s), or you
have in mind a plan how-to avoid those, I'd appreciate and rework it to
that direction.

[2] lkml.kernel.org/r/20190206001107.16488-14-d...@arista.com
[3] lkml.kernel.org/r/20190206001107.16488-15-d...@arista.com
[4] lkml.kernel.org/r/20190206001107.16488-16-d...@arista.com
[5] lkml.kernel.org/r/20190206001107.16488-17-d...@arista.com
[6] lkml.kernel.org/r/20190206001107.16488-19-d...@arista.com
[7] lkml.kernel.org/r/20190206001107.16488-20-d...@arista.com

Thanks,
  Dmitry


Re: Kernel 5.2.8 - au0828 - Tuner Is Busy

2019-08-16 Thread Brad Love
Hi Nathan,


On 16/08/2019 13.18, Nathan Royce wrote:
> Right up front, I must say I do NOT have a Hauppauge tuner. I think
> it's like maybe Mygica/Geniatech:
> Bus 002 Device 004: ID 05e1:0400 Syntek Semiconductor Co., Ltd
>
> Whenever I update my kernel, I edit the
> ./drivers/media/usb/au0828/au0828-cards.c file adding an entry for my
> 0x400 device.
> I've been doing it for years and it's been working fine... until now...
>
> *
> Aug 16 12:07:20 computerName kernel: usb 2-2.3: Tuner is busy. Error -19
> <...18 more repeated entries...>
> Aug 16 12:07:20 computerName kernel: usb 2-2.3: Tuner is busy. Error -19
> Aug 16 12:07:10 computerName tvheadend[3276]: main: Log started
> *
> "w_scan" behaves the same way.
>
> *


I don't have a "woodbury", but I have a Hauppauge 950Q sitting around
and tested it on latest mainline kernel. w_scan is ok and streaming is
fine. There's no unexpected errors. The 950Q uses the same au0828 bridge
and au8522 demod as woodbury, but a different tuner. Your problem
wouldn't appear to be a general au0828 issue.

You might have to check out git bisect. That will be the quickest way to
get to the bottom, if you've got points A and B, and are
building/running your own kernel.

Cheers,

Brad






> $ modprobe au0828
> Aug 16 12:52:52 computerName kernel: videodev: Linux video capture
> interface: v2.00
> Aug 16 12:52:52 computerName kernel: au0828: au0828_init() Debugging is 
> enabled
> Aug 16 12:52:52 computerName kernel: au0828: au0828 driver loaded
> Aug 16 12:52:52 computerName kernel: au0828: au0828_usb_probe() vendor
> id 0x5e1 device id 0x400 ifnum:0
> Aug 16 12:52:52 computerName kernel: au0828: au0828_gpio_setup()
> Aug 16 12:52:52 computerName kernel: au0828: au0828_i2c_register()
> Aug 16 12:52:52 computerName kernel: au0828: i2c bus registered
> Aug 16 12:52:52 computerName kernel: au0828: au0828_card_setup()
> Aug 16 12:52:52 computerName kernel: tveeprom: Encountered bad packet
> header [20]. Corrupt or not a Hauppauge eeprom.
> Aug 16 12:52:52 computerName kernel: au0828: hauppauge_eeprom:
> warning: unknown hauppauge model #0
> Aug 16 12:52:52 computerName kernel: au0828: hauppauge_eeprom:
> hauppauge eeprom: model=0
> Aug 16 12:52:52 computerName kernel: au0828: au0828_analog_register
> called for intf#0!
> Aug 16 12:52:52 computerName kernel: au0828: au0828_dvb_register()
> Aug 16 12:52:52 computerName kernel: au8522 7-0047: creating new instance
> Aug 16 12:52:52 computerName kernel: tda18271 7-0060: creating new instance
> Aug 16 12:52:52 computerName kernel: tda18271: TDA18271HD/C2 detected @ 7-0060
> Aug 16 12:52:53 computerName kernel: au0828: dvb_register()
> Aug 16 12:52:53 computerName kernel: dvbdev: DVB: registering new
> adapter (au0828)
> Aug 16 12:52:53 computerName kernel: usb 2-2.3: DVB: registering
> adapter 0 frontend 0 (Auvitek AU8522 QAM/8VSB Frontend)...
> Aug 16 12:52:53 computerName kernel: dvbdev: dvb_create_media_entity:
> media entity 'Auvitek AU8522 QAM/8VSB Frontend' registered.
> Aug 16 12:52:53 computerName kernel: dvbdev: dvb_create_media_entity:
> media entity 'dvb-demux' registered.
> Aug 16 12:52:53 computerName kernel: au0828: Registered device AU0828
> [Hauppauge Woodbury]
> Aug 16 12:52:53 computerName kernel: usbcore: registered new interface
> driver au0828
> *
> The "eeprom" thing has never been an issue with regard to my tuner
> working. It still worked in spite of it.
>
> It's odd because:
> *
> $ lsmod | grep au0828
> au0828 86016  0
> tveeprom   28672  1 au0828
> dvb_core  176128  1 au0828
> v4l2_common20480  1 au0828
> videobuf2_vmalloc  20480  2 dvb_core,au0828
> videobuf2_v4l2 28672  1 au0828
> videobuf2_common   61440  3 videobuf2_v4l2,dvb_core,au0828
> videodev  253952  4
> v4l2_common,videobuf2_v4l2,videobuf2_common,au0828
> rc_core61440  1 au0828
> media  61440  6
> videodev,snd_usb_audio,videobuf2_v4l2,dvb_core,videobuf2_common,au0828
>
> $ ls -la /dev/dvb/adapter0/
> total 0
> drwxr-xr-x  2 root root 120 Aug 16 12:01 .
> drwxr-xr-x  3 root root  60 Aug 16 12:01 ..
> crw-rw+ 1 root video 212, 4 Aug 16 12:01 demux0
> crw-rw+ 1 root video 212, 5 Aug 16 12:01 dvr0
> crw-rw+ 1 root video 212, 3 Aug 16 12:01 frontend0
> crw-rw+ 1 root video 212, 7 Aug 16 12:01 net0
> *
>
> The previous kernel version I was on that worked was 5.1.15.
> I just reverted back to the previous version and it's working again.
> I don't know what broke and where, between the versions.
>
> I saw https://lkml.org/lkml/2019/1/21/1020 but this is back in January
> so I don't know if something was more recently applied to au0828 that
> makes use of the API.
> "lsof" didn't show anything related to "/dev/dvb" being used.
>
> Oh neat! Someone posted a neat git feature which I tried and I get:
> *
> $ git log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset
> %s 

[PATCH][next] bus: moxtet: fix unsigned comparison to less than zero

2019-08-16 Thread Colin King
From: Colin Ian King 

Currently the size_t variable res is being checked for
an error failure however the unsigned variable is never
less than zero so this test is always false. Fix this by
making variable res ssize_t

Addresses-Coverity: ("Unsigned compared against 0")
Fixes: 5bc7f990cd98 ("bus: Add support for Moxtet bus")
Signed-off-by: Colin Ian King 
---
 drivers/bus/moxtet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/bus/moxtet.c b/drivers/bus/moxtet.c
index 1ee4570e7e17..288a9e4c6c7b 100644
--- a/drivers/bus/moxtet.c
+++ b/drivers/bus/moxtet.c
@@ -514,7 +514,7 @@ static ssize_t output_write(struct file *file, const char 
__user *buf,
struct moxtet *moxtet = file->private_data;
u8 bin[TURRIS_MOX_MAX_MODULES];
u8 hex[sizeof(bin) * 2 + 1];
-   size_t res;
+   ssize_t res;
loff_t dummy = 0;
int err, i;
 
-- 
2.20.1



Re: [PATCH v6 4/4] net: phy: realtek: Add LED configuration support for RTL8211E

2019-08-16 Thread Matthias Kaehlcke
On Fri, Aug 16, 2019 at 03:12:47PM -0700, Florian Fainelli wrote:
> On 8/16/19 2:27 PM, Matthias Kaehlcke wrote:
> > On Fri, Aug 16, 2019 at 10:13:42PM +0200, Pavel Machek wrote:
> >> On Tue 2019-08-13 12:11:47, Matthias Kaehlcke wrote:
> >>> Add a .config_led hook which is called by the PHY core when
> >>> configuration data for a PHY LED is available. Each LED can be
> >>> configured to be solid 'off, solid 'on' for certain (or all)
> >>> link speeds or to blink on RX/TX activity.
> >>>
> >>> Signed-off-by: Matthias Kaehlcke 
> >>
> >> THis really needs to go through the LED subsystem,
> > 
> > Sorry, I used what get_maintainers.pl threw at me, I should have
> > manually cc-ed the LED list.
> > 
> >> and use the same userland interfaces as the rest of the system.
> > 
> > With the PHY maintainers we discussed to define a binding that is
> > compatible with that of the LED one, to have the option to integrate
> > it with the LED subsystem later. The integration itself is beyond the
> > scope of this patchset.
> > 
> > The PHY LED configuration is a low priority for the project I'm
> > working on. I wanted to make an attempt to upstream it and spent
> > already significantly more time on it than planned, if integration
> > with the LED framework now is a requirement please consider this
> > series abandonded.
> 
> While I have an appreciation for how hard it can be to work in a
> corporate environment while doing upstream first and working with
> virtually unbounded goals (in time or scope) due to maintainers and
> reviewers, that kind of statement can hinder your ability to establish
> trust with peers in the community as it can be read as take it or leave it.

I'm really just stating the reality here. We strongly prefer landing
patches upstream over doing custom hacks, and depending on the
priority of a given feature/sub-system and impact on schedule we can
allocate more time on it or less. In some cases/at some point a
downstream patch is just good enough.

I definitely don't intend to get a patchset landed if it isn't deemed
ready or suitable at all. In this case I just can't justify to spend
significantly more time on it. IMO it is better to be clear on this,
not to pressure maintainers to take a patch, but so people know what
to expect. This information can also help if someone comes across this
patchset in the future and wonders about its status.

btw, a birdie told me there will be a talk next week at ELC in San
Diego on how Chrome OS works with upstream, discussing pros and
cons for both the project and upstream. For those who are intersted
in the topic but can't make it to the conference, the slides are
already online and IMO have good information:
https://static.sched.com/hosted_files/ossna19/9c/ELC19_ChromeOSAndUpstream.pdf

Cheers

Matthias


Re: [PATCH v6 4/4] net: phy: realtek: Add LED configuration support for RTL8211E

2019-08-16 Thread Doug Anderson
Hi,

On Fri, Aug 16, 2019 at 3:12 PM Florian Fainelli  wrote:
>
> On 8/16/19 2:27 PM, Matthias Kaehlcke wrote:
> > On Fri, Aug 16, 2019 at 10:13:42PM +0200, Pavel Machek wrote:
> >> On Tue 2019-08-13 12:11:47, Matthias Kaehlcke wrote:
> >>> Add a .config_led hook which is called by the PHY core when
> >>> configuration data for a PHY LED is available. Each LED can be
> >>> configured to be solid 'off, solid 'on' for certain (or all)
> >>> link speeds or to blink on RX/TX activity.
> >>>
> >>> Signed-off-by: Matthias Kaehlcke 
> >>
> >> THis really needs to go through the LED subsystem,
> >
> > Sorry, I used what get_maintainers.pl threw at me, I should have
> > manually cc-ed the LED list.
> >
> >> and use the same userland interfaces as the rest of the system.
> >
> > With the PHY maintainers we discussed to define a binding that is
> > compatible with that of the LED one, to have the option to integrate
> > it with the LED subsystem later. The integration itself is beyond the
> > scope of this patchset.
> >
> > The PHY LED configuration is a low priority for the project I'm
> > working on. I wanted to make an attempt to upstream it and spent
> > already significantly more time on it than planned, if integration
> > with the LED framework now is a requirement please consider this
> > series abandonded.
>
> While I have an appreciation for how hard it can be to work in a
> corporate environment while doing upstream first and working with
> virtually unbounded goals (in time or scope) due to maintainers and
> reviewers, that kind of statement can hinder your ability to establish
> trust with peers in the community as it can be read as take it or leave it.

You think so?  I feel like Matthias is simply expressing the reality
of the situation here and I'd rather see a statement like this posted
than the series just silently dropped.  Communication is good.

In general on Chrome OS we don't spent lots of time tweaking with
Ethernet and even less time tweaking with Ethernet on ARM boards where
you might need a binding like this, so it's pretty hard to justify up
the management chain spending massive amounts of resources on it.  In
this case we have two existing ARM boards which we're trying to uprev
from 3.14 to 4.19 which were tweaking the Ethernet driver in some
downstream code.  We thought it would be nice to try to come up with a
solution that could land upstream, which is usually what we try to do
in these cases.

Normally if there is some major architecture needed that can't fit in
the scope of a project, we would do a downstream solution for the
project and then fork off the task (maybe by a different Engineer or a
contractor) to get a solution that can land upstream.  ...but in this
case it seems hard to justify because it's unlikely we would need it
again anytime remotely soon.

So I guess the alternatives to what Matthias did would have been:

A) Don't even try to upstream.  Seems worse.  At least this way
there's something a future person can start from and the discussion is
rolling.

B) Keep spending tons of time on something even though management
doesn't want him to.  Seems worse.

C) Spend his nights and weekends working on this.  Seems worse.

D) Silently stop working on it without saying "I'm going to stop".  Seems worse.

...unless you have a brilliant "E)" I think what Matthias did here is
exactly right.

BTW: I'm giving a talk on this topic next week at ELC [1].  If you're
going to be there feel free to attend.  ...or just read the slides if
not.


> The LED subsystem integration can definitively come in later from my 2
> cents perspective and this patch series as it stands is valuable and
> avoids inventing new bindings.

If something like this series can land and someone can later try to
make the situation better then I think that would be awesome.  I don't
think Matthias is saying "I won't spin" or "I won't take feedback".
He's just expressing that he can't keep working on this indefinitely.



[1] 
https://ossna19.sched.com/event/PVSV/how-chrome-os-works-with-upstream-linux-douglas-anderson-google

-Doug


Re: [PATCH] PM / wakeup: Register wakeup class kobj after device is added

2019-08-16 Thread Tri Vo
On Fri, Aug 16, 2019 at 2:46 PM Stephen Boyd  wrote:
>
> Quoting Tri Vo (2019-08-16 14:27:35)
> > On Fri, Aug 16, 2019 at 7:56 AM Stephen Boyd  wrote:
> > > diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c
> > > index 1b9c281cbe41..27ee00f50bd7 100644
> > > --- a/drivers/base/power/sysfs.c
> > > +++ b/drivers/base/power/sysfs.c
> > > @@ -5,6 +5,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include "power.h"
> > > @@ -661,14 +662,21 @@ int dpm_sysfs_add(struct device *dev)
> > > if (rc)
> > > goto err_runtime;
> > > }
> > > +   if (dev->power.wakeup) {
> >
> > This conditional checks for the situation when wakeup source
> > registration have been previously attempted, but failed at
> > wakeup_source_sysfs_add(). My concern is that it's not easy to
> > understand what this check does without knowing exactly what
> > device_wakeup_enable() does to dev->power.wakeup before we reach this
> > point.
>
> Oh, actually this is wrong. It should be a check for
> dev->power.wakeup->dev being non-NULL. That's the variable that's set by
> wakeup_source_sysfs_add() upon success. So I should make it:
>
> if (dev->power.wakeup && !dev->power.wakeup->dev)

Oh ok, this makes more sense now :)
>
> And there's the problem that CONFIG_PM_SLEEP could be unset. Let me fix
> it up with a new inline function like device_has_wakeup_dev().
>
> >
> > > +   rc = wakeup_source_sysfs_add(dev, dev->power.wakeup);
> > > +   if (rc)
> > > +   goto err_wakeup;
> > > +   }
> > > if (dev->power.set_latency_tolerance) {
> > > rc = sysfs_merge_group(>kobj,
> > >
> > > _qos_latency_tolerance_attr_group);
> > > if (rc)
> > > -   goto err_wakeup;
> > > +   goto err_wakeup_source;
> > > }
> > > return 0;
> > >
> > > + err_wakeup_source:
> > > +   wakeup_source_sysfs_remove(dev->power.wakeup);
> > >   err_wakeup:
> > > sysfs_unmerge_group(>kobj, _wakeup_attr_group);
> > >   err_runtime:
> > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> > > index f7925820b5ca..5817b51d2b15 100644
> > > --- a/drivers/base/power/wakeup.c
> > > +++ b/drivers/base/power/wakeup.c
> > > @@ -220,10 +220,12 @@ struct wakeup_source *wakeup_source_register(struct 
> > > device *dev,
> > >
> > > ws = wakeup_source_create(name);
> > > if (ws) {
> > > -   ret = wakeup_source_sysfs_add(dev, ws);
> > > -   if (ret) {
> > > -   wakeup_source_free(ws);
> > > -   return NULL;
> > > +   if (!dev || device_is_registered(dev)) {
> >
> > Is there a possible race condition here? If dev->power.wakeup check in
> > dpm_sysfs_add() is done at the same time as device_is_registered(dev)
> > check here, then wakeup_source_sysfs_add() won't ever be called?
>
> The same race exists for device_set_wakeup_capable() so I didn't bother
> to try to avoid it. I suppose wakeup_source_sysfs_add() could run
> completely, allocate the device and set the name, etc., but not call
> device_add() and then we can set ws->dev and call device_add() under a
> mutex so that we keep a very small window where the wakeup class is
> published to sysfs. Or just throw a big mutex around the whole wakeup
> class creation path so that there isn't a chance of a race. But really,
> is anyone going to call device_set_wakeup_*() on a device that is also
> being added to the system? Seems unlikely.

True. I don't have a strong opinion.
>
> >
> > > +   ret = wakeup_source_sysfs_add(dev, ws);
> > > +   if (ret) {
> > > +   wakeup_source_free(ws);


Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()

2019-08-16 Thread John Hubbard
On 8/16/19 2:59 PM, Ira Weiny wrote:
> On Fri, Aug 16, 2019 at 11:50:09AM -0700, John Hubbard wrote:
...
>>> John could you send a formal patch using vaddr_pin* and I'll add it to the
>>> tree?
>>>
>>
>> Yes...hints about which struct file to use here are very welcome, btw. This 
>> part
>> of mm is fairly new to me.
> 
> I'm still working out the final semantics of vaddr_pin*.  But right now you
> don't need a vaddr_pin if you don't specify FOLL_LONGTERM.
> 

ah OK.

> Since case 1, this case, does not need FOLL_LONGTERM I think it is safe to
> simply pass NULL here.
> 
> OTOH we could just track this against the mm_struct.  But I don't think we 
> need
> to because this pin should be transient.
> 

Thanks for looking at that, I'm definitely in learning mode here.

> And this is why I keep leaning toward _not_ putting these flags in the
> vaddr_pin*() calls.  I know this is what I did but I think I'm wrong.  It 
> should
> be the caller specifying what they want and the vaddr_pin*() calls check that
> what they are asking for is correct.
> 

Yes. I think we're nearly done finding the right balance of wrapper calls and
FOLL_* flags. I've seen Jan and others asking that the call sites do *not*
set the flags, but we also know that FOLL_PIN and FOLL_LONGTERM need to vary
independently.

That means either:

a) another trivial wrapper calls, on top of vaddr_pin_*(), for each supported 
combination of FOLL_PIN and FOLL_LONGTERM, or

b) just setting FOLL_PIN and FOLL_LONGTERM at each callsite.

I think either way is easy to grep for, so it's hard to get too excited
(fortunately) about which one to pick. Let's start simple with (b) and it's 
easy to convert later if someone wants that.

Meanwhile, we do need to pull the flag setting out of vaddr_pin_pages().

So I will post these small patches for your mmotm-rdmafsdax-b0-v4 branch,
shortly:

1) Add FOLL_PIN 

   --also I guess it's time to add comments documenting FOLL_PIN and
FOLL_LONGTERM use, stealing Jan's and others' wording for the 4 cases,
from earlier. :)

2) Add vaddr_pin_user_pages_remote(), which will not set FOLL_PIN or 
FOLL_LONGTERM
itself. And add the caller, which will.

thanks,
-- 
John Hubbard
NVIDIA


[PATCH v3 bpf-next 3/4] tracing/probe: Sync perf_event.h to tools

2019-08-16 Thread Daniel Xu
Signed-off-by: Daniel Xu 
---
 tools/include/uapi/linux/perf_event.h | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index 7198ddd0c6b1..8783d29a807a 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -447,6 +447,28 @@ struct perf_event_query_bpf {
__u32   ids[0];
 };
 
+/*
+ * Structure used by below PERF_EVENT_IOC_QUERY_PROBE command
+ * to query information about the probe attached to the perf
+ * event. Currently only supports [uk]probes.
+ */
+struct perf_event_query_probe {
+   /*
+* Size of structure for forward/backward compatibility
+*/
+   __u64   size;
+   /*
+* Set by the kernel to indicate number of times this probe
+* was temporarily disabled
+*/
+   __u64   nmissed;
+   /*
+* Set by the kernel to indicate number of times this probe
+* was hit
+*/
+   __u64   nhit;
+};
+
 /*
  * Ioctls that can be done on a perf event fd:
  */
@@ -462,6 +484,7 @@ struct perf_event_query_bpf {
 #define PERF_EVENT_IOC_PAUSE_OUTPUT_IOW('$', 9, __u32)
 #define PERF_EVENT_IOC_QUERY_BPF   _IOWR('$', 10, struct 
perf_event_query_bpf *)
 #define PERF_EVENT_IOC_MODIFY_ATTRIBUTES   _IOW('$', 11, struct 
perf_event_attr *)
+#define PERF_EVENT_IOC_QUERY_PROBE _IOR('$', 12, struct 
perf_event_query_probe *)
 
 enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
-- 
2.20.1



[PATCH v3 bpf-next 1/4] tracing/probe: Add PERF_EVENT_IOC_QUERY_PROBE ioctl

2019-08-16 Thread Daniel Xu
It's useful to know [uk]probe's nmissed and nhit stats. For example with
tracing tools, it's important to know when events may have been lost.
debugfs currently exposes a control file to get this information, but
it is not compatible with probes registered with the perf API.

While bpf programs may be able to manually count nhit, there is no way
to gather nmissed. In other words, it is currently not possible to
retrieve information about FD-based probes.

This patch adds a new ioctl that lets users query nmissed (as well as
nhit for completeness). We currently only add support for [uk]probes
but leave the possibility open for other probes like tracepoint.

Signed-off-by: Daniel Xu 
---
 include/linux/trace_events.h| 12 
 include/uapi/linux/perf_event.h | 23 +++
 kernel/events/core.c| 20 
 kernel/trace/trace_kprobe.c | 24 
 kernel/trace/trace_uprobe.c | 24 
 5 files changed, 103 insertions(+)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 5150436783e8..61558f19696a 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -586,6 +586,12 @@ extern int bpf_get_kprobe_info(const struct perf_event 
*event,
   u32 *fd_type, const char **symbol,
   u64 *probe_offset, u64 *probe_addr,
   bool perf_type_tracepoint);
+extern int perf_kprobe_event_query(struct perf_event *event, void __user 
*info);
+#else
+int perf_kprobe_event_query(struct perf_event *event, void __user *info)
+{
+   return -EOPNOTSUPP;
+}
 #endif
 #ifdef CONFIG_UPROBE_EVENTS
 extern int  perf_uprobe_init(struct perf_event *event,
@@ -594,6 +600,12 @@ extern void perf_uprobe_destroy(struct perf_event *event);
 extern int bpf_get_uprobe_info(const struct perf_event *event,
   u32 *fd_type, const char **filename,
   u64 *probe_offset, bool perf_type_tracepoint);
+extern int perf_uprobe_event_query(struct perf_event *event, void __user 
*info);
+#else
+int perf_uprobe_event_query(struct perf_event *event, void __user *info)
+{
+   return -EOPNOTSUPP;
+}
 #endif
 extern int  ftrace_profile_set_filter(struct perf_event *event, int event_id,
 char *filter_str);
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 7198ddd0c6b1..8783d29a807a 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -447,6 +447,28 @@ struct perf_event_query_bpf {
__u32   ids[0];
 };
 
+/*
+ * Structure used by below PERF_EVENT_IOC_QUERY_PROBE command
+ * to query information about the probe attached to the perf
+ * event. Currently only supports [uk]probes.
+ */
+struct perf_event_query_probe {
+   /*
+* Size of structure for forward/backward compatibility
+*/
+   __u64   size;
+   /*
+* Set by the kernel to indicate number of times this probe
+* was temporarily disabled
+*/
+   __u64   nmissed;
+   /*
+* Set by the kernel to indicate number of times this probe
+* was hit
+*/
+   __u64   nhit;
+};
+
 /*
  * Ioctls that can be done on a perf event fd:
  */
@@ -462,6 +484,7 @@ struct perf_event_query_bpf {
 #define PERF_EVENT_IOC_PAUSE_OUTPUT_IOW('$', 9, __u32)
 #define PERF_EVENT_IOC_QUERY_BPF   _IOWR('$', 10, struct 
perf_event_query_bpf *)
 #define PERF_EVENT_IOC_MODIFY_ATTRIBUTES   _IOW('$', 11, struct 
perf_event_attr *)
+#define PERF_EVENT_IOC_QUERY_PROBE _IOR('$', 12, struct 
perf_event_query_probe *)
 
 enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0463c1151bae..ed33d50511a3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5060,6 +5060,8 @@ static int perf_event_set_filter(struct perf_event 
*event, void __user *arg);
 static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd);
 static int perf_copy_attr(struct perf_event_attr __user *uattr,
  struct perf_event_attr *attr);
+static int perf_probe_event_query(struct perf_event *event,
+   void __user *info);
 
 static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned 
long arg)
 {
@@ -5143,6 +5145,10 @@ static long _perf_ioctl(struct perf_event *event, 
unsigned int cmd, unsigned lon
 
return perf_event_modify_attr(event,  _attr);
}
+#if defined(CONFIG_KPROBE_EVENTS) || defined(CONFIG_UPROBE_EVENTS)
+   case PERF_EVENT_IOC_QUERY_PROBE:
+   return perf_probe_event_query(event, (void __user *)arg);
+#endif
default:
return -ENOTTY;
}
@@ -8833,6 +8839,20 @@ static inline void perf_tp_register(void)
 

[PATCH v3 bpf-next 2/4] libbpf: Add helpers to extract perf fd from bpf_link

2019-08-16 Thread Daniel Xu
It is sometimes necessary to perform ioctl's on the underlying perf fd.
There is not currently a way to extract the fd given a bpf_link, so add a
a pair of casting and getting helpers.

The casting and getting helpers are nice because they let us define
broad categories of links that makes it clear to users what they can
expect to extract from what type of link.

Acked-by: Song Liu 
Signed-off-by: Daniel Xu 
---
 tools/lib/bpf/libbpf.c   | 21 +
 tools/lib/bpf/libbpf.h   | 13 +
 tools/lib/bpf/libbpf.map |  4 
 3 files changed, 38 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2233f919dd88..41588e13be2b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4876,6 +4876,7 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr 
*attr,
 
 struct bpf_link {
int (*destroy)(struct bpf_link *link);
+   enum bpf_link_type type;
 };
 
 int bpf_link__destroy(struct bpf_link *link)
@@ -4909,6 +4910,24 @@ static int bpf_link__destroy_perf_event(struct bpf_link 
*link)
return err;
 }
 
+const struct bpf_link_fd *bpf_link__as_fd(const struct bpf_link *link)
+{
+   if (link->type != LIBBPF_LINK_FD)
+   return NULL;
+
+   return (struct bpf_link_fd *)link;
+}
+
+enum bpf_link_type bpf_link__type(const struct bpf_link *link)
+{
+   return link->type;
+}
+
+int bpf_link_fd__fd(const struct bpf_link_fd *link)
+{
+   return link->fd;
+}
+
 struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog,
int pfd)
 {
@@ -4932,6 +4951,7 @@ struct bpf_link *bpf_program__attach_perf_event(struct 
bpf_program *prog,
if (!link)
return ERR_PTR(-ENOMEM);
link->link.destroy = _link__destroy_perf_event;
+   link->link.type = LIBBPF_LINK_FD;
link->fd = pfd;
 
if (ioctl(pfd, PERF_EVENT_IOC_SET_BPF, prog_fd) < 0) {
@@ -5225,6 +5245,7 @@ struct bpf_link 
*bpf_program__attach_raw_tracepoint(struct bpf_program *prog,
link = malloc(sizeof(*link));
if (!link)
return ERR_PTR(-ENOMEM);
+   link->link.type = LIBBPF_LINK_FD;
link->link.destroy = _link__destroy_fd;
 
pfd = bpf_raw_tracepoint_open(tp_name, prog_fd);
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e8f70977d137..2ddef5315ff9 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -166,7 +166,20 @@ LIBBPF_API int bpf_program__pin(struct bpf_program *prog, 
const char *path);
 LIBBPF_API int bpf_program__unpin(struct bpf_program *prog, const char *path);
 LIBBPF_API void bpf_program__unload(struct bpf_program *prog);
 
+enum bpf_link_type {
+   LIBBPF_LINK_FD,
+};
+
 struct bpf_link;
+struct bpf_link_fd;
+
+/* casting APIs */
+LIBBPF_API const struct bpf_link_fd *
+bpf_link__as_fd(const struct bpf_link *link);
+
+/* getters APIs */
+LIBBPF_API enum bpf_link_type bpf_link__type(const struct bpf_link *link);
+LIBBPF_API int bpf_link_fd__fd(const struct bpf_link_fd *link);
 
 LIBBPF_API int bpf_link__destroy(struct bpf_link *link);
 
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 4e72df8e98ba..ee9945177100 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -186,4 +186,8 @@ LIBBPF_0.0.4 {
 } LIBBPF_0.0.3;
 
 LIBBPF_0.0.5 {
+   global:
+   bpf_link__type;
+   bpf_link__as_fd;
+   bpf_link_fd__fd;
 } LIBBPF_0.0.4;
-- 
2.20.1



[PATCH v3 bpf-next 0/4] tracing/probe: Add PERF_EVENT_IOC_QUERY_PROBE

2019-08-16 Thread Daniel Xu
It's useful to know [uk]probe's nmissed and nhit stats. For example with
tracing tools, it's important to know when events may have been lost.
debugfs currently exposes a control file to get this information, but
it is not compatible with probes registered with the perf API.

While bpf programs may be able to manually count nhit, there is no way
to gather nmissed. In other words, it is currently not possible to
retrieve information about FD-based probes.

This patch adds a new ioctl that lets users query nmissed (as well as
nhit for completeness). We currently only add support for [uk]probes
but leave the possibility open for other probes like tracepoint.

v2 -> v3:
- Introduce bpf_link_type and associated getter to track underlying link
  types
- Add back size field in perf_event_query_probe for forward/backwards
  compat
- Remove NULL checks, fix typos

v1 -> v2:
- More descriptive cover letter
- Make API more generic and support uprobes as well
- Use casters/getters for libbpf instead of single getter
- Fix typos
- Remove size field from ioctl struct
- Split out libbpf.h sync to tools dir to separate commit

Daniel Xu (4):
  tracing/probe: Add PERF_EVENT_IOC_QUERY_PROBE ioctl
  libbpf: Add helpers to extract perf fd from bpf_link
  tracing/probe: Sync perf_event.h to tools
  tracing/probe: Add self test for PERF_EVENT_IOC_QUERY_PROBE

 include/linux/trace_events.h  |  12 ++
 include/uapi/linux/perf_event.h   |  23 
 kernel/events/core.c  |  20 
 kernel/trace/trace_kprobe.c   |  24 
 kernel/trace/trace_uprobe.c   |  24 
 tools/include/uapi/linux/perf_event.h |  23 
 tools/lib/bpf/libbpf.c|  21 
 tools/lib/bpf/libbpf.h|  13 +++
 tools/lib/bpf/libbpf.map  |   4 +
 .../selftests/bpf/prog_tests/attach_probe.c   | 106 ++
 10 files changed, 270 insertions(+)

-- 
2.20.1



[PATCH v3 bpf-next 4/4] tracing/probe: Add self test for PERF_EVENT_IOC_QUERY_PROBE

2019-08-16 Thread Daniel Xu
Acked-by: Andrii Nakryiko 
Signed-off-by: Daniel Xu 
---
 .../selftests/bpf/prog_tests/attach_probe.c   | 106 ++
 1 file changed, 106 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/attach_probe.c 
b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
index 5ecc267d98b0..c14db7557881 100644
--- a/tools/testing/selftests/bpf/prog_tests/attach_probe.c
+++ b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
@@ -27,17 +27,27 @@ void test_attach_probe(void)
const char *kretprobe_name = "kretprobe/sys_nanosleep";
const char *uprobe_name = "uprobe/trigger_func";
const char *uretprobe_name = "uretprobe/trigger_func";
+   struct perf_event_query_probe kprobe_query = {};
+   struct perf_event_query_probe kretprobe_query = {};
+   struct perf_event_query_probe uprobe_query = {};
+   struct perf_event_query_probe uretprobe_query = {};
const int kprobe_idx = 0, kretprobe_idx = 1;
const int uprobe_idx = 2, uretprobe_idx = 3;
const char *file = "./test_attach_probe.o";
struct bpf_program *kprobe_prog, *kretprobe_prog;
struct bpf_program *uprobe_prog, *uretprobe_prog;
struct bpf_object *obj;
+   const struct bpf_link_fd *kprobe_fd_link;
+   const struct bpf_link_fd *kretprobe_fd_link;
+   const struct bpf_link_fd *uprobe_fd_link;
+   const struct bpf_link_fd *uretprobe_fd_link;
int err, prog_fd, duration = 0, res;
struct bpf_link *kprobe_link = NULL;
struct bpf_link *kretprobe_link = NULL;
struct bpf_link *uprobe_link = NULL;
struct bpf_link *uretprobe_link = NULL;
+   int kprobe_fd, kretprobe_fd;
+   int uprobe_fd, uretprobe_fd;
int results_map_fd;
size_t uprobe_offset;
ssize_t base_addr;
@@ -116,6 +126,54 @@ void test_attach_probe(void)
/* trigger & validate kprobe && kretprobe */
usleep(1);
 
+   kprobe_fd_link = bpf_link__as_fd(kprobe_link);
+   if (CHECK(!kprobe_fd_link, "kprobe_link_as_fd",
+ "failed to cast link to fd link\n"))
+   goto cleanup;
+
+   kprobe_fd = bpf_link_fd__fd(kprobe_fd_link);
+   if (CHECK(kprobe_fd < 0, "kprobe_get_perf_fd",
+   "failed to get perf fd from kprobe link\n"))
+   goto cleanup;
+
+   kretprobe_fd_link = bpf_link__as_fd(kretprobe_link);
+   if (CHECK(!kretprobe_fd_link, "kretprobe_link_as_fd",
+ "failed to cast link to fd link\n"))
+   goto cleanup;
+
+   kretprobe_fd = bpf_link_fd__fd(kretprobe_fd_link);
+   if (CHECK(kretprobe_fd < 0, "kretprobe_get_perf_fd",
+   "failed to get perf fd from kretprobe link\n"))
+   goto cleanup;
+
+   kprobe_query.size = sizeof(kprobe_query);
+   err = ioctl(kprobe_fd, PERF_EVENT_IOC_QUERY_PROBE, _query);
+   if (CHECK(err, "get_kprobe_ioctl",
+ "failed to issue kprobe query ioctl\n"))
+   goto cleanup;
+   if (CHECK(kprobe_query.nmissed > 0, "get_kprobe_ioctl",
+ "read incorrect nmissed from kprobe_ioctl: %llu\n",
+ kprobe_query.nmissed))
+   goto cleanup;
+   if (CHECK(kprobe_query.nhit == 0, "get_kprobe_ioctl",
+ "read incorrect nhit from kprobe_ioctl: %llu\n",
+ kprobe_query.nhit))
+   goto cleanup;
+
+   kretprobe_query.size = sizeof(kretprobe_query);
+   err = ioctl(kretprobe_fd, PERF_EVENT_IOC_QUERY_PROBE, _query);
+   if (CHECK(err, "get_kretprobe_ioctl",
+ "failed to issue kretprobe query ioctl\n"))
+   goto cleanup;
+   if (CHECK(kretprobe_query.nmissed > 0, "get_kretprobe_ioctl",
+ "read incorrect nmissed from kretprobe_ioctl: %llu\n",
+ kretprobe_query.nmissed))
+   goto cleanup;
+   if (CHECK(kretprobe_query.nhit <= 0, "get_kretprobe_ioctl",
+ "read incorrect nhit from kretprobe_ioctl: %llu\n",
+ kretprobe_query.nhit))
+   goto cleanup;
+
err = bpf_map_lookup_elem(results_map_fd, _idx, );
if (CHECK(err, "get_kprobe_res",
  "failed to get kprobe res: %d\n", err))
@@ -135,6 +193,54 @@ void test_attach_probe(void)
/* trigger & validate uprobe & uretprobe */
get_base_addr();
 
+   uprobe_fd_link = bpf_link__as_fd(uprobe_link);
+   if (CHECK(!uprobe_fd_link, "uprobe_link_as_fd",
+ "failed to cast link to fd link\n"))
+   goto cleanup;
+
+   uprobe_fd = bpf_link_fd__fd(uprobe_fd_link);
+   if (CHECK(uprobe_fd < 0, "uprobe_get_perf_fd",
+   "failed to get perf fd from uprobe link\n"))
+   goto cleanup;
+
+   uretprobe_fd_link = bpf_link__as_fd(uretprobe_link);
+   if (CHECK(!uretprobe_fd_link, "uretprobe_link_as_fd",
+ "failed to cast link to fd link\n"))
+   goto cleanup;
+

re: clk: actions: Don't reference clk_init_data after registration [bug report]

2019-08-16 Thread Colin Ian King
Hi,

Static analysis with Coverity Scan on linux-next has found an issue with
the following commit:

commit 20cac6d02815edcc0b1c87bc3e8858b3d1fda3fa
Author: Stephen Boyd 
Date:   Wed Jul 31 12:35:09 2019 -0700

clk: actions: Don't reference clk_init_data after registration

The analysis is as follows:

7int i, ret;

  1. var_decl: Declaring variable hw without initializer.

68struct clk_hw *hw;
69

  2. Condition i < hw_clks->num, taking true branch.

70for (i = 0; i < hw_clks->num; i++) {

  CID 85252 (#1 of 1): Uninitialized pointer read (UNINIT)
  3. uninit_use:  Using uninitialized value hw.

71const char *name = hw->init->name;
72
73hw = hw_clks->hws[i];

hw is being dereferenced on line 71 however it is not assigned until
line 73.

Did you instead intent this to be:

const char *name;

hw = hw_clks->hws[i];
name = hw->init->name;

Colin


Re: linux-next: manual merge of the gpio-brgl tree with the gpio tree

2019-08-16 Thread Linus Walleij
On Fri, Aug 16, 2019 at 8:56 AM Stephen Rothwell  wrote:

> Today's linux-next merge of the gpio-brgl tree got a conflict in:
>
>   include/linux/gpio/driver.h
>
> between commit:
>
>   fdd61a013a24 ("gpio: Add support for hierarchical IRQ domains")
>
> from the gpio tree and commit:
>
>   9091373ab7ea ("gpio: remove less important #ifdef around declarations")
>
> from the gpio-brgl tree.

OK Bartosz can you send me a pull request for your tree so we can
sync them up?

Yours,
Linus Walleij


Re: [PATCH] gpio: of: fix Freescale SPI CS quirk handling

2019-08-16 Thread Linus Walleij
On Fri, Aug 16, 2019 at 6:50 PM Andreas Kemnade  wrote:

> On the gta04 we see:
> spi_gpio: probe of spi_lcd failed with error -2
>
> The quirk introduced in
> commit e3023bf80639 ("gpio: of: Handle the Freescale SPI CS")
> can also be triggered by a temporary -EPROBE_DEFER and
> so "convert" it to a hard -ENOENT.
>
> Disable that conversion by checking for -EPROBE_DEFER.
>
> Fixes: e3023bf80639 ("gpio: of: Handle the Freescale SPI CS")
> Suggested-by: H. Nikolaus Schaller 
> Signed-off-by: Andreas Kemnade 

Good catch! Patch applied for fixes.

Yours,
Linus Walleij


Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

2019-08-16 Thread Valentin Schneider
On 16/08/2019 21:57, Joel Fernandes wrote:
>> Can we finally put a foot down and tell compiler and standard committee
>> people to stop this insanity?
> 
> Sure, or could the compilers provide flags which prevent such optimization
> similar to -O* flags?
> 

How would you differentiate optimizations you want from those you don't with
just a flag? There's a reason we use volatile casts instead of declaring
everything volatile: we actually *want* those optimizations. It just so
happens that we don't want them *in some places*, and we have tools to tag
them as such.

The alternative is having a compiler that can magically correlate e.g. locked
writes with lock-free reads and properly handle them, but I don't think
there's a foolproof way of doing that.

> thanks,
> 
>  - Joel
> 


Re: linux-next: build failure after merge of the gpio tree

2019-08-16 Thread Linus Walleij
On Fri, Aug 16, 2019 at 1:38 PM Stephen Rothwell  wrote:

> After merging the gpio tree, today's linux-next build (powerpc
> ppc44x_defconfig) failed like this:

Oops!

> I have applied the following patch for today:
>
> From: Stephen Rothwell 
> Date: Fri, 16 Aug 2019 21:29:30 +1000
> Subject: [PATCH] gpio: stubs in headers should be inline
>
> Fixes: fdd61a013a24 ("gpio: Add support for hierarchical IRQ domains")
> Signed-off-by: Stephen Rothwell 

I just nicked that and applied to the GPIO tree to fix this.

Thanks Stephen!

Linus Walleij


NEED YOUR ASSIST PLEASE

2019-08-16 Thread Mrs Ann Johnson
Hello
   My name Mrs. Ann Johnson from UK,I am a dying woman who had decided to 
donate what I have left to you. I am 59 years old and was diagnosed of ovarian 
cancer about 2 years ago immediately after the death of my husband, who had 
left me with everything he worked for. I have decided to donate/will 
£9,500,000.00 pounds to you to help the motherless, less privileged and also 
for the assistance of the widows ,rather than allow my relatives to use my 
husband hard earned funds ungodly as i will be going in for an operation do not 
know if i will survived it. Upon receipt of your positive response along with 
your full contact details will notify my bank to presenting you as my next of 
kin/beneficiary to facilitate the funds disbursement to you.

God Bless you
Mrs. Ann Johnson


Re: regression in ath10k dma allocation

2019-08-16 Thread Nicolin Chen
Hi Tobias

On Fri, Aug 16, 2019 at 10:16:45PM +0200, Tobias Klausmann wrote:
> > do you have CONFIG_DMA_CMA set in your config?  If not please make sure
> > you have this commit in your testing tree, and if the problem still
> > persists it would be a little odd and we'd have to dig deeper:
> > 
> > commit dd3dcede9fa0a0b661ac1f24843f4a1b1317fdb6
> > Author: Nicolin Chen 
> > Date:   Wed May 29 17:54:25 2019 -0700
> > 
> >  dma-contiguous: fix !CONFIG_DMA_CMA version of dma_{alloc, 
> > free}_contiguous()

> yes CONFIG_DMA_CMA is set (=y, see attached config), the commit you mention
> above is included, if you have any hints how to go forward, please let me
> know!

For CONFIG_DMA_CMA=y, by judging the log with error code -12, I
feel this one should work for you. Would you please check if it
is included or try it out otherwise?

dma-contiguous: do not overwrite align in dma_alloc_contiguous()
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=c6622a425acd1d2f3a443cd39b490a8777b622d7

Thanks
Nicolin


[PATCH][next] soc: samsung: exynos-chipid: fix memory leak

2019-08-16 Thread Colin King
From: Colin Ian King 

Currently when the call to product_id_to_soc_id fails there
is a memory leak of soc_dev_attr->revision and soc_dev_attr
on the error return path.  Fix this by adding a common error
return path that frees there obects and use this for two
error return paths.

Addresses-Coverity: ("Resource leak")
Fixes: 3253b7b7cd44 ("soc: samsung: Add exynos chipid driver support")
Signed-off-by: Colin Ian King 
---
 drivers/soc/samsung/exynos-chipid.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/soc/samsung/exynos-chipid.c 
b/drivers/soc/samsung/exynos-chipid.c
index 006a95feb618..4e194a97c0fa 100644
--- a/drivers/soc/samsung/exynos-chipid.c
+++ b/drivers/soc/samsung/exynos-chipid.c
@@ -81,15 +81,15 @@ int __init exynos_chipid_early_init(void)
soc_dev_attr->soc_id = product_id_to_soc_id(product_id);
if (!soc_dev_attr->soc_id) {
pr_err("Unknown SoC\n");
-   return -ENODEV;
+   ret = -ENODEV;
+   goto err;
}
 
/* please note that the actual registration will be deferred */
soc_dev = soc_device_register(soc_dev_attr);
if (IS_ERR(soc_dev)) {
-   kfree(soc_dev_attr->revision);
-   kfree(soc_dev_attr);
-   return PTR_ERR(soc_dev);
+   ret = PTR_ERR(soc_dev);
+   goto err;
}
 
/* it is too early to use dev_info() here (soc_dev is NULL) */
@@ -97,5 +97,11 @@ int __init exynos_chipid_early_init(void)
soc_dev_attr->soc_id, product_id, revision);
 
return 0;
+
+err:
+   kfree(soc_dev_attr->revision);
+   kfree(soc_dev_attr);
+   return ret;
 }
+
 early_initcall(exynos_chipid_early_init);
-- 
2.20.1



[PATCH] signal: Allow cifs and drbd to receive their terminating signals

2019-08-16 Thread Eric W. Biederman


My recent to change to only use force_sig for a synchronous events
wound up breaking signal reception cifs and drbd.  I had overlooked
the fact that by default kthreads start out with all signals set to
SIG_IGN.  So a change I thought was safe turned out to have made it
impossible for those kernel thread to catch their signals.

Reverting the work on force_sig is a bad idea because what the code
was doing was very much a misuse of force_sig.  As the way force_sig
ultimately allowed the signal to happen was to change the signal
handler to SIG_DFL.  Which after the first signal will allow userspace
to send signals to these kernel threads.  At least for
wake_ack_receiver in drbd that does not appear actively wrong.

So correct this problem by adding allow_kernel_signal that will allow
signals whose siginfo reports they were sent by the kernel through,
but will not allow userspace generated signals, and update cifs and
drbd to call allow_kernel_signal in an appropriate place so that their
thread can receive this signal.

Fixing things this way ensures that userspace won't be able to send
signals and cause problems, that it is clear which signals the
threads are expecting to receive, and it guarantees that nothing
else in the system will be affected.

This change was partly inspired by similar cifs and drbd patches that
added allow_signal.

Reported-by: ronnie sahlberg 
Reported-by: Christoph Böhmwalder 
Cc: Steve French 
Cc: Philipp Reisner 
Cc: David Laight 
Fixes: 247bc9470b1e ("cifs: fix rmmod regression in cifs.ko caused by force_sig 
changes")
Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig 
instead of force_sig")
Fixes: fee109901f39 ("signal/drbd: Use send_sig not force_sig")
Fixes: 3cf5d076fb4d ("signal: Remove task parameter from force_sig")
Signed-off-by: "Eric W. Biederman" 
---
 drivers/block/drbd/drbd_main.c |  2 ++
 fs/cifs/connect.c  |  2 +-
 include/linux/signal.h | 15 ++-
 kernel/signal.c|  5 +
 4 files changed, 22 insertions(+), 2 deletions(-)

Folks my apolgies for this mess and for taking so long to suggest an
improvement.  I needed a good nights sleep to think about this and
with a new baby at home that has been a challenge to get.

Unless someone has an objection or sees a problem with this patch I will
send this to Linus in the next couple of days.

I think adding allow_kernel_signal is better because it makes it clear
that userspace does not mess with these signals.  I would love it if we
could avoid signals all together but that appears tricky in the
presence of kernel threads making blocking network requests.

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 9bd4ddd12b25..5b248763a672 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -322,6 +322,8 @@ static int drbd_thread_setup(void *arg)
 thi->name[0],
 resource->name);
 
+   allow_kernel_signal(DRBD_SIGKILL);
+   allow_kernel_signal(SIGXCPU);
 restart:
retval = thi->function(thi);
 
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index a15a6e738eb5..1795e80cbdf7 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -1113,7 +1113,7 @@ cifs_demultiplex_thread(void *p)
mempool_resize(cifs_req_poolp, length + cifs_min_rcv);
 
set_freezable();
-   allow_signal(SIGKILL);
+   allow_kernel_signal(SIGKILL);
while (server->tcpStatus != CifsExiting) {
if (try_to_freeze())
continue;
diff --git a/include/linux/signal.h b/include/linux/signal.h
index b5d99482d3fe..703fa20c06f5 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,6 +282,9 @@ extern void signal_setup_done(int failed, struct ksignal 
*ksig, int stepping);
 extern void exit_signals(struct task_struct *tsk);
 extern void kernel_sigaction(int, __sighandler_t);
 
+#define SIG_KTHREAD ((__force __sighandler_t)2)
+#define SIG_KTHREAD_KERNEL ((__force __sighandler_t)3)
+
 static inline void allow_signal(int sig)
 {
/*
@@ -289,7 +292,17 @@ static inline void allow_signal(int sig)
 * know it'll be handled, so that they don't get converted to
 * SIGKILL or just silently dropped.
 */
-   kernel_sigaction(sig, (__force __sighandler_t)2);
+   kernel_sigaction(sig, SIG_KTHREAD);
+}
+
+static inline void allow_kernel_signal(int sig)
+{
+   /*
+* Kernel threads handle their own signals. Let the signal code
+* kwown signals sent by the kernel will be handled, so that they
+* don't get silently dropped.
+*/
+   kernel_sigaction(sig, SIG_KTHREAD_KERNEL);
 }
 
 static inline void disallow_signal(int sig)
diff --git a/kernel/signal.c b/kernel/signal.c
index e667be6907d7..534fec266a33 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -90,6 +90,11 @@ static bool sig_task_ignored(struct task_struct *t, int sig, 
bool 

Re: [PATCH v6 4/4] net: phy: realtek: Add LED configuration support for RTL8211E

2019-08-16 Thread Florian Fainelli
On 8/16/19 2:27 PM, Matthias Kaehlcke wrote:
> On Fri, Aug 16, 2019 at 10:13:42PM +0200, Pavel Machek wrote:
>> On Tue 2019-08-13 12:11:47, Matthias Kaehlcke wrote:
>>> Add a .config_led hook which is called by the PHY core when
>>> configuration data for a PHY LED is available. Each LED can be
>>> configured to be solid 'off, solid 'on' for certain (or all)
>>> link speeds or to blink on RX/TX activity.
>>>
>>> Signed-off-by: Matthias Kaehlcke 
>>
>> THis really needs to go through the LED subsystem,
> 
> Sorry, I used what get_maintainers.pl threw at me, I should have
> manually cc-ed the LED list.
> 
>> and use the same userland interfaces as the rest of the system.
> 
> With the PHY maintainers we discussed to define a binding that is
> compatible with that of the LED one, to have the option to integrate
> it with the LED subsystem later. The integration itself is beyond the
> scope of this patchset.
> 
> The PHY LED configuration is a low priority for the project I'm
> working on. I wanted to make an attempt to upstream it and spent
> already significantly more time on it than planned, if integration
> with the LED framework now is a requirement please consider this
> series abandonded.

While I have an appreciation for how hard it can be to work in a
corporate environment while doing upstream first and working with
virtually unbounded goals (in time or scope) due to maintainers and
reviewers, that kind of statement can hinder your ability to establish
trust with peers in the community as it can be read as take it or leave it.

The LED subsystem integration can definitively come in later from my 2
cents perspective and this patch series as it stands is valuable and
avoids inventing new bindings.
-- 
Florian


Re: [PATCH v11 03/12] dt-binding: gce: add binding for gce client reg property

2019-08-16 Thread Rob Herring
On Mon, 29 Jul 2019 15:00:57 +0800, Bibby Hsieh wrote:
> cmdq driver provide a function that get the relationship
> of sub system number from device node for client.
> add specification for #subsys-cells, mediatek,gce-client-reg.
> 
> Signed-off-by: Bibby Hsieh 
> ---
>  .../devicetree/bindings/mailbox/mtk-gce.txt  | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 

Reviewed-by: Rob Herring 


Re: [PATCH 1/2] dt-bindings: amazon: add Amazon Annapurna Labs Alpine support

2019-08-16 Thread Rob Herring
On Sun, Jul 28, 2019 at 10:51:34PM +0300, Ronen Krupnik wrote:
> This patch adds DT bindings info for Amazon Annapurna Labs Alpine SOC
> and related reference boards.
> 
> Signed-off-by: Ronen Krupnik 
> ---
>  .../devicetree/bindings/arm/amazon,alpine.txt | 23 +++
>  1 file changed, 23 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/amazon,alpine.txt

Board bindings are in DT schema format now.

Also, we already have al,alpine.yaml. Please combine with that one or 
remove it perhaps. Seems kind of abandoned given the lack of response on 
it.

> 
> diff --git a/Documentation/devicetree/bindings/arm/amazon,alpine.txt 
> b/Documentation/devicetree/bindings/arm/amazon,alpine.txt
> new file mode 100644
> index ..58817208421b
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/amazon,alpine.txt
> @@ -0,0 +1,23 @@
> +Amazon Annapurna Labs Alpine v3 Platform Device Tree Bindings
> +---
> +
> +Platforms based on Amazon Annapurna Labs Alpine SoC architecture
> +shall follow the following scheme:
> +
> +SoCs
> +
> +
> +Each device tree root node must specify which exact SoC in alpine
> +architecture it uses, using one of the following compatible values:
> +
> +- alpine v3
> +  compatible = "amazon,alpine-v3";
> +
> +Boards
> +--
> +
> +Each device tree must specify which one or more of the following
> +board-specific compatible values:
> +
> +- alpine-v3 Evaluation Platform (EVP)
> +  compatible = "amazon,alpine-v3-evp";
> -- 
> 2.21.0
> 


Re: [PATCH v6 1/4] dt-bindings: net: phy: Add subnode for LED configuration

2019-08-16 Thread Matthias Kaehlcke
On Fri, Aug 16, 2019 at 10:13:38PM +0200, Pavel Machek wrote:
> Hi!
> 
> Please Cc led mailing lists on led issues.

sorry for missing this

> On Tue 2019-08-13 12:11:44, Matthias Kaehlcke wrote:
> > The LED behavior of some Ethernet PHYs is configurable. Add an
> > optional 'leds' subnode with a child node for each LED to be
> > configured. The binding aims to be compatible with the common
> > LED binding (see devicetree/bindings/leds/common.txt).
> > 
> > A LED can be configured to be:
> > 
> > - 'on' when a link is active, some PHYs allow configuration for
> >   certain link speeds
> >   speeds
> > - 'off'
> > - blink on RX/TX activity, some PHYs allow configuration for
> >   certain link speeds
> > 
> > For the configuration to be effective it needs to be supported by
> > the hardware and the corresponding PHY driver.
> > 
> > Suggested-by: Andrew Lunn 
> > Signed-off-by: Matthias Kaehlcke 
> 
> > @@ -173,5 +217,20 @@ examples:
> >  reset-gpios = < 4 1>;
> >  reset-assert-us = <1000>;
> >  reset-deassert-us = <2000>;
> > +
> > +leds {
> > +#address-cells = <1>;
> > +#size-cells = <0>;
> > +
> > +led@0 {
> > +reg = <0>;
> > +linux,default-trigger = "phy-link-1g";
> > +};
> 
> Because this affects us.
> 
> Is the LED software controllable?

it might be for certain PHYs, integration with the LED framework is
not part of this series.

> Or can it do limited subset of triggers you listed?

it depends on the PHY. The one in this series (RTL8211E) only supports
a limited subset of the listed triggers.


Re: [PATCH 2/2] HID: wacom: do not call hid_set_drvdata(hdev, NULL)

2019-08-16 Thread Jason Gerecke
On Mon, Aug 12, 2019 at 9:29 AM Benjamin Tissoires
 wrote:
>
> This is a common pattern in the HID drivers to reset the drvdata.
> However, this is actually already handled by driver core, so there
> is no need to do it manually.
>
> Signed-off-by: Benjamin Tissoires 

Acked-by: Jason Gerecke 

Jason
---
Now instead of four in the eights place /
you’ve got three, ‘Cause you added one  /
(That is to say, eight) to the two, /
But you can’t take seven from three,/
So you look at the sixty-fours


> ---
>  drivers/hid/wacom_sys.c | 18 +-
>  1 file changed, 5 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
> index 53bddb50aeba..69ccfdd51a6f 100644
> --- a/drivers/hid/wacom_sys.c
> +++ b/drivers/hid/wacom_sys.c
> @@ -2718,14 +2718,12 @@ static int wacom_probe(struct hid_device *hdev,
> wacom_wac->features = *((struct wacom_features *)id->driver_data);
> features = _wac->features;
>
> -   if (features->check_for_hid_type && features->hid_type != hdev->type) 
> {
> -   error = -ENODEV;
> -   goto fail;
> -   }
> +   if (features->check_for_hid_type && features->hid_type != hdev->type)
> +   return -ENODEV;
>
> error = kfifo_alloc(_wac->pen_fifo, WACOM_PKGLEN_MAX, 
> GFP_KERNEL);
> if (error)
> -   goto fail;
> +   return error;
>
> wacom_wac->hid_data.inputmode = -1;
> wacom_wac->mode_report = -1;
> @@ -2743,12 +2741,12 @@ static int wacom_probe(struct hid_device *hdev,
> error = hid_parse(hdev);
> if (error) {
> hid_err(hdev, "parse failed\n");
> -   goto fail;
> +   return error;
> }
>
> error = wacom_parse_and_register(wacom, false);
> if (error)
> -   goto fail;
> +   return error;
>
> if (hdev->bus == BUS_BLUETOOTH) {
> error = device_create_file(>dev, _attr_speed);
> @@ -2759,10 +2757,6 @@ static int wacom_probe(struct hid_device *hdev,
> }
>
> return 0;
> -
> -fail:
> -   hid_set_drvdata(hdev, NULL);
> -   return error;
>  }
>
>  static void wacom_remove(struct hid_device *hdev)
> @@ -2791,8 +2785,6 @@ static void wacom_remove(struct hid_device *hdev)
> wacom_release_resources(wacom);
>
> kfifo_free(_wac->pen_fifo);
> -
> -   hid_set_drvdata(hdev, NULL);
>  }
>
>  #ifdef CONFIG_PM
> --
> 2.19.2
>


Re: [PATCH 07/15] dt: thermal: tsens: Document interrupt support in tsens driver

2019-08-16 Thread Amit Kucheria
On Sat, Aug 17, 2019 at 3:06 AM Rob Herring  wrote:
>
> On Fri, Jul 26, 2019 at 03:48:42AM +0530, Amit Kucheria wrote:
> > Define two new required properties to define interrupts and
> > interrupt-names for tsens.
> >
> > Signed-off-by: Amit Kucheria 
> > ---
> >  Documentation/devicetree/bindings/thermal/qcom-tsens.txt | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/thermal/qcom-tsens.txt 
> > b/Documentation/devicetree/bindings/thermal/qcom-tsens.txt
> > index 673cc1831ee9..3d3dd5dc6d36 100644
> > --- a/Documentation/devicetree/bindings/thermal/qcom-tsens.txt
> > +++ b/Documentation/devicetree/bindings/thermal/qcom-tsens.txt
> > @@ -22,6 +22,8 @@ Required properties:
> >
> >  - #thermal-sensor-cells : Should be 1. See ./thermal.txt for a description.
> >  - #qcom,sensors: Number of sensors in tsens block
> > +- interrupts: Interrupts generated from Always-On subsystem (AOSS)
> > +- interrupt-names: The name of the interrupt e.g. "tsens0", "tsens1"
>
> How many interrupts? A name with just indices isn't too useful.

Depending on the version of the tsens IP, there can be 1 (upper/lower
threshold), 2 (upper/lower + critical threshold) or 3 (upper/lower +
critical + zero degree) interrupts. This patch series only introduces
support for a single interrupt (upper/lower).

I used the names tsens0, tsens1 to encapsulate the controller instance
since some SoCs have 1 controller, others have two. So we'll end up
with something like the following in DT:

tsens0: thermal-sensor@c263000 {
compatible = "qcom,sdm845-tsens", "qcom,tsens-v2";
reg = <0 0x0c263000 0 0x1ff>, /* TM */
  <0 0x0c222000 0 0x1ff>; /* SROT */
#qcom,sensors = <13>;
interrupts = ,
 ;
interrupt-names = "tsens0", "tsens0-critical";
#thermal-sensor-cells = <1>;
};

tsens1: thermal-sensor@c265000 {
compatible = "qcom,sdm845-tsens", "qcom,tsens-v2";
reg = <0 0x0c265000 0 0x1ff>, /* TM */
  <0 0x0c223000 0 0x1ff>; /* SROT */
#qcom,sensors = <8>;
interrupts = ,
 ;
interrupt-names = "tsens1", "tsens1-critical";
#thermal-sensor-cells = <1>;
}

Does that work?

Regards,
Amit

> >  - Refer to Documentation/devicetree/bindings/nvmem/nvmem.txt to know how 
> > to specify
> >  nvmem cells
> >
> > @@ -40,6 +42,9 @@ tsens0: thermal-sensor@c263000 {
> >   reg = <0xc263000 0x1ff>, /* TM */
> >   <0xc222000 0x1ff>; /* SROT */
> >   #qcom,sensors = <13>;
> > + interrupts = ;
> > + interrupt-names = "tsens0";
> > +
> >   #thermal-sensor-cells = <1>;
> >   };
> >
> > --
> > 2.17.1
> >


Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()

2019-08-16 Thread Ira Weiny
On Fri, Aug 16, 2019 at 11:50:09AM -0700, John Hubbard wrote:
> On 8/16/19 11:33 AM, Ira Weiny wrote:
> > On Fri, Aug 16, 2019 at 05:41:08PM +0200, Jan Kara wrote:
> > > On Thu 15-08-19 19:14:08, John Hubbard wrote:
> > > > On 8/15/19 10:41 AM, John Hubbard wrote:
> > > > > On 8/15/19 10:32 AM, Ira Weiny wrote:
> > > > > > On Thu, Aug 15, 2019 at 03:35:10PM +0200, Jan Kara wrote:
> > > > > > > On Thu 15-08-19 15:26:22, Jan Kara wrote:
> > > > > > > > On Wed 14-08-19 20:01:07, John Hubbard wrote:
> > > > > > > > > On 8/14/19 5:02 PM, John Hubbard wrote:
> > > > ...
> > > > 
> > > > OK, there was only process_vm_access.c, plus (sort of) Bharath's sgi-gru
> > > > patch, maybe eventually [1].  But looking at process_vm_access.c, I 
> > > > think
> > > > it is one of the patches that is no longer applicable, and I can just
> > > > drop it entirely...I'd welcome a second opinion on that...
> > > 
> > > I don't think you can drop the patch. process_vm_rw_pages() clearly 
> > > touches
> > > page contents and does not synchronize with page_mkclean(). So it is case
> > > 1) and needs FOLL_PIN semantics.
> > 
> > John could you send a formal patch using vaddr_pin* and I'll add it to the
> > tree?
> > 
> 
> Yes...hints about which struct file to use here are very welcome, btw. This 
> part
> of mm is fairly new to me.

I'm still working out the final semantics of vaddr_pin*.  But right now you
don't need a vaddr_pin if you don't specify FOLL_LONGTERM.

Since case 1, this case, does not need FOLL_LONGTERM I think it is safe to
simply pass NULL here.

OTOH we could just track this against the mm_struct.  But I don't think we need
to because this pin should be transient.

And this is why I keep leaning toward _not_ putting these flags in the
vaddr_pin*() calls.  I know this is what I did but I think I'm wrong.  It should
be the caller specifying what they want and the vaddr_pin*() calls check that
what they are asking for is correct.

Ira

> 
> thanks,
> -- 
> John Hubbard
> NVIDIA


Re: [PATCH 2/2] dt-bindings: pci: pci-msi: Correct the unit-address of the pci node name

2019-08-16 Thread Rob Herring
On Sun, 28 Jul 2019 02:30:19 -0700, Bin Meng wrote:
> The unit-address must match the first address specified in the
> reg property of the node.
> 
> Signed-off-by: Bin Meng 
> ---
> 
>  Documentation/devicetree/bindings/pci/pci-msi.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Applied, thanks.

Rob


  1   2   3   4   5   6   7   8   9   >