date:20180513

Re: general protection fault in kernfs_kill_sb (2)

2018-05-13 Thread Al Viro

On Sun, May 13, 2018 at 11:19:46AM +0900, Tetsuo Handa wrote:

> This is what I reported at
> https://groups.google.com/d/msg/syzkaller-bugs/ISOJlV2I2QM/qHslGMi3AwAJ .
> 
> We are currently waiting for comments from Al Viro.

1) the damn thing is unusable without javashit.  Which gets about
the same reaction as sending something.doc in attachment.  Please,
find a less obnoxious way to archive the thing (or to generate
URLs that would work without that garbage).

2) deactivate_locked_super() *WILL* be called when fill_super() fails.
Live with it; it allows to simplify a whole lot of cleanup logics
in various filesystems.  Again, we are not going for a model where
->kill_sb() is not called for something returned by sget().
Rationale: rarely exercised paths tend to rot, so anything that increases
the duplication of bits and pieces of normal teardown into failure exits
of foo_fill_super() is a bloody bad idea.  If anything, we want to take
a lot of stuff out of ->put_super() instances directly into ->kill_sb()
ones, precisely because ->put_super() is only called for fully set up
filesystems.

3) kernfs needs to be fixed.  The rest of the dropped commits were
made redundant by 8e04944f0ea8; this one wasn't.  Mea culpa.

Re: [PATCH 3/3] arm64: dts: renesas: draak: Describe HDMI input

2018-05-13 Thread Laurent Pinchart

Hello,

On Sunday, 13 May 2018 15:57:55 EEST Niklas Söderlund wrote:
> On 2018-05-11 12:00:02 +0200, Jacopo Mondi wrote:
> > Describe HDMI input connected to VIN4 interface for R-Car D3 Draak
> > development board.
> > 
> > Signed-off-by: Jacopo Mondi 
> > ---
> > 
> >  arch/arm64/boot/dts/renesas/r8a77995-draak.dts | 68 +
> >  1 file changed, 68 insertions(+)
> > 
> > diff --git a/arch/arm64/boot/dts/renesas/r8a77995-draak.dts
> > b/arch/arm64/boot/dts/renesas/r8a77995-draak.dts index d03f194..e0ce462
> > 100644
> > --- a/arch/arm64/boot/dts/renesas/r8a77995-draak.dts
> > +++ b/arch/arm64/boot/dts/renesas/r8a77995-draak.dts
> > @@ -59,6 +59,17 @@
> > 
> > };
> > 
> > };
> > 
> > +   hdmi-in {
> > +   compatible = "hdmi-connector";
> > +   type = "a";
> > +
> > +   port {
> > +   hdmi_con_in: endpoint {
> > +   remote-endpoint = <_in>;
> > +   };
> > +   };
> > +   };
> > +
> > 
> > memory@4800 {
> > 
> > device_type = "memory";
> > /* first 128MB is reserved for secure area. */
> > 
> > @@ -142,6 +153,11 @@
> > 
> > groups = "usb0";
> > function = "usb0";
> > 
> > };
> > 
> > +
> > +   vin4_pins: vin4 {
> > +   groups = "vin4_data24", "vin4_sync", "vin4_clk", "vin4_clkenb";
> > +   function = "vin4";
> > +   };
> > 
> >  };
> >  
> >   {
> > 
> > @@ -154,6 +170,35 @@
> > 
> > reg = <0x50>;
> > pagesize = <8>;
> > 
> > };
> > 
> > +
> > +   hdmi-decoder@4c {
> > +   compatible = "adi,adv7612";
> > +   reg = <0x4c>;
> > +   default-input = <0>;
> > +
> > +   ports {
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +
> > +   port@0 {
> > +   reg = <0>;
> > +   adv7612_in: endpoint {
> > +   remote-endpoint = <_con_in>;
> > +   };
> > +   };
> > +
> > +   port@2 {
> > +   reg = <2>;
> > +   adv7612_out: endpoint {
> > +   pclk-sample = <0>;
> > +   hsync-active = <0>;
> > +   vsync-active = <0>;
> 
> This differs from the Gen2 DT bindings which is a very similar hardware
> setup using the same components. Defining these properties will make the
> bus marked as V4L2_MBUS_PARALLEL instead of V4L2_MBUS_BT656.
> 
> This will change how the hardware is configured for capture if the media
> bus is in a UYVY format, see VNMC_INF register in rvin_setup(). Maybe
> this it not an issue here but still I'm curious to why this differ
> between Gen2 and Gen3 :-)
> 
> > +
> > +   remote-endpoint = <_in>;
> > +   };
> > +   };
> > +   };
> > +   };
> > 
> >  };
> >  
> >   {
> > 
> > @@ -246,3 +291,26 @@
> > 
> > timeout-sec = <60>;
> > status = "okay";
> >  
> >  };
> > 
> > +
> > + {
> > +   pinctrl-0 = <_pins>;
> > +   pinctrl-names = "default";
> > +
> > +   status = "okay";
> > +
> > +   ports {
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +
> > +   port@0 {
> > +   reg = <0>;
> > +
> > +   vin4_in: endpoint {
> > +   hsync-active = <0>;
> > +   vsync-active = <0>;
> 
> Comparing this to the Gen2 bindings some properties are missing,
> 
> bus-width = <24>;
> pclk-sample = <1>;
> data-active = <1>;
> 
> This is not a big deal as the VIN driver don't use these properties so
> no functional change should come of this but still a difference.

I think the VIN DT bindings should be updated to explicitly list the endpoint 
properties that are mandatory, optional, or not allowed.

> Over all I'm happy with this change but before I add my tag I would like
> to understand why it differs from the Gen2 configuration for the adv7612
> properties.
> 
> Also on a side not it is possible with hardware switches on the board
> switch the VIN4 source to a completely different pipeline CVBS connector
> -> adv7180 -> VIN4. But I think it's best we keep the HDMI as default as
> this seems to be how the boards are shipped. But maybe mentioning this
> in the commit message would not hurt if you end-up resending the patch.
> 
> > +
> > +   remote-endpoint = <_out>;
> > +   };
> > +   };
> > +   };
> > +};

-- 
Regards,

Laurent Pinchart

linux-next: build warning after merge of the block tree

2018-05-13 Thread Stephen Rothwell

Hi all,

After merging the block tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

drivers/memstick/core/mspro_block.c: In function 'mspro_block_init_disk':
drivers/memstick/core/mspro_block.c:1173:24: warning: unused variable 'host' 
[-Wunused-variable]
  struct memstick_host *host = card->host;
^~~~
drivers/memstick/core/ms_block.c: In function 'msb_init_disk':
drivers/memstick/core/ms_block.c:2097:24: warning: unused variable 'host' 
[-Wunused-variable]
  struct memstick_host *host = card->host;
^~~~

Introduced by commit

  7c2d748e8476 ("memstick: don't call blk_queue_bounce_limit")

-- 
Cheers,
Stephen Rothwell


pgpC7ssq08B01.pgp
Description: OpenPGP digital signature

Re: RTL8723BE performance regression

2018-05-13 Thread Pkshih

On Wed, 2018-05-09 at 13:33 -0700, João Paulo Rechi Vita wrote:
> On Tue, May 8, 2018 at 1:37 AM, Pkshih  wrote:
> > On Mon, 2018-05-07 at 14:49 -0700, João Paulo Rechi Vita wrote:
> >> On Tue, May 1, 2018 at 10:58 PM, Pkshih  wrote:
> >> > On Wed, 2018-05-02 at 05:44 +, Pkshih wrote:
> >> >>
> >> >> > -Original Message-
> >> >> > From: João Paulo Rechi Vita [mailto:jprv...@gmail.com]
> >> >> > Sent: Wednesday, May 02, 2018 6:41 AM
> >> >> > To: Larry Finger
> >> >> > Cc: Steve deRosier; 莊彥宣; Pkshih; Birming Chiu; Shaofu; Steven Ting; 
> >> >> > Chaoming_Li; Kalle
> Valo;
> >> >> > linux-wireless; Network Development; LKML; Daniel Drake; João Paulo 
> >> >> > Rechi Vita; linux@endl
> ess
> >> m.c
> >> >> om
> >> >> > Subject: Re: RTL8723BE performance regression
> >> >> >
> >> >> > On Tue, Apr 3, 2018 at 7:51 PM, Larry Finger 
> >> >> >  wrote:
> >> >> > > On 04/03/2018 09:37 PM, João Paulo Rechi Vita wrote:
> >> >> > >>
> >> >> > >> On Tue, Apr 3, 2018 at 7:28 PM, Larry Finger 
> >> >> > >> 
> >> >> > >> wrote:
> >> >> > >>
> >> >> > >> (...)
> >> >> > >>
> >> >> > >>> As the antenna selection code changes affected your first 
> >> >> > >>> bisection, do
> >> >> > >>> you
> >> >> > >>> have one of those HP laptops with only one antenna and the 
> >> >> > >>> incorrect
> >> >> > >>> coding
> >> >> > >>> in the FUSE?
> >> >> > >>
> >> >> > >>
> >> >> > >> Yes, that is why I've been passing ant_sel=1 during my tests -- 
> >> >> > >> this
> >> >> > >> was needed to achieve a good performance in the past, before this
> >> >> > >> regression. I've also opened the laptop chassis and confirmed the
> >> >> > >> antenna cable is plugged to the connector labeled with "1" on the
> >> >> > >> card.
> >> >> > >>
> >> >> > >>> If so, please make sure that you still have the same signal
> >> >> > >>> strength for good and bad cases. I have tried to keep the driver 
> >> >> > >>> and the
> >> >> > >>> btcoex code in sync, but there may be some combinations of antenna
> >> >> > >>> configuration and FUSE contents that cause the code to fail.
> >> >> > >>>
> >> >> > >>
> >> >> > >> What is the recommended way to monitor the signal strength?
> >> >> > >
> >> >> > >
> >> >> > > The btcoex code is developed for multiple platforms by a different 
> >> >> > > group
> >> >> > > than the Linux driver. I think they made a change that caused 
> >> >> > > ant_sel to
> >> >> > > switch from 1 to 2. At least numerous comments at
> >> >> > > github.com/lwfinger/rtlwifi_new claimed they needed to make that 
> >> >> > > change.
> >> >> > >
> >> >> > > Mhy recommended method is to verify the wifi device name with "iw 
> >> >> > > dev". Then
> >> >> > > using that device
> >> >> > >
> >> >> > > sudo iw dev  scan | egrep "SSID|signal"
> >> >> > >
> >> >> >
> >> >> > I have confirmed that the performance regression is indeed tied to
> >> >> > signal strength: on the good cases signal was between -16 and -8 dBm,
> >> >> > whereas in bad cases signal was always between -50 to - 40 dBm. I've
> >> >> > also switched to testing bandwidth in controlled LAN environment using
> >> >> > iperf3, as suggested by Steve deRosier, with the DUT being the only
> >> >> > machine connected to the 2.4 GHz radio and the machine running the
> >> >> > iperf3 server connected via ethernet.
> >> >> >
> >> >>
> >> >> We have new experimental results in commit af8a41cccf8f46 ("rtlwifi: 
> >> >> cleanup
> >> >> 8723be ant_sel definition"). You can use the above commit and do the 
> >> >> same
> >> >> experiments (with ant_sel=0, 1 and 2) in your side, and then share your 
> >> >> results.
> >> >> Since performance is tied to signal strength, you can only share signal 
> >> >> strength.
> >> >>
> >> >
> >> > Please pay attention to cold reboot once ant_sel is changed.
> >> >
> >>
> >> I've tested the commit mentioned above and it fixes the problem on top
> >> of v4.16 (in addition to the latest wireless-drivers-next also been
> >> fixed as it already contains such commit). On v4.15, we also need the
> >> following commits before "af8a41cccf8f rtlwifi: cleanup 8723be ant_sel
> >> definition" to have a good performance again:
> >>
> >>   874e837d67d0 rtlwifi: fill FW version and subversion
> >>   a44709bba70f rtlwifi: btcoex: Add power_on_setting routine
> >>   40d9dd4f1c5d rtlwifi: btcoex: Remove global variables from btcoex
> >
> > v4.15 isn't longterm version and had been EOL.
> >
> 
> Right, but this is a performace regression in comparison to v4.11, so
> if "af8a41cccf8f rtlwifi: cleanup 8723be ant_sel definition" is marked
> for stable, shouldn't these other patches be brought as well? All
> releases since v4.11 are probably affected, but honestly I don't have
> a strong understanding of how the stable trees operate in situations
> like this.
> 

see below.

> >>
> >> Surprisingly, it seems forcing ant_sel=1 is not needed anymore on
> >> these machines, as the

CONFIG_KCOV causing crash in svm_vcpu_run()

2018-05-13 Thread Eric Biggers

With CONFIG_KCOV=y and an AMD processor, running the following program crashes
the kernel with no output (I'm testing in a VM, so it's using nested
virtualization):

#include 
#include 
#include 

int main()
{
int dev, vm, cpu;
char page[4096] __attribute__((aligned(4096))) = { 0 };
struct kvm_userspace_memory_region memreg = {
.memory_size = 4096,
.userspace_addr = (unsigned long)page,
};
dev = open("/dev/kvm", O_RDONLY);
vm = ioctl(dev, KVM_CREATE_VM, 0);
cpu = ioctl(vm, KVM_CREATE_VCPU, 0);
ioctl(vm, KVM_SET_USER_MEMORY_REGION, );
ioctl(cpu, KVM_RUN, 0);
}

It bisects down to commit b2ac58f90540e39 ("KVM/SVM: Allow direct access to
MSR_IA32_SPEC_CTRL").  The bug is apparently that due to the new code for
managing the SPEC_CTRL MSR, __sanitizer_cov_trace_pc() is being called from
svm_vcpu_run() before the host's MSR_GS_BASE has been restored, which causes a
crash somehow.  The following patch fixes it, though I don't know that it's the
right solution; maybe KCOV should be disabled in the function instead, or maybe
there's a more fundamental problem.  What do people think?

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1fc05e428aba8..d35ef241e66d8 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5652,6 +5652,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
);
 
+#ifdef CONFIG_X86_64
+   wrmsrl(MSR_GS_BASE, svm->host.gs_base);
+#else
+   loadsegment(fs, svm->host.fs);
+#ifndef CONFIG_X86_32_LAZY_GS
+   loadsegment(gs, svm->host.gs);
+#endif
+#endif
+
/*
 * We do not use IBRS in the kernel. If this vCPU has used the
 * SPEC_CTRL MSR it may have left it on; save the value and
@@ -5676,15 +5685,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
/* Eliminate branch target predictions from guest mode */
vmexit_fill_RSB();
 
-#ifdef CONFIG_X86_64
-   wrmsrl(MSR_GS_BASE, svm->host.gs_base);
-#else
-   loadsegment(fs, svm->host.fs);
-#ifndef CONFIG_X86_32_LAZY_GS
-   loadsegment(gs, svm->host.gs);
-#endif
-#endif
-
reload_tss(vcpu);
 
local_irq_disable();

Re: CONFIG_KCOV causing crash in svm_vcpu_run()

2018-05-13 Thread Eric Biggers

Sorry, messed up address for KVM mailing list.  See message below.

On Sun, May 13, 2018 at 08:00:07PM -0700, Eric Biggers wrote:
> With CONFIG_KCOV=y and an AMD processor, running the following program crashes
> the kernel with no output (I'm testing in a VM, so it's using nested
> virtualization):
> 
>   #include 
>   #include 
>   #include 
> 
>   int main()
>   {
>   int dev, vm, cpu;
>   char page[4096] __attribute__((aligned(4096))) = { 0 };
>   struct kvm_userspace_memory_region memreg = {
>   .memory_size = 4096,
>   .userspace_addr = (unsigned long)page,
>   };
>   dev = open("/dev/kvm", O_RDONLY);
>   vm = ioctl(dev, KVM_CREATE_VM, 0);
>   cpu = ioctl(vm, KVM_CREATE_VCPU, 0);
>   ioctl(vm, KVM_SET_USER_MEMORY_REGION, );
>   ioctl(cpu, KVM_RUN, 0);
>   }
> 
> It bisects down to commit b2ac58f90540e39 ("KVM/SVM: Allow direct access to
> MSR_IA32_SPEC_CTRL").  The bug is apparently that due to the new code for
> managing the SPEC_CTRL MSR, __sanitizer_cov_trace_pc() is being called from
> svm_vcpu_run() before the host's MSR_GS_BASE has been restored, which causes a
> crash somehow.  The following patch fixes it, though I don't know that it's 
> the
> right solution; maybe KCOV should be disabled in the function instead, or 
> maybe
> there's a more fundamental problem.  What do people think?
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 1fc05e428aba8..d35ef241e66d8 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -5652,6 +5652,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>  #endif
>   );
>  
> +#ifdef CONFIG_X86_64
> + wrmsrl(MSR_GS_BASE, svm->host.gs_base);
> +#else
> + loadsegment(fs, svm->host.fs);
> +#ifndef CONFIG_X86_32_LAZY_GS
> + loadsegment(gs, svm->host.gs);
> +#endif
> +#endif
> +
>   /*
>* We do not use IBRS in the kernel. If this vCPU has used the
>* SPEC_CTRL MSR it may have left it on; save the value and
> @@ -5676,15 +5685,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>   /* Eliminate branch target predictions from guest mode */
>   vmexit_fill_RSB();
>  
> -#ifdef CONFIG_X86_64
> - wrmsrl(MSR_GS_BASE, svm->host.gs_base);
> -#else
> - loadsegment(fs, svm->host.fs);
> -#ifndef CONFIG_X86_32_LAZY_GS
> - loadsegment(gs, svm->host.gs);
> -#endif
> -#endif
> -
>   reload_tss(vcpu);
>  
>   local_irq_disable();

Re: [PATCH] rcu: Report a quiescent state when it's exactly in the state

2018-05-13 Thread Byungchul Park




On 2018-05-12 오전 7:41, Joel Fernandes wrote:

On Fri, May 11, 2018 at 09:17:46AM -0700, Paul E. McKenney wrote:

On Fri, May 11, 2018 at 09:57:54PM +0900, Byungchul Park wrote:

Hello folks,

I think I wrote the title in a misleading way.

Please change the title to something else such as,
"rcu: Report a quiescent state when it's in the state" or,
"rcu: Add points reporting quiescent states where proper" or so on.

On 2018-05-11 오후 5:30, Byungchul Park wrote:

We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
is called, no matter whether it actually be scheduled or not. However,
it currently doesn't report the quiescent state when the task enters
into __schedule() as it's called with preempt = true. So make it report
the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
called.

And in TINY_RCU, even though the quiescent state of rcu_bh also should
be reported when the tick interrupt comes from user, it doesn't. So make
it reported.

Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
reported when the tick interrupt comes from not only user but also idle,
as an extended quiescent state.

Signed-off-by: Byungchul Park 
---
  include/linux/rcupdate.h | 4 ++--
  kernel/rcu/tiny.c| 6 +++---
  kernel/rcu/tree.c| 4 ++--
  3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index ee8cf5fc..7432261 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
   */
  #define cond_resched_tasks_rcu_qs() \
  do { \
-   if (!cond_resched()) \
-   rcu_note_voluntary_context_switch_lite(current); \
+   rcu_note_voluntary_context_switch_lite(current); \
+   cond_resched(); \


Ah, good point.

Peter, I have to ask...  Why is "cond_resched()" considered a preemption
while "schedule()" is not?


Infact something interesting I inferred from the __schedule loop related to
your question:

switch_count can either be set to prev->invcsw or prev->nvcsw. If we can
assume that switch_count reflects whether the context switch is involuntary
or voluntary,
   
task-running-state	preempt		switch_count

0 (running) 1   involuntary
0   0   involuntary
1   0   voluntary
1   1   involuntary

According to the above table, both the task's running state and the preempt
parameter to __schedule should be used together to determine if the switch is
a voluntary one or not.

So this code in rcu_note_context_switch should really be:
if (!preempt && !(current->state & TASK_RUNNING))
rcu_note_voluntary_context_switch_lite(current);

According to the above table, cond_resched always classifies as an
involuntary switch which makes sense to me. Even though cond_resched is


Hello guys,

The classification for nivcsw/nvcsw used in scheduler core, Joel, you
showed us is different from that used in when we distinguish between
non preemption/voluntary preemption/preemption/full and so on, even
they use the same word, "voluntary" though.

The name, rcu_note_voluntary_context_switch_lite() used in RCU has
a lot to do with the latter, the term of preemption. Furthermore, I
think the function should be called even when calling schedule() for
sleep as well. I think it would be better to change the function
name to something else to prevent confusing, it's up to Paul tho. :)


explicitly called, its still sort of involuntary in the sense its not called
into the scheduler for sleeping, but rather for seeing if something else can
run instead (a preemption point). Infact none of the task deactivation in the
__schedule loop will run if cond_resched is used.

I agree that if schedule was called directly but with TASK_RUNNING=1, then
that could probably be classified an involuntary switch too...

Also since we're deciding to call rcu_note_voluntary_context_switch_lite
unconditionally, then IMO this comment on that macro:

/*
  * Note a voluntary context switch for RCU-tasks benefit.  This is a
  * macro rather than an inline function to avoid #include hell.
  */
  #ifdef CONFIG_TASKS_RCU
  #define rcu_note_voluntary_context_switch_lite(t)

Should be changed to:

/*
  * Note a attempt to perform a voluntary context switch for RCU-tasks
  * benefit.  This is called even in situations where a context switch
  * didn't really happen even though it was requested. This is a
  * macro rather than an inline function to avoid #include hell.
  */
  #ifdef CONFIG_TASKS_RCU
  #define rcu_note_voluntary_context_switch_lite(t)

Right?

Correct me if I'm wrong about anything, thanks,

- Joel




--
Thanks,
Byungchul

Re: [PATCH -resend 01/27] linkage: new macros for assembler symbols

2018-05-13 Thread Randy Dunlap

On 05/10/2018 01:06 AM, Jiri Slaby wrote:

> ---
>  Documentation/asm-annotations.rst | 218 
>  arch/x86/include/asm/linkage.h|  10 +-
>  include/linux/linkage.h   | 257 
> --
>  3 files changed, 475 insertions(+), 10 deletions(-)
>  create mode 100644 Documentation/asm-annotations.rst
> 
> diff --git a/Documentation/asm-annotations.rst 
> b/Documentation/asm-annotations.rst
> new file mode 100644
> index ..3e9b426347f0
> --- /dev/null
> +++ b/Documentation/asm-annotations.rst
> @@ -0,0 +1,218 @@
> +Assembler Annotations
> +=
> +
> +Copyright (c) 2017 Jiri Slaby

[snip]

> +This is not only important for debugging purposes. When we have properly
> +marked objects like this, we can run tools on them and let the tools generate
> +more useful information. In particular, on properly marked objects, we can 
> run
> +``objtool`` and let it check and fix the object if needed. Currently, it can
> +report missing frame pointer setup/destruction in functions. It can also
> +automatically generate annotations for *ORC unwinder* (cf.
> +) for most code. Both of this is

Both of these are

> +especially important to support reliable stack traces which are in turn
> +necessary for *Kernel live patching* (see
> +).
> +
> +Caveat and Discussion
> +-
> +As one might realize, there were only three macros previously. That is indeed
> +insufficient to cover all the combinations of cases:
> +
> +* standard/non-standard function
> +* code/data
> +* global/local symbol
> +
> +We had a discussion_ and instead of extending the current ``ENTRY/END*``
> +macros, it was decided that we shoould introduce brand new macros instead::

  should

> +
> +So how about using macro names that actually show the purpose, instead
> +of importing all the crappy, historic, essentially randomly chosen
> +debug symbol macro names from the binutils and older kernels?
> +
> +.. _discussion: https://marc.info/?i=20170217104757.28588-1-jslaby%40suse.cz
> +
> +Macros Description
> +--
> +
> +The new macros are prefixed with the ``SYM_`` prefix and can be divided into
> +three main groups:
> +
> +1. ``SYM_FUNC_*`` -- to annotate C-like functions. This means functions with
> +   standard C calling conventions, i.e. the stack contains a return address 
> at
> +   the predefined place and a return from the function can happen in a
> +   standard way. When frame pointers are enabled, save/restore of frame
> +   pointer shall happen at the start/end of a function, respectively, too.
> +
> +   Checking tools like ``objtool`` should ensure such marked functions 
> conform
> +   to these rules. The tools can also easily annotate these functions with
> +   debugging information (like *ORC data*) automatically.
> +
> +2. ``SYM_CODE_*`` -- special functions called with special stack. Be it
> +   interrupt handlers with special stack content, trampolines, or startup
> +   functions.
> +
> +   Checking tools mostly ignore checking of these functions. But some debug
> +   information still can be generated automatically. For correct debug data,
> +   this code needs hints like ``UNWIND_HINT_REGS`` provided by developers.
> +
> +3. ``SYM_DATA*`` -- obviosly data belonging to ``.data`` sections and not to

   obviously

> +   ``.text``. Data do not contain instructions, so they have to be treated
> +   specially by the tools: they should not treat the bytes as instructions,
> +   neither assign any debug information to them.

  nor assign

> +
> +Instruction Macros
> +~~
> +This section covers ``SYM_FUNC_*`` and ``SYM_CODE_*`` enumerated above.
> +

[snip]

> +
> +Data Macros
> +~~~
> +Similar to instructions, we have a couple of macros to describe data in the
> +assembly. Again, they help debuggers to understand the layout of the 
> resulting
> +object files.
> +
> +* ``SYM_DATA_START`` and ``SYM_DATA_START_LOCAL`` mark the start of some data
> +  and shall be in couple with either ``SYM_DATA_END``, or

(maybe:) and shall be used in conjunction with either

> +  ``SYM_DATA_END_LABEL``. The latter adds also a label to the end, so that
> +  people can use ``lstack`` and (local) ``lstack_end`` in the following
> +  example::
> +
> +SYM_DATA_START_LOCAL(lstack)
> +.skip 4096
> +SYM_DATA_END_LABEL(lstack, SYM_L_LOCAL, lstack_end)
> +
> +* ``SYM_DATA`` and ``SYM_DATA_LOCAL`` are variants for simple, mostly 
> one-line
> +  data::
> +
> +SYM_DATA(HEAP, .long rm_heap)
> +SYM_DATA(heap_end, .long rm_stack)
> +
> +  In the end, they expand to ``SYM_DATA_START`` with ``SYM_DATA_END``
> +  internally.
> +
> +Support Macros
> +~~
> +All the above reduce themselves to some invocation of ``SYM_START``,
> +``SYM_END``, or ``SYM_ENTRY`` at last. Normally,

Re: [PATCH] rcu: Report a quiescent state when it's exactly in the state

2018-05-13 Thread Byungchul Park




On 2018-05-13 오전 2:26, Steven Rostedt wrote:

On Sat, 12 May 2018 07:41:19 -0700
"Paul E. McKenney"  wrote:


Don't get me wrong, this discussion was quite useful to me.  We probably
need to at least change the comments, and perhaps the code as well.  But
I agree that we need input from Peter and Steven to make much more forward
progress.


It's the weekend so I skimmed more than read this thread, but I will
just add this.

The table Joel posted is interesting, and perhaps we should keep things
consistent with that. But that said, with respect to task-RCU, as
nothing on a trampoline should ever call cond_resched() (and perhaps I
should add code in lockdep that verifies this), we just want a
quiescent state that tells us that the task has left the trampoline. A
cond_resched() should be one of those points that does.

It really has nothing to do with scheduling or preemption. The issue is
that if a task is on a trampoline and gets preempted, there's no
knowing when it is off that trampoline where we can free it. We need to
have places in the kernel that we know is a quiescent state to move
task-RCU forward. cond_resched() seems to be one of them. schedule
itself can not be, because it can be called from an interrupt preempting
a task while it is on the trampoline.


Exactly. I think Steven explained how we should consider them exactly.


-- Steve

--
Thanks,
Byungchul

[PATCH 02/10] autofs4 - use autofs instead of autofs4 everywhere

2018-05-13 Thread Ian Kent

Update naming within autofs source to be consistent by changing occurrences
of autofs4 to autofs.

Signed-off-by: Ian Kent 
---
 fs/autofs4/autofs_i.h  |   88 
 fs/autofs4/dev-ioctl.c |   18 ++-
 fs/autofs4/expire.c|  132 ---
 fs/autofs4/init.c  |   12 +-
 fs/autofs4/inode.c |   48 -
 fs/autofs4/root.c  |  271 
 fs/autofs4/symlink.c   |   16 +--
 fs/autofs4/waitq.c |   53 +
 8 files changed, 319 insertions(+), 319 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 01636f3945d5..9110b66c7ef1 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -122,44 +122,44 @@ struct autofs_sb_info {
struct rcu_head rcu;
 };
 
-static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
+static inline struct autofs_sb_info *autofs_sbi(struct super_block *sb)
 {
return (struct autofs_sb_info *)(sb->s_fs_info);
 }
 
-static inline struct autofs_info *autofs4_dentry_ino(struct dentry *dentry)
+static inline struct autofs_info *autofs_dentry_ino(struct dentry *dentry)
 {
return (struct autofs_info *)(dentry->d_fsdata);
 }
 
-/* autofs4_oz_mode(): do we see the man behind the curtain?  (The
+/* autofs_oz_mode(): do we see the man behind the curtain?  (The
  * processes which do manipulations for us in user space sees the raw
  * filesystem without "magic".)
  */
-static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
+static inline int autofs_oz_mode(struct autofs_sb_info *sbi)
 {
return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
 }
 
-struct inode *autofs4_get_inode(struct super_block *, umode_t);
-void autofs4_free_ino(struct autofs_info *);
+struct inode *autofs_get_inode(struct super_block *, umode_t);
+void autofs_free_ino(struct autofs_info *);
 
 /* Expiration */
-int is_autofs4_dentry(struct dentry *);
-int autofs4_expire_wait(const struct path *path, int rcu_walk);
-int autofs4_expire_run(struct super_block *, struct vfsmount *,
-  struct autofs_sb_info *,
-  struct autofs_packet_expire __user *);
-int autofs4_do_expire_multi(struct super_block *sb, struct vfsmount *mnt,
-   struct autofs_sb_info *sbi, int when);
-int autofs4_expire_multi(struct super_block *, struct vfsmount *,
-struct autofs_sb_info *, int __user *);
-struct dentry *autofs4_expire_direct(struct super_block *sb,
-struct vfsmount *mnt,
-struct autofs_sb_info *sbi, int how);
-struct dentry *autofs4_expire_indirect(struct super_block *sb,
-  struct vfsmount *mnt,
-  struct autofs_sb_info *sbi, int how);
+int is_autofs_dentry(struct dentry *);
+int autofs_expire_wait(const struct path *path, int rcu_walk);
+int autofs_expire_run(struct super_block *, struct vfsmount *,
+ struct autofs_sb_info *,
+ struct autofs_packet_expire __user *);
+int autofs_do_expire_multi(struct super_block *sb, struct vfsmount *mnt,
+  struct autofs_sb_info *sbi, int when);
+int autofs_expire_multi(struct super_block *, struct vfsmount *,
+   struct autofs_sb_info *, int __user *);
+struct dentry *autofs_expire_direct(struct super_block *sb,
+   struct vfsmount *mnt,
+   struct autofs_sb_info *sbi, int how);
+struct dentry *autofs_expire_indirect(struct super_block *sb,
+ struct vfsmount *mnt,
+ struct autofs_sb_info *sbi, int how);
 
 /* Device node initialization */
 
@@ -168,11 +168,11 @@ void autofs_dev_ioctl_exit(void);
 
 /* Operations structures */
 
-extern const struct inode_operations autofs4_symlink_inode_operations;
-extern const struct inode_operations autofs4_dir_inode_operations;
-extern const struct file_operations autofs4_dir_operations;
-extern const struct file_operations autofs4_root_operations;
-extern const struct dentry_operations autofs4_dentry_operations;
+extern const struct inode_operations autofs_symlink_inode_operations;
+extern const struct inode_operations autofs_dir_inode_operations;
+extern const struct file_operations autofs_dir_operations;
+extern const struct file_operations autofs_root_operations;
+extern const struct dentry_operations autofs_dentry_operations;
 
 /* VFS automount flags management functions */
 static inline void __managed_dentry_set_managed(struct dentry *dentry)
@@ -201,9 +201,9 @@ static inline void managed_dentry_clear_managed(struct 
dentry *dentry)
 
 /* Initializing function */
 
-int autofs4_fill_super(struct super_block *, void *, int);
-struct autofs_info *autofs4_new_ino(struct autofs_sb_info *);
-void autofs4_clean_ino(struct

[PATCH 07/10] autofs - delete fs/autofs4 source files

2018-05-13 Thread Ian Kent

Delete the now unused autofs4 module files.

Signed-off-by: Ian Kent 
---
 fs/autofs4/autofs_i.h  |  273 --
 fs/autofs4/dev-ioctl.c |  761 ---
 fs/autofs4/expire.c|  632 
 fs/autofs4/init.c  |   48 --
 fs/autofs4/inode.c |  375 ---
 fs/autofs4/root.c  |  942 
 fs/autofs4/symlink.c   |   29 -
 fs/autofs4/waitq.c |  559 
 8 files changed, 3619 deletions(-)
 delete mode 100644 fs/autofs4/autofs_i.h
 delete mode 100644 fs/autofs4/dev-ioctl.c
 delete mode 100644 fs/autofs4/expire.c
 delete mode 100644 fs/autofs4/init.c
 delete mode 100644 fs/autofs4/inode.c
 delete mode 100644 fs/autofs4/root.c
 delete mode 100644 fs/autofs4/symlink.c
 delete mode 100644 fs/autofs4/waitq.c

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
deleted file mode 100644
index 9110b66c7ef1..
--- a/fs/autofs4/autofs_i.h
+++ /dev/null
@@ -1,273 +0,0 @@
-/*
- *  Copyright 1997-1998 Transmeta Corporation - All Rights Reserved
- *  Copyright 2005-2006 Ian Kent 
- *
- * This file is part of the Linux kernel and is made available under
- * the terms of the GNU General Public License, version 2, or at your
- * option, any later version, incorporated herein by reference.
- */
-
-/* Internal header file for autofs */
-
-#include 
-#include 
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-/* This is the range of ioctl() numbers we claim as ours */
-#define AUTOFS_IOC_FIRST AUTOFS_IOC_READY
-#define AUTOFS_IOC_COUNT 32
-
-#define AUTOFS_DEV_IOCTL_IOC_FIRST (AUTOFS_DEV_IOCTL_VERSION)
-#define AUTOFS_DEV_IOCTL_IOC_COUNT \
-   (AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD - AUTOFS_DEV_IOCTL_VERSION_CMD)
-
-#ifdef pr_fmt
-#undef pr_fmt
-#endif
-#define pr_fmt(fmt) KBUILD_MODNAME ":pid:%d:%s: " fmt, current->pid, __func__
-
-/*
- * Unified info structure.  This is pointed to by both the dentry and
- * inode structures.  Each file in the filesystem has an instance of this
- * structure.  It holds a reference to the dentry, so dentries are never
- * flushed while the file exists.  All name lookups are dealt with at the
- * dentry level, although the filesystem can interfere in the validation
- * process.  Readdir is implemented by traversing the dentry lists.
- */
-struct autofs_info {
-   struct dentry   *dentry;
-   struct inode*inode;
-
-   int flags;
-
-   struct completion expire_complete;
-
-   struct list_head active;
-   int active_count;
-
-   struct list_head expiring;
-
-   struct autofs_sb_info *sbi;
-   unsigned long last_used;
-   atomic_t count;
-
-   kuid_t uid;
-   kgid_t gid;
-};
-
-#define AUTOFS_INF_EXPIRING(1<<0) /* dentry in the process of expiring */
-#define AUTOFS_INF_WANT_EXPIRE (1<<1) /* the dentry is being considered
-   * for expiry, so RCU_walk is
-   * not permitted.  If it progresses to
-   * actual expiry attempt, the flag is
-   * not cleared when EXPIRING is set -
-   * in that case it gets cleared only
-   * when it comes to clearing EXPIRING.
-   */
-#define AUTOFS_INF_PENDING (1<<2) /* dentry pending mount */
-
-struct autofs_wait_queue {
-   wait_queue_head_t queue;
-   struct autofs_wait_queue *next;
-   autofs_wqt_t wait_queue_token;
-   /* We use the following to see what we are waiting for */
-   struct qstr name;
-   u32 dev;
-   u64 ino;
-   kuid_t uid;
-   kgid_t gid;
-   pid_t pid;
-   pid_t tgid;
-   /* This is for status reporting upon return */
-   int status;
-   unsigned int wait_ctr;
-};
-
-#define AUTOFS_SBI_MAGIC 0x6d4a556d
-
-struct autofs_sb_info {
-   u32 magic;
-   int pipefd;
-   struct file *pipe;
-   struct pid *oz_pgrp;
-   int catatonic;
-   int version;
-   int sub_version;
-   int min_proto;
-   int max_proto;
-   unsigned long exp_timeout;
-   unsigned int type;
-   struct super_block *sb;
-   struct mutex wq_mutex;
-   struct mutex pipe_mutex;
-   spinlock_t fs_lock;
-   struct autofs_wait_queue *queues; /* Wait queue pointer */
-   spinlock_t lookup_lock;
-   struct list_head active_list;
-   struct list_head expiring_list;
-   struct rcu_head rcu;
-};
-
-static inline struct autofs_sb_info *autofs_sbi(struct super_block *sb)
-{
-   return (struct autofs_sb_info *)(sb->s_fs_info);
-}
-
-static inline struct autofs_info *autofs_dentry_ino(struct dentry *dentry)
-{
-

[PATCH 10/10] autofs - update MAINTAINERS entry for autofs

2018-05-13 Thread Ian Kent

Update the autofs entry in MAINTAINERS to reflect the rename of
autofs4 to autofs.

Signed-off-by: Ian Kent 
---
 MAINTAINERS |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 58b9861ccf99..6189ff91fda7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7652,11 +7652,11 @@ W:  https://linuxtv.org
 S: Maintained
 F: drivers/media/radio/radio-keene*
 
-KERNEL AUTOMOUNTER v4 (AUTOFS4)
+KERNEL AUTOMOUNTER
 M: Ian Kent 
 L: aut...@vger.kernel.org
 S: Maintained
-F: fs/autofs4/
+F: fs/autofs/
 
 KERNEL BUILD + files below scripts/ (unless maintained elsewhere)
 M: Masahiro Yamada

[PATCH 05/10] autofs - update fs/autofs4/Kconfig

2018-05-13 Thread Ian Kent

Update Kconfig and add a depricated warning.

Signed-off-by: Ian Kent 
---
 fs/autofs4/Kconfig |   32 ++--
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/autofs4/Kconfig b/fs/autofs4/Kconfig
index 44727bf18297..53bc592a250d 100644
--- a/fs/autofs4/Kconfig
+++ b/fs/autofs4/Kconfig
@@ -1,5 +1,6 @@
 config AUTOFS4_FS
-   tristate "Kernel automounter version 4 support (also supports v3)"
+   tristate "Kernel automounter version 4 support (also supports v3 and 
v5)"
+   default n
help
  The automounter is a tool to automatically mount remote file systems
  on demand. This implementation is partially kernel-based to reduce
@@ -7,14 +8,25 @@ config AUTOFS4_FS
  automounter (amd), which is a pure user space daemon.
 
  To use the automounter you need the user-space tools from
- ; you also
- want to answer Y to "NFS file system support", below.
+ ; you also want
+ to answer Y to "NFS file system support", below.
 
- To compile this support as a module, choose M here: the module will be
- called autofs4.  You will need to add "alias autofs autofs4" to your
- modules configuration file.
+ This module is in the process of being renamed from autofs4 to
+ autofs. Since autofs is now the only module that provides the
+ autofs file system the module is not version 4 specific.
 
- If you are not a part of a fairly large, distributed network or
- don't have a laptop which needs to dynamically reconfigure to the
- local network, you probably do not need an automounter, and can say
- N here.
+ The autofs4 module is now built from the source located in
+ fs/autofs. The autofs4 directory and its configuration entry
+ will be removed two kernel versions from the inclusion of this
+ change.
+
+ Changes that will need to be made should be limited to:
+ - source include statments should be changed from autofs_fs4.h to
+   autofs_fs.h since these two header files have been merged.
+ - user space scripts that manually load autofs4.ko should be
+   changed to load autofs.ko. But since the module directory name
+   and the module name are the same as the file system name there
+   is no need to manually load module.
+ - any "alias autofs autofs4" will need to be removed.
+
+ Please configure AUTOFS_FS instead of AUTOFS4_FS from now on.

[PATCH 09/10] autofs - use autofs instead of autofs4 in documentation

2018-05-13 Thread Ian Kent

Finally remove autofs4 references in the filesystems documentation.

Signed-off-by: Ian Kent 
---
 Documentation/filesystems/00-INDEX |4 ++--
 Documentation/filesystems/autofs-mount-control.txt |8 
 Documentation/filesystems/autofs.txt   |   10 +-
 Documentation/filesystems/automount-support.txt|2 +-
 Documentation/filesystems/path-lookup.md   |2 +-
 5 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/Documentation/filesystems/00-INDEX 
b/Documentation/filesystems/00-INDEX
index b7bd6c9009cc..a8bd4af7fbce 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -10,8 +10,8 @@ afs.txt
- info and examples for the distributed AFS (Andrew File System) fs.
 affs.txt
- info and mount options for the Amiga Fast File System.
-autofs4-mount-control.txt
-   - info on device control operations for autofs4 module.
+autofs-mount-control.txt
+   - info on device control operations for autofs module.
 automount-support.txt
- information about filesystem automount support.
 befs.txt
diff --git a/Documentation/filesystems/autofs-mount-control.txt 
b/Documentation/filesystems/autofs-mount-control.txt
index e5177cb31a04..6eba86e1ac72 100644
--- a/Documentation/filesystems/autofs-mount-control.txt
+++ b/Documentation/filesystems/autofs-mount-control.txt
@@ -1,5 +1,5 @@
 
-Miscellaneous Device control operations for the autofs4 kernel module
+Miscellaneous Device control operations for the autofs kernel module
 
 
 The problem
@@ -164,7 +164,7 @@ possibility for future development due to the requirements 
of the
 message bus architecture.
 
 
-autofs4 Miscellaneous Device mount control interface
+autofs Miscellaneous Device mount control interface
 
 
 The control interface is opening a device node, typically /dev/autofs.
@@ -244,7 +244,7 @@ The device node ioctl operations implemented by this 
interface are:
 AUTOFS_DEV_IOCTL_VERSION
 
 
-Get the major and minor version of the autofs4 device ioctl kernel module
+Get the major and minor version of the autofs device ioctl kernel module
 implementation. It requires an initialized struct autofs_dev_ioctl as an
 input parameter and sets the version information in the passed in structure.
 It returns 0 on success or the error -EINVAL if a version mismatch is
@@ -254,7 +254,7 @@ detected.
 AUTOFS_DEV_IOCTL_PROTOVER_CMD and AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD
 --
 
-Get the major and minor version of the autofs4 protocol version understood
+Get the major and minor version of the autofs protocol version understood
 by loaded module. This call requires an initialized struct autofs_dev_ioctl
 with the ioctlfd field set to a valid autofs mount point descriptor
 and sets the requested version number in version field of struct args_protover
diff --git a/Documentation/filesystems/autofs.txt 
b/Documentation/filesystems/autofs.txt
index f10dd590f69f..373ad25852d3 100644
--- a/Documentation/filesystems/autofs.txt
+++ b/Documentation/filesystems/autofs.txt
@@ -30,15 +30,15 @@ key advantages:
 Context
 ---
 
-The "autofs4" filesystem module is only one part of an autofs system.
+The "autofs" filesystem module is only one part of an autofs system.
 There also needs to be a user-space program which looks up names
 and mounts filesystems.  This will often be the "automount" program,
-though other tools including "systemd" can make use of "autofs4".
+though other tools including "systemd" can make use of "autofs".
 This document describes only the kernel module and the interactions
 required with any user-space program.  Subsequent text refers to this
 as the "automount daemon" or simply "the daemon".
 
-"autofs4" is a Linux kernel module with provides the "autofs"
+"autofs" is a Linux kernel module with provides the "autofs"
 filesystem type.  Several "autofs" filesystems can be mounted and they
 can each be managed separately, or all managed by the same daemon.
 
@@ -215,7 +215,7 @@ of expiry.
 The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to
 the `umount` system call.  Unmounting with MNT_EXPIRE will fail unless
 a previous attempt had been made, and the filesystem has been inactive
-and untouched since that previous attempt.  autofs4 does not depend on
+and untouched since that previous attempt.  autofs does not depend on
 this but has its own internal tracking of whether filesystems were
 recently used.  This allows individual names in the autofs directory
 to expire separately.
@@ -415,7 +415,7 @@ which can be used to communicate directly with the autofs 
filesystem.
 It requires CAP_SYS_ADMIN for access.
 
 The `ioctl`s that can be used on this device are described in a separate
-document

[PATCH 04/10] autofs - create autofs Kconfig and Makefile

2018-05-13 Thread Ian Kent

Create Makefile and Kconfig for autofs module.

Signed-off-by: Ian Kent 
---
 fs/Kconfig |1 +
 fs/Makefile|1 +
 fs/autofs/Kconfig  |   20 
 fs/autofs/Makefile |7 +++
 4 files changed, 29 insertions(+)
 create mode 100644 fs/autofs/Kconfig
 create mode 100644 fs/autofs/Makefile

diff --git a/fs/Kconfig b/fs/Kconfig
index bc821a86d965..e712e62afe59 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -108,6 +108,7 @@ source "fs/notify/Kconfig"
 
 source "fs/quota/Kconfig"
 
+source "fs/autofs/Kconfig"
 source "fs/autofs4/Kconfig"
 source "fs/fuse/Kconfig"
 source "fs/overlayfs/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index c9375fd2c8c4..2e005525cc19 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -102,6 +102,7 @@ obj-$(CONFIG_AFFS_FS)   += affs/
 obj-$(CONFIG_ROMFS_FS) += romfs/
 obj-$(CONFIG_QNX4FS_FS)+= qnx4/
 obj-$(CONFIG_QNX6FS_FS)+= qnx6/
+obj-$(CONFIG_AUTOFS_FS)+= autofs/
 obj-$(CONFIG_AUTOFS4_FS)   += autofs4/
 obj-$(CONFIG_ADFS_FS)  += adfs/
 obj-$(CONFIG_FUSE_FS)  += fuse/
diff --git a/fs/autofs/Kconfig b/fs/autofs/Kconfig
new file mode 100644
index ..6a2064eb3b27
--- /dev/null
+++ b/fs/autofs/Kconfig
@@ -0,0 +1,20 @@
+config AUTOFS_FS
+   tristate "Kernel automounter support (supports v3, v4 and v5)"
+   default n
+   help
+  The automounter is a tool to automatically mount remote file systems
+  on demand. This implementation is partially kernel-based to reduce
+  overhead in the already-mounted case; this is unlike the BSD
+  automounter (amd), which is a pure user space daemon.
+
+  To use the automounter you need the user-space tools from
+  ; you also want
+  to answer Y to "NFS file system support", below.
+
+  To compile this support as a module, choose M here: the module will 
be
+  called autofs.
+
+  If you are not a part of a fairly large, distributed network or
+  don't have a laptop which needs to dynamically reconfigure to the
+  local network, you probably do not need an automounter, and can say
+  N here.
diff --git a/fs/autofs/Makefile b/fs/autofs/Makefile
new file mode 100644
index ..43fedde15c26
--- /dev/null
+++ b/fs/autofs/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the linux autofs-filesystem routines.
+#
+
+obj-$(CONFIG_AUTOFS_FS) += autofs.o
+
+autofs-objs := init.o inode.o root.o symlink.o waitq.o expire.o dev-ioctl.o

[PATCH 08/10] autofs - rename autofs documentation files

2018-05-13 Thread Ian Kent

There are two files in Documentation/filsystems that should now
use autofs rather than autofs4 in their names.

Signed-off-by: Ian Kent 
---
 Documentation/filesystems/autofs-mount-control.txt |  407 +++
 Documentation/filesystems/autofs.txt   |  529 
 .../filesystems/autofs4-mount-control.txt  |  407 ---
 Documentation/filesystems/autofs4.txt  |  529 
 4 files changed, 936 insertions(+), 936 deletions(-)
 create mode 100644 Documentation/filesystems/autofs-mount-control.txt
 create mode 100644 Documentation/filesystems/autofs.txt
 delete mode 100644 Documentation/filesystems/autofs4-mount-control.txt
 delete mode 100644 Documentation/filesystems/autofs4.txt

diff --git a/Documentation/filesystems/autofs-mount-control.txt 
b/Documentation/filesystems/autofs-mount-control.txt
new file mode 100644
index ..e5177cb31a04
--- /dev/null
+++ b/Documentation/filesystems/autofs-mount-control.txt
@@ -0,0 +1,407 @@
+
+Miscellaneous Device control operations for the autofs4 kernel module
+
+
+The problem
+===
+
+There is a problem with active restarts in autofs (that is to say
+restarting autofs when there are busy mounts).
+
+During normal operation autofs uses a file descriptor opened on the
+directory that is being managed in order to be able to issue control
+operations. Using a file descriptor gives ioctl operations access to
+autofs specific information stored in the super block. The operations
+are things such as setting an autofs mount catatonic, setting the
+expire timeout and requesting expire checks. As is explained below,
+certain types of autofs triggered mounts can end up covering an autofs
+mount itself which prevents us being able to use open(2) to obtain a
+file descriptor for these operations if we don't already have one open.
+
+Currently autofs uses "umount -l" (lazy umount) to clear active mounts
+at restart. While using lazy umount works for most cases, anything that
+needs to walk back up the mount tree to construct a path, such as
+getcwd(2) and the proc file system /proc//cwd, no longer works
+because the point from which the path is constructed has been detached
+from the mount tree.
+
+The actual problem with autofs is that it can't reconnect to existing
+mounts. Immediately one thinks of just adding the ability to remount
+autofs file systems would solve it, but alas, that can't work. This is
+because autofs direct mounts and the implementation of "on demand mount
+and expire" of nested mount trees have the file system mounted directly
+on top of the mount trigger directory dentry.
+
+For example, there are two types of automount maps, direct (in the kernel
+module source you will see a third type called an offset, which is just
+a direct mount in disguise) and indirect.
+
+Here is a master map with direct and indirect map entries:
+
+/-  /etc/auto.direct
+/test   /etc/auto.indirect
+
+and the corresponding map files:
+
+/etc/auto.direct:
+
+/automount/dparse/g6  budgie:/autofs/export1
+/automount/dparse/g1  shark:/autofs/export1
+and so on.
+
+/etc/auto.indirect:
+
+g1shark:/autofs/export1
+g6budgie:/autofs/export1
+and so on.
+
+For the above indirect map an autofs file system is mounted on /test and
+mounts are triggered for each sub-directory key by the inode lookup
+operation. So we see a mount of shark:/autofs/export1 on /test/g1, for
+example.
+
+The way that direct mounts are handled is by making an autofs mount on
+each full path, such as /automount/dparse/g1, and using it as a mount
+trigger. So when we walk on the path we mount shark:/autofs/export1 "on
+top of this mount point". Since these are always directories we can
+use the follow_link inode operation to trigger the mount.
+
+But, each entry in direct and indirect maps can have offsets (making
+them multi-mount map entries).
+
+For example, an indirect mount map entry could also be:
+
+g1  \
+   /shark:/autofs/export5/testing/test \
+   /s1  shark:/autofs/export/testing/test/s1 \
+   /s2  shark:/autofs/export5/testing/test/s2 \
+   /s1/ss1  shark:/autofs/export1 \
+   /s2/ss2  shark:/autofs/export2
+
+and a similarly a direct mount map entry could also be:
+
+/automount/dparse/g1 \
+/   shark:/autofs/export5/testing/test \
+/s1 shark:/autofs/export/testing/test/s1 \
+/s2 shark:/autofs/export5/testing/test/s2 \
+/s1/ss1 shark:/autofs/export2 \
+/s2/ss2 shark:/autofs/export2
+
+One of the issues with version 4 of autofs was that, when mounting an
+entry with a large number of offsets, possibly with nesting, we needed
+to mount and umount all of the offsets as a single unit. Not really a
+problem, except for people with a large number of offsets in map entries.
+This mechanism is used for the well known "hosts" map and we have seen
+cases (in 2.4) where the available

Re: [PATCH 01/10] autofs4 - merge auto_fs.h and auto_fs4.h

2018-05-13 Thread Al Viro

On Mon, May 14, 2018 at 11:03:50AM +0800, Ian Kent wrote:
> The autofs module has long since been removed so there's no need to have
> two separate include files for autofs.

Umm...  Why does fs/compat_ioctl.c need either include, actually?

> --- a/fs/compat_ioctl.c
> +++ b/fs/compat_ioctl.c
> @@ -39,7 +39,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 

AFAICS, we can just delete both.  Matter of fact, a *lot* of those includes are
pointless nowadays...

[PATCH 03/10] autofs - copy autofs4 to autofs

2018-05-13 Thread Ian Kent

Copy source files from the autofs4 directory to the autofs directory.

Signed-off-by: Ian Kent 
---
 fs/autofs/autofs_i.h  |  273 ++
 fs/autofs/dev-ioctl.c |  761 
 fs/autofs/expire.c|  632 +
 fs/autofs/init.c  |   48 ++
 fs/autofs/inode.c |  375 
 fs/autofs/root.c  |  942 +
 fs/autofs/symlink.c   |   29 ++
 fs/autofs/waitq.c |  559 +
 8 files changed, 3619 insertions(+)
 create mode 100644 fs/autofs/autofs_i.h
 create mode 100644 fs/autofs/dev-ioctl.c
 create mode 100644 fs/autofs/expire.c
 create mode 100644 fs/autofs/init.c
 create mode 100644 fs/autofs/inode.c
 create mode 100644 fs/autofs/root.c
 create mode 100644 fs/autofs/symlink.c
 create mode 100644 fs/autofs/waitq.c

diff --git a/fs/autofs/autofs_i.h b/fs/autofs/autofs_i.h
new file mode 100644
index ..9110b66c7ef1
--- /dev/null
+++ b/fs/autofs/autofs_i.h
@@ -0,0 +1,273 @@
+/*
+ *  Copyright 1997-1998 Transmeta Corporation - All Rights Reserved
+ *  Copyright 2005-2006 Ian Kent 
+ *
+ * This file is part of the Linux kernel and is made available under
+ * the terms of the GNU General Public License, version 2, or at your
+ * option, any later version, incorporated herein by reference.
+ */
+
+/* Internal header file for autofs */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* This is the range of ioctl() numbers we claim as ours */
+#define AUTOFS_IOC_FIRST AUTOFS_IOC_READY
+#define AUTOFS_IOC_COUNT 32
+
+#define AUTOFS_DEV_IOCTL_IOC_FIRST (AUTOFS_DEV_IOCTL_VERSION)
+#define AUTOFS_DEV_IOCTL_IOC_COUNT \
+   (AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD - AUTOFS_DEV_IOCTL_VERSION_CMD)
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+#define pr_fmt(fmt) KBUILD_MODNAME ":pid:%d:%s: " fmt, current->pid, __func__
+
+/*
+ * Unified info structure.  This is pointed to by both the dentry and
+ * inode structures.  Each file in the filesystem has an instance of this
+ * structure.  It holds a reference to the dentry, so dentries are never
+ * flushed while the file exists.  All name lookups are dealt with at the
+ * dentry level, although the filesystem can interfere in the validation
+ * process.  Readdir is implemented by traversing the dentry lists.
+ */
+struct autofs_info {
+   struct dentry   *dentry;
+   struct inode*inode;
+
+   int flags;
+
+   struct completion expire_complete;
+
+   struct list_head active;
+   int active_count;
+
+   struct list_head expiring;
+
+   struct autofs_sb_info *sbi;
+   unsigned long last_used;
+   atomic_t count;
+
+   kuid_t uid;
+   kgid_t gid;
+};
+
+#define AUTOFS_INF_EXPIRING(1<<0) /* dentry in the process of expiring */
+#define AUTOFS_INF_WANT_EXPIRE (1<<1) /* the dentry is being considered
+   * for expiry, so RCU_walk is
+   * not permitted.  If it progresses to
+   * actual expiry attempt, the flag is
+   * not cleared when EXPIRING is set -
+   * in that case it gets cleared only
+   * when it comes to clearing EXPIRING.
+   */
+#define AUTOFS_INF_PENDING (1<<2) /* dentry pending mount */
+
+struct autofs_wait_queue {
+   wait_queue_head_t queue;
+   struct autofs_wait_queue *next;
+   autofs_wqt_t wait_queue_token;
+   /* We use the following to see what we are waiting for */
+   struct qstr name;
+   u32 dev;
+   u64 ino;
+   kuid_t uid;
+   kgid_t gid;
+   pid_t pid;
+   pid_t tgid;
+   /* This is for status reporting upon return */
+   int status;
+   unsigned int wait_ctr;
+};
+
+#define AUTOFS_SBI_MAGIC 0x6d4a556d
+
+struct autofs_sb_info {
+   u32 magic;
+   int pipefd;
+   struct file *pipe;
+   struct pid *oz_pgrp;
+   int catatonic;
+   int version;
+   int sub_version;
+   int min_proto;
+   int max_proto;
+   unsigned long exp_timeout;
+   unsigned int type;
+   struct super_block *sb;
+   struct mutex wq_mutex;
+   struct mutex pipe_mutex;
+   spinlock_t fs_lock;
+   struct autofs_wait_queue *queues; /* Wait queue pointer */
+   spinlock_t lookup_lock;
+   struct list_head active_list;
+   struct list_head expiring_list;
+   struct rcu_head rcu;
+};
+
+static inline struct autofs_sb_info *autofs_sbi(struct super_block *sb)
+{
+   return (struct autofs_sb_info *)(sb->s_fs_info);
+}
+
+static inline struct autofs_info *autofs_dentry_ino(struct dentry *dentry)

[PATCH 01/10] autofs4 - merge auto_fs.h and auto_fs4.h

2018-05-13 Thread Ian Kent

The autofs module has long since been removed so there's no need to have
two separate include files for autofs.

Signed-off-by: Ian Kent 
---
 fs/autofs4/autofs_i.h |2 
 fs/compat_ioctl.c |1 
 include/uapi/linux/auto_fs.h  |  169 ++---
 include/uapi/linux/auto_fs4.h |  153 +
 4 files changed, 161 insertions(+), 164 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 4737615f0eaa..01636f3945d5 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -9,7 +9,7 @@
 
 /* Internal header file for autofs */
 
-#include 
+#include 
 #include 
 
 #include 
diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index ef80085ed564..b3e1768b636e 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -39,7 +39,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/include/uapi/linux/auto_fs.h b/include/uapi/linux/auto_fs.h
index 2a4432c7a4b4..e13eec3dfb2f 100644
--- a/include/uapi/linux/auto_fs.h
+++ b/include/uapi/linux/auto_fs.h
@@ -1,6 +1,8 @@
 /* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
 /*
- *   Copyright 1997 Transmeta Corporation - All Rights Reserved
+ * Copyright 1997 Transmeta Corporation - All Rights Reserved
+ * Copyright 1999-2000 Jeremy Fitzhardinge 
+ * Copyright 2005-2006,2013,2017-2018 Ian Kent 
  *
  * This file is part of the Linux kernel and is made available under
  * the terms of the GNU General Public License, version 2, or at your
@@ -8,7 +10,6 @@
  *
  * --- */
 
-
 #ifndef _UAPI_LINUX_AUTO_FS_H
 #define _UAPI_LINUX_AUTO_FS_H
 
@@ -18,13 +19,11 @@
 #include 
 #endif /* __KERNEL__ */
 
+#define AUTOFS_PROTO_VERSION   5
+#define AUTOFS_MIN_PROTO_VERSION   3
+#define AUTOFS_MAX_PROTO_VERSION   5
 
-/* This file describes autofs v3 */
-#define AUTOFS_PROTO_VERSION   3
-
-/* Range of protocol versions defined */
-#define AUTOFS_MAX_PROTO_VERSION   AUTOFS_PROTO_VERSION
-#define AUTOFS_MIN_PROTO_VERSION   AUTOFS_PROTO_VERSION
+#define AUTOFS_PROTO_SUBVERSION2
 
 /*
  * The wait_queue_token (autofs_wqt_t) is part of a structure which is passed
@@ -76,9 +75,155 @@ enum {
 #define AUTOFS_IOC_READY_IO(AUTOFS_IOCTL, AUTOFS_IOC_READY_CMD)
 #define AUTOFS_IOC_FAIL _IO(AUTOFS_IOCTL, AUTOFS_IOC_FAIL_CMD)
 #define AUTOFS_IOC_CATATONIC_IO(AUTOFS_IOCTL, AUTOFS_IOC_CATATONIC_CMD)
-#define AUTOFS_IOC_PROTOVER _IOR(AUTOFS_IOCTL, AUTOFS_IOC_PROTOVER_CMD, 
int)
-#define AUTOFS_IOC_SETTIMEOUT32 _IOWR(AUTOFS_IOCTL, AUTOFS_IOC_SETTIMEOUT_CMD, 
compat_ulong_t)
-#define AUTOFS_IOC_SETTIMEOUT   _IOWR(AUTOFS_IOCTL, AUTOFS_IOC_SETTIMEOUT_CMD, 
unsigned long)
-#define AUTOFS_IOC_EXPIRE   _IOR(AUTOFS_IOCTL, AUTOFS_IOC_EXPIRE_CMD, 
struct autofs_packet_expire)
+#define AUTOFS_IOC_PROTOVER _IOR(AUTOFS_IOCTL, \
+AUTOFS_IOC_PROTOVER_CMD, int)
+#define AUTOFS_IOC_SETTIMEOUT32 _IOWR(AUTOFS_IOCTL, \
+ AUTOFS_IOC_SETTIMEOUT_CMD, \
+ compat_ulong_t)
+#define AUTOFS_IOC_SETTIMEOUT   _IOWR(AUTOFS_IOCTL, \
+ AUTOFS_IOC_SETTIMEOUT_CMD, \
+ unsigned long)
+#define AUTOFS_IOC_EXPIRE   _IOR(AUTOFS_IOCTL, \
+AUTOFS_IOC_EXPIRE_CMD, \
+struct autofs_packet_expire)
+
+/* autofs version 4 and later definitions */
+
+/* Mask for expire behaviour */
+#define AUTOFS_EXP_IMMEDIATE   1
+#define AUTOFS_EXP_LEAVES  2
+
+#define AUTOFS_TYPE_ANY0U
+#define AUTOFS_TYPE_INDIRECT   1U
+#define AUTOFS_TYPE_DIRECT 2U
+#define AUTOFS_TYPE_OFFSET 4U
+
+static inline void set_autofs_type_indirect(unsigned int *type)
+{
+   *type = AUTOFS_TYPE_INDIRECT;
+}
+
+static inline unsigned int autofs_type_indirect(unsigned int type)
+{
+   return (type == AUTOFS_TYPE_INDIRECT);
+}
+
+static inline void set_autofs_type_direct(unsigned int *type)
+{
+   *type = AUTOFS_TYPE_DIRECT;
+}
+
+static inline unsigned int autofs_type_direct(unsigned int type)
+{
+   return (type == AUTOFS_TYPE_DIRECT);
+}
+
+static inline void set_autofs_type_offset(unsigned int *type)
+{
+   *type = AUTOFS_TYPE_OFFSET;
+}
+
+static inline unsigned int autofs_type_offset(unsigned int type)
+{
+   return (type == AUTOFS_TYPE_OFFSET);
+}
+
+static inline unsigned int autofs_type_trigger(unsigned int type)
+{
+   return (type == AUTOFS_TYPE_DIRECT || type == AUTOFS_TYPE_OFFSET);
+}
+
+/*
+ * This isn't really a type as we use it to say "no type set" to
+ * indicate we want to search for "any" mount in the
+ * autofs_dev_ioctl_ismountpoint() device ioctl function.
+ */

Re: [RFC][PATCH 06/10] tracing: Add trigger file for trace_markers tracefs/ftrace/print

2018-05-13 Thread Namhyung Kim

On Fri, May 11, 2018 at 03:49:33PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" 
> 
> Allow writing to the trace_markers file initiate triggers defined in
> tracefs/ftrace/print/trigger file. This will allow of user space to trigger
> the same type of triggers (including histograms) that the trace events use.
> 
> Cc: Tom Zanussi 
> Cc: Clark Williams 
> Cc: Karim Yaghmour 
> Cc: Brendan Gregg 
> Suggested-by: Joel Fernandes 
> Signed-off-by: Steven Rostedt (VMware) 
> ---

[SNIP]
> diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> index e3a658bac10f..de3fce14cd00 100644
> --- a/kernel/trace/trace_entries.h
> +++ b/kernel/trace/trace_entries.h
> @@ -230,7 +230,7 @@ FTRACE_ENTRY(bprint, bprint_entry,
>   FILTER_OTHER
>  );
>  
> -FTRACE_ENTRY(print, print_entry,
> +FTRACE_ENTRY_REG(print, print_entry,
>  
>   TRACE_PRINT,
>  
> @@ -242,7 +242,9 @@ FTRACE_ENTRY(print, print_entry,
>   F_printk("%ps: %s",
>(void *)__entry->ip, __entry->buf),
>  
> - FILTER_OTHER
> + FILTER_OTHER,
> +
> +  ftrace_event_register

I wonder this is still needed since you added __find_event_file()
which ignores the reg field.  Maybe I'm missing something..

Anyway it looks whitespace damaged.

Thanks,
Namhyung


>  );
>  
>  FTRACE_ENTRY(raw_data, raw_data_entry,
> diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> index d842f1eadfe5..45630a76ed3a 100644
> --- a/kernel/trace/trace_export.c
> +++ b/kernel/trace/trace_export.c
> @@ -14,6 +14,13 @@
>  
>  #include "trace_output.h"
>  
> +/* Stub function for events with triggers */
> +static int ftrace_event_register(struct trace_event_call *call,
> +  enum trace_reg type, void *data)
> +{
> + return 0;
> +}
> +
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM ftrace
>  
> -- 
> 2.17.0
> 
>

[PATCH 06/10] autofs - update fs/autofs4/Makefile

2018-05-13 Thread Ian Kent

Update Makefile to build from source in fs/autofs instead of
fs/autofs4.

Signed-off-by: Ian Kent 
---
 fs/autofs4/Makefile |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/autofs4/Makefile b/fs/autofs4/Makefile
index a811c1f7d9ab..417dd726d9ef 100644
--- a/fs/autofs4/Makefile
+++ b/fs/autofs4/Makefile
@@ -4,4 +4,6 @@
 
 obj-$(CONFIG_AUTOFS4_FS) += autofs4.o
 
-autofs4-objs := init.o inode.o root.o symlink.o waitq.o expire.o dev-ioctl.o
+autofs4-objs := ../autofs/init.o ../autofs/inode.o ../autofs/root.o \
+   ../autofs/symlink.o ../autofs/waitq.o ../autofs/expire.o \
+   ../autofs/dev-ioctl.o

[PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works

2018-05-13 Thread Joel Fernandes (Google)

rcu_seq_snap may be tricky for someone looking at it for the first time.
Lets document how it works with an example to make it easier.

Signed-off-by: Joel Fernandes (Google) 
---
 kernel/rcu/rcu.h | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 003671825d62..fc3170914ac7 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp)
WRITE_ONCE(*sp, rcu_seq_endval(sp));
 }
 
-/* Take a snapshot of the update side's sequence number. */
+/*
+ * Take a snapshot of the update side's sequence number.
+ *
+ * This function predicts what the grace period number will be the next
+ * time an RCU callback will be executed, given the current grace period's
+ * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is
+ * already in progress.
+ *
+ * We do this with a single addition and masking.
+ * For example, if RCU_SEQ_STATE_MASK=1 and the least significant bit (LSB) of
+ * the seq is used to track if a GP is in progress or not, its sufficient if we
+ * add (2+1) and mask with ~1. Let's see why with an example:
+ *
+ * Say the current seq is 6 which is 0b110 (gp is 3 and state bit is 0).
+ * To get the next GP number, we have to at least add 0b10 to this (0x1 << 1)
+ * to account for the state bit. However, if the current seq is 7 (gp is 3 and
+ * state bit is 1), then it means the current grace period is already in
+ * progress so the next time the callback will run is at the end of grace
+ * period number gp+2. To account for the extra +1, we just overflow the LSB by
+ * adding another 0x1 and masking with ~0x1. In case no GP was in progress (RCU
+ * is idle), then the addition of the extra 0x1 and masking will have no
+ * effect. This is calculated as below.
+ */
 static inline unsigned long rcu_seq_snap(unsigned long *sp)
 {
unsigned long s;
-- 
2.17.0.441.gb46fe60e1d-goog

[PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed

2018-05-13 Thread Joel Fernandes (Google)

Currently the tree RCU clean up code records a CleanupMore trace event
even if the GP was already in progress. This makes CleanupMore show up
twice for no reason. Avoid it.

Signed-off-by: Joel Fernandes (Google) 
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8401a253e7de..25c44328d071 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2083,7 +2083,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
rsp->gp_state = RCU_GP_IDLE;
/* Check for GP requests since above loop. */
rdp = this_cpu_ptr(rsp->rda);
-   if (ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) {
+   if (!needgp && ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) {
trace_rcu_this_gp(rnp, rdp, rnp->gp_seq_needed,
  TPS("CleanupMore"));
needgp = true;
-- 
2.17.0.441.gb46fe60e1d-goog

[PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU

2018-05-13 Thread Joel Fernandes (Google)

The 'c' variable was used previously to store the grace period
that is being requested. However it is not very meaningful for
a code reader, this patch replaces it with gp_seq_start indicating that
this is the grace period that was requested. Also updating tracing with
the new name.

Just a clean up patch, no logical change.

Signed-off-by: Joel Fernandes (Google) 
---
 include/trace/events/rcu.h | 15 ++--
 kernel/rcu/tree.c  | 47 ++
 2 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index ce9d1a1cac78..539900a9f8c7 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -103,15 +103,16 @@ TRACE_EVENT(rcu_grace_period,
  */
 TRACE_EVENT(rcu_future_grace_period,
 
-   TP_PROTO(const char *rcuname, unsigned long gp_seq, unsigned long c,
-u8 level, int grplo, int grphi, const char *gpevent),
+   TP_PROTO(const char *rcuname, unsigned long gp_seq,
+unsigned long gp_seq_start, u8 level, int grplo, int grphi,
+const char *gpevent),
 
-   TP_ARGS(rcuname, gp_seq, c, level, grplo, grphi, gpevent),
+   TP_ARGS(rcuname, gp_seq, gp_seq_start, level, grplo, grphi, gpevent),
 
TP_STRUCT__entry(
__field(const char *, rcuname)
__field(unsigned long, gp_seq)
-   __field(unsigned long, c)
+   __field(unsigned long, gp_seq_start)
__field(u8, level)
__field(int, grplo)
__field(int, grphi)
@@ -121,7 +122,7 @@ TRACE_EVENT(rcu_future_grace_period,
TP_fast_assign(
__entry->rcuname = rcuname;
__entry->gp_seq = gp_seq;
-   __entry->c = c;
+   __entry->gp_seq_start = gp_seq_start;
__entry->level = level;
__entry->grplo = grplo;
__entry->grphi = grphi;
@@ -129,7 +130,7 @@ TRACE_EVENT(rcu_future_grace_period,
),
 
TP_printk("%s %lu %lu %u %d %d %s",
- __entry->rcuname, __entry->gp_seq, __entry->c, __entry->level,
+ __entry->rcuname, __entry->gp_seq, __entry->gp_seq_start, 
__entry->level,
  __entry->grplo, __entry->grphi, __entry->gpevent)
 );
 
@@ -751,7 +752,7 @@ TRACE_EVENT(rcu_barrier,
 #else /* #ifdef CONFIG_RCU_TRACE */
 
 #define trace_rcu_grace_period(rcuname, gp_seq, gpevent) do { } while (0)
-#define trace_rcu_future_grace_period(rcuname, gp_seq, c, \
+#define trace_rcu_future_grace_period(rcuname, gp_seq, gp_seq_start, \
  level, grplo, grphi, event) \
  do { } while (0)
 #define trace_rcu_grace_period_init(rcuname, gp_seq, level, grplo, grphi, \
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 29ccc60bdbfc..9f5679ba413b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1541,13 +1541,18 @@ void rcu_cpu_stall_reset(void)
 
 /* Trace-event wrapper function for trace_rcu_future_grace_period.  */
 static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
- unsigned long c, const char *s)
+ unsigned long gp_seq_start, const char *s)
 {
-   trace_rcu_future_grace_period(rdp->rsp->name, rnp->gp_seq, c,
+   trace_rcu_future_grace_period(rdp->rsp->name, rnp->gp_seq, gp_seq_start,
  rnp->level, rnp->grplo, rnp->grphi, s);
 }
 
 /*
+ * rcu_start_this_gp - Request the start of a particular grace period
+ * @rnp: The leaf node of the CPU from which to start.
+ * @rdp: The rcu_data corresponding to the CPU from which to start.
+ * @gp_seq_start: The gp_seq of the grace period to start.
+ *
  * Start the specified grace period, as needed to handle newly arrived
  * callbacks.  The required future grace periods are recorded in each
  * rcu_node structure's ->gp_seq_needed field.  Returns true if there
@@ -1555,9 +1560,11 @@ static void trace_rcu_this_gp(struct rcu_node *rnp, 
struct rcu_data *rdp,
  *
  * The caller must hold the specified rcu_node structure's ->lock, which
  * is why the caller is responsible for waking the grace-period kthread.
+ *
+ * Returns true if the GP thread needs to be awakened else false.
  */
 static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
- unsigned long c)
+ unsigned long gp_seq_start)
 {
bool ret = false;
struct rcu_state *rsp = rdp->rsp;
@@ -1573,18 +1580,19 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, 
struct rcu_data *rdp,
 * not be released.
 */
raw_lockdep_assert_held_rcu_node(rnp);
-   trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf"));
+   trace_rcu_this_gp(rnp, rdp, gp_seq_start, TPS("Startleaf"));
for (rnp_root = rnp; 1; rnp_root =

[PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number

2018-05-13 Thread Joel Fernandes (Google)

cpustart shows a stale gp_seq. This is because rdp->gp_seq is updated
only at the end of the __note_gp_changes function. For this reason, use
rnp->gp_seq instead. I believe we can't update rdp->gp_seq too early so
lets just use the gp_seq from rnp instead.

Signed-off-by: Joel Fernandes (Google) 
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 25c44328d071..58d2b68f8b98 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1807,7 +1807,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, 
struct rcu_node *rnp,
 * set up to detect a quiescent state, otherwise don't
 * go looking for one.
 */
-   trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpustart"));
+   trace_rcu_grace_period(rsp->name, rnp->gp_seq, TPS("cpustart"));
need_gp = !!(rnp->qsmask & rdp->grpmask);
rdp->cpu_no_qs.b.norm = need_gp;
rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_dynticks.rcu_qs_ctr);
-- 
2.17.0.441.gb46fe60e1d-goog

[PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint

2018-05-13 Thread Joel Fernandes (Google)

In recent discussion [1], the check for whether a leaf believes RCU is
not idle, is being added back to funnel locking code, to avoid more
locking. In this we are marking the leaf node for a future grace-period
and bailing out since a GP is currently in progress. However the
tracepoint is missing. Lets add it back.

Also add a small comment about why we do this check (basically the point
is to avoid locking intermediate nodes unnecessarily) and clarify the
comments in the trace event header now that we are doing traversal of
one or more intermediate nodes.

[1] http://lkml.kernel.org/r/20180513190906.gl26...@linux.vnet.ibm.com

Signed-off-by: Joel Fernandes (Google) 
---
 include/trace/events/rcu.h |  4 ++--
 kernel/rcu/tree.c  | 11 ++-
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 539900a9f8c7..dc0bd11739c7 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period,
  *
  * "Startleaf": Request a grace period based on leaf-node data.
  * "Prestarted": Someone beat us to the request
- * "Startedleaf": Leaf-node start proved sufficient.
- * "Startedleafroot": Leaf-node start proved sufficient after checking root.
+ * "Startedleaf": Leaf and one or more non-root nodes marked for future start.
+ * "Startedleafroot": all non-root nodes from leaf to root marked for future 
start.
  * "Startedroot": Requested a nocb grace period based on root-node data.
  * "NoGPkthread": The RCU grace-period kthread has not yet started.
  * "StartWait": Start waiting for the requested grace period.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 40670047d22c..8401a253e7de 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1593,8 +1593,17 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, 
struct rcu_data *rdp,
goto unlock_out;
}
rnp_node->gp_seq_needed = gp_seq_start;
-   if (rcu_seq_state(rcu_seq_current(>gp_seq)))
+
+   /*
+* Check if leaf believes a GP is in progress, if yes we can
+* bail and avoid more locking. We have already marked the leaf.
+*/
+   if (rcu_seq_state(rcu_seq_current(>gp_seq))) {
+   trace_rcu_this_gp(rnp_node, rdp, gp_seq_start,
+ TPS("Startedleaf"));
goto unlock_out;
+   }
+
if (rnp_node != rnp && rnp_node->parent != NULL)
raw_spin_unlock_rcu_node(rnp_node);
if (!rnp_node->parent) {
-- 
2.17.0.441.gb46fe60e1d-goog

[PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev

2018-05-13 Thread Joel Fernandes (Google)

Hi,
Here are some fixes, clean ups and some code comments changes mostly
for the new funnel locking, gp_seq changes and some tracing. Its based
on latest rcu/dev branch.

thanks,

- Joel

Joel Fernandes (Google) (8):
  rcu: Add comment documenting how rcu_seq_snap works
  rcu: Clarify usage of cond_resched for tasks-RCU
  rcu: Add back the cpuend tracepoint
  rcu: Get rid of old c variable from places in tree RCU
  rcu: Use rcu_node as temporary variable in funnel locking loop
  rcu: Add back the Startedleaf tracepoint
  rcu: trace CleanupMore condition only if needed
  rcu: Fix cpustart tracepoint gp_seq number

 include/linux/rcupdate.h   | 11 +++--
 include/trace/events/rcu.h | 19 
 kernel/rcu/rcu.h   | 24 +-
 kernel/rcu/tree.c  | 92 +++---
 4 files changed, 97 insertions(+), 49 deletions(-)

-- 
2.17.0.441.gb46fe60e1d-goog

[PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU

2018-05-13 Thread Joel Fernandes (Google)

Recently we had a discussion about cond_resched unconditionally
recording a voluntary context switch [1].

Lets add a comment clarifying that how this API is to be used.

[1] 
https://lkml.kernel.org/r/1526027434-21237-1-git-send-email-byungchul.p...@lge.com

Signed-off-by: Joel Fernandes (Google) 
---
 include/linux/rcupdate.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 743226176350..a9881007ece6 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -159,8 +159,12 @@ static inline void rcu_init_nohz(void) { }
} while (0)
 
 /*
- * Note a voluntary context switch for RCU-tasks benefit.  This is a
- * macro rather than an inline function to avoid #include hell.
+ * Note an attempt to perform a voluntary context switch for RCU-tasks benefit.
+ *
+ * This is called even in situations where a context switch didn't really
+ * happen even though it was requested. The caller uses it to indicate
+ * traversal of an RCU-tasks quiescent state. This is a macro rather than an
+ * inline function to avoid #include hell.
  */
 #ifdef CONFIG_TASKS_RCU
 #define rcu_note_voluntary_context_switch_lite(t) \
@@ -187,7 +191,8 @@ static inline void exit_tasks_rcu_finish(void) { }
 #endif /* #else #ifdef CONFIG_TASKS_RCU */
 
 /**
- * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU
+ * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU.
+ * The quiescent state report is made even if cond_resched() did nothing.
  *
  * This macro resembles cond_resched(), except that it is defined to
  * report potential quiescent states to RCU-tasks even if the cond_resched()
-- 
2.17.0.441.gb46fe60e1d-goog

[PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop

2018-05-13 Thread Joel Fernandes (Google)

The funnel locking loop in rcu_start_this_gp uses rcu_root as a
temporary variable while walking the combining tree. This causes a
tiresome exercise of a code reader reminding themselves that rcu_root
may not be root. Lets just call it rcu_node, and then finally when
rcu_node is the rcu_root, lets assign it at that time.

Just a clean up patch, no logical change.

Signed-off-by: Joel Fernandes (Google) 
---
 kernel/rcu/tree.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9f5679ba413b..40670047d22c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1568,7 +1568,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, 
struct rcu_data *rdp,
 {
bool ret = false;
struct rcu_state *rsp = rdp->rsp;
-   struct rcu_node *rnp_root;
+   struct rcu_node *rnp_node, *rnp_root = NULL;
 
/*
 * Use funnel locking to either acquire the root rcu_node
@@ -1581,24 +1581,26 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, 
struct rcu_data *rdp,
 */
raw_lockdep_assert_held_rcu_node(rnp);
trace_rcu_this_gp(rnp, rdp, gp_seq_start, TPS("Startleaf"));
-   for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
-   if (rnp_root != rnp)
-   raw_spin_lock_rcu_node(rnp_root);
-   if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start) ||
-   rcu_seq_done(_root->gp_seq, gp_seq_start) ||
-   (rnp != rnp_root &&
-rcu_seq_state(rcu_seq_current(_root->gp_seq {
-   trace_rcu_this_gp(rnp_root, rdp, gp_seq_start,
+   for (rnp_node = rnp; 1; rnp_node = rnp_node->parent) {
+   if (rnp_node != rnp)
+   raw_spin_lock_rcu_node(rnp_node);
+   if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start) ||
+   rcu_seq_done(_node->gp_seq, gp_seq_start) ||
+   (rnp != rnp_node &&
+rcu_seq_state(rcu_seq_current(_node->gp_seq {
+   trace_rcu_this_gp(rnp_node, rdp, gp_seq_start,
  TPS("Prestarted"));
goto unlock_out;
}
-   rnp_root->gp_seq_needed = gp_seq_start;
+   rnp_node->gp_seq_needed = gp_seq_start;
if (rcu_seq_state(rcu_seq_current(>gp_seq)))
goto unlock_out;
-   if (rnp_root != rnp && rnp_root->parent != NULL)
-   raw_spin_unlock_rcu_node(rnp_root);
-   if (!rnp_root->parent)
+   if (rnp_node != rnp && rnp_node->parent != NULL)
+   raw_spin_unlock_rcu_node(rnp_node);
+   if (!rnp_node->parent) {
+   rnp_root = rnp_node;
break;  /* At root, and perhaps also leaf. */
+   }
}
 
/* If GP already in progress, just leave, otherwise start one. */
@@ -1616,10 +1618,10 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, 
struct rcu_data *rdp,
trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gp_seq), 
TPS("newreq"));
ret = true;  /* Caller must wake GP kthread. */
 unlock_out:
-   if (rnp != rnp_root)
-   raw_spin_unlock_rcu_node(rnp_root);
+   if (rnp != rnp_node)
+   raw_spin_unlock_rcu_node(rnp_node);
/* Push furthest requested GP to leaf node and rcu_data structure. */
-   if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start)) {
+   if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start)) {
rnp->gp_seq_needed = gp_seq_start;
rdp->gp_seq_needed = gp_seq_start;
}
-- 
2.17.0.441.gb46fe60e1d-goog

[PATCH RFC 3/8] rcu: Add back the cpuend tracepoint

2018-05-13 Thread Joel Fernandes (Google)

Commit be4b8beed87d ("rcu: Move RCU's grace-period-change code to ->gp_seq")
removed the cpuend grace period trace point. This patch adds it back.

Signed-off-by: Joel Fernandes (Google) 
---
 kernel/rcu/tree.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9ad931bff409..29ccc60bdbfc 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1774,10 +1774,12 @@ static bool __note_gp_changes(struct rcu_state *rsp, 
struct rcu_node *rnp,
 
/* Handle the ends of any preceding grace periods first. */
if (rcu_seq_completed_gp(rdp->gp_seq, rnp->gp_seq) ||
-   unlikely(READ_ONCE(rdp->gpwrap)))
+   unlikely(READ_ONCE(rdp->gpwrap))) {
ret = rcu_advance_cbs(rsp, rnp, rdp); /* Advance callbacks. */
-   else
+   trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpuend"));
+   } else {
ret = rcu_accelerate_cbs(rsp, rnp, rdp); /* Recent callbacks. */
+   }
 
/* Now handle the beginnings of any new-to-this-CPU grace periods. */
if (rcu_seq_new_gp(rdp->gp_seq, rnp->gp_seq) ||
-- 
2.17.0.441.gb46fe60e1d-goog

Re: [PATCH 1/2] powerpc: Detect the presence of big-core with interleaved threads

2018-05-13 Thread Michael Neuling

Thanks for posting this... A couple of comments below.

On Fri, 2018-05-11 at 16:47 +0530, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" 
> 
> A pair of IBM POWER9 SMT4 cores can be fused together to form a
> big-core with 8 SMT threads. This can be discovered via the
> "ibm,thread-groups" CPU property in the device tree which will
> indicate which group of threads that share the L1 cache, translation
> cache and instruction data flow.  If there are multiple such group of
> threads, then the core is a big-core. The thread-ids of the threads of
> the big-core can be obtained by interleaving the thread-ids of the
> thread-groups (component small core).
> 
> Eg: Threads in the pair of component SMT4 cores of an interleaved
> big-core are numbered {0,2,4,6} and {1,3,5,7} respectively.
> 
> This patch introduces a function to check if a given device tree node
> corresponding to a CPU node represents an interleaved big-core.
> 
> This function is invoked during the boot-up to detect the presence of
> interleaved big-cores. The presence of such an interleaved big-core is
> recorded in a global variable for later use.
> 
> Signed-off-by: Gautham R. Shenoy 
> ---
>  arch/powerpc/include/asm/cputhreads.h |  8 +++--
>  arch/powerpc/kernel/setup-common.c| 63 +-
> -
>  2 files changed, 66 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/cputhreads.h
> b/arch/powerpc/include/asm/cputhreads.h
> index d71a909..b706f0a 100644
> --- a/arch/powerpc/include/asm/cputhreads.h
> +++ b/arch/powerpc/include/asm/cputhreads.h
> @@ -23,11 +23,13 @@
>  extern int threads_per_core;
>  extern int threads_per_subcore;
>  extern int threads_shift;
> +extern bool has_interleaved_big_core;
>  extern cpumask_t threads_core_mask;
>  #else
> -#define threads_per_core 1
> -#define threads_per_subcore  1
> -#define threads_shift0
> +#define threads_per_core 1
> +#define threads_per_subcore  1
> +#define threads_shift0
> +#define has_interleaved_big_core 0
>  #define threads_core_mask(*get_cpu_mask(0))
>  #endif
>  
> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-
> common.c
> index 0af5c11..884dff2 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -408,10 +408,12 @@ void __init check_for_initrd(void)
>  #ifdef CONFIG_SMP
>  
>  int threads_per_core, threads_per_subcore, threads_shift;
> +bool has_interleaved_big_core;
>  cpumask_t threads_core_mask;
>  EXPORT_SYMBOL_GPL(threads_per_core);
>  EXPORT_SYMBOL_GPL(threads_per_subcore);
>  EXPORT_SYMBOL_GPL(threads_shift);
> +EXPORT_SYMBOL_GPL(has_interleaved_big_core);
>  EXPORT_SYMBOL_GPL(threads_core_mask);
>  
>  static void __init cpu_init_thread_core_maps(int tpc)
> @@ -436,8 +438,56 @@ static void __init cpu_init_thread_core_maps(int tpc)
>   printk(KERN_DEBUG " (thread shift is %d)\n", threads_shift);
>  }
>  
> -
>  u32 *cpu_to_phys_id = NULL;
> +/*
> + * check_for_interleaved_big_core - Checks if the core represented by
> + *dn is a big-core whose threads are interleavings of the
> + *threads of the component small cores.
> + *
> + * @dn: device node corresponding to the core.
> + *
> + * Returns true if the core is a interleaved big-core.
> + * Returns false otherwise.
> + */
> +static inline bool check_for_interleaved_big_core(struct device_node *dn)
> +{
> + int len, nr_groups, threads_per_group;
> + const __be32 *thread_groups;
> + __be32 *thread_list, *first_cpu_idx;
> + int cur_cpu, next_cpu, i, j;
> +
> + thread_groups = of_get_property(dn, "ibm,thread-groups", );
> + if (!thread_groups)
> + return false;

Can you document what this property looks like? Seems to be nr_groups,
threads_per_group, thread_list. Can you explain what each of these mean?

If we get configured with an SMT2 big-core (ie. two interleaved SMT1 normal
cores), will this code also work there?

> +
> + nr_groups = be32_to_cpu(*(thread_groups + 1));
> + if (nr_groups <= 1)
> + return false;
> +
> + threads_per_group = be32_to_cpu(*(thread_groups + 2));
> + thread_list = (__be32 *)thread_groups + 3;
> +
> + /*
> +  * In case of an interleaved big-core, the thread-ids of the
> +  * big-core can be obtained by interleaving the the thread-ids
> +  * of the component small
> +  *
> +  * Eg: On a 8-thread big-core with two SMT4 small cores, the
> +  * threads of the two component small cores will be
> +  * {0, 2, 4, 6} and {1, 3, 5, 7}.
> +  */
> + for (i = 0; i < nr_groups; i++) {
> + first_cpu_idx = thread_list + i * threads_per_group;
> +
> + for (j = 0; j < threads_per_group - 1; j++) {
> + cur_cpu = be32_to_cpu(*(first_cpu_idx + j));
> + next_cpu = be32_to_cpu(*(first_cpu_idx +

[PATCH v2 00/13] add power domain support for Rockchip Socs

2018-05-13 Thread Elaine Zhang

add power domain support for RK3036/RK3128/RK3228/PX30 Soc.
fix up the wrong value when set power domain up.

Change in V2:
Fix up the commit message description and Assign author.

Caesar Wang (3):
  dt-bindings: power: add RK3036 SoCs header for power-domain
  dt-bindings: add binding for rk3036 power domains
  Soc: rockchip: power-domain: add power domain support for rk3036

Elaine Zhang (6):
  dt-bindings: power: add RK3128 SoCs header for power-domain
  dt-bindings: add binding for rk3128 power domains
  soc: rockchip: power-domain: add power domain support for rk3128
  dt-bindings: power: add RK3228 SoCs header for power-domain
  dt-bindings: add binding for rk3228 power domains
  soc: rockchip: power-domain: add power domain support for rk3228

Finley Xiao (4):
  soc: rockchip: power-domain: Fix wrong value when power up pd
  dt-bindings: power: add PX30 SoCs header for power-domain
  dt-bindings: add binding for px30 power domains
  soc: rockchip: power-domain: add power domain support for px30

 .../bindings/soc/rockchip/power_domain.txt |  12 +++
 drivers/soc/rockchip/pm_domains.c  | 116 -
 include/dt-bindings/power/px30-power.h |  32 ++
 include/dt-bindings/power/rk3036-power.h   |  27 +
 include/dt-bindings/power/rk3128-power.h   |  28 +
 include/dt-bindings/power/rk3228-power.h   |  26 +
 6 files changed, 240 insertions(+), 1 deletion(-)
 create mode 100644 include/dt-bindings/power/px30-power.h
 create mode 100644 include/dt-bindings/power/rk3036-power.h
 create mode 100644 include/dt-bindings/power/rk3128-power.h
 create mode 100644 include/dt-bindings/power/rk3228-power.h

-- 
1.9.1

Re: [PATCH 2/2] powerpc: Enable ASYM_SMT on interleaved big-core systems

2018-05-13 Thread Michael Neuling

On Fri, 2018-05-11 at 16:47 +0530, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" 
> 
> Each of the SMT4 cores forming a fused-core are more or less
> independent units. Thus when multiple tasks are scheduled to run on
> the fused core, we get the best performance when the tasks are spread
> across the pair of SMT4 cores.
> 
> Since the threads in the pair of SMT4 cores of an interleaved big-core
> are numbered {0,2,4,6} and {1,3,5,7} respectively, enable ASYM_SMT on
> such interleaved big-cores that will bias the load-balancing of tasks
> on smaller numbered threads, which will automatically result in
> spreading the tasks uniformly across the associated pair of SMT4
> cores.
> 
> Signed-off-by: Gautham R. Shenoy 
> ---
>  arch/powerpc/kernel/smp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 9ca7148..0153f01 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1082,7 +1082,7 @@ static int powerpc_smt_flags(void)
>  {
>   int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
>  
> - if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
> + if (cpu_has_feature(CPU_FTR_ASYM_SMT) || has_interleaved_big_core) {

Shouldn't we just set CPU_FTR_ASYM_SMT and leave this code unchanged?


>   printk_once(KERN_INFO "Enabling Asymmetric SMT
> scheduling\n");
>   flags |= SD_ASYM_PACKING;
>   }

[PATCH v2 02/13] dt-bindings: add binding for rk3036 power domains

2018-05-13 Thread Elaine Zhang

From: Caesar Wang 

Add binding documentation for the power domains
found on Rockchip RK3036 SoCs.

Signed-off-by: Caesar Wang 
Signed-off-by: Elaine Zhang 
---
 Documentation/devicetree/bindings/soc/rockchip/power_domain.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt 
b/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
index 301d2a9bc1b8..79924ee9ae86 100644
--- a/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
+++ b/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
@@ -5,6 +5,7 @@ powered up/down by software based on different application 
scenes to save power.
 
 Required properties for power domain controller:
 - compatible: Should be one of the following.
+   "rockchip,rk3036-power-controller" - for RK3036 SoCs.
"rockchip,rk3288-power-controller" - for RK3288 SoCs.
"rockchip,rk3328-power-controller" - for RK3328 SoCs.
"rockchip,rk3366-power-controller" - for RK3366 SoCs.
@@ -17,6 +18,7 @@ Required properties for power domain controller:
 
 Required properties for power domain sub nodes:
 - reg: index of the power domain, should use macros in:
+   "include/dt-bindings/power/rk3036-power.h" - for RK3036 type power 
domain.
"include/dt-bindings/power/rk3288-power.h" - for RK3288 type power 
domain.
"include/dt-bindings/power/rk3328-power.h" - for RK3328 type power 
domain.
"include/dt-bindings/power/rk3366-power.h" - for RK3366 type power 
domain.
@@ -93,6 +95,7 @@ Node of a device using power domains must have a 
power-domains property,
 containing a phandle to the power device node and an index specifying which
 power domain to use.
 The index should use macros in:
+   "include/dt-bindings/power/rk3036-power.h" - for rk3036 type power 
domain.
"include/dt-bindings/power/rk3288-power.h" - for rk3288 type power 
domain.
"include/dt-bindings/power/rk3328-power.h" - for rk3328 type power 
domain.
"include/dt-bindings/power/rk3366-power.h" - for rk3366 type power 
domain.
-- 
1.9.1

[PATCH v2 03/13] Soc: rockchip: power-domain: add power domain support for rk3036

2018-05-13 Thread Elaine Zhang

From: Caesar Wang 

This driver is modified to support RK3036 SoC.

Signed-off-by: Caesar Wang 
Signed-off-by: Elaine Zhang 
---
 drivers/soc/rockchip/pm_domains.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/drivers/soc/rockchip/pm_domains.c 
b/drivers/soc/rockchip/pm_domains.c
index 53efc386b1ad..ebd7c41898c0 100644
--- a/drivers/soc/rockchip/pm_domains.c
+++ b/drivers/soc/rockchip/pm_domains.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -102,6 +103,14 @@ struct rockchip_pmu {
.ack_mask = (ack >= 0) ? BIT(ack) : 0,  \
.active_wakeup = wakeup,\
 }
+#define DOMAIN_RK3036(req, ack, idle, wakeup)  \
+{  \
+   .req_mask = (req >= 0) ? BIT(req) : 0,  \
+   .req_w_mask = (req >= 0) ?  BIT(req + 16) : 0,  \
+   .ack_mask = (ack >= 0) ? BIT(ack) : 0,  \
+   .idle_mask = (idle >= 0) ? BIT(idle) : 0,   \
+   .active_wakeup = wakeup,\
+}
 
 #define DOMAIN_RK3288(pwr, status, req, wakeup)\
DOMAIN(pwr, status, req, req, (req) + 16, wakeup)
@@ -701,6 +710,16 @@ static int rockchip_pm_domain_probe(struct platform_device 
*pdev)
return error;
 }
 
+static const struct rockchip_domain_info rk3036_pm_domains[] = {
+   [RK3036_PD_MSCH]= DOMAIN_RK3036(14, 23, 30, true),
+   [RK3036_PD_CORE]= DOMAIN_RK3036(13, 17, 24, false),
+   [RK3036_PD_PERI]= DOMAIN_RK3036(12, 18, 25, false),
+   [RK3036_PD_VIO] = DOMAIN_RK3036(11, 19, 26, false),
+   [RK3036_PD_VPU] = DOMAIN_RK3036(10, 20, 27, false),
+   [RK3036_PD_GPU] = DOMAIN_RK3036(9, 21, 28, false),
+   [RK3036_PD_SYS] = DOMAIN_RK3036(8, 22, 29, false),
+};
+
 static const struct rockchip_domain_info rk3288_pm_domains[] = {
[RK3288_PD_VIO] = DOMAIN_RK3288(7, 7, 4, false),
[RK3288_PD_HEVC]= DOMAIN_RK3288(14, 10, 9, false),
@@ -768,6 +787,15 @@ static int rockchip_pm_domain_probe(struct platform_device 
*pdev)
[RK3399_PD_SDIOAUDIO]   = DOMAIN_RK3399(31, 31, 29, true),
 };
 
+static const struct rockchip_pmu_info rk3036_pmu = {
+   .req_offset = 0x148,
+   .idle_offset = 0x14c,
+   .ack_offset = 0x14c,
+
+   .num_domains = ARRAY_SIZE(rk3036_pm_domains),
+   .domain_info = rk3036_pm_domains,
+};
+
 static const struct rockchip_pmu_info rk3288_pmu = {
.pwr_offset = 0x08,
.status_offset = 0x0c,
@@ -843,6 +871,10 @@ static int rockchip_pm_domain_probe(struct platform_device 
*pdev)
 
 static const struct of_device_id rockchip_pm_domain_dt_match[] = {
{
+   .compatible = "rockchip,rk3036-power-controller",
+   .data = (void *)_pmu,
+   },
+   {
.compatible = "rockchip,rk3288-power-controller",
.data = (void *)_pmu,
},
-- 
1.9.1

[PATCH v2 04/13] soc: rockchip: power-domain: Fix wrong value when power up pd

2018-05-13 Thread Elaine Zhang

From: Finley Xiao 

Solve the pd could only ever turn off but never turn them on again,
If the pd registers have the writemask bits.

Fix up the code error for commit:
commit 79bb17ce8edb3141339b5882e372d0ec7346217c
Author: Elaine Zhang 
Date:   Fri Dec 23 11:47:52 2016 +0800

soc: rockchip: power-domain: Support domain control in hiword-registers

New Rockchips SoCs may have their power-domain control in registers
using a writemask-based access scheme (upper 16bit being the write
mask). So add a DOMAIN_M type and handle this case accordingly.
Signed-off-by: Elaine Zhang 
Signed-off-by: Heiko Stuebner 

Signed-off-by: Finley Xiao 
Signed-off-by: Elaine Zhang 
---
 drivers/soc/rockchip/pm_domains.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/rockchip/pm_domains.c 
b/drivers/soc/rockchip/pm_domains.c
index ebd7c41898c0..01d4ba26a054 100644
--- a/drivers/soc/rockchip/pm_domains.c
+++ b/drivers/soc/rockchip/pm_domains.c
@@ -264,7 +264,7 @@ static void rockchip_do_pmu_set_power_domain(struct 
rockchip_pm_domain *pd,
return;
else if (pd->info->pwr_w_mask)
regmap_write(pmu->regmap, pmu->info->pwr_offset,
-on ? pd->info->pwr_mask :
+on ? pd->info->pwr_w_mask :
 (pd->info->pwr_mask | pd->info->pwr_w_mask));
else
regmap_update_bits(pmu->regmap, pmu->info->pwr_offset,
-- 
1.9.1

[PATCH v2 05/13] dt-bindings: power: add RK3128 SoCs header for power-domain

2018-05-13 Thread Elaine Zhang

According to a description from TRM, add all the power domains.

Signed-off-by: Elaine Zhang 
---
 include/dt-bindings/power/rk3128-power.h | 28 
 1 file changed, 28 insertions(+)
 create mode 100644 include/dt-bindings/power/rk3128-power.h

diff --git a/include/dt-bindings/power/rk3128-power.h 
b/include/dt-bindings/power/rk3128-power.h
new file mode 100644
index ..26aef519cd94
--- /dev/null
+++ b/include/dt-bindings/power/rk3128-power.h
@@ -0,0 +1,28 @@
+/*
+ * Copyright (c) 2017 Rockchip Electronics Co. Ltd.
+ * Author: Elaine Zhang 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __DT_BINDINGS_POWER_RK3128_POWER_H__
+#define __DT_BINDINGS_POWER_RK3128_POWER_H__
+
+/* VD_CORE */
+#define RK3128_PD_CORE 0
+
+/* VD_LOGIC */
+#define RK3128_PD_VIO  1
+#define RK3128_PD_VIDEO2
+#define RK3128_PD_GPU  3
+#define RK3128_PD_MSCH 4
+
+#endif
-- 
1.9.1

[PATCH v2 01/13] dt-bindings: power: add RK3036 SoCs header for power-domain

2018-05-13 Thread Elaine Zhang

From: Caesar Wang 

According to a description from TRM, add all the power domains.

Signed-off-by: Caesar Wang 
Signed-off-by: Elaine Zhang 
---
 include/dt-bindings/power/rk3036-power.h | 27 +++
 1 file changed, 27 insertions(+)
 create mode 100644 include/dt-bindings/power/rk3036-power.h

diff --git a/include/dt-bindings/power/rk3036-power.h 
b/include/dt-bindings/power/rk3036-power.h
new file mode 100644
index ..59e09f1c5af7
--- /dev/null
+++ b/include/dt-bindings/power/rk3036-power.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright (c) 2017 Rockchip Electronics Co. Ltd.
+ * Author: Caesar Wang 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __DT_BINDINGS_POWER_RK3036_POWER_H__
+#define __DT_BINDINGS_POWER_RK3036_POWER_H__
+
+#define RK3036_PD_MSCH 0
+#define RK3036_PD_CORE 1
+#define RK3036_PD_PERI 2
+#define RK3036_PD_VIO  3
+#define RK3036_PD_VPU  4
+#define RK3036_PD_GPU  5
+#define RK3036_PD_SYS  6
+
+#endif
-- 
1.9.1

[PATCH v2 06/13] dt-bindings: add binding for rk3128 power domains

2018-05-13 Thread Elaine Zhang

Add binding documentation for the power domains
found on Rockchip RK3128 SoCs.

Signed-off-by: Elaine Zhang 
---
 Documentation/devicetree/bindings/soc/rockchip/power_domain.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt 
b/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
index 79924ee9ae86..9a3f5fd36a80 100644
--- a/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
+++ b/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
@@ -6,6 +6,7 @@ powered up/down by software based on different application 
scenes to save power.
 Required properties for power domain controller:
 - compatible: Should be one of the following.
"rockchip,rk3036-power-controller" - for RK3036 SoCs.
+   "rockchip,rk3128-power-controller" - for RK3128 SoCs.
"rockchip,rk3288-power-controller" - for RK3288 SoCs.
"rockchip,rk3328-power-controller" - for RK3328 SoCs.
"rockchip,rk3366-power-controller" - for RK3366 SoCs.
@@ -19,6 +20,7 @@ Required properties for power domain controller:
 Required properties for power domain sub nodes:
 - reg: index of the power domain, should use macros in:
"include/dt-bindings/power/rk3036-power.h" - for RK3036 type power 
domain.
+   "include/dt-bindings/power/rk3128-power.h" - for RK3128 type power 
domain.
"include/dt-bindings/power/rk3288-power.h" - for RK3288 type power 
domain.
"include/dt-bindings/power/rk3328-power.h" - for RK3328 type power 
domain.
"include/dt-bindings/power/rk3366-power.h" - for RK3366 type power 
domain.
@@ -96,6 +98,7 @@ containing a phandle to the power device node and an index 
specifying which
 power domain to use.
 The index should use macros in:
"include/dt-bindings/power/rk3036-power.h" - for rk3036 type power 
domain.
+   "include/dt-bindings/power/rk3128-power.h" - for rk3128 type power 
domain.
"include/dt-bindings/power/rk3288-power.h" - for rk3288 type power 
domain.
"include/dt-bindings/power/rk3328-power.h" - for rk3328 type power 
domain.
"include/dt-bindings/power/rk3366-power.h" - for rk3366 type power 
domain.
-- 
1.9.1

[PATCH v2 08/13] dt-bindings: power: add RK3228 SoCs header for power-domain

2018-05-13 Thread Elaine Zhang

According to a description from TRM, add all the power domains.

Signed-off-by: Elaine Zhang 
---
 include/dt-bindings/power/rk3228-power.h | 26 ++
 1 file changed, 26 insertions(+)
 create mode 100644 include/dt-bindings/power/rk3228-power.h

diff --git a/include/dt-bindings/power/rk3228-power.h 
b/include/dt-bindings/power/rk3228-power.h
new file mode 100644
index ..fa1264d5a995
--- /dev/null
+++ b/include/dt-bindings/power/rk3228-power.h
@@ -0,0 +1,26 @@
+/*
+ * Copyright (c) 2018 Fuzhou Rockchip Electronics Co., Ltd
+ *
+ * SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+ */
+
+#ifndef __DT_BINDINGS_POWER_RK3228_POWER_H__
+#define __DT_BINDINGS_POWER_RK3228_POWER_H__
+
+/**
+ * RK3228 idle id Summary.
+ */
+
+#define RK3228_PD_CORE 0
+#define RK3228_PD_MSCH 1
+#define RK3228_PD_BUS  2
+#define RK3228_PD_SYS  3
+#define RK3228_PD_VIO  4
+#define RK3228_PD_VOP  5
+#define RK3228_PD_VPU  6
+#define RK3228_PD_RKVDEC   7
+#define RK3228_PD_GPU  8
+#define RK3228_PD_PERI 9
+#define RK3228_PD_GMAC 10
+
+#endif
-- 
1.9.1

[PATCH v2 07/13] soc: rockchip: power-domain: add power domain support for rk3128

2018-05-13 Thread Elaine Zhang

This driver is modified to support RK3128 SoC.

Signed-off-by: Elaine Zhang 
---
 drivers/soc/rockchip/pm_domains.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/soc/rockchip/pm_domains.c 
b/drivers/soc/rockchip/pm_domains.c
index 01d4ba26a054..99a2dd8a7801 100644
--- a/drivers/soc/rockchip/pm_domains.c
+++ b/drivers/soc/rockchip/pm_domains.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -720,6 +721,14 @@ static int rockchip_pm_domain_probe(struct platform_device 
*pdev)
[RK3036_PD_SYS] = DOMAIN_RK3036(8, 22, 29, false),
 };
 
+static const struct rockchip_domain_info rk3128_pm_domains[] = {
+   [RK3128_PD_CORE]= DOMAIN_RK3288(0, 0, 4, false),
+   [RK3128_PD_MSCH]= DOMAIN_RK3288(-1, -1, 6, true),
+   [RK3128_PD_VIO] = DOMAIN_RK3288(3, 3, 2, false),
+   [RK3128_PD_VIDEO]   = DOMAIN_RK3288(2, 2, 1, false),
+   [RK3128_PD_GPU] = DOMAIN_RK3288(1, 1, 3, false),
+};
+
 static const struct rockchip_domain_info rk3288_pm_domains[] = {
[RK3288_PD_VIO] = DOMAIN_RK3288(7, 7, 4, false),
[RK3288_PD_HEVC]= DOMAIN_RK3288(14, 10, 9, false),
@@ -796,6 +805,17 @@ static int rockchip_pm_domain_probe(struct platform_device 
*pdev)
.domain_info = rk3036_pm_domains,
 };
 
+static const struct rockchip_pmu_info rk3128_pmu = {
+   .pwr_offset = 0x04,
+   .status_offset = 0x08,
+   .req_offset = 0x0c,
+   .idle_offset = 0x10,
+   .ack_offset = 0x10,
+
+   .num_domains = ARRAY_SIZE(rk3128_pm_domains),
+   .domain_info = rk3128_pm_domains,
+};
+
 static const struct rockchip_pmu_info rk3288_pmu = {
.pwr_offset = 0x08,
.status_offset = 0x0c,
@@ -875,6 +895,10 @@ static int rockchip_pm_domain_probe(struct platform_device 
*pdev)
.data = (void *)_pmu,
},
{
+   .compatible = "rockchip,rk3128-power-controller",
+   .data = (void *)_pmu,
+   },
+   {
.compatible = "rockchip,rk3288-power-controller",
.data = (void *)_pmu,
},
-- 
1.9.1

[PATCH v2 13/13] soc: rockchip: power-domain: add power domain support for px30

2018-05-13 Thread Elaine Zhang

From: Finley Xiao 

This driver is modified to support PX30 SoC.

Signed-off-by: Finley Xiao 
Signed-off-by: Elaine Zhang 
---
 drivers/soc/rockchip/pm_domains.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/drivers/soc/rockchip/pm_domains.c 
b/drivers/soc/rockchip/pm_domains.c
index 90dcd5e21ae6..d0c5615132e3 100644
--- a/drivers/soc/rockchip/pm_domains.c
+++ b/drivers/soc/rockchip/pm_domains.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -114,6 +115,9 @@ struct rockchip_pmu {
.active_wakeup = wakeup,\
 }
 
+#define DOMAIN_PX30(pwr, status, req, wakeup)  \
+   DOMAIN_M(pwr, status, req, (req) + 16, req, wakeup)
+
 #define DOMAIN_RK3288(pwr, status, req, wakeup)\
DOMAIN(pwr, status, req, req, (req) + 16, wakeup)
 
@@ -712,6 +716,17 @@ static int rockchip_pm_domain_probe(struct platform_device 
*pdev)
return error;
 }
 
+static const struct rockchip_domain_info px30_pm_domains[] = {
+   [PX30_PD_USB]   = DOMAIN_PX30(5, 5, 10, false),
+   [PX30_PD_SDCARD]= DOMAIN_PX30(8, 8, 9, false),
+   [PX30_PD_GMAC]  = DOMAIN_PX30(10, 10, 6, false),
+   [PX30_PD_MMC_NAND]  = DOMAIN_PX30(11, 11, 5, false),
+   [PX30_PD_VPU]   = DOMAIN_PX30(12, 12, 14, false),
+   [PX30_PD_VO]= DOMAIN_PX30(13, 13, 7, false),
+   [PX30_PD_VI]= DOMAIN_PX30(14, 14, 8, false),
+   [PX30_PD_GPU]   = DOMAIN_PX30(15, 15, 2, false),
+};
+
 static const struct rockchip_domain_info rk3036_pm_domains[] = {
[RK3036_PD_MSCH]= DOMAIN_RK3036(14, 23, 30, true),
[RK3036_PD_CORE]= DOMAIN_RK3036(13, 17, 24, false),
@@ -811,6 +826,17 @@ static int rockchip_pm_domain_probe(struct platform_device 
*pdev)
[RK3399_PD_SDIOAUDIO]   = DOMAIN_RK3399(31, 31, 29, true),
 };
 
+static const struct rockchip_pmu_info px30_pmu = {
+   .pwr_offset = 0x18,
+   .status_offset = 0x20,
+   .req_offset = 0x64,
+   .idle_offset = 0x6c,
+   .ack_offset = 0x6c,
+
+   .num_domains = ARRAY_SIZE(px30_pm_domains),
+   .domain_info = px30_pm_domains,
+};
+
 static const struct rockchip_pmu_info rk3036_pmu = {
.req_offset = 0x148,
.idle_offset = 0x14c,
@@ -915,6 +941,10 @@ static int rockchip_pm_domain_probe(struct platform_device 
*pdev)
 
 static const struct of_device_id rockchip_pm_domain_dt_match[] = {
{
+   .compatible = "rockchip,px30-power-controller",
+   .data = (void *)_pmu,
+   },
+   {
.compatible = "rockchip,rk3036-power-controller",
.data = (void *)_pmu,
},
-- 
1.9.1

[PATCH v2 10/13] soc: rockchip: power-domain: add power domain support for rk3228

2018-05-13 Thread Elaine Zhang

This driver is modified to support RK3228 SoC.

Signed-off-by: Elaine Zhang 
---
 drivers/soc/rockchip/pm_domains.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/soc/rockchip/pm_domains.c 
b/drivers/soc/rockchip/pm_domains.c
index 99a2dd8a7801..90dcd5e21ae6 100644
--- a/drivers/soc/rockchip/pm_domains.c
+++ b/drivers/soc/rockchip/pm_domains.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -729,6 +730,20 @@ static int rockchip_pm_domain_probe(struct platform_device 
*pdev)
[RK3128_PD_GPU] = DOMAIN_RK3288(1, 1, 3, false),
 };
 
+static const struct rockchip_domain_info rk3228_pm_domains[] = {
+   [RK3228_PD_CORE]= DOMAIN_RK3036(0, 0, 16, true),
+   [RK3228_PD_MSCH]= DOMAIN_RK3036(1, 1, 17, true),
+   [RK3228_PD_BUS] = DOMAIN_RK3036(2, 2, 18, true),
+   [RK3228_PD_SYS] = DOMAIN_RK3036(3, 3, 19, true),
+   [RK3228_PD_VIO] = DOMAIN_RK3036(4, 4, 20, false),
+   [RK3228_PD_VOP] = DOMAIN_RK3036(5, 5, 21, false),
+   [RK3228_PD_VPU] = DOMAIN_RK3036(6, 6, 22, false),
+   [RK3228_PD_RKVDEC]  = DOMAIN_RK3036(7, 7, 23, false),
+   [RK3228_PD_GPU] = DOMAIN_RK3036(8, 8, 24, false),
+   [RK3228_PD_PERI]= DOMAIN_RK3036(9, 9, 25, true),
+   [RK3228_PD_GMAC]= DOMAIN_RK3036(10, 10, 26, false),
+};
+
 static const struct rockchip_domain_info rk3288_pm_domains[] = {
[RK3288_PD_VIO] = DOMAIN_RK3288(7, 7, 4, false),
[RK3288_PD_HEVC]= DOMAIN_RK3288(14, 10, 9, false),
@@ -816,6 +831,15 @@ static int rockchip_pm_domain_probe(struct platform_device 
*pdev)
.domain_info = rk3128_pm_domains,
 };
 
+static const struct rockchip_pmu_info rk3228_pmu = {
+   .req_offset = 0x40c,
+   .idle_offset = 0x488,
+   .ack_offset = 0x488,
+
+   .num_domains = ARRAY_SIZE(rk3228_pm_domains),
+   .domain_info = rk3228_pm_domains,
+};
+
 static const struct rockchip_pmu_info rk3288_pmu = {
.pwr_offset = 0x08,
.status_offset = 0x0c,
@@ -899,6 +923,10 @@ static int rockchip_pm_domain_probe(struct platform_device 
*pdev)
.data = (void *)_pmu,
},
{
+   .compatible = "rockchip,rk3228-power-controller",
+   .data = (void *)_pmu,
+   },
+   {
.compatible = "rockchip,rk3288-power-controller",
.data = (void *)_pmu,
},
-- 
1.9.1

[PATCH v2 11/13] dt-bindings: power: add PX30 SoCs header for power-domain

2018-05-13 Thread Elaine Zhang

From: Finley Xiao 

According to a description from TRM, add all the power domains.

Signed-off-by: Finley Xiao 
Signed-off-by: Elaine Zhang 
---
 include/dt-bindings/power/px30-power.h | 32 
 1 file changed, 32 insertions(+)
 create mode 100644 include/dt-bindings/power/px30-power.h

diff --git a/include/dt-bindings/power/px30-power.h 
b/include/dt-bindings/power/px30-power.h
new file mode 100644
index ..4ed482e80950
--- /dev/null
+++ b/include/dt-bindings/power/px30-power.h
@@ -0,0 +1,32 @@
+/*
+ * Copyright (c) 2017 Fuzhou Rockchip Electronics Co., Ltd
+ *
+ * SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+ */
+
+#ifndef __DT_BINDINGS_POWER_PX30_POWER_H__
+#define __DT_BINDINGS_POWER_PX30_POWER_H__
+
+/* VD_CORE */
+#define PX30_PD_A35_0  0
+#define PX30_PD_A35_1  1
+#define PX30_PD_A35_2  2
+#define PX30_PD_A35_3  3
+#define PX30_PD_SCU4
+
+/* VD_LOGIC */
+#define PX30_PD_USB5
+#define PX30_PD_DDR6
+#define PX30_PD_SDCARD 7
+#define PX30_PD_CRYPTO 8
+#define PX30_PD_GMAC   9
+#define PX30_PD_MMC_NAND   10
+#define PX30_PD_VPU11
+#define PX30_PD_VO 12
+#define PX30_PD_VI 13
+#define PX30_PD_GPU14
+
+/* VD_PMU */
+#define PX30_PD_PMU15
+
+#endif
-- 
1.9.1

[PATCH v2 12/13] dt-bindings: add binding for px30 power domains

2018-05-13 Thread Elaine Zhang

From: Finley Xiao 

Add binding documentation for the power domains
found on Rockchip PX30 SoCs.

Signed-off-by: Finley Xiao 
Signed-off-by: Elaine Zhang 
---
 Documentation/devicetree/bindings/soc/rockchip/power_domain.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt 
b/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
index affe36dcfa17..5d49d0a2ff29 100644
--- a/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
+++ b/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
@@ -5,6 +5,7 @@ powered up/down by software based on different application 
scenes to save power.
 
 Required properties for power domain controller:
 - compatible: Should be one of the following.
+   "rockchip,px30-power-controller" - for PX30 SoCs.
"rockchip,rk3036-power-controller" - for RK3036 SoCs.
"rockchip,rk3128-power-controller" - for RK3128 SoCs.
"rockchip,rk3228-power-controller" - for RK3228 SoCs.
@@ -20,6 +21,7 @@ Required properties for power domain controller:
 
 Required properties for power domain sub nodes:
 - reg: index of the power domain, should use macros in:
+   "include/dt-bindings/power/px30-power.h" - for PX30 type power domain.
"include/dt-bindings/power/rk3036-power.h" - for RK3036 type power 
domain.
"include/dt-bindings/power/rk3128-power.h" - for RK3128 type power 
domain.
"include/dt-bindings/power/rk3228-power.h" - for RK3228 type power 
domain.
@@ -99,6 +101,7 @@ Node of a device using power domains must have a 
power-domains property,
 containing a phandle to the power device node and an index specifying which
 power domain to use.
 The index should use macros in:
+   "include/dt-bindings/power/px30-power.h" - for px30 type power domain.
"include/dt-bindings/power/rk3036-power.h" - for rk3036 type power 
domain.
"include/dt-bindings/power/rk3128-power.h" - for rk3128 type power 
domain.
"include/dt-bindings/power/rk3128-power.h" - for rk3228 type power 
domain.
-- 
1.9.1

[PATCH v2 09/13] dt-bindings: add binding for rk3228 power domains

2018-05-13 Thread Elaine Zhang

Add binding documentation for the power domains
found on Rockchip RK3228 SoCs.

Signed-off-by: Elaine Zhang 
---
 Documentation/devicetree/bindings/soc/rockchip/power_domain.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt 
b/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
index 9a3f5fd36a80..affe36dcfa17 100644
--- a/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
+++ b/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
@@ -7,6 +7,7 @@ Required properties for power domain controller:
 - compatible: Should be one of the following.
"rockchip,rk3036-power-controller" - for RK3036 SoCs.
"rockchip,rk3128-power-controller" - for RK3128 SoCs.
+   "rockchip,rk3228-power-controller" - for RK3228 SoCs.
"rockchip,rk3288-power-controller" - for RK3288 SoCs.
"rockchip,rk3328-power-controller" - for RK3328 SoCs.
"rockchip,rk3366-power-controller" - for RK3366 SoCs.
@@ -21,6 +22,7 @@ Required properties for power domain sub nodes:
 - reg: index of the power domain, should use macros in:
"include/dt-bindings/power/rk3036-power.h" - for RK3036 type power 
domain.
"include/dt-bindings/power/rk3128-power.h" - for RK3128 type power 
domain.
+   "include/dt-bindings/power/rk3228-power.h" - for RK3228 type power 
domain.
"include/dt-bindings/power/rk3288-power.h" - for RK3288 type power 
domain.
"include/dt-bindings/power/rk3328-power.h" - for RK3328 type power 
domain.
"include/dt-bindings/power/rk3366-power.h" - for RK3366 type power 
domain.
@@ -99,6 +101,7 @@ power domain to use.
 The index should use macros in:
"include/dt-bindings/power/rk3036-power.h" - for rk3036 type power 
domain.
"include/dt-bindings/power/rk3128-power.h" - for rk3128 type power 
domain.
+   "include/dt-bindings/power/rk3128-power.h" - for rk3228 type power 
domain.
"include/dt-bindings/power/rk3288-power.h" - for rk3288 type power 
domain.
"include/dt-bindings/power/rk3328-power.h" - for rk3328 type power 
domain.
"include/dt-bindings/power/rk3366-power.h" - for rk3366 type power 
domain.
-- 
1.9.1

Re: [PATCH 3/3] sched/fair: schedutil: explicit update only when required

2018-05-13 Thread Joel Fernandes

On Thu, May 10, 2018 at 04:05:53PM +0100, Patrick Bellasi wrote:
> Schedutil updates for FAIR tasks are triggered implicitly each time a
> cfs_rq's utilization is updated via cfs_rq_util_change(), currently
> called by update_cfs_rq_load_avg(), when the utilization of a cfs_rq has
> changed, and {attach,detach}_entity_load_avg().
> 
> This design is based on the idea that "we should callback schedutil
> frequently enough" to properly update the CPU frequency at every
> utilization change. However, such an integration strategy has also
> some downsides:

Hi Patrick,

I agree making the call explicit would make schedutil integration easier so
that's really awesome. However I also fear that if some path in the fair
class in the future changes the utilization but forgets to update schedutil
explicitly (because they forgot to call the explicit public API) then the
schedutil update wouldn't go through. In this case the previous design of
doing the schedutil update in the wrapper kind of was a nice to have

Just thinking out loud but is there a way you could make the implicit call
anyway incase the explicit call wasn't requested for some reason? That's
probably hard to do correctly though..

Some more comments below:

> 
>  - schedutil updates are triggered by RQ's load updates, which makes
>sense in general but it does not allow to know exactly which other RQ
>related information have been updated.
>Recently, for example, we had issues due to schedutil dependencies on
>cfs_rq->h_nr_running and estimated utilization updates.
> 
>  - cfs_rq_util_change() is mainly a wrapper function for an already
>existing "public API", cpufreq_update_util(), which is required
>just to ensure we actually update schedutil only when we are updating
>a root cfs_rq.
>Thus, especially when task groups are in use, most of the calls to
>this wrapper function are not required.
> 
>  - the usage of a wrapper function is not completely consistent across
>fair.c, since we could still need additional explicit calls to
>cpufreq_update_util().
>For example this already happens to report the IOWAIT boot flag in
>the wakeup path.
> 
>  - it makes it hard to integrate new features since it could require to
>change other function prototypes just to pass in an additional flag,
>as it happened for example in commit:
> 
>   ea14b57e8a18 ("sched/cpufreq: Provide migration hint")
> 
> All the above considered, let's make schedutil updates more explicit in
> fair.c by removing the cfs_rq_util_change() wrapper function in favour
> of the existing cpufreq_update_util() public API.
> This can be done by calling cpufreq_update_util() explicitly in the few
> call sites where it really makes sense and when all the (potentially)
> required cfs_rq's information have been updated.
> 
> This patch mainly removes code and adds explicit schedutil updates
> only when we:
>  - {enqueue,dequeue}_task_fair() a task to/from the root cfs_rq
>  - (un)throttle_cfs_rq() a set of tasks up to the root cfs_rq
>  - task_tick_fair() to update the utilization of the root cfs_rq
> 
> All the other code paths, currently _indirectly_ covered by a call to
> update_load_avg(), are still covered. Indeed, some paths already imply
> enqueue/dequeue calls:
>  - switch_{to,from}_fair()
>  - sched_move_task()
> while others are followed by enqueue/dequeue calls:
>  - cpu_cgroup_fork() and
>post_init_entity_util_avg():
>  are used at wakeup_new_task() time and thus already followed by an
>  enqueue_task_fair()
>  - migrate_task_rq_fair():
>  updates the removed utilization but not the actual cfs_rq
>  utilization, which is updated by a following sched event
> 
> This new proposal allows also to better aggregate schedutil related
> flags, which are required only at enqueue_task_fair() time.
> IOWAIT and MIGRATION flags are now requested only when a task is
> actually visible at the root cfs_rq level.
> 
> Signed-off-by: Patrick Bellasi 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Rafael J. Wysocki 
> Cc: Viresh Kumar 
> Cc: Joel Fernandes 
> Cc: Juri Lelli 
> Cc: linux-kernel@vger.kernel.org
> Cc: linux...@vger.kernel.org
> 
> ---
> 
> NOTE: this patch changes the behavior of the IOWAIT flag: in case of a
> task waking up on a throttled RQ we do not assert the flag to schedutil
> anymore. However, this seems to make sense since the task will not be
> running anyway.
> ---
>  kernel/sched/fair.c | 81 
> -
>  1 file changed, 36 insertions(+), 45 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 01dfc47541e6..87f092151a6e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -772,7 +772,7 @@ void post_init_entity_util_avg(struct sched_entity *se)
>* For !fair tasks do:
>*
>   update_cfs_rq_load_avg(now, cfs_rq);
> -

Re: KASAN: use-after-free Write in irq_bypass_register_consumer

2018-05-13 Thread Eric Biggers

On Thu, Apr 05, 2018 at 08:15:24PM -0700, Eric Biggers wrote:
> On Mon, Jan 29, 2018 at 01:29:48PM +0800, Tianyu Lan wrote:
> > 
> > 
> > On 1/27/2018 7:27 AM, Eric Biggers wrote:
> > > On Sat, Dec 16, 2017 at 04:37:02PM +0800, Lan, Tianyu wrote:
> > > > The root cause is that kvm_irqfd_assign() and kvm_irqfd_deassign() can't
> > > > be run in parallel. Some data structure(e.g, irqfd->consumer) will be
> > > > crashed because irqfd may be freed in deassign path before they are used
> > > > in assign path. The other data maybe used in deassign path before
> > > > initialization. Syzbot test hit such case. Add mutx between
> > > > kvm_irqfd_assign() and kvm_irqfd_deassign() can fix such issue. Will
> > > > send patch to fix it.
> > > > 
> > > > On 12/16/2017 12:53 PM, Tianyu Lan wrote:
> > > > > I reproduced the issue. Will have a look.
> > > > > 
> > > > > -- Best regards Tianyu Lan 2017-12-15 18:14 GMT+08:00 syzbot
> > > > > :
> > > > > > syzkaller has found reproducer for the following crash on
> > > > > > 82bcf1def3b5f1251177ad47c44f7e17af039b4b
> > > > > > git://git.cmpxchg.org/linux-mmots.git/master
> > > > > > compiler: gcc (GCC) 7.1.1 20170620
> > > > > > .config is attached
> > > > > > Raw console output is attached.
> > > > > > C reproducer is attached
> > > > > > syzkaller reproducer is attached. Seehttps://goo.gl/kgGztJ
> > > > > > for information about syzkaller reproducers
> > > > > > 
> > > > > > 
> > > > > > ==
> > > > > > BUG: KASAN: use-after-free in __list_add include/linux/list.h:64 
> > > > > > [inline]
> > > > > > BUG: KASAN: use-after-free in list_add include/linux/list.h:79 
> > > > > > [inline]
> > > > > > BUG: KASAN: use-after-free in 
> > > > > > irq_bypass_register_consumer+0x4b4/0x500
> > > > > > virt/lib/irqbypass.c:217
> > > > > > Write of size 8 at addr 8801cdf51180 by task 
> > > > > > syzkaller436086/15031
> > > > > > 
> > > > > > CPU: 1 PID: 15031 Comm: syzkaller436086 Not tainted 4.15.0-rc2-mm1+ 
> > > > > > #39
> > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, 
> > > > > > BIOS
> > > > > > Google 01/01/2011
> > > > > > Call Trace:
> > > > > >__dump_stack lib/dump_stack.c:17 [inline]
> > > > > >dump_stack+0x194/0x257 lib/dump_stack.c:53
> > > > > >print_address_description+0x73/0x250 mm/kasan/report.c:252
> > > > > >kasan_report_error mm/kasan/report.c:351 [inline]
> > > > > >kasan_report+0x25b/0x340 mm/kasan/report.c:409
> > > > > >__asan_report_store8_noabort+0x17/0x20 mm/kasan/report.c:435
> > > > > >__list_add include/linux/list.h:64 [inline]
> > > > > >list_add include/linux/list.h:79 [inline]
> > > > > >irq_bypass_register_consumer+0x4b4/0x500 virt/lib/irqbypass.c:217
> > > > > >kvm_irqfd_assign arch/x86/kvm/../../../virt/kvm/eventfd.c:417 
> > > > > > [inline]
> > > > > >kvm_irqfd+0x137f/0x1d50 
> > > > > > arch/x86/kvm/../../../virt/kvm/eventfd.c:572
> > > > > >kvm_vm_ioctl+0x1079/0x1c40 
> > > > > > arch/x86/kvm/../../../virt/kvm/kvm_main.c:2992
> > > > > >vfs_ioctl fs/ioctl.c:46 [inline]
> > > > > >do_vfs_ioctl+0x1b1/0x1530 fs/ioctl.c:686
> > > > > >SYSC_ioctl fs/ioctl.c:701 [inline]
> > > > > >SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
> > > > > >entry_SYSCALL_64_fastpath+0x1f/0x96
> > > > > > RIP: 0033:0x44d379
> > > > > > RSP: 002b:7fc5ff9a9d08 EFLAGS: 0246 ORIG_RAX: 
> > > > > > 0010
> > > > > > RAX: ffda RBX: 7fc5ff9aa700 RCX: 0044d379
> > > > > > RDX: 20080fe0 RSI: 4020ae76 RDI: 0005
> > > > > > RBP: 007ff900 R08: 7fc5ff9aa700 R09: 7fc5ff9aa700
> > > > > > R10: 7fc5ff9aa700 R11: 0246 R12: 
> > > > > > R13: 007ff8ff R14: 7fc5ff9aa9c0 R15: 
> > > > > > 
> > > > > > Allocated by task 15031:
> > > > > >save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> > > > > >set_track mm/kasan/kasan.c:459 [inline]
> > > > > >kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
> > > > > >kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3614
> > > > > >kmalloc include/linux/slab.h:516 [inline]
> > > > > >kzalloc include/linux/slab.h:705 [inline]
> > > > > >kvm_irqfd_assign arch/x86/kvm/../../../virt/kvm/eventfd.c:296 
> > > > > > [inline]
> > > > > >kvm_irqfd+0x16c/0x1d50 
> > > > > > arch/x86/kvm/../../../virt/kvm/eventfd.c:572
> > > > > >kvm_vm_ioctl+0x1079/0x1c40 
> > > > > > arch/x86/kvm/../../../virt/kvm/kvm_main.c:2992
> > > > > >vfs_ioctl fs/ioctl.c:46 [inline]
> > > > > >do_vfs_ioctl+0x1b1/0x1530 fs/ioctl.c:686
> > > > > >SYSC_ioctl fs/ioctl.c:701 [inline]
> > > > > >SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
> > > > > >entry_SYSCALL_64_fastpath+0x1f/0x96
> > > > > > 
> > > > > > Freed by task 1402:
> > > > > >save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> > > > > >set_track mm/kasan/kasan.c:459 [inline]
> > > > > >

Re: [PATCH 3/3] sched/fair: schedutil: explicit update only when required

2018-05-13 Thread Joel Fernandes

On Sat, May 12, 2018 at 11:04:43PM -0700, Joel Fernandes wrote:
> On Thu, May 10, 2018 at 04:05:53PM +0100, Patrick Bellasi wrote:
> > Schedutil updates for FAIR tasks are triggered implicitly each time a
> > cfs_rq's utilization is updated via cfs_rq_util_change(), currently
> > called by update_cfs_rq_load_avg(), when the utilization of a cfs_rq has
> > changed, and {attach,detach}_entity_load_avg().
> > 
> > This design is based on the idea that "we should callback schedutil
> > frequently enough" to properly update the CPU frequency at every
> > utilization change. However, such an integration strategy has also
> > some downsides:
> 
> Hi Patrick,
> 
> I agree making the call explicit would make schedutil integration easier so
> that's really awesome. However I also fear that if some path in the fair
> class in the future changes the utilization but forgets to update schedutil
> explicitly (because they forgot to call the explicit public API) then the
> schedutil update wouldn't go through. In this case the previous design of
> doing the schedutil update in the wrapper kind of was a nice to have
> 
> Just thinking out loud but is there a way you could make the implicit call
> anyway incase the explicit call wasn't requested for some reason? That's
> probably hard to do correctly though..
> 
> Some more comments below:
> 
> > 
> >  - schedutil updates are triggered by RQ's load updates, which makes
> >sense in general but it does not allow to know exactly which other RQ
> >related information have been updated.
> >Recently, for example, we had issues due to schedutil dependencies on
> >cfs_rq->h_nr_running and estimated utilization updates.
> > 
> >  - cfs_rq_util_change() is mainly a wrapper function for an already
> >existing "public API", cpufreq_update_util(), which is required
> >just to ensure we actually update schedutil only when we are updating
> >a root cfs_rq.
> >Thus, especially when task groups are in use, most of the calls to
> >this wrapper function are not required.
> > 
> >  - the usage of a wrapper function is not completely consistent across
> >fair.c, since we could still need additional explicit calls to
> >cpufreq_update_util().
> >For example this already happens to report the IOWAIT boot flag in
> >the wakeup path.
> > 
> >  - it makes it hard to integrate new features since it could require to
> >change other function prototypes just to pass in an additional flag,
> >as it happened for example in commit:
> > 
> >   ea14b57e8a18 ("sched/cpufreq: Provide migration hint")
> > 
> > All the above considered, let's make schedutil updates more explicit in
> > fair.c by removing the cfs_rq_util_change() wrapper function in favour
> > of the existing cpufreq_update_util() public API.
> > This can be done by calling cpufreq_update_util() explicitly in the few
> > call sites where it really makes sense and when all the (potentially)
> > required cfs_rq's information have been updated.
> > 
> > This patch mainly removes code and adds explicit schedutil updates
> > only when we:
> >  - {enqueue,dequeue}_task_fair() a task to/from the root cfs_rq
> >  - (un)throttle_cfs_rq() a set of tasks up to the root cfs_rq
> >  - task_tick_fair() to update the utilization of the root cfs_rq
> > 
> > All the other code paths, currently _indirectly_ covered by a call to
> > update_load_avg(), are still covered. Indeed, some paths already imply
> > enqueue/dequeue calls:
> >  - switch_{to,from}_fair()
> >  - sched_move_task()
> > while others are followed by enqueue/dequeue calls:
> >  - cpu_cgroup_fork() and
> >post_init_entity_util_avg():
> >  are used at wakeup_new_task() time and thus already followed by an
> >  enqueue_task_fair()
> >  - migrate_task_rq_fair():
> >  updates the removed utilization but not the actual cfs_rq
> >  utilization, which is updated by a following sched event
> > 
> > This new proposal allows also to better aggregate schedutil related
> > flags, which are required only at enqueue_task_fair() time.
> > IOWAIT and MIGRATION flags are now requested only when a task is
> > actually visible at the root cfs_rq level.
> > 
> > Signed-off-by: Patrick Bellasi 
> > Cc: Ingo Molnar 
> > Cc: Peter Zijlstra 
> > Cc: Rafael J. Wysocki 
> > Cc: Viresh Kumar 
> > Cc: Joel Fernandes 
> > Cc: Juri Lelli 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux...@vger.kernel.org
> > 
> > ---
> > 
> > NOTE: this patch changes the behavior of the IOWAIT flag: in case of a
> > task waking up on a throttled RQ we do not assert the flag to schedutil
> > anymore. However, this seems to make sense since the task will not be
> > running anyway.
> > ---
> >  kernel/sched/fair.c | 81 
> > -
> >  1 file changed, 36 insertions(+), 45 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 01dfc47541e6..87f092151a6e 100644
> > ---

Re: BUG: unable to handle kernel paging request in cgroup_mt_destroy_v1

2018-05-13 Thread Eric Biggers

On Wed, Jan 31, 2018 at 05:58:01PM -0800, syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on upstream commit
> 3da90b159b146672f830bcd2489dd3a1f4e9e089 (Wed Jan 31 03:07:32 2018 +)
> Merge tag 'f2fs-for-4.16-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
> 
> So far this crash happened 3 times on net-next, upstream.
> C reproducer is attached.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+eeed2602160e4cc17...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> audit: type=1400 audit(1517426494.787:7): avc:  denied  { map } for
> pid=4176 comm="syzkaller493328" path="/root/syzkaller493328633" dev="sda1"
> ino=16481 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
> tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> BUG: unable to handle kernel paging request at ff6d
> IP: css_put include/linux/cgroup.h:386 [inline]
> IP: cgroup_put include/linux/cgroup.h:415 [inline]
> IP: cgroup_mt_destroy_v1+0xe5/0x310 net/netfilter/xt_cgroup.c:102
> PGD 6a25067 P4D 6a25067 PUD 6a27067 PMD 0
> Oops:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 4176 Comm: syzkaller493328 Not tainted 4.15.0+ #288
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:css_put include/linux/cgroup.h:386 [inline]
> RIP: 0010:cgroup_put include/linux/cgroup.h:415 [inline]
> RIP: 0010:cgroup_mt_destroy_v1+0xe5/0x310 net/netfilter/xt_cgroup.c:102
> RSP: 0018:8801b19e7958 EFLAGS: 00010246
> RAX: 0008 RBX: 11003633cf2b RCX: 847188c6
> RDX:  RSI: 8709b900 RDI: ff6d
> RBP: 8801b19e79e0 R08: 11003633cef9 R09: 
> R10:  R11:  R12: ff01
> R13: 8801b19e79b8 R14: dc00 R15: 84718810
> FS:  00c16880() GS:8801db40() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: ff6d CR3: 0001b1f38004 CR4: 001606f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  cleanup_match+0x14e/0x220 net/ipv6/netfilter/ip6_tables.c:481
>  cleanup_entry+0xcb/0x350 net/ipv4/netfilter/ip_tables.c:646
>  __do_replace+0x7d7/0xa90 net/ipv4/netfilter/ip_tables.c:1091
>  do_replace net/ipv4/netfilter/ip_tables.c:1147 [inline]
>  do_ipt_set_ctl+0x40f/0x5f0 net/ipv4/netfilter/ip_tables.c:1677
>  nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
>  nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
>  ip_setsockopt+0xa1/0xb0 net/ipv4/ip_sockglue.c:1256
>  tcp_setsockopt+0x82/0xd0 net/ipv4/tcp.c:2875
>  sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2968
>  SYSC_setsockopt net/socket.c:1831 [inline]
>  SyS_setsockopt+0x189/0x360 net/socket.c:1810
>  entry_SYSCALL_64_fastpath+0x29/0xa0
> RIP: 0033:0x4408a9
> RSP: 002b:7ffddd061cc8 EFLAGS: 0207 ORIG_RAX: 0036
> RAX: ffda RBX:  RCX: 004408a9
> RDX: 0040 RSI:  RDI: 0004
> RBP: faaff2414ccfc19e R08: 12f0 R09: 
> R10: 2000b000 R11: 0207 R12: 886f734548d4d66b
> R13: ff01 R14:  R15: 
> Code: 6c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 0f b6 14 02 48
> 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 a6 01 00 00 <41> f6 44 24 6c
> 01 74 2e e8 be 06 ff fc 48 b8 00 00 00 00 00 fc
> RIP: css_put include/linux/cgroup.h:386 [inline] RSP: 8801b19e7958
> RIP: cgroup_put include/linux/cgroup.h:415 [inline] RSP: 8801b19e7958
> RIP: cgroup_mt_destroy_v1+0xe5/0x310 net/netfilter/xt_cgroup.c:102 RSP:
> 8801b19e7958
> CR2: ff6d
> ---[ end trace bfd8c145aa41ae03 ]---
> 
> 
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
> 
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title

This was fixed by commit ba7cd5d95f25cc6:

#syz fix: netfilter: xt_cgroup: initialize info->priv in cgroup_mt_check_v1()

- Eric

Re: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes

2018-05-13 Thread Neftin, Sasha

On 5/10/2018 21:42, Keller, Jacob E wrote:

-Original Message-
From: Benjamin Poirier [mailto:bpoir...@suse.com]
Sent: Thursday, May 10, 2018 12:29 AM
To: Kirsher, Jeffrey T 
Cc: Keller, Jacob E ; Achim Mildenberger
; olouvig...@gmail.com;
jaya...@goubiq.com; ehabk...@redhat.com; postmodern.m...@gmail.com;
bart.vanass...@wdc.com; intel-wired-...@lists.osuosl.org;
net...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes

There have been multiple reports of crashes that look like
kernel: RIP: 0010:[] timecounter_read+0xf/0x50
[...]
kernel: Call Trace:
kernel:  [] e1000e_phc_gettime+0x2f/0x60 [e1000e]
kernel:  [] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
kernel:  [] process_one_work+0x155/0x440
kernel:  [] worker_thread+0x116/0x4b0
kernel:  [] kthread+0xd2/0xf0
kernel:  [] ret_from_fork+0x3f/0x70

These can be traced back to the fact that e1000e_systim_reset() skips the
timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
leads to a null deref in timecounter_read().

Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
e1000e_get_base_timinca() in such a way that it can return -EINVAL for
e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.

Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
sometimes don't have the SYSCFI bit set. Retrying the read shortly after
finds the bit to be set. This was observed at boot (probe) but also link up
and link down.

Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
reads where SYSCFI=0. Therefore, remove this register read and
unconditionally set the clock parameters.

Reported-by: Achim Mildenberger 
Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
Fixes: 83129b37ef35 ("e1000e: fix systim issues")
Signed-off-by: Benjamin Poirier 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++-
  1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
b/drivers/net/ethernet/intel/e1000e/netdev.c
index ec4a9759a6f2..3afb1f3b6f91 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter
*adapter, u32 *timinca)
}
break;
case e1000_pch_spt:
-   if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
-   /* Stable 24MHz frequency */
-   incperiod = INCPERIOD_24MHZ;
-   incvalue = INCVALUE_24MHZ;
-   shift = INCVALUE_SHIFT_24MHZ;
-   adapter->cc.shift = shift;
-   break;
-   }
-   return -EINVAL;
+   /* Stable 24MHz frequency */
+   incperiod = INCPERIOD_24MHZ;
+   incvalue = INCVALUE_24MHZ;
+   shift = INCVALUE_SHIFT_24MHZ;
+   adapter->cc.shift = shift;
+   break;
case e1000_pch_cnp:
if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
/* Stable 24MHz frequency */
--
2.16.3

Given testing showing that the clock operates fine regardless of the register 
read, I think this is probably fine. Normally I believe the register was used 
to check which frequency was in use, but it doesn't seem to serve that purpose 
here.

Thanks,
Jake
___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

I've checked our specification, looks only 24MHz used for this product. 
Hope no different platform with another clock support has been 
distributed. So, let's pick up this change.

another branch for linux-next

2018-05-13 Thread Christoph Hellwig

Hi Stephen,

can you please add the for-linus branch of

   git://git.infradead.org/users/hch/dma-mapping.git

to linux-next?  You already carry the for-next branch, but it turns
out I'll need another branch for late fixes so that for-next can
remain unmerged and unrebased.

[PATCH v2] {net, IB}/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'

2018-05-13 Thread Christophe JAILLET

When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to
free it.

Signed-off-by: Christophe JAILLET 
---
v1 -> v2: More places to update have been added to the patch
---
 drivers/infiniband/hw/mlx5/cq.c| 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/vport.c| 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 77d257ec899b..6d52ea03574e 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -849,7 +849,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct 
ib_udata *udata,
return 0;
 
 err_cqb:
-   kfree(*cqb);
+   kvfree(*cqb);
 
 err_db:
mlx5_ib_db_unmap_user(to_mucontext(context), >db);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 35e256eb2f6e..b123f8a52ad8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -663,7 +663,7 @@ static int esw_create_vport_rx_group(struct mlx5_eswitch 
*esw)
 
esw->offloads.vport_rx_group = g;
 out:
-   kfree(flow_group_in);
+   kvfree(flow_group_in);
return err;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c 
b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 177e076b8d17..719cecb182c6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -511,7 +511,7 @@ int mlx5_query_nic_vport_system_image_guid(struct 
mlx5_core_dev *mdev,
*system_image_guid = MLX5_GET64(query_nic_vport_context_out, out,
nic_vport_context.system_image_guid);
 
-   kfree(out);
+   kvfree(out);
 
return 0;
 }
@@ -531,7 +531,7 @@ int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev 
*mdev, u64 *node_guid)
*node_guid = MLX5_GET64(query_nic_vport_context_out, out,
nic_vport_context.node_guid);
 
-   kfree(out);
+   kvfree(out);
 
return 0;
 }
@@ -587,7 +587,7 @@ int mlx5_query_nic_vport_qkey_viol_cntr(struct 
mlx5_core_dev *mdev,
*qkey_viol_cntr = MLX5_GET(query_nic_vport_context_out, out,
   nic_vport_context.qkey_violation_counter);
 
-   kfree(out);
+   kvfree(out);
 
return 0;
 }
-- 
2.17.0

Re: [PATCH] net/mlx4_core: Fix error handling in mlx4_init_port_info.

2018-05-13 Thread Tariq Toukan




On 02/05/2018 4:31 PM, Tariq Toukan wrote:



On 27/04/2018 6:20 PM, Tarick Bedeir wrote:

Avoid exiting the function with a lingering sysfs file (if the first
call to device_create_file() fails while the second succeeds), and avoid
calling devlink_port_unregister() twice.

In other words, either mlx4_init_port_info() succeeds and returns 
zero, or

it fails, returns non-zero, and requires no cleanup.

Signed-off-by: Tarick Bedeir 
---
  drivers/net/ethernet/mellanox/mlx4/main.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c

index 4d84cab77105..e8a3a45d0b53 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3007,6 +3007,7 @@ static int mlx4_init_port_info(struct mlx4_dev 
*dev, int port)

  mlx4_err(dev, "Failed to create file for port %d\n", port);
  devlink_port_unregister(>devlink_port);
  info->port = -1;
+    return err;
  }
  sprintf(info->dev_mtu_name, "mlx4_port%d_mtu", port);
@@ -3028,9 +3029,10 @@ static int mlx4_init_port_info(struct mlx4_dev 
*dev, int port)

 >port_attr);
  devlink_port_unregister(>devlink_port);
  info->port = -1;
+    return err;
  }
-    return err;
+    return 0;
  }
  static void mlx4_cleanup_port_info(struct mlx4_port_info *info)


Acked-by: Tariq Toukan 

Thanks Tarick.


Actually, you need to add a Fixes line:

Fixes: 096335b3f983 ("mlx4_core: Allow dynamic MTU configuration for IB 
ports")

[PATCH] remoteproc: Add APSS based Qualcomm ADSP PIL driver for SDM845

2018-05-13 Thread Rohit kumar

This adds Qualcomm ADSP PIL driver support for SDM845 with ADSP bootup
and shutdown operation handled from Application Processor SubSystem(APSS).

Signed-off-by: Rohit kumar 
Signed-off-by: RajendraBabu Medisetti 
Signed-off-by: Krishnamurthy Renu 
---
 .../devicetree/bindings/remoteproc/qcom,adsp.txt   |   1 +
 drivers/remoteproc/Makefile|   3 +-
 drivers/remoteproc/qcom_adsp_pil.c | 122 -
 drivers/remoteproc/qcom_adsp_pil.h |  86 ++
 drivers/remoteproc/qcom_adsp_pil_sdm845.c  | 304 +
 5 files changed, 454 insertions(+), 62 deletions(-)
 create mode 100644 drivers/remoteproc/qcom_adsp_pil.h
 create mode 100644 drivers/remoteproc/qcom_adsp_pil_sdm845.c

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,adsp.txt 
b/Documentation/devicetree/bindings/remoteproc/qcom,adsp.txt
index 728e419..a9fe033 100644
--- a/Documentation/devicetree/bindings/remoteproc/qcom,adsp.txt
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,adsp.txt
@@ -10,6 +10,7 @@ on the Qualcomm ADSP Hexagon core.
"qcom,msm8974-adsp-pil"
"qcom,msm8996-adsp-pil"
"qcom,msm8996-slpi-pil"
+   "qcom,sdm845-apss-adsp-pil"
 
 - interrupts-extended:
Usage: required
diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
index 02627ed..759831b 100644
--- a/drivers/remoteproc/Makefile
+++ b/drivers/remoteproc/Makefile
@@ -14,7 +14,8 @@ obj-$(CONFIG_OMAP_REMOTEPROC) += omap_remoteproc.o
 obj-$(CONFIG_WKUP_M3_RPROC)+= wkup_m3_rproc.o
 obj-$(CONFIG_DA8XX_REMOTEPROC) += da8xx_remoteproc.o
 obj-$(CONFIG_KEYSTONE_REMOTEPROC)  += keystone_remoteproc.o
-obj-$(CONFIG_QCOM_ADSP_PIL)+= qcom_adsp_pil.o
+obj-$(CONFIG_QCOM_ADSP_PIL)+= qcom_adsp.o
+qcom_adsp-objs += qcom_adsp_pil.o 
qcom_adsp_pil_sdm845.o
 obj-$(CONFIG_QCOM_RPROC_COMMON)+= qcom_common.o
 obj-$(CONFIG_QCOM_Q6V5_PIL)+= qcom_q6v5_pil.o
 obj-$(CONFIG_QCOM_SYSMON)  += qcom_sysmon.o
diff --git a/drivers/remoteproc/qcom_adsp_pil.c 
b/drivers/remoteproc/qcom_adsp_pil.c
index 89a86ce..9ab3698 100644
--- a/drivers/remoteproc/qcom_adsp_pil.c
+++ b/drivers/remoteproc/qcom_adsp_pil.c
@@ -1,5 +1,5 @@
 /*
- * Qualcomm ADSP/SLPI Peripheral Image Loader for MSM8974 and MSM8996
+ * Qualcomm ADSP/SLPI Peripheral Image Loader for MSM8974, MSM8996 and SDM845.
  *
  * Copyright (C) 2016 Linaro Ltd
  * Copyright (C) 2014 Sony Mobile Communications AB
@@ -22,7 +22,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -30,56 +29,8 @@
 #include 
 #include 
 
-#include "qcom_common.h"
 #include "remoteproc_internal.h"
-
-struct adsp_data {
-   int crash_reason_smem;
-   const char *firmware_name;
-   int pas_id;
-   bool has_aggre2_clk;
-
-   const char *ssr_name;
-   const char *sysmon_name;
-   int ssctl_id;
-};
-
-struct qcom_adsp {
-   struct device *dev;
-   struct rproc *rproc;
-
-   int wdog_irq;
-   int fatal_irq;
-   int ready_irq;
-   int handover_irq;
-   int stop_ack_irq;
-
-   struct qcom_smem_state *state;
-   unsigned stop_bit;
-
-   struct clk *xo;
-   struct clk *aggre2_clk;
-
-   struct regulator *cx_supply;
-   struct regulator *px_supply;
-
-   int pas_id;
-   int crash_reason_smem;
-   bool has_aggre2_clk;
-
-   struct completion start_done;
-   struct completion stop_done;
-
-   phys_addr_t mem_phys;
-   phys_addr_t mem_reloc;
-   void *mem_region;
-   size_t mem_size;
-
-   struct qcom_rproc_glink glink_subdev;
-   struct qcom_rproc_subdev smd_subdev;
-   struct qcom_rproc_ssr ssr_subdev;
-   struct qcom_sysmon *sysmon;
-};
+#include "qcom_adsp_pil.h"
 
 static int adsp_load(struct rproc *rproc, const struct firmware *fw)
 {
@@ -112,18 +63,32 @@ static int adsp_start(struct rproc *rproc)
if (ret)
goto disable_cx_supply;
 
-   ret = qcom_scm_pas_auth_and_reset(adsp->pas_id);
-   if (ret) {
-   dev_err(adsp->dev,
-   "failed to authenticate image and release reset\n");
-   goto disable_px_supply;
+   if (adsp->is_apss_controlled) {
+   ret = adsp->ops->bringup(adsp);
+   if (ret) {
+   dev_err(adsp->dev, "adsp bringup failed\n");
+   adsp->ops->bringdown(adsp);
+   goto disable_px_supply;
+   }
+   } else {
+   ret = qcom_scm_pas_auth_and_reset(adsp->pas_id);
+   if (ret) {
+   dev_err(adsp->dev,
+   "failed to authenticate image and release 
reset\n");
+   goto disable_px_supply;
+   }
}
 
ret =

[GIT PULL] dma mapping fix for 4.17-rc5

2018-05-13 Thread Christoph Hellwig

Hi Linus,

one trivial dma-mapping regression fix for you.  Note that this has
NOT been in linux-next as I didn't want to disturb the branch in
there which has the 4.18 material.  I've asked Stephen to add the
for-linus branch in addition to for-next so that this doesn't happen
again.  In addition to being entirely trivial I also made sure it
passed the 0-day build bot.

The following changes since commit 75bc37fefc4471e718ba8e651aa74673d4e0a9eb:

  Linux 4.17-rc4 (2018-05-06 16:57:38 -1000)

are available in the Git repository at:

  git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-4.17-5

for you to fetch changes up to 05e13bb57e6f181d7605f8608181c7e6fb7f591d:

  swiotlb: silent unwanted warning "buffer is full" (2018-05-12 11:57:37 +0200)


another dma-mapping fix for 4.17-rc:

 - just one little fix from Jean to avoid a harmless but very annoying
   warning, especially for the drm code


Jean Delvare (1):
  swiotlb: silent unwanted warning "buffer is full"

 lib/swiotlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Re: [PATCH v2 7/7] ALSA: usb: add UAC3 BADD profiles support

2018-05-13 Thread Takashi Iwai

On Fri, 11 May 2018 17:36:36 +0200,
Jorge wrote:
> 
> 
> 
> On 04/05/18 02:24, Ruslan Bilovol wrote:
> > Recently released USB Audio Class 3.0 specification
> > contains BADD (Basic Audio Device Definition) document
> > which describes pre-defined UAC3 configurations.
> >
> > BADD support is mandatory for UAC3 devices, it should be
> > implemented as a separate USB device configuration.
> > As per BADD document, class-specific descriptors
> > shall not be included in the Device’s Configuration
> > descriptor ("inferred"), but host can guess them
> > from BADD profile number, number of endpoints and
> > their max packed sizes.
> >
> > This patch adds support of all BADD profiles from the spec
> >
> > Signed-off-by: Ruslan Bilovol 
> 
> Tested-by: Jorge Sanjuan 

OK, I'll queue this one to for-next branch.
Thanks!


Takashi

Re: [PATCH] ALSA: control: fix a redundant-copy issue

2018-05-13 Thread Takashi Iwai

On Sat, 05 May 2018 20:38:03 +0200,
Wenwen Wang wrote:
> 
> In snd_ctl_elem_add_compat(), the fields of the struct 'data' need to be
> copied from the corresponding fields of the struct 'data32' in userspace.
> This is achieved by invoking copy_from_user() and get_user() functions. The
> problem here is that the 'type' field is copied twice. One is by
> copy_from_user() and one is by get_user(). Given that the 'type' field is
> not used between the two copies, the second copy is *completely* redundant
> and should be removed for better performance and cleanup. Also, these two
> copies can cause inconsistent data: as the struct 'data32' resides in
> userspace and a malicious userspace process can race to change the 'type'
> field between the two copies to cause inconsistent data. Depending on how
> the data is used in the future, such an inconsistency may cause potential
> security risks.
> 
> For above reasons, we should take out the second copy.
> 
> Signed-off-by: Wenwen Wang 

Applied now, thanks.


Takashi

Re: [PATCH v5 03/13] ALSA: hda/ca0132: Add PCI region2 iomap for SBZ

2018-05-13 Thread Takashi Iwai

On Tue, 08 May 2018 19:20:03 +0200,
Connor McAdams wrote:
> 
> This patch adds iomapping for the region2 section of memory on the SBZ.
> This memory region is used in later patches for setting inputs and
> outputs. If the mapping fails, the quirk is changed back to QUIRK_NONE
> to avoid attempts to write to uninitialized memory.
> 
> It also adds a new exit sequence to unmap the iomem for the SBZ.
> 
> Signed-off-by: Connor McAdams 
> ---
>  sound/pci/hda/patch_ca0132.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/sound/pci/hda/patch_ca0132.c b/sound/pci/hda/patch_ca0132.c
> index 02238fe..78d2c26 100644
> --- a/sound/pci/hda/patch_ca0132.c
> +++ b/sound/pci/hda/patch_ca0132.c
> @@ -29,6 +29,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  #include "hda_codec.h"
>  #include "hda_local.h"
>  #include "hda_auto_parser.h"

The linux/*.h inclusion should be before sound/*.h.
But never mind, I fixed it locally, so no need for resubmission.


thanks,

Takashi

Re: [PATCH v5 00/13] ALSA: hda/ca0132: Patch Series for Recon3Di and Sound Blaster Z Support

2018-05-13 Thread Takashi Iwai

On Tue, 08 May 2018 19:20:00 +0200,
Connor McAdams wrote:
> 
> This patchset adds support for the Sound Blaster Z and the Recon3Di.
> 
> In order to figure out how to get these cards to work, I made a program called
> QemuHDADump[1], which uses the trace function of qemu to see interactions with
> the memory mapped pci BAR space of the card being used in the virtual machine.
> With this, I obtain the CORB buffer location to get the command verbs, and 
> then
> dump them each time the buffer rolls over. This program may be useful for 
> fixing
> other HDA related driver issues where there is no documentation for the 
> device.
> 
> So far, I have been able to get all features supported on the Sound Blaster Z
> and the Recon3Di. All output and input effects work, all inputs and outputs
> work, and just about anything else I can think of. I have also added new
> controls in order to select the new inputs and outputs, as well as controls to
> change the effect levels and presets.
> 
> I have also added the ability to use firmware taken from the Windows drivers 
> of
> both the Sound Blaster Z and Recon3Di. I am trying to get into contact with
> Creative to get permission to redistribute these along with the current
> file included with the Chromebook, but they have not been very responsive.
> Luckily, the cards work with the Chromebook firmware just fine, although I
> believe there has to be a reason they have different firmware in Windows. I
> will not link to the firmwares here, but if you look up my thread on Creative
> Labs forums, you will find the link to download the firmwares there.
> 
> I am willing to help get the other non-working cards such as the ZxR and the
> newer AE-5 working too, but I will need someone willing to run QemuHDADump in 
> a
> virtual machine in order to get the commands.
> 
> So, in summary:
> -This patchset makes the cards work better than they did before (they really
>  didn't work before)
> 
> -This patchset leaves the original chromebook related stuff alone.
> 
> Thanks.
> 
> [1] https://github.com/Conmanx360/QemuHDADump
> 
> Bugs:
> ---
> Recon3Di: (Reported by Mariusz Ceier)
> ***
> -Occasionally switching between rear and front mic breaks the input until
>  computer is shutdown or put to sleep.
> 
> -Surround Sound works, but is inconsistent. Sometimes, just updating the 
> volume
>  fixes it, and sometimes, it requires a restart.
> 
> Sound Blaster Z:
> ***
> -none that I'm aware of.
> 
> 
> Version changes:
> ---
> v1:
> ***
> -Massive patch formatting failure, please ignore v1.
> 
> v2:
> ***
> -Fixed patch formatting failure.
> 
> v3:
> ***
> -Fixed mem_base unmap, instead of checking for QUIRK_SBZ on exit, have it 
> check
>  if the area is mapped, and if it is, unmap it. Also make it unmap after all
>  other commands are finished.
> 
> -Change notification of failure to map mem_base from codec_dbg to codec_warn,
>  and use codec_info to tell the user that their card might have been 
> incorrectly
>  identified as a Sound Blaster Z.
> 
> -Remove commented out commands in sbz_exit_chip function, only reintroduce 
> them
>  when their functions are defined.
> 
> v4:
> ***
> -Split patch into smaller pieces.
> 
> -Added const to alt_out_presets array.
> 
> -Fixed command that was commented out and only put it in when
>  it was actually used.
> 
> v5:
> ***
> -Fixed issue identified by kbuild test robot, where patch 12 didn't compile
>  individually.
> 
> Connor McAdams (13):
>   ALSA: hda/ca0132: R3Di and SBZ quirk entires + alt firmware loading
>   ALSA: hda/ca0132: Add pincfg for SBZ + R3Di, add fp hp auto-detect
>   ALSA: hda/ca0132: Add PCI region2 iomap for SBZ
>   ALSA: hda/ca0132: Add extra exit functions for R3Di and SBZ
>   ALSA: hda/ca0132: add extra init functions for r3di + sbz
>   ALSA: hda/ca0132: update core functions for sbz + r3di
>   ALSA: hda/ca0132: add dsp setup related commands for the sbz
>   ALSA: hda/ca0132: Add dsp setup + gpio functions for r3di
>   ALSA: hda/ca0132: add the ability to set src_id on scp commands
>   ALSA: hda/ca0132: add alt_select_in/out for R3Di + SBZ
>   ALSA: hda/ca0132: Add DSP Volume set and New mixers for SBZ + R3Di
>   ALSA: hda/ca0132: add ca0132_alt_set_vipsource
>   ALSA: hda/ca0132: Add new control changes for SBZ + R3Di

Now I applied all patches to for-next branch.
Thanks for your work, it has been a PITA for long time!


Takashi

Re: [PATCH v5 03/23] iommu/vt-d: add a flag for pasid table bound status

2018-05-13 Thread Lu Baolu

Hi,

On 05/12/2018 04:53 AM, Jacob Pan wrote:
> Adding a flag in device domain into to track whether a guest or
typo:   ^^info

Best regards,
Lu Baolu

> user PASID table is bound to a device.
>
> Signed-off-by: Jacob Pan 
> ---
>  include/linux/intel-iommu.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 304afae..ddc7d79 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -473,6 +473,7 @@ struct device_domain_info {
>   u8 pri_enabled:1;
>   u8 ats_supported:1;
>   u8 ats_enabled:1;
> + u8 pasid_table_bound:1;
>   u8 ats_qdep;
>   u64 fault_mask; /* selected IOMMU faults to be reported */
>   struct device *dev; /* it's NULL for PCIe-to-PCI bridge */

Re: KASAN: null-ptr-deref Write in linear_transfer

2018-05-13 Thread Eric Biggers

On Wed, Jan 10, 2018 at 10:58:43AM +0100, Takashi Iwai wrote:
> On Wed, 10 Jan 2018 09:08:00 +0100,
> Eric Biggers wrote:
> > 
> > On Fri, Jan 05, 2018 at 02:58:02AM -0800, syzbot wrote:
> > > Hello,
> > > 
> > > syzkaller hit the following crash on
> > > 30a7acd573899fd8b8ac39236eff6468b195ac7d
> > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> > > compiler: gcc (GCC) 7.1.1 20170620
> > > .config is attached
> > > Raw console output is attached.
> > > C reproducer is attached
> > > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> > > for information about syzkaller reproducers
> > > 
> > > 
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+a8f5641f452c7e6ab...@syzkaller.appspotmail.com
> > > It will help syzbot understand when the bug is fixed. See footer for
> > > details.
> > > If you forward the report, please keep this part and the footer.
> > > 
> > > ==
> > > BUG: KASAN: null-ptr-deref in memcpy include/linux/string.h:344 [inline]
> > > BUG: KASAN: null-ptr-deref in do_convert sound/core/oss/linear.c:52 
> > > [inline]
> > > BUG: KASAN: null-ptr-deref in convert sound/core/oss/linear.c:81 [inline]
> > > BUG: KASAN: null-ptr-deref in linear_transfer+0x634/0x900
> > > sound/core/oss/linear.c:110
> > > Write of size 2 at addr   (null) by task syzkaller360172/7860
> > > 
> > > CPU: 0 PID: 7860 Comm: syzkaller360172 Not tainted 4.15.0-rc6+ #155
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > Google 01/01/2011
> > > Call Trace:
> > >  __dump_stack lib/dump_stack.c:17 [inline]
> > >  dump_stack+0x194/0x257 lib/dump_stack.c:53
> > >  kasan_report_error mm/kasan/report.c:349 [inline]
> > >  kasan_report+0x13b/0x340 mm/kasan/report.c:409
> > >  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
> > >  check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
> > >  memcpy+0x37/0x50 mm/kasan/kasan.c:303
> > >  memcpy include/linux/string.h:344 [inline]
> > >  do_convert sound/core/oss/linear.c:52 [inline]
> > >  convert sound/core/oss/linear.c:81 [inline]
> > >  linear_transfer+0x634/0x900 sound/core/oss/linear.c:110
> > >  snd_pcm_plug_write_transfer+0x22d/0x420 sound/core/oss/pcm_plugin.c:611
> > >  snd_pcm_oss_write2+0x260/0x420 sound/core/oss/pcm_oss.c:1311
> > >  snd_pcm_oss_sync1+0x1cc/0x550 sound/core/oss/pcm_oss.c:1530
> > >  snd_pcm_oss_sync+0x5b6/0x830 sound/core/oss/pcm_oss.c:1604
> > >  snd_pcm_oss_release+0x20b/0x280 sound/core/oss/pcm_oss.c:2431
> > >  __fput+0x327/0x7e0 fs/file_table.c:210
> > >  fput+0x15/0x20 fs/file_table.c:244
> > >  task_work_run+0x199/0x270 kernel/task_work.c:113
> > >  exit_task_work include/linux/task_work.h:22 [inline]
> > >  do_exit+0x9bb/0x1ad0 kernel/exit.c:865
> > >  do_group_exit+0x149/0x400 kernel/exit.c:968
> > >  get_signal+0x73f/0x16c0 kernel/signal.c:2335
> > >  do_signal+0x90/0x1eb0 arch/x86/kernel/signal.c:809
> > >  exit_to_usermode_loop+0x214/0x310 arch/x86/entry/common.c:158
> > >  prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
> > >  syscall_return_slowpath arch/x86/entry/common.c:264 [inline]
> > >  do_syscall_32_irqs_on arch/x86/entry/common.c:333 [inline]
> > >  do_fast_syscall_32+0xbfd/0xf9d arch/x86/entry/common.c:389
> > >  entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:129
> > 
> > Still reproducible even after all the fixes currently in sound/for-linus.
> 
> Interesting, I can't reproduce it on my VM any longer...
> 

No longer occurring, last occurrence was Mar 29 on commit a2601d78b77aa.
Seems to have been fixed by commit 02a5d6925cd3:

#syz fix: ALSA: pcm: Avoid potential races between OSS ioctls and read/write

The reproducer was opening /dev/dsp1, then concurrently writing to it and
calling the SNDCTL_DSP_SPEED ioctl.

- Eric

Re: KASAN: null-ptr-deref Write in linear_transfer

2018-05-13 Thread Takashi Iwai

On Sun, 13 May 2018 09:36:36 +0200,
Eric Biggers wrote:
> 
> On Wed, Jan 10, 2018 at 10:58:43AM +0100, Takashi Iwai wrote:
> > On Wed, 10 Jan 2018 09:08:00 +0100,
> > Eric Biggers wrote:
> > > 
> > > On Fri, Jan 05, 2018 at 02:58:02AM -0800, syzbot wrote:
> > > > Hello,
> > > > 
> > > > syzkaller hit the following crash on
> > > > 30a7acd573899fd8b8ac39236eff6468b195ac7d
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> > > > compiler: gcc (GCC) 7.1.1 20170620
> > > > .config is attached
> > > > Raw console output is attached.
> > > > C reproducer is attached
> > > > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> > > > for information about syzkaller reproducers
> > > > 
> > > > 
> > > > IMPORTANT: if you fix the bug, please add the following tag to the 
> > > > commit:
> > > > Reported-by: syzbot+a8f5641f452c7e6ab...@syzkaller.appspotmail.com
> > > > It will help syzbot understand when the bug is fixed. See footer for
> > > > details.
> > > > If you forward the report, please keep this part and the footer.
> > > > 
> > > > ==
> > > > BUG: KASAN: null-ptr-deref in memcpy include/linux/string.h:344 [inline]
> > > > BUG: KASAN: null-ptr-deref in do_convert sound/core/oss/linear.c:52 
> > > > [inline]
> > > > BUG: KASAN: null-ptr-deref in convert sound/core/oss/linear.c:81 
> > > > [inline]
> > > > BUG: KASAN: null-ptr-deref in linear_transfer+0x634/0x900
> > > > sound/core/oss/linear.c:110
> > > > Write of size 2 at addr   (null) by task syzkaller360172/7860
> > > > 
> > > > CPU: 0 PID: 7860 Comm: syzkaller360172 Not tainted 4.15.0-rc6+ #155
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > > Google 01/01/2011
> > > > Call Trace:
> > > >  __dump_stack lib/dump_stack.c:17 [inline]
> > > >  dump_stack+0x194/0x257 lib/dump_stack.c:53
> > > >  kasan_report_error mm/kasan/report.c:349 [inline]
> > > >  kasan_report+0x13b/0x340 mm/kasan/report.c:409
> > > >  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
> > > >  check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
> > > >  memcpy+0x37/0x50 mm/kasan/kasan.c:303
> > > >  memcpy include/linux/string.h:344 [inline]
> > > >  do_convert sound/core/oss/linear.c:52 [inline]
> > > >  convert sound/core/oss/linear.c:81 [inline]
> > > >  linear_transfer+0x634/0x900 sound/core/oss/linear.c:110
> > > >  snd_pcm_plug_write_transfer+0x22d/0x420 sound/core/oss/pcm_plugin.c:611
> > > >  snd_pcm_oss_write2+0x260/0x420 sound/core/oss/pcm_oss.c:1311
> > > >  snd_pcm_oss_sync1+0x1cc/0x550 sound/core/oss/pcm_oss.c:1530
> > > >  snd_pcm_oss_sync+0x5b6/0x830 sound/core/oss/pcm_oss.c:1604
> > > >  snd_pcm_oss_release+0x20b/0x280 sound/core/oss/pcm_oss.c:2431
> > > >  __fput+0x327/0x7e0 fs/file_table.c:210
> > > >  fput+0x15/0x20 fs/file_table.c:244
> > > >  task_work_run+0x199/0x270 kernel/task_work.c:113
> > > >  exit_task_work include/linux/task_work.h:22 [inline]
> > > >  do_exit+0x9bb/0x1ad0 kernel/exit.c:865
> > > >  do_group_exit+0x149/0x400 kernel/exit.c:968
> > > >  get_signal+0x73f/0x16c0 kernel/signal.c:2335
> > > >  do_signal+0x90/0x1eb0 arch/x86/kernel/signal.c:809
> > > >  exit_to_usermode_loop+0x214/0x310 arch/x86/entry/common.c:158
> > > >  prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
> > > >  syscall_return_slowpath arch/x86/entry/common.c:264 [inline]
> > > >  do_syscall_32_irqs_on arch/x86/entry/common.c:333 [inline]
> > > >  do_fast_syscall_32+0xbfd/0xf9d arch/x86/entry/common.c:389
> > > >  entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:129
> > > 
> > > Still reproducible even after all the fixes currently in sound/for-linus.
> > 
> > Interesting, I can't reproduce it on my VM any longer...
> > 
> 
> No longer occurring, last occurrence was Mar 29 on commit a2601d78b77aa.
> Seems to have been fixed by commit 02a5d6925cd3:
> 
> #syz fix: ALSA: pcm: Avoid potential races between OSS ioctls and read/write
> 
> The reproducer was opening /dev/dsp1, then concurrently writing to it and
> calling the SNDCTL_DSP_SPEED ioctl.

Good to hear.  Yes, this should have been covered by the change.
Thanks for checking!


Takashi

Re: [PATCH 1/2] KVM: X86: Fix CR3 reserve bits

2018-05-13 Thread Liran Alon


- kernel...@gmail.com wrote:

> From: Wanpeng Li 
> 
> MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4. 
> It should be checked when PCIDE bit is not set, however commit 
> 'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based on 
> its physical address width")' removes the bit 63 checking 
> unconditionally. This patch fixes it by checking bit 63 of CR3 
> when PCIDE bit is not set in CR4.
> 
> Fixes: d1cd3ce900441 (KVM: MMU: check guest CR3 reserved bits based on
> its physical address width)
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: Junaid Shahid 
> Signed-off-by: Wanpeng Li 
> ---
>  arch/x86/kvm/emulate.c | 4 +++-
>  arch/x86/kvm/x86.c | 2 +-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index b3705ae..b21f427 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4189,7 +4189,9 @@ static int check_cr_write(struct
> x86_emulate_ctxt *ctxt)
>   maxphyaddr = eax & 0xff;
>   else
>   maxphyaddr = 36;
> - rsvd = rsvd_bits(maxphyaddr, 62);
> + if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
> + new_val &= ~CR3_PCID_INVD;
> + rsvd = rsvd_bits(maxphyaddr, 63);

I would prefer instead to do this:
if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
rsvd &= ~CR3_PCID_INVD;
It makes more sense as opposed to temporary removing the CR3_PCID_INVD bit from 
new_val.

>   }
>  
>   if (new_val & rsvd)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 87e4805..9a90668 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -863,7 +863,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned
> long cr3)
>   }
>  
>   if (is_long_mode(vcpu) &&
> - (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 62)))
> + (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
>   return 1;
>   else if (is_pae(vcpu) && is_paging(vcpu) &&
>  !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
> -- 
> 2.7.4

Re: [PATCH v5 03/23] iommu/vt-d: add a flag for pasid table bound status

2018-05-13 Thread Lu Baolu

Hi again,

On 05/12/2018 04:53 AM, Jacob Pan wrote:
> Adding a flag in device domain into to track whether a guest or
> user PASID table is bound to a device.
>
> Signed-off-by: Jacob Pan 
> ---
>  include/linux/intel-iommu.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 304afae..ddc7d79 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -473,6 +473,7 @@ struct device_domain_info {
>   u8 pri_enabled:1;
>   u8 ats_supported:1;
>   u8 ats_enabled:1;
> + u8 pasid_table_bound:1;

Can you please add some comments here? So that, people can
understand the purpose for this bit exactly.

Best regards,
Lu Baolu

>   u8 ats_qdep;
>   u64 fault_mask; /* selected IOMMU faults to be reported */
>   struct device *dev; /* it's NULL for PCIe-to-PCI bridge */

Re: [PATCH 2/2] KVM: X86: Fix loss of CR3_PCID_INVD bit when guest writes CR3

2018-05-13 Thread Liran Alon


- kernel...@gmail.com wrote:

> From: Wanpeng Li 
> 
> SDM volume 3, section 4.10.4:
> 
> * MOV to CR3. The behavior of the instruction depends on the value of
> CR4.PCIDE:
> — If CR4.PCIDE = 1 and bit 63 of the instruction’s source operand is
> 1, the 
>   instruction is not required to invalidate any TLB entries or entries
> in 
>   paging-structure caches.
> 
> The CR3_PCID_INVD bit should not be removed if CR4.PCIDE = 1 when
> guest writes 
> CR3, this patch fixes it.
> 
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: Junaid Shahid 
> Signed-off-by: Wanpeng Li 
> ---
>  arch/x86/kvm/x86.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9a90668..438f140 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -849,11 +849,13 @@ EXPORT_SYMBOL_GPL(kvm_set_cr4);
>  
>  int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  {
> + unsigned long cr3_check = cr3;
> +
>  #ifdef CONFIG_X86_64
>   bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>  
>   if (pcid_enabled)
> - cr3 &= ~CR3_PCID_INVD;
> + cr3_check &= ~CR3_PCID_INVD;
>  #endif
>  
>   if (cr3 == kvm_read_cr3(vcpu) && !pdptrs_changed(vcpu)) {
> @@ -863,7 +865,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned
> long cr3)
>   }
>  
>   if (is_long_mode(vcpu) &&
> - (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
> + (cr3_check & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
>   return 1;
>   else if (is_pae(vcpu) && is_paging(vcpu) &&
>  !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
> -- 
> 2.7.4

This commit doesn't seem correct to me.

According to Intel SDM "MOV—Move to/from Control Registers":
"If CR4.PCIDE = 1, bit 63 of the source operand to MOV to CR3 determines 
whether the instruction
invalidates entries in the TLBs and the paging-structure caches
(see Section 4.10.4.1, “Operations that Invalidate TLBs and Paging-Structure 
Caches,”
in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 
3A).
The instruction does not modify bit 63 of CR3, which is reserved and always 0."

However, after this commit kvm_set_cr3() will update vcpu->arch.cr3 to have bit 
CR3_PCID_INVD set.
Which is wrong as it should be reserved and always 0.

Re: [PATCH 3/3] arm64: dts: renesas: draak: Describe HDMI input

2018-05-13 Thread Simon Horman

On Fri, May 11, 2018 at 12:00:02PM +0200, Jacopo Mondi wrote:
> Describe HDMI input connected to VIN4 interface for R-Car D3 Draak
> development board.
> 
> Signed-off-by: Jacopo Mondi 

Hi Niklas,

As you reviewed the rest of the series I'm wondering if you're planning
to review this patch too.

Re: [PATCH 1/2] KVM: X86: Fix CR3 reserve bits

2018-05-13 Thread Wanpeng Li

2018-05-13 15:53 GMT+08:00 Liran Alon :
>
> - kernel...@gmail.com wrote:
>
>> From: Wanpeng Li 
>>
>> MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4.
>> It should be checked when PCIDE bit is not set, however commit
>> 'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based on
>> its physical address width")' removes the bit 63 checking
>> unconditionally. This patch fixes it by checking bit 63 of CR3
>> when PCIDE bit is not set in CR4.
>>
>> Fixes: d1cd3ce900441 (KVM: MMU: check guest CR3 reserved bits based on
>> its physical address width)
>> Cc: Paolo Bonzini 
>> Cc: Radim Krčmář 
>> Cc: Junaid Shahid 
>> Signed-off-by: Wanpeng Li 
>> ---
>>  arch/x86/kvm/emulate.c | 4 +++-
>>  arch/x86/kvm/x86.c | 2 +-
>>  2 files changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index b3705ae..b21f427 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -4189,7 +4189,9 @@ static int check_cr_write(struct
>> x86_emulate_ctxt *ctxt)
>>   maxphyaddr = eax & 0xff;
>>   else
>>   maxphyaddr = 36;
>> - rsvd = rsvd_bits(maxphyaddr, 62);
>> + if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
>> + new_val &= ~CR3_PCID_INVD;
>> + rsvd = rsvd_bits(maxphyaddr, 63);
>
> I would prefer instead to do this:
> if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
> rsvd &= ~CR3_PCID_INVD;
> It makes more sense as opposed to temporary removing the CR3_PCID_INVD bit 
> from new_val.

It tries the same way
https://git.kernel.org/pub/scm/virt/kvm/kvm.git/commit/?id=c19986fea873f3c745122bf79013a872a190f212
pointed out.

Regards,
Wanpeng Li

>
>>   }
>>
>>   if (new_val & rsvd)
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 87e4805..9a90668 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -863,7 +863,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned
>> long cr3)
>>   }
>>
>>   if (is_long_mode(vcpu) &&
>> - (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 62)))
>> + (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
>>   return 1;
>>   else if (is_pae(vcpu) && is_paging(vcpu) &&
>>  !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
>> --
>> 2.7.4

Re: general protection fault in __radix_tree_delete

2018-05-13 Thread Dmitry Vyukov

On Sun, Apr 29, 2018 at 7:00 PM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> cdface5209349930ae1b51338763c8e029971b97 (Sun Apr 29 03:07:21 2018 +)
> Merge tag 'for_linus_stable' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=549decbd1891d501b6d5
>
> So far this crash happened 8 times on upstream.
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6647588371562496
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=4781854846615552
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=4580574157078528
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=7043958930931867332
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+549decbd1891d501b...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.


This crash was bisected as introduced by:

commit faeb7833eee0d6afe0ecb6bdfa6042556c2c352e
Author: Roman Kagan 
Date:   Thu Feb 1 16:48:32 2018 +0300

kvm: x86: hyperv: guest->host event signaling via eventfd

https://gist.githubusercontent.com/dvyukov/df4971d7dfd1b37bedb5bfa0c95f9ebc/raw/ee8b7804788049f80625563e0322090c798c4544/gistfile1.txt



> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 4525 Comm: syz-executor786 Not tainted 4.17.0-rc2+ #23
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline]
> RIP: 0010:__radix_tree_delete+0x74/0x230 lib/radix-tree.c:1989
> RSP: 0018:8801d9137108 EFLAGS: 00010206
> RAX: 0003 RBX: dc00 RCX: 11003b226e3e
> RDX:  RSI: 8768eeed RDI: 8801a7dac168
> RBP: 8801d91371a8 R08: 8801d962c1c0 R09: ed0034fb5811
> R10: 8801d91372b8 R11: 8801a7dac08f R12: 
> R13: 8801a7dac168 R14: 0018 R15: 8801d9137230
> FS:  01df3880() GS:8801dae0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 2100 CR3: 0001d9104000 CR4: 001426f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  radix_tree_delete_item+0x148/0x2d0 lib/radix-tree.c:2050
>  idr_remove+0x46/0x60 lib/idr.c:157
>  kvm_hv_eventfd_deassign arch/x86/kvm/hyperv.c:1433 [inline]
>  kvm_vm_ioctl_hv_eventfd+0x1df/0x24b arch/x86/kvm/hyperv.c:1451
>  kvm_arch_vm_ioctl+0x155e/0x2690 arch/x86/kvm/x86.c:4563
>  kvm_vm_ioctl+0x246/0x1d90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3100
>  vfs_ioctl fs/ioctl.c:46 [inline]
>  file_ioctl fs/ioctl.c:500 [inline]
>  do_vfs_ioctl+0x1cf/0x16a0 fs/ioctl.c:684
>  ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
>  __do_sys_ioctl fs/ioctl.c:708 [inline]
>  __se_sys_ioctl fs/ioctl.c:706 [inline]
>  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x440069
> RSP: 002b:7ffcf0b02cd8 EFLAGS: 0217 ORIG_RAX: 0010
> RAX: ffda RBX: 6d766b2f7665642f RCX: 00440069
> RDX: 2000 RSI: 4018aebd RDI: 00a9
> RBP: 006cb018 R08: 7ffcf0b02cf0 R09: 7ffcf0b02cf0
> R10: 7ffcf0b02cf0 R11: 0217 R12: 004018a0
> R13: 00401930 R14:  R15: 
> Code: 3f 9a 88 48 c7 45 88 80 ee 68 87 c7 00 f1 f1 f1 f1 c7 40 04 00 f2 f2
> f2 c7 40 08 f3 f3 f3 f3 e8 a3 51 10 fa 4c 89 f0 48 c1 e8 03 <80> 3c 18 00 0f
> 85 97 01 00 00 48 8d 55 d8 4c 8d 7a c0 49 8b 1e
> RIP: __read_once_size include/linux/compiler.h:188 [inline] RSP:
> 8801d9137108
> RIP: __radix_tree_delete+0x74/0x230 lib/radix-tree.c:1989 RSP:
> 8801d9137108
> ---[ end trace 79327005f044daef ]---
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug

Re: [PATCH 1/2] KVM: X86: Fix CR3 reserve bits

2018-05-13 Thread Liran Alon


- kernel...@gmail.com wrote:

> 2018-05-13 15:53 GMT+08:00 Liran Alon :
> >
> > - kernel...@gmail.com wrote:
> >
> >> From: Wanpeng Li 
> >>
> >> MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4.
> >> It should be checked when PCIDE bit is not set, however commit
> >> 'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based on
> >> its physical address width")' removes the bit 63 checking
> >> unconditionally. This patch fixes it by checking bit 63 of CR3
> >> when PCIDE bit is not set in CR4.
> >>
> >> Fixes: d1cd3ce900441 (KVM: MMU: check guest CR3 reserved bits based
> on
> >> its physical address width)
> >> Cc: Paolo Bonzini 
> >> Cc: Radim Krčmář 
> >> Cc: Junaid Shahid 
> >> Signed-off-by: Wanpeng Li 
> >> ---
> >>  arch/x86/kvm/emulate.c | 4 +++-
> >>  arch/x86/kvm/x86.c | 2 +-
> >>  2 files changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> >> index b3705ae..b21f427 100644
> >> --- a/arch/x86/kvm/emulate.c
> >> +++ b/arch/x86/kvm/emulate.c
> >> @@ -4189,7 +4189,9 @@ static int check_cr_write(struct
> >> x86_emulate_ctxt *ctxt)
> >>   maxphyaddr = eax & 0xff;
> >>   else
> >>   maxphyaddr = 36;
> >> - rsvd = rsvd_bits(maxphyaddr, 62);
> >> + if (ctxt->ops->get_cr(ctxt, 4) &
> X86_CR4_PCIDE)
> >> + new_val &= ~CR3_PCID_INVD;
> >> + rsvd = rsvd_bits(maxphyaddr, 63);
> >
> > I would prefer instead to do this:
> > if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
> > rsvd &= ~CR3_PCID_INVD;
> > It makes more sense as opposed to temporary removing the
> CR3_PCID_INVD bit from new_val.
> 
> It tries the same way
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_virt_kvm_kvm.git_commit_-3Fid-3Dc19986fea873f3c745122bf79013a872a190f212=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0=r52WDgKBorUHwe_B_5Nw2Le_F_E0ne8lqqWW6n-3bSg=ufTcXvhhAMkY3XP6gAx-HiKCT8ynPWo2fs2z9DqCzM4=
> pointed out.
> 
> Regards,
> Wanpeng Li

Yes but there it makes sense as new CR3 value should not have bit 63 set in 
vcpu->arch.cr3.

> 
> >
> >>   }
> >>
> >>   if (new_val & rsvd)
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index 87e4805..9a90668 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -863,7 +863,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu,
> unsigned
> >> long cr3)
> >>   }
> >>
> >>   if (is_long_mode(vcpu) &&
> >> - (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 62)))
> >> + (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
> >>   return 1;
> >>   else if (is_pae(vcpu) && is_paging(vcpu) &&
> >>  !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
> >> --
> >> 2.7.4

Re: [PATCH v3 0/3] mailbox: ACPI: Remove incorrect error message about parsing PCCT

2018-05-13 Thread Rafael J. Wysocki

On Tuesday, May 1, 2018 2:39:04 AM CEST Al Stone wrote:
> This set of patches provide some cleanup in ACPI for minor issues
> found while correcting a bogus error message (the first two patches),
> and the correction for the error message itself (patch 3/3).  Note
> that patches 1/3 and 2/3 are not required for 3/3 to work: 1/3 only
> changes a comment and 2/3 makes an ACPI table parsing loop a wee bit
> more robust.
> 
> For patch 3/3, many systems on boot have been reporting "Error parsing
> PCC subspaces from PCCT" which turns out to not be an error at all.
> The issue is that the probe for ACPI mailboxes defined in the PCCT
> (Platform Communications Channel Table) makes a faulty assumption about
> the content of the PCCT.  What's more, when the error is reported, no
> further PCC mailboxes are set up, even when they have been defined
> in the PCCT.  So, in the reported cases, there was no error and the
> data in the PCCT was being ignored.  This is described in more detail
> in patch 3/3.
> 
> Since these patches primarily involve ACPI usages, it may make
> sense for all of them to go through the linux-acpi tree; clearly,
> this is up to the maintainers, though.
> 
> v3:
>   -- properly format docbook info in patch 1/3
>   -- remove extra parens in patch 2/3
>   -- clean up formatting, remove pr_warn() calls used in debugging but
>  providing no value, clean up docbook info for count_pcc_subspaces()
>  and parse_pcc_subspaces(), all in patch 3/3
> 
> v2:
>   -- removed one extraneous '+' in a comment in patch 3/3
>   -- fixed an if test that had a predicate that kbuild pointed out would
>  always be zero
> 
> Al Stone (3):
>   ACPI: improve function documentation for acpi_parse_entries_array()
>   ACPI: ensure acpi_parse_entries_array() does not access non-existent
> table data
>   mailbox: ACPI: erroneous error message when parsing the ACPI PCCT
> 
>  drivers/acpi/tables.c | 12 ---
>  drivers/mailbox/pcc.c | 96 
> +--
>  2 files changed, 71 insertions(+), 37 deletions(-)

I've applied [1-2/3] and I'm waiting on input from Prashanth on the [3/3].

Thanks,
Rafael

QEMU ATA hard disk not detected

2018-05-13 Thread Paul Menzel


Dear Linux folks,


In QEMU 2.11 a disk is only detected by the AHCI driver and not by 
libata. On QEMU’s Standard PC (i440FX + PIIX, 1996), that causes an 
attached drive not to be detected as that machine doesn’t support AHCI.


Here is the output with the machine Standard PC (Q35 + ICH9, 2009), 
which does have AHCI support.



$ qemu-system-x86_64 -enable-kvm -M q35 -m 1G -serial stdio -hda /dev/shm/qemu-debian.img 
-kernel bzImage -append "console=ttyS0,115200 console=tty0" -initrd 
initrd.cpio.xz
WARNING: Image format was not specified for '/dev/shm/qemu-debian.img' and 
probing guessed raw.
 Automatically detecting the format is dangerous for raw images, write 
operations on block 0 will be restricted.
 Specify the 'raw' format explicitly to remove the restrictions.
qemu-system-x86_64: warning: host doesn't support requested feature: 
CPUID.8001H:ECX.svm [bit 2]
[0.00] Linux version 4.17.0-rc4-heads+ 
(pmen...@bohemianrhapsody.molgen.mpg.de) (gcc version 7.3.0 (GCC)) #1 SMP Sun 
May 13 10:11:43 CEST 2018
[0.00] Command line: console=ttyS0,115200 console=tty0
[…]
[0.250239] SCSI subsystem initialized
[0.252012] libata version 3.00 loaded.
[…]
[0.633761] ahci :00:1f.2: version 3.0
[0.635808] PCI Interrupt Link [LNKA] enabled at IRQ 10
[0.637935] PCI: setting IRQ 10 as level-triggered
[0.641276] ahci :00:1f.2: AHCI 0001. 32 slots 6 ports 1.5 Gbps 0x3f 
impl SATA mode
[0.644598] ahci :00:1f.2: flags: 64bit ncq only 
[0.647877] scsi host0: ahci

[0.649254] scsi host1: ahci
[0.650591] scsi host2: ahci
[0.652075] scsi host3: ahci
[0.653497] scsi host4: ahci
[0.654775] scsi host5: ahci
[0.656414] ata1: SATA max UDMA/133 abar m4096@0xfebb1000 port 0xfebb1100 
irq 24
[0.659847] ata2: SATA max UDMA/133 abar m4096@0xfebb1000 port 0xfebb1180 
irq 24
[0.664691] ata3: SATA max UDMA/133 abar m4096@0xfebb1000 port 0xfebb1200 
irq 24
[0.667730] ata4: SATA max UDMA/133 abar m4096@0xfebb1000 port 0xfebb1280 
irq 24
[0.670826] ata5: SATA max UDMA/133 abar m4096@0xfebb1000 port 0xfebb1300 
irq 24
[0.674341] ata6: SATA max UDMA/133 abar m4096@0xfebb1000 port 0xfebb1380 
irq 24
[0.678009] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[0.680694] e1000: Copyright (c) 1999-2006 Intel Corporation.
[0.683495] PCI Interrupt Link [LNKG] enabled at IRQ 11
[0.685429] PCI: setting IRQ 11 as level-triggered
[0.996760] ata6: SATA link down (SStatus 0 SControl 300)
[0.998724] ata5: SATA link down (SStatus 0 SControl 300)
[1.001077] ata4: SATA link down (SStatus 0 SControl 300)
[1.003553] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[1.006047] ata3.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[1.008241] ata3.00: applying bridge limits
[1.010322] ata2: SATA link down (SStatus 0 SControl 300)
[1.012609] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[1.014864] ata1.00: ATA-7: QEMU HARDDISK, 2.5+, max UDMA/100
[1.016994] ata1.00: 6291456 sectors, multi 16: LBA48 NCQ (depth 31/32)
[1.019254] ata1.00: applying bridge limits
[1.028470] ata3.00: configured for UDMA/100
[1.030279] ata1.00: configured for UDMA/100
[1.032075] scsi 0:0:0:0: Direct-Access ATA  QEMU HARDDISK2.5+ 
PQ: 0 ANSI: 5
[1.040151] sd 0:0:0:0: Attached scsi generic sg0 type 0
[1.041985] sd 0:0:0:0: [sda] 6291456 512-byte logical blocks: (3.22 GB/3.00 
GiB)
[1.045002] sd 0:0:0:0: [sda] Write Protect is off
[1.046843] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[1.049064] scsi 2:0:0:0: CD-ROMQEMU QEMU DVD-ROM 2.5+ 
PQ: 0 ANSI: 5
[1.059073] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[1.067126]  sda: sda1
[1.068217] sd 0:0:0:0: [sda] Attached SCSI disk
[1.074848] sr 2:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
[1.076830] cdrom: Uniform CD-ROM driver Revision: 3.20
[1.078760] sr 2:0:0:0: Attached scsi CD-ROM sr0
[1.085017] sr 2:0:0:0: Attached scsi generic sg1 type 5


The Debian disk image was built with grml-debootstrap.

sudo grml-debootstrap --vmfile --vmsize 3G --target 
/dev/shm/qemu-debian.img -r sid


Is that a Linux or QEMU issue?


Kind regards,

Paul
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.17.0-rc4 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y

Re: [PATCH] swiotlb: Silent unwanted warning "buffer is full"

2018-05-13 Thread Jean Delvare

On Sat, 12 May 2018 12:02:40 +0200, Christoph Hellwig wrote:
> Thanks.
> 
> I manually applied this for 4.17-rc, as the mail unfortunately was
> garbled.

Sorry about that. Because of the umlauts in the Cc list, the mail was
encoded as quoted-printable instead of 7bit. I guess this is what
caused the problem on your end.

You would think our tools are able to deal with accentuated characters
properly in 2018 but apparently not :-(

-- 
Jean Delvare
SUSE L3 Support

[PATCH 2/2] arm64: Clear the stack

2018-05-13 Thread Alexander Popov

Hello Mark,

Thanks a lot for your reply!

On 11.05.2018 19:13, Mark Rutland wrote:
> On Fri, May 11, 2018 at 06:50:09PM +0300, Alexander Popov wrote:
>> On 06.05.2018 11:22, Alexander Popov wrote:
>>> On 04.05.2018 14:09, Mark Rutland wrote:
>>> +   stack_left = sp & (THREAD_SIZE - 1);
>>> +   BUG_ON(stack_left < 256 || size >= stack_left - 256);
>>
>> Is this arbitrary, or is there something special about 256?
>>
>> Even if this is arbitrary, can we give it some mnemonic?
>
> It's just a reasonable number. We can introduce a macro for it.

 I'm just not sure I see the point in the offset, given things like
 VMAP_STACK exist. BUG_ON() handling will likely require *more* than 256
 bytes of stack, so it seems superfluous, as we'd be relying on stack
 overflow detection at that point.

 I can see that we should take the CONFIG_SCHED_STACK_END_CHECK offset
 into account, though.
>>>
>>> Mark, thank you for such an important remark!
>>>
>>> In Kconfig STACKLEAK implies but doesn't depend on VMAP_STACK. In fact 
>>> x86_32
>>> doesn't have VMAP_STACK at all but can have STACKLEAK.
>>>
>>> [Adding Andy Lutomirski]
>>>
>>> I've made some additional experiments: I exhaust the thread stack to have 
>>> only
>>> (MIN_STACK_LEFT - 1) bytes left and then force alloca. If VMAP_STACK is
>>> disabled, BUG_ON() handling causes stack depth overflow, which is detected 
>>> by
>>> SCHED_STACK_END_CHECK. If VMAP_STACK is enabled, the kernel hangs on 
>>> BUG_ON()
>>> handling! Enabling CONFIG_PROVE_LOCKING gives the needed report from 
>>> VMAP_STACK:
> 
> I can't see why CONFIG_VMAP_STACK would only work in conjunction with
> CONFIG_PROVE_LOCKING.
> 
> On arm64 at least, if we overflow the stack while handling a BUG(), we
> *should* trigger the overflow handler as usual, and that should work,
> unless I'm missing something.
> 
> Maybe it gets part-way into panic(), sets up some state,
> stack-overflows, and we get wedged because we're already in a panic?
> Perhaps CONFIG_PROVE_LOCKING causes more stack to be used, so it dies a
> little earlier in panic(), before setting up some state that causes
> wedging.

That seems likely. I later noticed that I had oops=panic kernel parameter.

> ... which sounds like something best fixed in those code paths, and not
> here.
> 
>> [...]
>>
>>> I can't say why VMAP_STACK report hangs during BUG_ON() handling on 
>>> defconfig.
>>> Andy, can you give a clue?
>>>
>>> I see that MIN_STACK_LEFT = 2048 is enough for BUG_ON() handling on both 
>>> x86_64
>>> and x86_32. So I'm going to:
>>>  - set MIN_STACK_LEFT to 2048;
>>>  - improve the lkdtm test to cover this case.
>>>
>>> Mark, Kees, Laura, does it sound good?
>>
>>
>> Could you have a look at the following changes in check_alloca() before I 
>> send
>> the next version?
>>
>> If VMAP_STACK is enabled and alloca causes stack depth overflow, I write to
>> guard page below the thread stack to cause double fault and VMAP_STACK 
>> report.
> 
> On arm64 at least, writing to the guard page will not itself trigger a
> stack overflow, but will trigger a data abort. I suspect similar is true
> on x86, if the stack pointer is sufficiently far above the guard page.

Yes, you are right, my mistake.

The comment about CONFIG_VMAP_STACK in arch/x86/kernel/traps.c says:
"If we overflow the stack into a guard page, the CPU will fail to deliver #PF
and will send #DF instead."

>> If VMAP_STACK is disabled, I use MIN_STACK_LEFT = 2048, which seems to be 
>> enough
>> for BUG_ON() handling both on x86_32 and x86_64. Unfortunately, I can't
>> guarantee that it is always enough.
> 
> I don't think that we can choose something that's guaranteed to be
> sufficient for BUG() handling and also not wasting a tonne of space
> under normal operation.
> 
> Let's figure out what's going wrong on x86 in the case that you mention,
> and try to solve that.
> 
> Here I don't think we should reserve space at all -- it's completely
> arbitrary, and as above we can't guarantee that it's sufficient anyway.
> 
>>  #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
>> -#define MIN_STACK_LEFT 256
>> +#define MIN_STACK_LEFT 2048
>>
>>  void __used check_alloca(unsigned long size)
>>  {
>> unsigned long sp = (unsigned long)
>> struct stack_info stack_info = {0};
>> unsigned long visit_mask = 0;
>> unsigned long stack_left;
>>
>> BUG_ON(get_stack_info(, current, _info, _mask));
>>
>> stack_left = sp - (unsigned long)stack_info.begin;
>> +
>> +#ifdef CONFIG_VMAP_STACK
>> +   /*
>> +* If alloca oversteps the thread stack boundary, we touch the guard
>> +* page provided by VMAP_STACK to trigger handle_stack_overflow().
>> +*/
>> +   if (size >= stack_left)
>> +   *(stack_info.begin - 1) = 42;
>> +#else
> 
> On arm64, this won't trigger our stack overflow handler, unless the SP
> is already very close to the boundary.
> 
> Please just use

Re: [PATCH] kernel/sched/cpufreq_schedutil: remove stale comment

2018-05-13 Thread Rafael J. Wysocki

On Wednesday, May 9, 2018 10:41:54 AM CEST Viresh Kumar wrote:
> On 09-05-18, 10:40, Juri Lelli wrote:
> > After commit 794a56ebd9a57 ("sched/cpufreq: Change the worker kthread to
> > SCHED_DEADLINE") schedutil kthreads are "ignored" for a clock frequency
> > selection point of view, so the potential corner case for RT tasks is not
> > possible at all now.
> > 
> > Remove the stale comment mentioning it.
> > 
> > Signed-off-by: Juri Lelli 
> > Cc: Ingo Molnar 
> > Cc: Peter Zijlstra 
> > Cc: "Rafael J. Wysocki" 
> > Cc: Viresh Kumar 
> > Cc: Claudio Scordino 
> > Cc: Luca Abeni 
> > ---
> >  kernel/sched/cpufreq_schedutil.c | 13 -
> >  1 file changed, 13 deletions(-)
> > 
> > diff --git a/kernel/sched/cpufreq_schedutil.c 
> > b/kernel/sched/cpufreq_schedutil.c
> > index d2c6083304b4..23ef19070137 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -396,19 +396,6 @@ static void sugov_irq_work(struct irq_work *irq_work)
> >  
> > sg_policy = container_of(irq_work, struct sugov_policy, irq_work);
> >  
> > -   /*
> > -* For RT tasks, the schedutil governor shoots the frequency to maximum.
> > -* Special care must be taken to ensure that this kthread doesn't result
> > -* in the same behavior.
> > -*
> > -* This is (mostly) guaranteed by the work_in_progress flag. The flag is
> > -* updated only at the end of the sugov_work() function and before that
> > -* the schedutil governor rejects all other frequency scaling requests.
> > -*
> > -* There is a very rare case though, where the RT thread yields right
> > -* after the work_in_progress flag is cleared. The effects of that are
> > -* neglected for now.
> > -*/
> > kthread_queue_work(_policy->worker, _policy->work);
> >  }
> 
> Acked-by: Viresh Kumar 

Applied and pushed for 4.17-rc5.

Re: [PATCH] Documentation/admin-guide/pm/intel_pstate: fix Active Mode w/o HWP paragraph

2018-05-13 Thread Rafael J. Wysocki

On Tuesday, May 8, 2018 8:36:44 PM CEST Srinivas Pandruvada wrote:
> On Tue, 2018-05-08 at 17:12 +0200, Juri Lelli wrote:
> > P-state selection algorithm (powersave or performance) is selected by
> > echoing the desired choice to scaling_governor sysfs attribute and
> > not
> > to scaling_cur_freq (as currently stated).
> > 
> > Fix it.
> Thanks for the fix.
> 
> > 
> > Signed-off-by: Juri Lelli 
> > Cc: Jonathan Corbet 
> > Cc: "Rafael J. Wysocki" 
> > Cc: Srinivas Pandruvada 
> > Cc: linux-...@vger.kernel.org
> > Cc: linux...@vger.kernel.org
> Reviewed-by: Srinivas Pandruvada 
> 
> 
> > 
> > ---
> >  Documentation/admin-guide/pm/intel_pstate.rst | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/admin-guide/pm/intel_pstate.rst
> > b/Documentation/admin-guide/pm/intel_pstate.rst
> > index d2b6fda3d67b..ab2fe0eda1d7 100644
> > --- a/Documentation/admin-guide/pm/intel_pstate.rst
> > +++ b/Documentation/admin-guide/pm/intel_pstate.rst
> > @@ -145,7 +145,7 @@ feature enabled.]
> >  
> >  In this mode ``intel_pstate`` registers utilization update callbacks
> > with the
> >  CPU scheduler in order to run a P-state selection algorithm, either
> > -``powersave`` or ``performance``, depending on the
> > ``scaling_cur_freq`` policy
> > +``powersave`` or ``performance``, depending on the
> > ``scaling_governor`` policy
> >  setting in ``sysfs``.  The current CPU frequency information to be
> > made
> >  available from the ``scaling_cur_freq`` policy attribute in
> > ``sysfs`` is
> >  periodically updated by those utilization update callbacks too.
> 

Applied and pushed for 4.17-rc5, thanks!

Re: [PATCH] tools/power/x86/intel_pstate_tracer: Add optional setting of trace buffer memory allocation

2018-05-13 Thread Rafael J. Wysocki

On Friday, May 4, 2018 3:46:22 PM CEST Doug Smythies wrote:
> Allow the user to override the default trace buffer memory allocation
> by adding a command line option to override the default.
> 
> The patch also:
> 
> Adds a SIGINT (i.e. CTRL C exit) handler,
> so that things can be cleaned up before exit.
> 
> Moves the postion of some other cleanup from after to
> before the potential "No valid data to plot" exit.
> 
> Replaces all quit() calls with sys.exit, because
> quit() is not supposed to be used in scripts.
> 
> Signed-off-by: Doug Smythies 

Srinivas, any comments here?

> ---
>  .../x86/intel_pstate_tracer/intel_pstate_tracer.py | 54 
> ++
>  1 file changed, 35 insertions(+), 19 deletions(-)
> 
> diff --git a/tools/power/x86/intel_pstate_tracer/intel_pstate_tracer.py 
> b/tools/power/x86/intel_pstate_tracer/intel_pstate_tracer.py
> index 29f50d4..84e2b64 100755
> --- a/tools/power/x86/intel_pstate_tracer/intel_pstate_tracer.py
> +++ b/tools/power/x86/intel_pstate_tracer/intel_pstate_tracer.py
> @@ -28,6 +28,7 @@ import subprocess
>  import os
>  import time
>  import re
> +import signal
>  import sys
>  import getopt
>  import Gnuplot
> @@ -78,11 +79,12 @@ def print_help():
>  print('Or')
>  print('  ./intel_pstate_tracer.py [--cpu cpus] ---trace_file 
>  --name ')
>  print('To generate trace file, parse and plot, use (sudo required):')
> -print('  sudo ./intel_pstate_tracer.py [-c cpus] -i  -n 
> ')
> +print('  sudo ./intel_pstate_tracer.py [-c cpus] -i  -n 
>  -m ')
>  print('Or')
> -print('  sudo ./intel_pstate_tracer.py [--cpu cpus] --interval 
>  --name ')
> +print('  sudo ./intel_pstate_tracer.py [--cpu cpus] --interval 
>  --name  --memory ')
>  print('Optional argument:')
> -print('  cpus:  comma separated list of CPUs')
> +print('  cpus:   comma separated list of CPUs')
> +print('  kbytes: Kilo bytes of memory per CPU to allocate to the 
> trace buffer. Default: 10240')
>  print('  Output:')
>  print('If not already present, creates a "results/test_name" folder 
> in the current working directory with:')
>  print('  cpu.csv - comma seperated values file with trace contents 
> and some additional calculations.')
> @@ -379,7 +381,7 @@ def clear_trace_file():
>  f_handle.close()
>  except:
>  print('IO error clearing trace file ')
> -quit()
> +sys.exit(2)
>  
>  def enable_trace():
>  """ Enable trace """
> @@ -389,7 +391,7 @@ def enable_trace():
>   , 'w').write("1")
>  except:
>  print('IO error enabling trace ')
> -quit()
> +sys.exit(2)
>  
>  def disable_trace():
>  """ Disable trace """
> @@ -399,17 +401,17 @@ def disable_trace():
>   , 'w').write("0")
>  except:
>  print('IO error disabling trace ')
> -quit()
> +sys.exit(2)
>  
>  def set_trace_buffer_size():
>  """ Set trace buffer size """
>  
>  try:
> -   open('/sys/kernel/debug/tracing/buffer_size_kb'
> - , 'w').write("10240")
> +   with open('/sys/kernel/debug/tracing/buffer_size_kb', 'w') as fp:
> +  fp.write(memory)
>  except:
> -print('IO error setting trace buffer size ')
> -quit()
> +   print('IO error setting trace buffer size ')
> +   sys.exit(2)
>  
>  def free_trace_buffer():
>  """ Free the trace buffer memory """
> @@ -418,8 +420,8 @@ def free_trace_buffer():
> open('/sys/kernel/debug/tracing/buffer_size_kb'
>   , 'w').write("1")
>  except:
> -print('IO error setting trace buffer size ')
> -quit()
> +print('IO error freeing trace buffer ')
> +sys.exit(2)
>  
>  def read_trace_data(filename):
>  """ Read and parse trace data """
> @@ -431,7 +433,7 @@ def read_trace_data(filename):
>  data = open(filename, 'r').read()
>  except:
>  print('Error opening ', filename)
> -quit()
> +sys.exit(2)
>  
>  for line in data.splitlines():
>  search_obj = \
> @@ -489,10 +491,22 @@ def read_trace_data(filename):
>  # Now seperate the main overall csv file into per CPU csv files.
>  split_csv()
>  
> +def signal_handler(signal, frame):
> +print(' SIGINT: Forcing cleanup before exit.')
> +if interval:
> +disable_trace()
> +clear_trace_file()
> +# Free the memory
> +free_trace_buffer()
> +sys.exit(0)
> +
> +signal.signal(signal.SIGINT, signal_handler)
> +
>  interval = ""
>  filename = ""
>  cpu_list = ""
>  testname = ""
> +memory = "10240"
>  graph_data_present = False;
>  
>  valid1 = False
> @@ -501,7 +515,7 @@ valid2 = False
>  cpu_mask = zeros((MAX_CPUS,), dtype=int)
>  
>  try:
> -opts, args = 
> getopt.getopt(sys.argv[1:],"ht:i:c:n:",["help","trace_file=","interval=","cpu=","name="])
> +opts, args = 
>

Re: [PATCH V5] cpufreq: intel_pstate: allow trace in passive mode

2018-05-13 Thread Rafael J. Wysocki

On Thursday, May 3, 2018 8:22:47 AM CEST Doug Smythies wrote:
> Allow use of the trace_pstate_sample trace function
> when the intel_pstate driver is in passive mode.
> Since the core_busy and scaled_busy fields are not
> used, and it might be desirable to know which path
> through the driver was used, either intel_cpufreq_target
> or intel_cpufreq_fast_switch, re-task the core_busy
> field as a flag indicator.
> 
> The user can then use the intel_pstate_tracer.py utility
> to summarize and plot the trace.
> 
> Note: The core_busy feild still goes by that name
> in include/trace/events/power.h and within the
> intel_pstate_tracer.py script and csv file headers,
> but it is graphed as "performance", and called
> core_avg_perf now in the intel_pstate driver.
> 
> Sometimes, in passive mode, the driver is not called for
> many tens or even hundreds of seconds. The user
> needs to understand, and not be confused by, this limitation.
> 
> Signed-off-by: Doug Smythies 

Srinivas, any comments or concerns?

> 
> ---
> 
> V5: Changes as per Rafael J. Wysocki feedback.
> See: https://lkml.org/lkml/2018/1/7/270
> 
> V4: Only execute the trace specific overhead code if trace
> is enabled. Suggested by Srinivas Pandruvada.
> 
> V3: Move largely duplicate code to a subroutine.
> Suggested by Rafael J. Wysocki.
> 
> V2: prepare for resend. Rebase to current kernel, 4.15-rc3.
> 
> ---
>  drivers/cpufreq/intel_pstate.c | 44 
> --
>  1 file changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 17e566af..4a08686 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -1939,13 +1939,49 @@ static int intel_cpufreq_verify_policy(struct 
> cpufreq_policy *policy)
>   return 0;
>  }
>  
> +/* Use of trace in passive mode:
> + *
> + * In passive mode the trace core_busy field (also known as the
> + * performance field, and lablelled as such on the graphs; also known as
> + * core_avg_perf) is not needed and so is re-assigned to indicate if the
> + * driver call was via the normal or fast switch path. Various graphs
> + * output from the intel_pstate_tracer.py utility that include core_busy
> + * (or performance or core_avg_perf) have a fixed y-axis from 0 to 100%,
> + * so we use 10 to indicate the the normal path through the driver, and
> + * 90 to indicate the fast switch path through the driver.
> + * The scaled_busy field is not used, and is set to 0.
> + */
> +
> +#define  INTEL_PSTATE_TRACE_TARGET 10
> +#define  INTEL_PSTATE_TRACE_FAST_SWITCH 90
> +
> +static void intel_cpufreq_trace(struct cpudata *cpu, unsigned int 
> trace_type, int old_pstate)
> +{
> + struct sample *sample;
> +
> + if (!trace_pstate_sample_enabled())
> + return;
> + if (!intel_pstate_sample(cpu, ktime_get()))
> + return;
> + sample = >sample;
> + trace_pstate_sample(trace_type,
> + 0,
> + old_pstate,
> + cpu->pstate.current_pstate,
> + sample->mperf,
> + sample->aperf,
> + sample->tsc,
> + get_avg_frequency(cpu),
> + fp_toint(cpu->iowait_boost * 100));
> +}
> +
>  static int intel_cpufreq_target(struct cpufreq_policy *policy,
>   unsigned int target_freq,
>   unsigned int relation)
>  {
>   struct cpudata *cpu = all_cpu_data[policy->cpu];
>   struct cpufreq_freqs freqs;
> - int target_pstate;
> + int target_pstate, old_pstate;
>  
>   update_turbo_state();
>  
> @@ -1965,12 +2001,14 @@ static int intel_cpufreq_target(struct cpufreq_policy 
> *policy,
>   break;
>   }
>   target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
> + old_pstate = cpu->pstate.current_pstate;
>   if (target_pstate != cpu->pstate.current_pstate) {
>   cpu->pstate.current_pstate = target_pstate;
>   wrmsrl_on_cpu(policy->cpu, MSR_IA32_PERF_CTL,
> pstate_funcs.get_val(cpu, target_pstate));
>   }
>   freqs.new = target_pstate * cpu->pstate.scaling;
> + intel_cpufreq_trace(cpu, INTEL_PSTATE_TRACE_TARGET, old_pstate);
>   cpufreq_freq_transition_end(policy, , false);
>  
>   return 0;
> @@ -1980,13 +2018,15 @@ static unsigned int intel_cpufreq_fast_switch(struct 
> cpufreq_policy *policy,
> unsigned int target_freq)
>  {
>   struct cpudata *cpu = all_cpu_data[policy->cpu];
> - int target_pstate;
> + int target_pstate, old_pstate;
>  
>   update_turbo_state();
>  
>   target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
>   target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
> + old_pstate = cpu->pstate.current_pstate;
>   intel_pstate_update_pstate(cpu, target_pstate);
> +

Re: [PATCH 2/2] KVM: X86: Fix loss of CR3_PCID_INVD bit when guest writes CR3

2018-05-13 Thread Wanpeng Li

2018-05-13 16:03 GMT+08:00 Liran Alon :
>
> - kernel...@gmail.com wrote:
>
>> From: Wanpeng Li 
>>
>> SDM volume 3, section 4.10.4:
>>
>> * MOV to CR3. The behavior of the instruction depends on the value of
>> CR4.PCIDE:
>> — If CR4.PCIDE = 1 and bit 63 of the instruction’s source operand is
>> 1, the
>>   instruction is not required to invalidate any TLB entries or entries
>> in
>>   paging-structure caches.
>>
>> The CR3_PCID_INVD bit should not be removed if CR4.PCIDE = 1 when
>> guest writes
>> CR3, this patch fixes it.
>>
>> Cc: Paolo Bonzini 
>> Cc: Radim Krčmář 
>> Cc: Junaid Shahid 
>> Signed-off-by: Wanpeng Li 
>> ---
>>  arch/x86/kvm/x86.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 9a90668..438f140 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -849,11 +849,13 @@ EXPORT_SYMBOL_GPL(kvm_set_cr4);
>>
>>  int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>>  {
>> + unsigned long cr3_check = cr3;
>> +
>>  #ifdef CONFIG_X86_64
>>   bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>>
>>   if (pcid_enabled)
>> - cr3 &= ~CR3_PCID_INVD;
>> + cr3_check &= ~CR3_PCID_INVD;
>>  #endif
>>
>>   if (cr3 == kvm_read_cr3(vcpu) && !pdptrs_changed(vcpu)) {
>> @@ -863,7 +865,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned
>> long cr3)
>>   }
>>
>>   if (is_long_mode(vcpu) &&
>> - (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
>> + (cr3_check & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
>>   return 1;
>>   else if (is_pae(vcpu) && is_paging(vcpu) &&
>>  !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
>> --
>> 2.7.4
>
> This commit doesn't seem correct to me.
>
> According to Intel SDM "MOV—Move to/from Control Registers":
> "If CR4.PCIDE = 1, bit 63 of the source operand to MOV to CR3 determines 
> whether the instruction
> invalidates entries in the TLBs and the paging-structure caches
> (see Section 4.10.4.1, “Operations that Invalidate TLBs and Paging-Structure 
> Caches,”
> in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 
> 3A).
> The instruction does not modify bit 63 of CR3, which is reserved and always 
> 0."
>
> However, after this commit kvm_set_cr3() will update vcpu->arch.cr3 to have 
> bit CR3_PCID_INVD set.
> Which is wrong as it should be reserved and always 0.

You are right, thanks Liran.

Regards,
Wanpeng Li

Re: [PATCH] cpufreq: s3c2440: fix spelling mistake: "divsiors" -> "divisors"

2018-05-13 Thread Rafael J. Wysocki

On Wednesday, May 2, 2018 7:37:21 AM CEST Viresh Kumar wrote:
> On 30-04-18, 15:48, Colin King wrote:
> > From: Colin Ian King 
> > 
> > Trivial fix to spelling mistake in s3c_freq_dbg debug message text.
> > 
> > Signed-off-by: Colin Ian King 
> > ---
> >  drivers/cpufreq/s3c2440-cpufreq.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/cpufreq/s3c2440-cpufreq.c 
> > b/drivers/cpufreq/s3c2440-cpufreq.c
> > index d0d75b65ddd6..d2f67b7a20dd 100644
> > --- a/drivers/cpufreq/s3c2440-cpufreq.c
> > +++ b/drivers/cpufreq/s3c2440-cpufreq.c
> > @@ -143,7 +143,7 @@ static void s3c2440_cpufreq_setdivs(struct 
> > s3c_cpufreq_config *cfg)
> >  {
> > unsigned long clkdiv, camdiv;
> >  
> > -   s3c_freq_dbg("%s: divsiors: h=%d, p=%d\n", __func__,
> > +   s3c_freq_dbg("%s: divisors: h=%d, p=%d\n", __func__,
> >  cfg->divs.h_divisor, cfg->divs.p_divisor);
> >  
> > clkdiv = __raw_readl(S3C2410_CLKDIVN);
> 
> Acked-by: Viresh Kumar 
> 
> 

Applied, thanks!

Re: [PATCH 0/3] PM / core: Clean up suspend/resume diagnostic messages

2018-05-13 Thread Rafael J. Wysocki

On Thursday, April 26, 2018 11:36:20 PM CEST Bjorn Helgaas wrote:
> These are pretty minor cleanups to the suspend/resume diagnostic messages.
> 
> The first two are trivial.  The third may break scripts that parse dmesg
> output.  I looked at scripts/bootgraph.pl, and I don't think it is
> affected, but there may be others I don't know about.  Let me know if there
> are.
> 
> ---
> 
> Bjorn Helgaas (3):
>   PM / core: Remove unused initcall_debug_report() arguments
>   PM / core: Simplify initcall_debug_report() timing
>   PM / core: Use dev_printk() and symbols in suspend/resume diagnostics
> 
> 
>  drivers/base/power/main.c |   37 +
>  1 file changed, 17 insertions(+), 20 deletions(-)
> 

All [1-3/3] applied, thanks!

Re: [PATCH v3] PM / wakeup: use seq_open() to show wakeup stats

2018-05-13 Thread Rafael J. Wysocki

On Wednesday, April 25, 2018 12:59:31 PM CEST Ganesh Mahendran wrote:
> single_open() interface requires that the whole output must
> fit into a single buffer. This will lead to timeout when
> system memory is not in a good situation.
> 
> This patch use seq_open() to show wakeup stats. This method
> need only one page, so timeout will not be observed.
> 
> Signed-off-by: Ganesh Mahendran 
> 
> v3: simplify wakeup_sources_stats_seq_start
> v2: use srcu_read_lock instead of rcu_read_lock
> ---
>  drivers/base/power/wakeup.c | 75 
> +++--
>  1 file changed, 59 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> index ea01621..5872705 100644
> --- a/drivers/base/power/wakeup.c
> +++ b/drivers/base/power/wakeup.c
> @@ -1029,32 +1029,75 @@ static int print_wakeup_source_stats(struct seq_file 
> *m,
>   return 0;
>  }
>  
> -/**
> - * wakeup_sources_stats_show - Print wakeup sources statistics information.
> - * @m: seq_file to print the statistics into.
> - */
> -static int wakeup_sources_stats_show(struct seq_file *m, void *unused)
> +static void *wakeup_sources_stats_seq_start(struct seq_file *m,
> + loff_t *pos)
>  {
>   struct wakeup_source *ws;
> - int srcuidx;
> + loff_t n = *pos;
> + int *srcuidx = m->private;
>  
> - seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t"
> - "expire_count\tactive_since\ttotal_time\tmax_time\t"
> - "last_change\tprevent_suspend_time\n");
> + if (n == 0) {
> + seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t"
> + "expire_count\tactive_since\ttotal_time\tmax_time\t"
> + "last_change\tprevent_suspend_time\n");
> + }
>  
> - srcuidx = srcu_read_lock(_srcu);
> - list_for_each_entry_rcu(ws, _sources, entry)
> - print_wakeup_source_stats(m, ws);
> - srcu_read_unlock(_srcu, srcuidx);
> + *srcuidx = srcu_read_lock(_srcu);
> + list_for_each_entry_rcu(ws, _sources, entry) {
> + if (n-- <= 0)
> + return ws;
> + }
> +
> + return NULL;
> +}
> +
> +static void *wakeup_sources_stats_seq_next(struct seq_file *m,
> + void *v, loff_t *pos)
> +{
> + struct wakeup_source *ws = v;
> + struct wakeup_source *next_ws = NULL;
> +
> + ++(*pos);
>  
> - print_wakeup_source_stats(m, _ws);
> + list_for_each_entry_continue_rcu(ws, _sources, entry) {
> + next_ws = ws;
> + break;
> + }
> +
> + return next_ws;
> +}
> +
> +static void wakeup_sources_stats_seq_stop(struct seq_file *m, void *v)
> +{
> + int *srcuidx = m->private;
> +
> + srcu_read_unlock(_srcu, *srcuidx);
> +}
> +
> +/**
> + * wakeup_sources_stats_seq_show - Print wakeup sources statistics 
> information.
> + * @m: seq_file to print the statistics into.
> + * @v: wakeup_source of each iteration
> + */
> +static int wakeup_sources_stats_seq_show(struct seq_file *m, void *v)
> +{
> + struct wakeup_source *ws = v;
> +
> + print_wakeup_source_stats(m, ws);
>  
>   return 0;
>  }
>  
> +static const struct seq_operations wakeup_sources_stats_seq_ops = {
> + .start = wakeup_sources_stats_seq_start,
> + .next  = wakeup_sources_stats_seq_next,
> + .stop  = wakeup_sources_stats_seq_stop,
> + .show  = wakeup_sources_stats_seq_show,
> +};
> +
>  static int wakeup_sources_stats_open(struct inode *inode, struct file *file)
>  {
> - return single_open(file, wakeup_sources_stats_show, NULL);
> + return seq_open_private(file, _sources_stats_seq_ops, 
> sizeof(int));
>  }
>  
>  static const struct file_operations wakeup_sources_stats_fops = {
> @@ -1062,7 +1105,7 @@ static int wakeup_sources_stats_open(struct inode 
> *inode, struct file *file)
>   .open = wakeup_sources_stats_open,
>   .read = seq_read,
>   .llseek = seq_lseek,
> - .release = single_release,
> + .release = seq_release_private,
>  };
>  
>  static int __init wakeup_sources_debugfs_init(void)
> 

Applied, thanks!

Re: [PATCH v3 4/6] KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation

2018-05-13 Thread Vitaly Kuznetsov

Radim Krčmář  writes:

> 2018-04-16 13:08+0200, Vitaly Kuznetsov:
...
>
>> +/*
>> + * vcpu->arch.cr3 may not be up-to-date for running vCPUs so we
>> + * can't analyze it here, flush TLB regardless of the specified
>> + * address space.
>> + */
>> +kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>> +
>> +/*
>> + * It is possible that vCPU will migrate and we will kick wrong
>> + * CPU but vCPU's TLB will anyway be flushed upon migration as
>> + * we already made KVM_REQ_TLB_FLUSH request.
>> + */
>> +cpu = vcpu->cpu;
>> +if (cpu != -1 && cpu != me && cpu_online(cpu) &&
>> +kvm_arch_vcpu_should_kick(vcpu))
>> +cpumask_set_cpu(cpu, _current->tlb_lush);
>> +}
>> +
>> +if (!cpumask_empty(_current->tlb_lush))
>> +smp_call_function_many(_current->tlb_lush, ack_flush,
>> +   NULL, true);
>
> Hm, quite a lot of code duplication with EX hypercall and also
> kvm_make_all_cpus_request ... I'm thinking about making something like
>
>   kvm_make_some_cpus_request(struct kvm *kvm, unsigned int req,
>  bool (*predicate)(struct kvm_vcpu *vcpu))
>
> or to implement a vp_index -> vcpu mapping and using
>
>   kvm_vcpu_request_mask(struct kvm *kvm, unsigned int req, long *vcpu_bitmap)
>
> The latter would probably simplify logic of the EX hypercall.

We really want to avoid memory allocation for cpumask on this path and
that's what kvm_make_all_cpus_request() currently does (when
CPUMASK_OFFSTACK). vcpu bitmap is probably OK as KVM_MAX_VCPUS is much
lower.

Making cpumask allocation avoidable leads us to the following API:

bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
 long *vcpu_bitmap, cpumask_var_t tmp);

or, if we want to prettify this a little bit, we may end up with the
following pair:

bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
 long *vcpu_bitmap);

bool __kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
   long *vcpu_bitmap, cpumask_var_t tmp);

and from hyperv code we'll use the later. With this, no code duplication
is required.

Does this look acceptable?

-- 
  Vitaly

Re: [PATCH] cpufreq: fix speedstep_detect_processor()'s return type

2018-05-13 Thread Rafael J. Wysocki

On Wednesday, April 25, 2018 4:46:47 AM CEST Viresh Kumar wrote:
> On 24-04-18, 15:14, Luc Van Oostenryck wrote:
> > speedstep_detect_processor() is declared as returing an
> > 'enum speedstep_processor' but use an 'int' in its definition.
> > 
> > Fix this by using 'enum speedstep_processor' in its definition too.
> > 
> > Signed-off-by: Luc Van Oostenryck 
> > ---
> >  drivers/cpufreq/speedstep-lib.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/cpufreq/speedstep-lib.c 
> > b/drivers/cpufreq/speedstep-lib.c
> > index e3a9962ee..cabb6f48e 100644
> > --- a/drivers/cpufreq/speedstep-lib.c
> > +++ b/drivers/cpufreq/speedstep-lib.c
> > @@ -252,7 +252,7 @@ EXPORT_SYMBOL_GPL(speedstep_get_frequency);
> >   */
> >  
> >  /* Keep in sync with the x86_cpu_id tables in the different modules */
> > -unsigned int speedstep_detect_processor(void)
> > +enum speedstep_processor speedstep_detect_processor(void)
> >  {
> > struct cpuinfo_x86 *c = _data(0);
> > u32 ebx, msr_lo, msr_hi;
> 
> Acked-by: Viresh Kumar 
> 
> 

Applied, thanks!

Re: [PATCH] PM: docs: sleep-states: Fix a typo ("includig")

2018-05-13 Thread Rafael J. Wysocki

On Wednesday, April 25, 2018 12:07:03 PM CEST Jonathan Neuschäfer wrote:
> Signed-off-by: Jonathan Neuschäfer 
> ---
>  Documentation/admin-guide/pm/sleep-states.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/pm/sleep-states.rst 
> b/Documentation/admin-guide/pm/sleep-states.rst
> index 1e5c0f00cb2f..dbf5acd49f35 100644
> --- a/Documentation/admin-guide/pm/sleep-states.rst
> +++ b/Documentation/admin-guide/pm/sleep-states.rst
> @@ -15,7 +15,7 @@ Sleep States That Can Be Supported
>  ==
>  
>  Depending on its configuration and the capabilities of the platform it runs 
> on,
> -the Linux kernel can support up to four system sleep states, includig
> +the Linux kernel can support up to four system sleep states, including
>  hibernation and up to three variants of system suspend.  The sleep states 
> that
>  can be supported by the kernel are listed below.
>  
> 

Applied and pushed for 4.17-rc5, thanks!

Re: [PATCH 1/2] KVM: X86: Fix CR3 reserve bits

2018-05-13 Thread Wanpeng Li

2018-05-13 16:28 GMT+08:00 Liran Alon :
>
> - kernel...@gmail.com wrote:
>
>> 2018-05-13 15:53 GMT+08:00 Liran Alon :
>> >
>> > - kernel...@gmail.com wrote:
>> >
>> >> From: Wanpeng Li 
>> >>
>> >> MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4.
>> >> It should be checked when PCIDE bit is not set, however commit
>> >> 'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based on
>> >> its physical address width")' removes the bit 63 checking
>> >> unconditionally. This patch fixes it by checking bit 63 of CR3
>> >> when PCIDE bit is not set in CR4.
>> >>
>> >> Fixes: d1cd3ce900441 (KVM: MMU: check guest CR3 reserved bits based
>> on
>> >> its physical address width)
>> >> Cc: Paolo Bonzini 
>> >> Cc: Radim Krčmář 
>> >> Cc: Junaid Shahid 
>> >> Signed-off-by: Wanpeng Li 
>> >> ---
>> >>  arch/x86/kvm/emulate.c | 4 +++-
>> >>  arch/x86/kvm/x86.c | 2 +-
>> >>  2 files changed, 4 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> >> index b3705ae..b21f427 100644
>> >> --- a/arch/x86/kvm/emulate.c
>> >> +++ b/arch/x86/kvm/emulate.c
>> >> @@ -4189,7 +4189,9 @@ static int check_cr_write(struct
>> >> x86_emulate_ctxt *ctxt)
>> >>   maxphyaddr = eax & 0xff;
>> >>   else
>> >>   maxphyaddr = 36;
>> >> - rsvd = rsvd_bits(maxphyaddr, 62);
>> >> + if (ctxt->ops->get_cr(ctxt, 4) &
>> X86_CR4_PCIDE)
>> >> + new_val &= ~CR3_PCID_INVD;
>> >> + rsvd = rsvd_bits(maxphyaddr, 63);
>> >
>> > I would prefer instead to do this:
>> > if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
>> > rsvd &= ~CR3_PCID_INVD;
>> > It makes more sense as opposed to temporary removing the
>> CR3_PCID_INVD bit from new_val.
>>
>> It tries the same way
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_virt_kvm_kvm.git_commit_-3Fid-3Dc19986fea873f3c745122bf79013a872a190f212=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0=r52WDgKBorUHwe_B_5Nw2Le_F_E0ne8lqqWW6n-3bSg=ufTcXvhhAMkY3XP6gAx-HiKCT8ynPWo2fs2z9DqCzM4=
>> pointed out.
>>
>> Regards,
>> Wanpeng Li
>
> Yes but there it makes sense as new CR3 value should not have bit 63 set in 
> vcpu->arch.cr3.

When X86_CR4_PCIDE == 0 and CR3 63 bit is set, a #GP is missing in
your suggestion.

Regards,
Wanpeng Li

KASAN: use-after-free Read in corrupted

2018-05-13 Thread syzbot


Hello,

syzbot found the following crash on:

HEAD commit:427fbe89261d Merge branch 'next' of git://git.kernel.org/p..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=148eb01780
kernel config:  https://syzkaller.appspot.com/x/.config?x=fcce42b221691ff9
dashboard link: https://syzkaller.appspot.com/bug?extid=3417712847e7219a60ee
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1770c47780
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14ecdbc780

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+3417712847e7219a6...@syzkaller.appspotmail.com

R13:  R14:  R15: 
CPU: 0 PID: 4564 Comm: syz-executor214 Not tainted 4.17.0-rc4+ #44
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

==
Call Trace:
BUG: KASAN: use-after-free in __lock_acquire+0x3888/0x5140  
kernel/locking/lockdep.c:3310

Read of size 8 at addr 8801d8d69088 by task syz-executor214/4551
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113

 fail_dump lib/fault-inject.c:51 [inline]
 should_fail.cold.4+0xa/0x1a lib/fault-inject.c:149
 __should_failslab+0x124/0x180 mm/failslab.c:32
 should_failslab+0x9/0x14 mm/slab_common.c:1522
 slab_pre_alloc_hook mm/slab.h:423 [inline]
 slab_alloc mm/slab.c:3378 [inline]
 kmem_cache_alloc+0x2af/0x760 mm/slab.c:3552
 __d_alloc+0xc0/0xd30 fs/dcache.c:1638
 d_alloc_anon fs/dcache.c:1742 [inline]
 d_make_root+0x42/0x90 fs/dcache.c:1934
 fuse_fill_super+0x120e/0x1e20 fs/fuse/inode.c:1131
 mount_nodev+0x6b/0x110 fs/super.c:1210
 fuse_mount+0x2c/0x40 fs/fuse/inode.c:1192
 mount_fs+0xae/0x328 fs/super.c:1267
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:1027 [inline]
 do_new_mount fs/namespace.c:2518 [inline]
 do_mount+0x564/0x3070 fs/namespace.c:2848
 ksys_mount+0x12d/0x140 fs/namespace.c:3064
 __do_sys_mount fs/namespace.c:3078 [inline]
 __se_sys_mount fs/namespace.c:3075 [inline]
 __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x447cb9
RSP: 002b:7f7a75bca918 EFLAGS: 0246 ORIG_RAX: 00a5
RAX: ffda RBX: 0005 RCX: 00447cb9
RDX: 004b08d6 RSI: 2340 RDI: 004c7485
RBP: a001 R08: 7f7a75bca930 R09: 
R10:  R11: 0246 R12: 
R13:  R14:  R15: 
CPU: 1 PID: 4551 Comm: syz-executor214 Not tainted 4.17.0-rc4+ #44
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 __lock_acquire+0x3888/0x5140 kernel/locking/lockdep.c:3310
 lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
 down_write+0x87/0x120 kernel/locking/rwsem.c:70
 fuse_kill_sb_anon+0x50/0xb0 fs/fuse/inode.c:1200
 deactivate_locked_super+0x97/0x100 fs/super.c:316
 mount_nodev+0xfa/0x110 fs/super.c:1212
 fuse_mount+0x2c/0x40 fs/fuse/inode.c:1192
 mount_fs+0xae/0x328 fs/super.c:1267
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:1027 [inline]
 do_new_mount fs/namespace.c:2518 [inline]
 do_mount+0x564/0x3070 fs/namespace.c:2848
 ksys_mount+0x12d/0x140 fs/namespace.c:3064
 __do_sys_mount fs/namespace.c:3078 [inline]
 __se_sys_mount fs/namespace.c:3075 [inline]
 __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x447cb9
RSP: 002b:7f7a75bca918 EFLAGS: 0246 ORIG_RAX: 00a5
RAX: ffda RBX: 0005 RCX: 00447cb9
RDX: 004b08d6 RSI: 2340 RDI: 004c7485
RBP: a001 R08: 7f7a75bca930 R09: 
R10:  R11: 0246 R12: 
R13:  R14:  R15: 

CPU: 0 PID: 4580 Comm: syz-executor214 Not tainted 4.17.0-rc4+ #44
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Allocated by task 4551:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
Call Trace:
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620

Re: [PATCH v3] mm: Change return type to vm_fault_t

2018-05-13 Thread Joe Perches

On Sat, 2018-05-12 at 19:51 -0700, Dan Williams wrote:
> On Sat, May 12, 2018 at 12:14 PM, Souptick Joarder  
> wrote:
> > > > It'd be nicer to realign the 2nd and 3rd arguments
> > > > on the subsequent lines.
> > > > 
> > > >   vm_fault_t (*fault)(const struct vm_special_mapping *sm,
> > > >   struct vm_area_struct *vma,
> > > >   struct vm_fault *vmf);
> > > > 
> > > It'd be nicer if people didn't try to line up arguments at all and
> > > just indented by an extra two tabs when they had to break a logical
> > > line due to the 80-column limit.
> > 
> > Matthew, there are two different opinions. Which one to take ?
> 
> Unfortunately this is one of those "maintainer's choice" preferences
> that drives new contributors crazy. Just go with the two tabs like
> Matthew said and be done.

The only reason I mentioned it was the old function name
was aligned that way with arguments aligned to the open
parenthesis.

Renaming the function should keep the same alignment style
and not just rename the function.

-   int (*fault)(const struct vm_special_mapping *sm,
+   vm_fault_t (*fault)(const struct vm_special_mapping *sm,
 struct vm_area_struct *vma,
 struct vm_fault *vmf);

Here the previous indent was 2 tabs, 5 spaces

Re: [PATCH V2] mlx4_core: allocate ICM memory in page size chunks

2018-05-13 Thread Tariq Toukan




On 11/05/2018 10:23 PM, Qing Huang wrote:

When a system is under memory presure (high usage with fragments),
the original 256KB ICM chunk allocations will likely trigger kernel
memory management to enter slow path doing memory compact/migration
ops in order to complete high order memory allocations.

When that happens, user processes calling uverb APIs may get stuck
for more than 120s easily even though there are a lot of free pages
in smaller chunks available in the system.

Syslog:
...
Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
oracle_205573_e:205573 blocked for more than 120 seconds.
...

With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.

However in order to support smaller ICM chunk size, we need to fix
another issue in large size kcalloc allocations.

E.g.
Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
entry). So we need a 16MB allocation for a table->icm pointer array to
hold 2M pointers which can easily cause kcalloc to fail.

The solution is to use vzalloc to replace kcalloc. There is no need
for contiguous memory pages for a driver meta data structure (no need
of DMA ops).

Signed-off-by: Qing Huang 
Acked-by: Daniel Jurgens 
Reviewed-by: Zhu Yanjun 
---
v2 -> v1: adjusted chunk size to reflect different architectures.

  drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c 
b/drivers/net/ethernet/mellanox/mlx4/icm.c
index a822f7a..ccb62b8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -43,12 +43,12 @@
  #include "fw.h"
  
  /*

- * We allocate in as big chunks as we can, up to a maximum of 256 KB
- * per chunk.
+ * We allocate in page size (default 4KB on many archs) chunks to avoid high
+ * order memory allocations in fragmented/high usage memory situation.
   */
  enum {
-   MLX4_ICM_ALLOC_SIZE = 1 << 18,
-   MLX4_TABLE_CHUNK_SIZE   = 1 << 18
+   MLX4_ICM_ALLOC_SIZE = 1 << PAGE_SHIFT,
+   MLX4_TABLE_CHUNK_SIZE   = 1 << PAGE_SHIFT


Which is actually PAGE_SIZE.
Also, please add a comma at the end of the last entry.


  };
  
  static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)

@@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct 
mlx4_icm_table *table,
obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size;
num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk;
  
-	table->icm  = kcalloc(num_icm, sizeof(*table->icm), GFP_KERNEL);

+   table->icm  = vzalloc(num_icm * sizeof(*table->icm));


Why not kvzalloc ?


if (!table->icm)
return -ENOMEM;
table->virt = virt;
@@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct 
mlx4_icm_table *table,
mlx4_free_icm(dev, table->icm[i], use_coherent);
}
  
-	kfree(table->icm);

+   vfree(table->icm);
  
  	return -ENOMEM;

  }
@@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev *dev, struct 
mlx4_icm_table *table)
mlx4_free_icm(dev, table->icm[i], table->coherent);
}
  
-	kfree(table->icm);

+   vfree(table->icm);
  }



Thanks for your patch.

I need to verify there is no dramatic performance degradation here.
You can prepare and send a v3 in the meanwhile.

Thanks,
Tariq

Re: KASAN: use-after-free Read in corrupted

2018-05-13 Thread Dmitry Vyukov

On Sun, May 13, 2018 at 10:56 AM, syzbot
 wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:427fbe89261d Merge branch 'next' of git://git.kernel.org/p..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=148eb01780
> kernel config:  https://syzkaller.appspot.com/x/.config?x=fcce42b221691ff9
> dashboard link: https://syzkaller.appspot.com/bug?extid=3417712847e7219a60ee
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1770c47780
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14ecdbc780
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+3417712847e7219a6...@syzkaller.appspotmail.com

Tetsuo,

This looks very similar to "KASAN: use-after-free Read in fuse_kill_sb_blk":
https://groups.google.com/d/msg/syzkaller-bugs/4C4oiBX8vZ0/0NTQRcUYBgAJ

which you fixed with "fuse: don't keep dead fuse_conn at fuse_fill_super().":
https://groups.google.com/d/msg/syzkaller-bugs/4C4oiBX8vZ0/W6pi8NdbBgAJ

However, here we have use-after-free in fuse_kill_sb_anon instead of
use_kill_sb_blk. Do you think your patch will fix this as well?



> R13:  R14:  R15: 
> CPU: 0 PID: 4564 Comm: syz-executor214 Not tainted 4.17.0-rc4+ #44
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> ==
> Call Trace:
> BUG: KASAN: use-after-free in __lock_acquire+0x3888/0x5140
> kernel/locking/lockdep.c:3310
> Read of size 8 at addr 8801d8d69088 by task syz-executor214/4551
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>
>  fail_dump lib/fault-inject.c:51 [inline]
>  should_fail.cold.4+0xa/0x1a lib/fault-inject.c:149
>  __should_failslab+0x124/0x180 mm/failslab.c:32
>  should_failslab+0x9/0x14 mm/slab_common.c:1522
>  slab_pre_alloc_hook mm/slab.h:423 [inline]
>  slab_alloc mm/slab.c:3378 [inline]
>  kmem_cache_alloc+0x2af/0x760 mm/slab.c:3552
>  __d_alloc+0xc0/0xd30 fs/dcache.c:1638
>  d_alloc_anon fs/dcache.c:1742 [inline]
>  d_make_root+0x42/0x90 fs/dcache.c:1934
>  fuse_fill_super+0x120e/0x1e20 fs/fuse/inode.c:1131
>  mount_nodev+0x6b/0x110 fs/super.c:1210
>  fuse_mount+0x2c/0x40 fs/fuse/inode.c:1192
>  mount_fs+0xae/0x328 fs/super.c:1267
>  vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
>  vfs_kern_mount fs/namespace.c:1027 [inline]
>  do_new_mount fs/namespace.c:2518 [inline]
>  do_mount+0x564/0x3070 fs/namespace.c:2848
>  ksys_mount+0x12d/0x140 fs/namespace.c:3064
>  __do_sys_mount fs/namespace.c:3078 [inline]
>  __se_sys_mount fs/namespace.c:3075 [inline]
>  __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x447cb9
> RSP: 002b:7f7a75bca918 EFLAGS: 0246 ORIG_RAX: 00a5
> RAX: ffda RBX: 0005 RCX: 00447cb9
> RDX: 004b08d6 RSI: 2340 RDI: 004c7485
> RBP: a001 R08: 7f7a75bca930 R09: 
> R10:  R11: 0246 R12: 
> R13:  R14:  R15: 
> CPU: 1 PID: 4551 Comm: syz-executor214 Not tainted 4.17.0-rc4+ #44
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
> FAULT_INJECTION: forcing a failure.
> name failslab, interval 1, probability 0, space 0, times 0
>  print_address_description+0x6c/0x20b mm/kasan/report.c:256
>  kasan_report_error mm/kasan/report.c:354 [inline]
>  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
>  __lock_acquire+0x3888/0x5140 kernel/locking/lockdep.c:3310
>  lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
>  down_write+0x87/0x120 kernel/locking/rwsem.c:70
>  fuse_kill_sb_anon+0x50/0xb0 fs/fuse/inode.c:1200
>  deactivate_locked_super+0x97/0x100 fs/super.c:316
>  mount_nodev+0xfa/0x110 fs/super.c:1212
>  fuse_mount+0x2c/0x40 fs/fuse/inode.c:1192
>  mount_fs+0xae/0x328 fs/super.c:1267
>  vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
>  vfs_kern_mount fs/namespace.c:1027 [inline]
>  do_new_mount fs/namespace.c:2518 [inline]
>  do_mount+0x564/0x3070 fs/namespace.c:2848
>  ksys_mount+0x12d/0x140 fs/namespace.c:3064
>  __do_sys_mount fs/namespace.c:3078 [inline]
>  __se_sys_mount fs/namespace.c:3075 [inline]
>  __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x447cb9
> RSP: 002b:7f7a75bca918 EFLAGS: 0246 ORIG_RAX: 00a5
> RAX: ffda RBX: 0005

Re: [PATCH] scsi: libosd: Remove VLA usage

2018-05-13 Thread Boaz Harrosh

On 03/05/18 01:55, Kees Cook wrote:
> On the quest to remove all VLAs from the kernel[1] this rearranges the
> code to avoid a VLA warning under -Wvla (gcc doesn't recognize "const"
> variables as not triggering VLA creation). Additionally cleans up variable
> naming to avoid 80 character column limit.
> 
> [1] 
> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
> 

ACK-BY: Boaz Harrosh 

> Signed-off-by: Kees Cook 
> ---
>  drivers/scsi/osd/osd_initiator.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/scsi/osd/osd_initiator.c 
> b/drivers/scsi/osd/osd_initiator.c
> index e18877177f1b..917a86a2ae8c 100644
> --- a/drivers/scsi/osd/osd_initiator.c
> +++ b/drivers/scsi/osd/osd_initiator.c
> @@ -1842,14 +1842,14 @@ int osd_req_decode_sense_full(struct osd_request *or,
>   case osd_sense_response_integrity_check:
>   {
>   struct osd_sense_response_integrity_check_descriptor
> - *osricd = cur_descriptor;
> - const unsigned len =
> -   sizeof(osricd->integrity_check_value);
> - char key_dump[len*4 + 2]; /* 2nibbles+space+ASCII */
> -
> - hex_dump_to_buffer(osricd->integrity_check_value, len,
> -32, 1, key_dump, sizeof(key_dump), true);
> - OSD_SENSE_PRINT2("response_integrity [%s]\n", key_dump);
> + *d = cur_descriptor;
> + /* 2nibbles+space+ASCII */
> + char dump[sizeof(d->integrity_check_value) * 4 + 2];
> +
> + hex_dump_to_buffer(d->integrity_check_value,
> + sizeof(d->integrity_check_value),
> + 32, 1, dump, sizeof(dump), true);
> + OSD_SENSE_PRINT2("response_integrity [%s]\n", dump);
>   }
>   case osd_sense_attribute_identification:
>   {
>

Re: [PATCH 1/2] KVM: X86: Fix CR3 reserve bits

2018-05-13 Thread Liran Alon


- kernel...@gmail.com wrote:

> 2018-05-13 16:28 GMT+08:00 Liran Alon :
> >
> > - kernel...@gmail.com wrote:
> >
> >> 2018-05-13 15:53 GMT+08:00 Liran Alon :
> >> >
> >> > - kernel...@gmail.com wrote:
> >> >
> >> >> From: Wanpeng Li 
> >> >>
> >> >> MSB of CR3 is a reserved bit if the PCIDE bit is not set in
> CR4.
> >> >> It should be checked when PCIDE bit is not set, however commit
> >> >> 'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based
> on
> >> >> its physical address width")' removes the bit 63 checking
> >> >> unconditionally. This patch fixes it by checking bit 63 of CR3
> >> >> when PCIDE bit is not set in CR4.
> >> >>
> >> >> Fixes: d1cd3ce900441 (KVM: MMU: check guest CR3 reserved bits
> based
> >> on
> >> >> its physical address width)
> >> >> Cc: Paolo Bonzini 
> >> >> Cc: Radim Krčmář 
> >> >> Cc: Junaid Shahid 
> >> >> Signed-off-by: Wanpeng Li 
> >> >> ---
> >> >>  arch/x86/kvm/emulate.c | 4 +++-
> >> >>  arch/x86/kvm/x86.c | 2 +-
> >> >>  2 files changed, 4 insertions(+), 2 deletions(-)
> >> >>
> >> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> >> >> index b3705ae..b21f427 100644
> >> >> --- a/arch/x86/kvm/emulate.c
> >> >> +++ b/arch/x86/kvm/emulate.c
> >> >> @@ -4189,7 +4189,9 @@ static int check_cr_write(struct
> >> >> x86_emulate_ctxt *ctxt)
> >> >>   maxphyaddr = eax & 0xff;
> >> >>   else
> >> >>   maxphyaddr = 36;
> >> >> - rsvd = rsvd_bits(maxphyaddr, 62);
> >> >> + if (ctxt->ops->get_cr(ctxt, 4) &
> >> X86_CR4_PCIDE)
> >> >> + new_val &= ~CR3_PCID_INVD;
> >> >> + rsvd = rsvd_bits(maxphyaddr, 63);
> >> >
> >> > I would prefer instead to do this:
> >> > if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
> >> > rsvd &= ~CR3_PCID_INVD;
> >> > It makes more sense as opposed to temporary removing the
> >> CR3_PCID_INVD bit from new_val.
> >>
> >> It tries the same way
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_virt_kvm_kvm.git_commit_-3Fid-3Dc19986fea873f3c745122bf79013a872a190f212=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0=r52WDgKBorUHwe_B_5Nw2Le_F_E0ne8lqqWW6n-3bSg=ufTcXvhhAMkY3XP6gAx-HiKCT8ynPWo2fs2z9DqCzM4=
> >> pointed out.
> >>
> >> Regards,
> >> Wanpeng Li
> >
> > Yes but there it makes sense as new CR3 value should not have bit 63
> set in vcpu->arch.cr3.
> 
> When X86_CR4_PCIDE == 0 and CR3 63 bit is set, a #GP is missing in
> your suggestion.
> 
> Regards,
> Wanpeng Li

Why?

I suggest the following change:
- rsvd = rsvd_bits(maxphyaddr, 62);
+ rsvd = rsvd_bits(maxphyaddr, 63);
+ if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
+ rsvd &= ~CR3_PCID_INVD;

In this case, if PCIDE=0 then bit 63 is set in rsvd and therefore 
check_cr_write() will emulate_gp() as needed.

Re: [PATCH 2/5] media: docs: clarify relationship between crop and selection APIs

2018-05-13 Thread Hans Verkuil

On 04/03/2018 11:15 PM, Luca Ceresoli wrote:
> Having two somewhat similar and largely overlapping APIs is confusing,
> especially since the older one appears in the docs before the newer
> and most featureful counterpart.
> 
> Clarify all of this in several ways:
>  - swap the two sections
>  - give a name to the two APIs in the section names
>  - add a note at the beginning of the CROP API section
> 
> Also remove a note that is incorrect (correct wording is in
> vidioc-cropcap.rst).
> 
> Signed-off-by: Luca Ceresoli 
> Based on info from: Hans Verkuil 
> Cc: Hans Verkuil 
> ---
>  Documentation/media/uapi/v4l/common.rst|  2 +-
>  Documentation/media/uapi/v4l/crop.rst  | 21 -
>  Documentation/media/uapi/v4l/selection-api-005.rst |  2 ++
>  Documentation/media/uapi/v4l/selection-api.rst |  4 ++--
>  4 files changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/media/uapi/v4l/common.rst 
> b/Documentation/media/uapi/v4l/common.rst
> index 13f2ed3fc5a6..5f93e71122ef 100644
> --- a/Documentation/media/uapi/v4l/common.rst
> +++ b/Documentation/media/uapi/v4l/common.rst
> @@ -41,6 +41,6 @@ applicable to all devices.
>  extended-controls
>  format
>  planar-apis
> -crop
>  selection-api
> +crop
>  streaming-par
> diff --git a/Documentation/media/uapi/v4l/crop.rst 
> b/Documentation/media/uapi/v4l/crop.rst
> index 182565b9ace4..83fa16eb347e 100644
> --- a/Documentation/media/uapi/v4l/crop.rst
> +++ b/Documentation/media/uapi/v4l/crop.rst
> @@ -2,9 +2,18 @@
>  
>  .. _crop:
>  
> -*
> -Image Cropping, Insertion and Scaling
> -*
> +*
> +Image Cropping, Insertion and Scaling -- the CROP API
> +*
> +
> +.. note::
> +
> +   The CROP API is mostly superseded by the newer :ref:`SELECTION API
> +   `. The new API should be preferred in most cases,
> +   with the exception of pixel aspect ratio detection, which is
> +   implemented by :ref:`VIDIOC_CROPCAP ` and has no
> +   equivalent in the SELECTION API. See :ref:`selection-vs-crop` for a
> +   comparison of the two APIs.
>  
>  Some video capture devices can sample a subsection of the picture and
>  shrink or enlarge it to an image of arbitrary size. We call these
> @@ -40,12 +49,6 @@ support scaling or the :ref:`VIDIOC_G_CROP 
> ` and
>  :ref:`VIDIOC_S_CROP ` ioctls. Their size (and position
>  where applicable) will be fixed in this case.
>  
> -.. note::
> -
> -   All capture and output devices must support the
> -   :ref:`VIDIOC_CROPCAP ` ioctl such that applications
> -   can determine if scaling takes place.

This note should be rewritten, not deleted:

All capture and output devices that support the CROP or SELECTION API
will also support the :ref:`VIDIOC_CROPCAP ` ioctl.

Regards,

Hans

> -
>  
>  Cropping Structures
>  ===
> diff --git a/Documentation/media/uapi/v4l/selection-api-005.rst 
> b/Documentation/media/uapi/v4l/selection-api-005.rst
> index 5b47a28ac6d7..2ad30a49184f 100644
> --- a/Documentation/media/uapi/v4l/selection-api-005.rst
> +++ b/Documentation/media/uapi/v4l/selection-api-005.rst
> @@ -1,5 +1,7 @@
>  .. -*- coding: utf-8; mode: rst -*-
>  
> +.. _selection-vs-crop:
> +
>  
>  Comparison with old cropping API
>  
> diff --git a/Documentation/media/uapi/v4l/selection-api.rst 
> b/Documentation/media/uapi/v4l/selection-api.rst
> index 81ea52d785b9..e4e623824b30 100644
> --- a/Documentation/media/uapi/v4l/selection-api.rst
> +++ b/Documentation/media/uapi/v4l/selection-api.rst
> @@ -2,8 +2,8 @@
>  
>  .. _selection-api:
>  
> -API for cropping, composing and scaling
> -===
> +Cropping, composing and scaling -- the SELECTION API
> +
>  
>  
>  .. toctree::
>

Re: [PATCH 1/5] media: docs: selection: fix typos

2018-05-13 Thread Hans Verkuil

On 04/03/2018 11:15 PM, Luca Ceresoli wrote:

Please add a commit message here. Yes, it can be as simple as 'Fixed typos in 
the
selection documentation.'

Regards,

Hans

> Cc: Hans Verkuil 
> Signed-off-by: Luca Ceresoli 
> ---
>  Documentation/media/uapi/v4l/selection-api-004.rst | 2 +-
>  Documentation/media/uapi/v4l/selection.svg | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/media/uapi/v4l/selection-api-004.rst 
> b/Documentation/media/uapi/v4l/selection-api-004.rst
> index d782cd5b2117..0a4ddc2d71db 100644
> --- a/Documentation/media/uapi/v4l/selection-api-004.rst
> +++ b/Documentation/media/uapi/v4l/selection-api-004.rst
> @@ -41,7 +41,7 @@ The driver may further adjust the requested size and/or 
> position
>  according to hardware limitations.
>  
>  Each capture device has a default source rectangle, given by the
> -``V4L2_SEL_TGT_CROP_DEFAULT`` target. This rectangle shall over what the
> +``V4L2_SEL_TGT_CROP_DEFAULT`` target. This rectangle shall cover what the
>  driver writer considers the complete picture. Drivers shall set the
>  active crop rectangle to the default when the driver is first loaded,
>  but not later.
> diff --git a/Documentation/media/uapi/v4l/selection.svg 
> b/Documentation/media/uapi/v4l/selection.svg
> index a93e3b59786d..911062bd2844 100644
> --- a/Documentation/media/uapi/v4l/selection.svg
> +++ b/Documentation/media/uapi/v4l/selection.svg
> @@ -1128,11 +1128,11 @@
> 
>
> y="1368.429" enable-background="new" font-size="50" style="line-height:125%">
> -   COMPOSE_BONDS
> +   COMPOSE_BOUNDS
>
>
>  enable-background="new" style="line-height:125%">
> -CROP_BONDS
> +CROP_BOUNDS
> 
>  enable-background="new" style="line-height:125%">
>  overscan area
>

Re: [PATCH 1/2] KVM: X86: Fix CR3 reserve bits

2018-05-13 Thread Wanpeng Li

2018-05-13 17:09 GMT+08:00 Liran Alon :
>
> - kernel...@gmail.com wrote:
>
>> 2018-05-13 16:28 GMT+08:00 Liran Alon :
>> >
>> > - kernel...@gmail.com wrote:
>> >
>> >> 2018-05-13 15:53 GMT+08:00 Liran Alon :
>> >> >
>> >> > - kernel...@gmail.com wrote:
>> >> >
>> >> >> From: Wanpeng Li 
>> >> >>
>> >> >> MSB of CR3 is a reserved bit if the PCIDE bit is not set in
>> CR4.
>> >> >> It should be checked when PCIDE bit is not set, however commit
>> >> >> 'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based
>> on
>> >> >> its physical address width")' removes the bit 63 checking
>> >> >> unconditionally. This patch fixes it by checking bit 63 of CR3
>> >> >> when PCIDE bit is not set in CR4.
>> >> >>
>> >> >> Fixes: d1cd3ce900441 (KVM: MMU: check guest CR3 reserved bits
>> based
>> >> on
>> >> >> its physical address width)
>> >> >> Cc: Paolo Bonzini 
>> >> >> Cc: Radim Krčmář 
>> >> >> Cc: Junaid Shahid 
>> >> >> Signed-off-by: Wanpeng Li 
>> >> >> ---
>> >> >>  arch/x86/kvm/emulate.c | 4 +++-
>> >> >>  arch/x86/kvm/x86.c | 2 +-
>> >> >>  2 files changed, 4 insertions(+), 2 deletions(-)
>> >> >>
>> >> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> >> >> index b3705ae..b21f427 100644
>> >> >> --- a/arch/x86/kvm/emulate.c
>> >> >> +++ b/arch/x86/kvm/emulate.c
>> >> >> @@ -4189,7 +4189,9 @@ static int check_cr_write(struct
>> >> >> x86_emulate_ctxt *ctxt)
>> >> >>   maxphyaddr = eax & 0xff;
>> >> >>   else
>> >> >>   maxphyaddr = 36;
>> >> >> - rsvd = rsvd_bits(maxphyaddr, 62);
>> >> >> + if (ctxt->ops->get_cr(ctxt, 4) &
>> >> X86_CR4_PCIDE)
>> >> >> + new_val &= ~CR3_PCID_INVD;
>> >> >> + rsvd = rsvd_bits(maxphyaddr, 63);
>> >> >
>> >> > I would prefer instead to do this:
>> >> > if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
>> >> > rsvd &= ~CR3_PCID_INVD;
>> >> > It makes more sense as opposed to temporary removing the
>> >> CR3_PCID_INVD bit from new_val.
>> >>
>> >> It tries the same way
>> >>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_virt_kvm_kvm.git_commit_-3Fid-3Dc19986fea873f3c745122bf79013a872a190f212=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0=r52WDgKBorUHwe_B_5Nw2Le_F_E0ne8lqqWW6n-3bSg=ufTcXvhhAMkY3XP6gAx-HiKCT8ynPWo2fs2z9DqCzM4=
>> >> pointed out.
>> >>
>> >> Regards,
>> >> Wanpeng Li
>> >
>> > Yes but there it makes sense as new CR3 value should not have bit 63
>> set in vcpu->arch.cr3.
>>
>> When X86_CR4_PCIDE == 0 and CR3 63 bit is set, a #GP is missing in
>> your suggestion.
>>
>> Regards,
>> Wanpeng Li
>
> Why?
>
> I suggest the following change:
> - rsvd = rsvd_bits(maxphyaddr, 62);
> + rsvd = rsvd_bits(maxphyaddr, 63);
> + if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
> + rsvd &= ~CR3_PCID_INVD;
>
> In this case, if PCIDE=0 then bit 63 is set in rsvd and therefore 
> check_cr_write() will emulate_gp() as needed.

Ok, I misread your first reply, will send out v2.

Regards,
Wanpeng Li

Re: [PATCH 3/5] media: docs: selection: rename files to something meaningful

2018-05-13 Thread Hans Verkuil

On 04/03/2018 11:15 PM, Luca Ceresoli wrote:
> These files have an automatically-generated numbering. Replaname them

Replaname -> Replace

> to something that suggests their meaning.

to -> with

Regards,

Hans

> 
> Reported-by: Hans Verkuil 
> Cc: Hans Verkuil 
> Signed-off-by: Luca Ceresoli 
> ---
>  .../{selection-api-004.rst => selection-api-configuration.rst} |  0
>  .../v4l/{selection-api-006.rst => selection-api-examples.rst}  |  0
>  .../v4l/{selection-api-002.rst => selection-api-intro.rst} |  0
>  .../v4l/{selection-api-003.rst => selection-api-targets.rst}   |  0
>  .../{selection-api-005.rst => selection-api-vs-crop-api.rst}   |  0
>  Documentation/media/uapi/v4l/selection-api.rst | 10 
> +-
>  6 files changed, 5 insertions(+), 5 deletions(-)
>  rename Documentation/media/uapi/v4l/{selection-api-004.rst => 
> selection-api-configuration.rst} (100%)
>  rename Documentation/media/uapi/v4l/{selection-api-006.rst => 
> selection-api-examples.rst} (100%)
>  rename Documentation/media/uapi/v4l/{selection-api-002.rst => 
> selection-api-intro.rst} (100%)
>  rename Documentation/media/uapi/v4l/{selection-api-003.rst => 
> selection-api-targets.rst} (100%)
>  rename Documentation/media/uapi/v4l/{selection-api-005.rst => 
> selection-api-vs-crop-api.rst} (100%)
> 
> diff --git a/Documentation/media/uapi/v4l/selection-api-004.rst 
> b/Documentation/media/uapi/v4l/selection-api-configuration.rst
> similarity index 100%
> rename from Documentation/media/uapi/v4l/selection-api-004.rst
> rename to Documentation/media/uapi/v4l/selection-api-configuration.rst
> diff --git a/Documentation/media/uapi/v4l/selection-api-006.rst 
> b/Documentation/media/uapi/v4l/selection-api-examples.rst
> similarity index 100%
> rename from Documentation/media/uapi/v4l/selection-api-006.rst
> rename to Documentation/media/uapi/v4l/selection-api-examples.rst
> diff --git a/Documentation/media/uapi/v4l/selection-api-002.rst 
> b/Documentation/media/uapi/v4l/selection-api-intro.rst
> similarity index 100%
> rename from Documentation/media/uapi/v4l/selection-api-002.rst
> rename to Documentation/media/uapi/v4l/selection-api-intro.rst
> diff --git a/Documentation/media/uapi/v4l/selection-api-003.rst 
> b/Documentation/media/uapi/v4l/selection-api-targets.rst
> similarity index 100%
> rename from Documentation/media/uapi/v4l/selection-api-003.rst
> rename to Documentation/media/uapi/v4l/selection-api-targets.rst
> diff --git a/Documentation/media/uapi/v4l/selection-api-005.rst 
> b/Documentation/media/uapi/v4l/selection-api-vs-crop-api.rst
> similarity index 100%
> rename from Documentation/media/uapi/v4l/selection-api-005.rst
> rename to Documentation/media/uapi/v4l/selection-api-vs-crop-api.rst
> diff --git a/Documentation/media/uapi/v4l/selection-api.rst 
> b/Documentation/media/uapi/v4l/selection-api.rst
> index e4e623824b30..390233f704a3 100644
> --- a/Documentation/media/uapi/v4l/selection-api.rst
> +++ b/Documentation/media/uapi/v4l/selection-api.rst
> @@ -9,8 +9,8 @@ Cropping, composing and scaling -- the SELECTION API
>  .. toctree::
>  :maxdepth: 1
>  
> -selection-api-002
> -selection-api-003
> -selection-api-004
> -selection-api-005
> -selection-api-006
> +selection-api-intro.rst
> +selection-api-targets.rst
> +selection-api-configuration.rst
> +selection-api-vs-crop-api.rst
> +selection-api-examples.rst
>

Re: [PATCH 1/5] media: docs: selection: fix typos

2018-05-13 Thread Hans Verkuil

Hi Luca,

My apologies for the long delay in reviewing this.

It all looks very good and if you can post a v2 with these small issues
fixed, then I'll merge it for 4.18.

Regards,

Hans

On 05/13/2018 11:13 AM, Hans Verkuil wrote:
> On 04/03/2018 11:15 PM, Luca Ceresoli wrote:
> 
> Please add a commit message here. Yes, it can be as simple as 'Fixed typos in 
> the
> selection documentation.'
> 
> Regards,
> 
>   Hans
> 
>> Cc: Hans Verkuil 
>> Signed-off-by: Luca Ceresoli 
>> ---
>>  Documentation/media/uapi/v4l/selection-api-004.rst | 2 +-
>>  Documentation/media/uapi/v4l/selection.svg | 4 ++--
>>  2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/media/uapi/v4l/selection-api-004.rst 
>> b/Documentation/media/uapi/v4l/selection-api-004.rst
>> index d782cd5b2117..0a4ddc2d71db 100644
>> --- a/Documentation/media/uapi/v4l/selection-api-004.rst
>> +++ b/Documentation/media/uapi/v4l/selection-api-004.rst
>> @@ -41,7 +41,7 @@ The driver may further adjust the requested size and/or 
>> position
>>  according to hardware limitations.
>>  
>>  Each capture device has a default source rectangle, given by the
>> -``V4L2_SEL_TGT_CROP_DEFAULT`` target. This rectangle shall over what the
>> +``V4L2_SEL_TGT_CROP_DEFAULT`` target. This rectangle shall cover what the
>>  driver writer considers the complete picture. Drivers shall set the
>>  active crop rectangle to the default when the driver is first loaded,
>>  but not later.
>> diff --git a/Documentation/media/uapi/v4l/selection.svg 
>> b/Documentation/media/uapi/v4l/selection.svg
>> index a93e3b59786d..911062bd2844 100644
>> --- a/Documentation/media/uapi/v4l/selection.svg
>> +++ b/Documentation/media/uapi/v4l/selection.svg
>> @@ -1128,11 +1128,11 @@
>> 
>>
>>> y="1368.429" enable-background="new" font-size="50" style="line-height:125%">
>> -   COMPOSE_BONDS
>> +   COMPOSE_BOUNDS
>>
>>
>> > enable-background="new" style="line-height:125%">
>> -CROP_BONDS
>> +CROP_BOUNDS
>> 
>> > enable-background="new" style="line-height:125%">
>>  overscan area
>>
>

[PATCH v2] KVM: X86: Fix CR3 reserve bits

2018-05-13 Thread Wanpeng Li

From: Wanpeng Li 

MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4. 
It should be checked when PCIDE bit is not set, however commit 
'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based on 
its physical address width")' removes the bit 63 checking 
unconditionally. This patch fixes it by checking bit 63 of CR3 
when PCIDE bit is not set in CR4.

Fixes: d1cd3ce900441 (KVM: MMU: check guest CR3 reserved bits based on its 
physical address width)
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Junaid Shahid 
Cc: Liran Alon 
Signed-off-by: Wanpeng Li 
---
v1 -> v2:
 * remove CR3_PCID_INVD in rsvd when PCIDE is 1 instead of 
   removing CR3_PCID_INVD in new_value

 arch/x86/kvm/emulate.c | 4 +++-
 arch/x86/kvm/x86.c | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index b3705ae..143b7ae 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4189,7 +4189,9 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt)
maxphyaddr = eax & 0xff;
else
maxphyaddr = 36;
-   rsvd = rsvd_bits(maxphyaddr, 62);
+   rsvd = rsvd_bits(maxphyaddr, 63);
+   if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
+   rsvd &= ~CR3_PCID_INVD;
}
 
if (new_val & rsvd)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 87e4805..9a90668 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -863,7 +863,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
}
 
if (is_long_mode(vcpu) &&
-   (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 62)))
+   (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
return 1;
else if (is_pae(vcpu) && is_paging(vcpu) &&
   !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
-- 
2.7.4

Re: [PATCH v5 04/23] iommu/vt-d: add bind_pasid_table function

2018-05-13 Thread Lu Baolu

Hi,

On 05/12/2018 04:53 AM, Jacob Pan wrote:
> Add Intel VT-d ops to the generic iommu_bind_pasid_table API
> functions.
>
> The primary use case is for direct assignment of SVM capable
> device. Originated from emulated IOMMU in the guest, the request goes
> through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
> passes guest PASID table pointer (GPA) and size.
>
> Device context table entry is modified by Intel IOMMU specific
> bind_pasid_table function. This will turn on nesting mode and matching
> translation type.
>
> The unbind operation restores default context mapping.
>
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> ---
>  drivers/iommu/intel-iommu.c   | 122 
> ++
>  include/linux/dma_remapping.h |   1 +
>  2 files changed, 123 insertions(+)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index a0f81a4..4623294 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2409,6 +2409,7 @@ static struct dmar_domain 
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>   info->ats_supported = info->pasid_supported = info->pri_supported = 0;
>   info->ats_enabled = info->pasid_enabled = info->pri_enabled = 0;
>   info->ats_qdep = 0;
> + info->pasid_table_bound = 0;
>   info->dev = dev;
>   info->domain = domain;
>   info->iommu = iommu;
> @@ -5132,6 +5133,7 @@ static void intel_iommu_put_resv_regions(struct device 
> *dev,
>  
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  #define MAX_NR_PASID_BITS (20)
> +#define MIN_NR_PASID_BITS (5)
>  static inline unsigned long intel_iommu_get_pts(struct intel_iommu *iommu)
>  {
>   /*
> @@ -5258,6 +5260,122 @@ struct intel_iommu *intel_svm_device_to_iommu(struct 
> device *dev)
>  
>   return iommu;
>  }
> +
> +static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
> + struct device *dev, struct pasid_table_config *pasidt_binfo)
> +{
> + struct intel_iommu *iommu;
> + struct context_entry *context;
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct device_domain_info *info;
> + struct pci_dev *pdev;
> + u8 bus, devfn, host_table_pasid_bits;
> + u16 did, sid;
> + int ret = 0;
> + unsigned long flags;
> + u64 ctx_lo;

I personally prefer to have this in order.

struct dmar_domain *dmar_domain = to_dmar_domain(domain);
u8 bus, devfn, host_table_pasid_bits;
struct device_domain_info *info;
struct context_entry *context;
struct intel_iommu *iommu;
struct pci_dev *pdev;
unsigned long flags;
u16 did, sid;
int ret = 0;
u64 ctx_lo;

> +
> + if ((pasidt_binfo->version != PASID_TABLE_CFG_VERSION_1) ||

Unnecessary parentheses.

> + pasidt_binfo->bytes != sizeof(*pasidt_binfo))

Alignment should match open parenthesis.

> + return -EINVAL;
> + iommu = device_to_iommu(dev, , );
> + if (!iommu)
> + return -ENODEV;
> + /* VT-d spec section 9.4 says pasid table size is encoded as 2^(x+5) */
> + host_table_pasid_bits = intel_iommu_get_pts(iommu) + MIN_NR_PASID_BITS;
> + if (!pasidt_binfo || pasidt_binfo->pasid_bits > host_table_pasid_bits ||

"!pasidt_binfo" checking should be moved up to the version checking.

> + pasidt_binfo->pasid_bits < MIN_NR_PASID_BITS) {
> + pr_err("Invalid gPASID bits %d, host range %d - %d\n",

How about dev_err()? 

> + pasidt_binfo->pasid_bits,
> + MIN_NR_PASID_BITS, host_table_pasid_bits);
> + return -ERANGE;
> + }
> + if (!ecap_nest(iommu->ecap)) {
> + dev_err(dev, "Cannot bind PASID table, no nested 
> translation\n");
> + ret = -ENODEV;
> + goto out;

How about
+return -ENODEV;
?

> + }
> + pdev = to_pci_dev(dev);

We can't always assume that it is a PCI device, right?

> + sid = PCI_DEVID(bus, devfn);
> + info = dev->archdata.iommu;
> +
> + if (!info) {
> + dev_err(dev, "Invalid device domain info\n");
> + ret = -EINVAL;
> + goto out;
> + }
> + if (info->pasid_table_bound) {

We should do this checking with lock hold.

Otherwise,

Thread A on CPUxThread B on CPUy
===
check pasid_table_boundcheck pasid_table_bound

mutex_lock()
Setup context
pasid_table_bound = 1
mutex_unlock()

   mutex_lock()
   Setup context
   pasid_table_bound = 1
   mutex_unlock()


> + dev_err(dev, "Device PASID table already bound\n");
> + ret = -EBUSY;
> + goto out;
> + }
> + if (!info->pasid_enabled) {
> +

Re: [PATCH][RFC v2] ACPI: acpi_pad: Do not launch acpi_pad threads on idle cpus

2018-05-13 Thread Rafael J. Wysocki

On Saturday, May 5, 2018 1:53:22 PM CEST Chen Yu wrote:
> According to current implementation of acpi_pad driver,
> it does not make sense to spawn any power saving threads
> on the cpus which are already idle - it might bring
> unnecessary overhead on these idle cpus and causes power
> waste. So verify the condition that if the number of 'busy'
> cpus exceeds the amount of the 'forced idle' cpus is met.
> This is applicable due to round-robin attribute of the
> power saving threads, otherwise ignore the setting/ACPI
> notification.

OK, but CPUs are busy, because they are running tasks.  If acpi_pad
kthreads run on them, the tasks they are running will migrate to the
currently idle CPUs (unless they have specific CPU affinity) and the
throttling will not really be effective.

I would think that acpi_pad should ensure that the requested number of
CPUs will not run anything other than throttling kthreads.  Isn't that
the case?

Thanks,
Rafael

< 1 2 3 4 5 6 7 >

301 - 400 of 692 matches

Mail list logo