date:20190524

Re: [alsa-devel] [PATCH][next] ALSA: firewire-lib: remove redundant assignment to cip_header

2019-05-24 Thread Takashi Sakamoto

Hi Colin,

On Sat, May 25, 2019, at 06:35, Colin King wrote:
> From: Colin Ian King 
> 
> The assignement to cip_header is redundant as the value never
> read and it is being re-assigned in the if and else paths of
> the following if statement. Clean up the code by removing it.
> 
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King 
> ---
>  sound/firewire/amdtp-stream.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/sound/firewire/amdtp-stream.c 
> b/sound/firewire/amdtp-stream.c
> index 2d9c764061d1..4236955bbf57 100644
> --- a/sound/firewire/amdtp-stream.c
> +++ b/sound/firewire/amdtp-stream.c
> @@ -675,7 +675,6 @@ static int handle_in_packet(struct amdtp_stream *s, 
> unsigned int cycle,
>   return -EIO;
>   }
>  
> - cip_header = ctx_header + 2;
>   if (!(s->flags & CIP_NO_HEADER)) {
>   cip_header = _header[2];
>   err = check_cip_header(s, cip_header, payload_length,

Thanks for the fix. I've already posted further patch for refactoring
and this was also fixed by a commit 98e3e43b599d ("
ALSA: firewire-lib: refactoring to obsolete IR packet handler").

https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=98e3e43b599d742c104864c6772a251025ffb52b

Thanks


Takashi Sakamoto

Re: [PATCH] ACPI / LPSS: Don't skip late system PM ops for hibernate on BYT/CHT

2019-05-24 Thread Robert R. Howell

On 5/16/19 5:11 AM, Rafael J. Wysocki wrote:
> 
> On Thursday, April 25, 2019 6:38:34 PM CEST Robert R. Howell wrote:
>> On 4/24/19 1:20 AM, Rafael J. Wysocki wrote:
>>
>>> On Tue, Apr 23, 2019 at 10:03 PM Robert R. Howell  wrote:

 On 4/23/19 2:07 AM, Rafael J. Wysocki wrote:
>
> On Sat, Apr 20, 2019 at 12:44 AM Robert R. Howell  
> wrote:
>>
>> On 4/18/19 5:42 AM, Hans de Goede wrote:
>>
 On 4/8/19 2:16 AM, Hans de Goede wrote:>
>
> Hmm, interesting so you have hibernation working on a T100TA
> (with 5.0 + 02e45646d53b reverted), right ?
>
>>
>>
>> I've managed to find a way around the i2c_designware timeout issues
>> on the T100TA's.  The key is to NOT set DPM_FLAG_SMART_SUSPEND,
>> which was added in the 02e45646d53b commit.
>>
>> To test that I've started with a 5.1-rc5 kernel, applied your recent 
>> patch
>> to acpi_lpss.c, then apply the following patch of mine, removing
>> DPM_FLAG_SMART_SUSPEND.  (For the T100 hardware I need to apply some
>> other patches as well but those are not related to the i2c-designware or
>> acpi issues addressed here.)
>>
>> On a resume from hibernation I still see one error:
>>   "i2c_designware 80860F41:00: Error i2c_dw_xfer called while suspended"
>> but I no longer get the i2c_designware timeouts, and audio does now work
>> after the resume.
>>
>> Removing DPM_FLAG_SMART_SUSPEND may not be what you want for other
>> hardware, but perhaps this will give you a clue as to what is going
>> wrong with hibernate/resume on the T100TA's.
>
> What if you drop DPM_FLAG_LEAVE_SUSPENDED alone instead?
>

 I did try dropping just DPM_FLAG_LEAVE_SUSPENDED, dropping just
 DPM_FLAG_SMART_SUSPEND, and dropping both flags.  When I just drop
 DPM_FLAG_LEAVE_SUSPENDED I still get the i2c_designware timeouts
 after the resume.  If I drop just DPM_FLAG_SMART_SUSPEND or drop both,
 then the timeouts go away.
>>>
>>> OK, thanks!
>>>
>>> Is non-hibernation system suspend affected too?
>>
>> I just ran some tests on a T100TA, using the 5.1-rc5 code with Hans' patch 
>> applied
>> but without any changes to i2c-designware-platdrv.c, so the
>> DPM_FLAG_SMART_PREPARE, DPM_FLAG_SMART_SUSPEND, and DPM_FLAG_LEAVE_SUSPENDED 
>> flags
>> are all set.
>>
>> Suspend does work OK, and after resume I do NOT get any of the crippling
>> i2c_designware timeout errors which cause sound to fail after hibernate.  I 
>> DO see one
>>   "i2c_designware 80860F41:00: Error i2c_dw_xfer call while suspended"
>> error on resume, just as I do on hibernate.  I've attached a portion of 
>> dmesg below.
>> The "asus_wmi:  Unknown key 79 pressed" error is a glitch which occurs
>> intermittently on these machines, but doesn't seem related to the other 
>> issues.
>> I had one test run when it was absent but the rest of the messages were the
>> same -- but then kept getting that unknown key error on all my later tries.
>>
>> I did notice the "2sidle" in the following rather than "shallow" or "deep".  
>> A
>> cat of /sys/power/state shows "freeze mem disk" but a
>> cat of /sys/power/mem_sleep" shows only "[s2idle] so it looks like shallow 
>> and deep
>> are not enabled for this system.  I did check the input power (or really 
>> current)
>> as it went into suspend and the micro-usb power input drops from about
>> 0.5 amps to 0.05 amps.  But clearly a lot of devices are still active, as 
>> movement
>> of a bluetooth mouse (the MX Anywhere 2) will wake it from suspend.  That 
>> presumably is
>> why suspend doesn't trigger the same i2c_designware problems as hibernate.
>>
>> Let me know if I can do any other tests.
> 
> Can you please check if the appended patch makes the hibernate issue go away 
> for you, without any other changes?
> 
> ---
>  drivers/pci/pci-driver.c |   36 ++--
>  1 file changed, 10 insertions(+), 26 deletions(-)
> 
> Index: linux-pm/drivers/pci/pci-driver.c
> ===
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -957,15 +957,14 @@ static int pci_pm_freeze(struct device *
> }
> 
> /*
> -* This used to be done in pci_pm_prepare() for all devices and some
> -* drivers may depend on it, so do it here.  Ideally, 
> runtime-suspended
> -* devices should not be touched during freeze/thaw transitions,
> -* however.
> +* Resume all runtime-suspended devices before creating a snapshot
> +* image of system memory, because the restore kernel generally cannot
> +* be expected to always handle them consistently and pci_pm_restore()
> +* always leaves them as "active", so ensure that the state saved in 
> the
> +* image will always be consistent with that.
>  */
> -   if

Re: [PATCH i2c/slave-mqueue v5] i2c: slave-mqueue: add a slave backend to receive and queue messages

2019-05-24 Thread Wang, Haiyue




在 2019-05-25 01:33, Eduardo Valentin 写道:

Hey,

On Fri, May 24, 2019 at 10:43:16AM +0800, Wang, Haiyue wrote:

Thanks for interest, the design idea is from:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/i2c-slave-eeprom.c?h=v5.2-rc1

and

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/i2c/slave-interface

Then you will get the answer. ;-)

Well, maybe :-) see further comments inline..
Please see in line. And how about the test result in your real system ? 
It works as expected ?

BR,

Haiyue


在 2019-05-24 06:03, Eduardo Valentin 写道:

Hey Wang,

On Tue, Apr 24, 2018 at 01:06:32AM +0800, Haiyue Wang wrote:

Some protocols over I2C are designed for bi-directional transferring
messages by using I2C Master Write protocol. Like the MCTP (Management
Component Transport Protocol) and IPMB (Intelligent Platform Management
Bus), they both require that the userspace can receive messages from
I2C dirvers under slave mode.

This new slave mqueue backend is used to receive and queue messages, it
will exposes these messages to userspace by sysfs bin file.

Signed-off-by: Haiyue Wang 
---
v4 -> v5:
  - Typo: bellowing -> the below

v3 -> v4:
  - Drop the small message after receiving I2C STOP.

v2 -> v3:
  - Just remove the ';' after the end '}' of i2c_slave_mqueue_probe().

v1 -> v2:
  - Change MQ_MSGBUF_SIZE and MQ_QUEUE_SIZE to be configurable by Kconfig.
---
  Documentation/i2c/slave-mqueue-backend.rst | 125 ++
  drivers/i2c/Kconfig|  25 
  drivers/i2c/Makefile   |   1 +
  drivers/i2c/i2c-slave-mqueue.c | 203 +
  4 files changed, 354 insertions(+)
  create mode 100644 Documentation/i2c/slave-mqueue-backend.rst
  create mode 100644 drivers/i2c/i2c-slave-mqueue.c

diff --git a/Documentation/i2c/slave-mqueue-backend.rst 
b/Documentation/i2c/slave-mqueue-backend.rst
new file mode 100644
index 000..3966cf0
--- /dev/null
+++ b/Documentation/i2c/slave-mqueue-backend.rst
@@ -0,0 +1,125 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=
+Linux I2C slave message queue backend
+=
+
+:Author: Haiyue Wang 
+
+Some protocols over I2C/SMBus are designed for bi-directional transferring
+messages by using I2C Master Write protocol. This requires that both sides
+of the communication have slave addresses.
+
+Like MCTP (Management Component Transport Protocol) and IPMB (Intelligent
+Platform Management Bus), they both require that the userspace can receive
+messages from i2c dirvers under slave mode.
+
+This I2C slave mqueue (message queue) backend is used to receive and queue
+messages from the remote i2c intelligent device; and it will add the target
+slave address (with R/W# bit is always 0) into the message at the first byte,
+so that userspace can use this byte to dispatch the messages into different
+handling modules. Also, like IPMB, the address byte is in its message format,
+it needs it to do checksum.
+
+For messages are time related, so this backend will flush the oldest message
+to queue the newest one.
+
+Link
+
+`Intelligent Platform Management Bus
+Communications Protocol Specification
+`_
+
+`Management Component Transport Protocol (MCTP)
+SMBus/I2C Transport Binding Specification
+`_
+
+How to use
+--
+For example, the I2C5 bus has slave address 0x10, the below command will create
+the related message queue interface:
+
+echo slave-mqueue 0x1010 > /sys/bus/i2c/devices/i2c-5/new_device
+
+Then you can dump the messages like this:
+
+hexdump -C /sys/bus/i2c/devices/5-1010/slave-mqueue
+
+Code Example
+
+*Note: call 'lseek' before 'read', this is a requirement from kernfs' design.*
+
+::
+
+  #include 
+  #include 
+  #include 
+  #include 
+  #include 
+  #include 
+  #include 
+
+  int main(int argc, char *argv[])
+  {
+  int i, r;
+  struct pollfd pfd;
+  struct timespec ts;
+  unsigned char data[256];
+
+  pfd.fd = open(argv[1], O_RDONLY | O_NONBLOCK);
+  if (pfd.fd < 0)
+  return -1;
+
+  pfd.events = POLLPRI;
+
+  while (1) {
+  r = poll(, 1, 5000);
+
+  if (r < 0)
+  break;
+
+  if (r == 0 || !(pfd.revents & POLLPRI))
+  continue;
+
+  lseek(pfd.fd, 0, SEEK_SET);
+  r = read(pfd.fd, data, sizeof(data));
+  if (r <= 0)
+  continue;
+
+  clock_gettime(CLOCK_MONOTONIC, );
+  printf("[%ld.%.9ld] :", ts.tv_sec, ts.tv_nsec);
+  for (i = 0; i < r; i++)
+

Re: [A General Question] What should I do after getting Reviewed-by from a maintainer?

2019-05-24 Thread Willy Tarreau

On Sat, May 25, 2019 at 10:12:41AM +0800, Gen Zhang wrote:
> On Fri, May 24, 2019 at 04:21:36PM -0700, Randy Dunlap wrote:
> > On 5/22/19 6:17 PM, Gen Zhang wrote:
> > > Hi Andrew,
> > > I am starting submitting patches these days and got some patches 
> > > "Reviewed-by" from maintainers. After checking the 
> > > submitting-patches.html, I figured out what "Reviewed-by" means. But I
> > > didn't get the guidance on what to do after getting "Reviewed-by".
> > > Am I supposed to send this patch to more maintainers? Or something else?
> > > Thanks
> > > Gen
> > > 
> > 
> > [Yes, I am not Andrew. ;]
> > 
> > Patches should be sent to a maintainer who is responsible for merging
> > changes for the driver or $arch or subsystem.
> > And they should also be Cc-ed to the appropriate mailing list(s) and
> > source code author(s), usually [unless they are no longer active].
> > 
> > Some source files have author email addresses in them.
> > Or in a kernel git tree, you can use "git log path/to/source/file.c" to see
> > who has been making & merging patches to that file.c.
> > Probably the easiest thing to do is run ./scripts/get_maintainer.pl and
> > it will try to tell you who to send the patch to.
> > 
> > HTH.
> > -- 
> > ~Randy
> Thanks for your patient instructions, Randy! I alrady figured it out.

Then if your question is what to do with these "Reviewed-by", you should
edit your patches and place these fields next to your Signed-off-by line
to indicate that these persons have reviewed this code (and didn't have
anything particular to say about it). From this point you should not
modify the patches with this tag.

When you'll resend your final series to the maintainer, it will include
all these reviewed-by tags and will generally save the maintainer some
review time by skipping some of them.

Willy

Re: [PATCH net] staging: Remove set but not used variable ‘status’

2019-05-24 Thread Greg KH

On Sat, May 25, 2019 at 12:26:42PM +0800, Mao Wenan wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/staging/kpc2000/kpc_spi/spi_driver.c: In function
> ‘kp_spi_transfer_one_message’:
> drivers/staging/kpc2000/kpc_spi/spi_driver.c:282:9: warning: variable
> ‘status’ set but not used [-Wunused-but-set-variable]
>  int status = 0;
>  ^~
> The variable 'status' is not used any more, remve it.
> 
> Signed-off-by: Mao Wenan 
> ---
>  drivers/staging/kpc2000/kpc_spi/spi_driver.c | 3 ---
>  1 file changed, 3 deletions(-)

What is [PATCH net] in the subject for?  This is not a networking driver
:(

Re: [PATCH 2/2] staging: kpc2000: add missing dependencies for kpc2000

2019-05-24 Thread Greg KH

On Fri, May 24, 2019 at 10:30:58PM +0200, Simon Sandström wrote:
> Fixes build errors:
> 
> ERROR: "mfd_remove_devices" [kpc2000.ko] undefined!
> ERROR: "uio_unregister_device" [kpc2000.ko] undefined!
> ERROR: "mfd_add_devices" [kpc2000.ko] undefined!
> ERROR: "__uio_register_device" [kpc2000.ko] undefined!
> 
> Signed-off-by: Simon Sandström 
> ---
>  drivers/staging/kpc2000/Kconfig | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/staging/kpc2000/Kconfig b/drivers/staging/kpc2000/Kconfig
> index c463d232f2b4..5188b56123ab 100644
> --- a/drivers/staging/kpc2000/Kconfig
> +++ b/drivers/staging/kpc2000/Kconfig
> @@ -3,6 +3,8 @@
>  config KPC2000
>   bool "Daktronics KPC Device support"
>   depends on PCI
> + select MFD_CORE
> + select UIO
>   help
> Select this if you wish to use the Daktronics KPC PCI devices
>  

This is already in linux-next (in a different form), are you sure you
are working against the latest kernel tree?

thanks,

greg k-h

drivers/mtd/chips/cfi_cmdset_0002.o: warning: objtool: chip_good() falls through to next function do_read_secsi_onechip()

2019-05-24 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   7fbc78e3155a0c464bd832efc07fb3c2355fe9bd
commit: e6f393bc939d566ce3def71232d8013de9aaadde objtool: Fix function 
fallthrough detection
date:   11 days ago
config: x86_64-randconfig-b0-05251117 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout e6f393bc939d566ce3def71232d8013de9aaadde
# save the attached .config to linux build tree
make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All warnings (new ones prefixed by >>):

>> drivers/mtd/chips/cfi_cmdset_0002.o: warning: objtool: chip_good() falls 
>> through to next function do_read_secsi_onechip()
--
>> drivers/mtd/chips/cfi_util.o: warning: objtool: cfi_qry_present() falls 
>> through to next function cfi_merge_status()

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[PATCH net] staging: Remove set but not used variable ‘status’

2019-05-24 Thread Mao Wenan

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/staging/kpc2000/kpc_spi/spi_driver.c: In function
‘kp_spi_transfer_one_message’:
drivers/staging/kpc2000/kpc_spi/spi_driver.c:282:9: warning: variable
‘status’ set but not used [-Wunused-but-set-variable]
 int status = 0;
 ^~
The variable 'status' is not used any more, remve it.

Signed-off-by: Mao Wenan 
---
 drivers/staging/kpc2000/kpc_spi/spi_driver.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/staging/kpc2000/kpc_spi/spi_driver.c 
b/drivers/staging/kpc2000/kpc_spi/spi_driver.c
index 86df16547a92..16f9518f8d63 100644
--- a/drivers/staging/kpc2000/kpc_spi/spi_driver.c
+++ b/drivers/staging/kpc2000/kpc_spi/spi_driver.c
@@ -279,7 +279,6 @@ kp_spi_transfer_one_message(struct spi_master *master, 
struct spi_message *m)
 struct kp_spi   *kpspi;
 struct spi_transfer *transfer;
 union kp_spi_config sc;
-int status = 0;
 
 spidev = m->spi;
 kpspi = spi_master_get_devdata(master);
@@ -332,7 +331,6 @@ kp_spi_transfer_one_message(struct spi_master *master, 
struct spi_message *m)
 /* do the transfers for this message */
 list_for_each_entry(transfer, >transfers, transfer_list) {
 if (transfer->tx_buf == NULL && transfer->rx_buf == NULL && 
transfer->len) {
-status = -EINVAL;
 break;
 }
 
@@ -370,7 +368,6 @@ kp_spi_transfer_one_message(struct spi_master *master, 
struct spi_message *m)
 m->actual_length += count;
 
 if (count != transfer->len) {
-status = -EIO;
 break;
 }
 }
-- 
2.20.1

Re: [PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering

2019-05-24 Thread Saravana Kannan

On Fri, May 24, 2019 at 7:40 PM Frank Rowand  wrote:
>
> Hi Saranova,
>
> I'll try to address the other portions of this email that I 
> in my previous replies.
>
>
> On 5/24/19 2:53 PM, Saravana Kannan wrote:
> > On Fri, May 24, 2019 at 10:49 AM Frank Rowand  
> > wrote:
> >>
> >> On 5/23/19 6:01 PM, Saravana Kannan wrote:
> >>> Add a generic "depends-on" property that allows specifying mandatory
> >>> functional dependencies between devices. Add device-links after the
> >>> devices are created (but before they are probed) by looking at this
> >>> "depends-on" property.
> >>>
> >>> This property is used instead of existing DT properties that specify
> >>> phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> >>> is because not all resources referred to by existing DT properties are
> >>> mandatory functional dependencies. Some devices/drivers might be able> to 
> >>> operate with reduced functionality when some of the resources
> >>> aren't available. For example, a device could operate in polling mode
> >>> if no IRQ is available, a device could skip doing power management if
> >>> clock or voltage control isn't available and they are left on, etc.
> >>>
> >>> So, adding mandatory functional dependency links between devices by
> >>> looking at referred phandles in DT properties won't work as it would
> >>> prevent probing devices that could be probed. By having an explicit
> >>> depends-on property, we can handle these cases correctly.
> >>
> >> Trying to wrap my brain around the concept, this series seems to be
> >> adding the ability to declare that an apparent dependency (eg an IRQ
> >> specified by a phandle) is _not_ actually a dependency.
> >
> > The current implementation completely ignores existing bindings for
> > dependencies and so does the current tip of the kernel. So it's not
> > really overriding anything. However, if I change the implementation so
> > that depends-on becomes the source of truth if it exists and falls
> > back to existing common bindings if "depends-on" isn't present -- then
> > depends-on would truly be overriding existing bindings for
> > dependencies. It depends on how we want to define the DT property.
> >
> >> The phandle already implies the dependency.
> >
> > Sure, it might imply, but it's not always true.
> >
> >> Creating a separate
> >> depends-on property provides a method of ignoring the implied
> >> dependencies.
> >
> > implied != true
> >
> >> This is not just hardware description.  It is instead a combination
> >> of hardware functionality and driver functionality.  An example
> >> provided in the second paragraph of the email I am replying to
> >> suggests a device could operate in polling mode if no IRQ is
> >> available.  Using this example, the devicetree does not know
> >> whether the driver requires the IRQ (currently an implied
> >> dependency since the IRQ phandle exists).  My understanding
> >> of this example is that the device node would _not_ have a
> >> depends-on property for the IRQ phandle so the IRQ would be
> >> optional.  But this is an attribute of the driver, not the
> >> hardware.
> >
>
> > Not really. The interrupt could be for "SD card plugged in". That's
> > never a mandatory dependency for the SD card controller to work. So
> > the IRQ provider won't be a "depends-on" in this case. But if there is
> > no power supply or clock for the SD card controller, it isn't going to
> > work -- so they'd be listed in the "depends-on". So, this is still
> > defining the hardware and not the OS.
>
> Please comment on my observation that was based on an IRQ for a device
> will polling mode vs interrupt driven mode.
> You described a different
> case and did not address my comment.

I thought I did reply -- not sure what part you are looking for so
I'll rephrase. I was just picking the SD card controller as a concrete
example of device that can work with or without an interrupt. But
sure, I can call it "the device".

And yes, the device won't have a "depends-on" on the IRQ provider
because the device can still work without a working (as in bound to
driver) IRQ provider. Whether the driver insists on waiting on an IRQ
provider or not is up to the driver and the depends-on property is NOT
trying to dictate what the driver should do in this case. Does that
answer your implied question?

>
> >> This is also configuration, declaring whether the
> >> system is willing to accept polling mode instead of interrupt
> >> mode.
> >
> > Whether the driver will choose to operate without the IRQ is up to it.
> > The OS could also assume the power supply is never turned off and
> > still try to use the device. Depending on the hardware configuration,
> > that might or might not work.
> >
> >> Devicetree is not the proper place for driver description or
> >> for configuration.
> >
> > But depends-on isn't describing the driver configuration though.
> >
> > Overall, the clock provider example I gave in another reply is a much
> > better example.

Re: [PATCH 2/7] keys: sparse: Fix incorrect RCU accesses

2019-05-24 Thread James Morris

On Wed, 22 May 2019, David Howells wrote:

> Fix a pair of accesses that should be using RCU protection.
> 
> rcu_dereference_protected() is needed to access task_struct::real_parent.
> 
> current_cred() should be used to access current->cred.
> 
> Signed-off-by: David Howells 


Reviewed-by: James Morris 


-- 
James Morris

Re: [PATCH 3/7] keys: sparse: Fix kdoc mismatches

2019-05-24 Thread James Morris

On Wed, 22 May 2019, David Howells wrote:

> Fix some kdoc argument description mismatches reported by sparse and give
> keyring_restrict() a description.
> 
> Signed-off-by: David Howells 
> cc: Mat Martineau 


Reviewed-by: James Morris 


-- 
James Morris

linusw/fixes boot bisection: v5.2-rc1-1-ge9646f0f5bb6 on rk3288-veyron-jaq

2019-05-24 Thread kernelci.org bot

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* This automated bisection report was sent to you on the basis  *
* that you may be involved with the breaking commit it has  *
* found.  No manual investigation has been done to verify it,   *
* and the root cause of the problem may be somewhere else.  *
* Hope this helps!  *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

linusw/fixes boot bisection: v5.2-rc1-1-ge9646f0f5bb6 on rk3288-veyron-jaq

Summary:
  Start:  e9646f0f5bb6 gpio: fix gpio-adp5588 build errors
  Details:https://kernelci.org/boot/id/5ce82d9f59b514bb857a3642
  Plain log:  
https://storage.kernelci.org//linusw/fixes/v5.2-rc1-1-ge9646f0f5bb6/arm/multi_v7_defconfig+CONFIG_EFI=y+CONFIG_ARM_LPAE=y/gcc-8/lab-collabora/boot-rk3288-veyron-jaq.txt
  HTML log:   
https://storage.kernelci.org//linusw/fixes/v5.2-rc1-1-ge9646f0f5bb6/arm/multi_v7_defconfig+CONFIG_EFI=y+CONFIG_ARM_LPAE=y/gcc-8/lab-collabora/boot-rk3288-veyron-jaq.html
  Result: 28694e009e51 thermal: rockchip: fix up the tsadc pinctrl setting 
error

Checks:
  revert: PASS
  verify: PASS

Parameters:
  Tree:   linusw
  URL:
https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git/
  Branch: fixes
  Target: rk3288-veyron-jaq
  CPU arch:   arm
  Lab:lab-collabora
  Compiler:   gcc-8
  Config: multi_v7_defconfig+CONFIG_EFI=y+CONFIG_ARM_LPAE=y
  Test suite: boot

Breaking commit found:

---
commit 28694e009e512451ead5519dd801f9869acb1f60
Author: Elaine Zhang 
Date:   Tue Apr 30 18:09:44 2019 +0800

thermal: rockchip: fix up the tsadc pinctrl setting error

Explicitly use the pinctrl to set/unset the right mode
instead of relying on the pinctrl init mode.
And it requires setting the tshut polarity before select pinctrl.

When the temperature sensor mode is set to 0, it will automatically
reset the board via the Clock-Reset-Unit (CRU) if the over temperature
threshold is reached. However, when the pinctrl initializes, it does a
transition to "otp_out" which may lead the SoC restart all the time.

"otp_out" IO may be connected to the RESET circuit on the hardware.
If the IO is in the wrong state, it will trigger RESET.
(similar to the effect of pressing the RESET button)
which will cause the soc to restart all the time.

Signed-off-by: Elaine Zhang 
Reviewed-by: Daniel Lezcano 
Signed-off-by: Eduardo Valentin 

diff --git a/drivers/thermal/rockchip_thermal.c 
b/drivers/thermal/rockchip_thermal.c
index 9c7643d62ed7..6dc7fc516abf 100644
--- a/drivers/thermal/rockchip_thermal.c
+++ b/drivers/thermal/rockchip_thermal.c
@@ -172,6 +172,9 @@ struct rockchip_thermal_data {
int tshut_temp;
enum tshut_mode tshut_mode;
enum tshut_polarity tshut_polarity;
+   struct pinctrl *pinctrl;
+   struct pinctrl_state *gpio_state;
+   struct pinctrl_state *otp_state;
 };
 
 /**
@@ -1242,6 +1245,8 @@ static int rockchip_thermal_probe(struct platform_device 
*pdev)
return error;
}
 
+   thermal->chip->control(thermal->regs, false);
+
error = clk_prepare_enable(thermal->clk);
if (error) {
dev_err(>dev, "failed to enable converter clock: %d\n",
@@ -1267,6 +1272,30 @@ static int rockchip_thermal_probe(struct platform_device 
*pdev)
thermal->chip->initialize(thermal->grf, thermal->regs,
  thermal->tshut_polarity);
 
+   if (thermal->tshut_mode == TSHUT_MODE_GPIO) {
+   thermal->pinctrl = devm_pinctrl_get(>dev);
+   if (IS_ERR(thermal->pinctrl)) {
+   dev_err(>dev, "failed to find thermal pinctrl\n");
+   return PTR_ERR(thermal->pinctrl);
+   }
+
+   thermal->gpio_state = pinctrl_lookup_state(thermal->pinctrl,
+  "gpio");
+   if (IS_ERR_OR_NULL(thermal->gpio_state)) {
+   dev_err(>dev, "failed to find thermal gpio 
state\n");
+   return -EINVAL;
+   }
+
+   thermal->otp_state = pinctrl_lookup_state(thermal->pinctrl,
+ "otpout");
+   if (IS_ERR_OR_NULL(thermal->otp_state)) {
+   dev_err(>dev, "failed to find thermal otpout 
state\n");
+   return -EINVAL;
+   }
+
+   pinctrl_select_state(thermal->pinctrl, thermal->otp_state);
+   }
+
for (i = 0; i < thermal->chip->chn_num; i++) {
error = rockchip_thermal_register_sensor(pdev, thermal,
>sensors[i],
@@ -1337,8 +1366,8 @@ static int __maybe_unused

[PATCH RT 0/6] Linux 4.19.37-rt20-rc1

2019-05-24 Thread Steven Rostedt



Dear RT Folks,

This is the RT stable review cycle of patch 4.19.37-rt20-rc1.

Please scream at me if I messed something up. Please test the patches too.

The -rc release will be uploaded to kernel.org and will be deleted when
the final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 5/28/2019.

Enjoy,

-- Steve


To build 4.19.37-rt20-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.19.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.19.37.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.19/patch-4.19.37-rt20-rc1.patch.xz

You can also build from 4.19.37-rt19 by applying the incremental patch:

http://www.kernel.org/pub/linux/kernel/projects/rt/4.19/incr/patch-4.19.37-rt19-rt20-rc1.patch.xz


Changes from 4.19.37-rt19:

---


Corey Minyard (1):
  sched/completion: Fix a lockup in wait_for_completion()

Julien Grall (1):
  tty/sysrq: Convert show_lock to raw_spinlock_t

Sebastian Andrzej Siewior (3):
  powerpc/pseries/iommu: Use a locallock instead local_irq_save()
  powerpc: reshuffle TIF bits
  drm/i915: Don't disable interrupts independently of the lock

Steven Rostedt (VMware) (1):
  Linux 4.19.37-rt20-rc1


 arch/powerpc/include/asm/thread_info.h | 11 +++
 arch/powerpc/kernel/entry_32.S | 12 +++-
 arch/powerpc/kernel/entry_64.S | 12 +++-
 arch/powerpc/platforms/pseries/iommu.c | 16 ++--
 drivers/gpu/drm/i915/i915_request.c|  8 ++--
 drivers/tty/sysrq.c|  6 +++---
 kernel/sched/completion.c  |  2 +-
 localversion-rt|  2 +-
 8 files changed, 38 insertions(+), 31 deletions(-)

[PATCH RT 6/6] Linux 4.19.37-rt20-rc1

2019-05-24 Thread Steven Rostedt

4.19.37-rt20-rc1 stable review patch.
If anyone has any objections, please let me know.

--

From: "Steven Rostedt (VMware)" 

---
 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index 483ad771f201..53614196cb36 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt19
+-rt20-rc1
-- 
2.20.1

[PATCH RT 5/6] sched/completion: Fix a lockup in wait_for_completion()

2019-05-24 Thread Steven Rostedt

4.19.37-rt20-rc1 stable review patch.
If anyone has any objections, please let me know.

--

From: Corey Minyard 

Consider following race:

  T0T1   T2
  wait_for_completion()
   do_wait_for_common()
__prepare_to_swait()
 schedule()
complete()
 x->done++ (0 -> 1)
 raw_spin_lock_irqsave()
 swake_up_locked()   wait_for_completion()
  wake_up_process(T0)
  list_del_init()
 raw_spin_unlock_irqrestore()
  
raw_spin_lock_irq(>wait.lock)
  raw_spin_lock_irq(>wait.lock)x->done != UINT_MAX, 1 -> 0
  
raw_spin_unlock_irq(>wait.lock)
  return 1
   while (!x->done && timeout),
   continue loop, not enqueued
   on >wait

Basically, the problem is that the original wait queues used in
completions did not remove the item from the queue in the wakeup
function, but swake_up_locked() does.

Fix it by adding the thread to the wait queue inside the do loop.
The design of swait detects if it is already in the list and doesn't
do the list add again.

Cc: stable...@vger.kernel.org
Fixes: a04ff6b4ec4ee7e ("completion: Use simple wait queues")
Signed-off-by: Corey Minyard 
Acked-by: Steven Rostedt (VMware) 
Signed-off-by: Steven Rostedt (VMware) 
[bigeasy: shorten commit message ]
Signed-off-by: Sebastian Andrzej Siewior 
---
 kernel/sched/completion.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 755a58084978..49c14137988e 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -72,12 +72,12 @@ do_wait_for_common(struct completion *x,
if (!x->done) {
DECLARE_SWAITQUEUE(wait);
 
-   __prepare_to_swait(>wait, );
do {
if (signal_pending_state(state, current)) {
timeout = -ERESTARTSYS;
break;
}
+   __prepare_to_swait(>wait, );
__set_current_state(state);
raw_spin_unlock_irq(>wait.lock);
timeout = action(timeout);
-- 
2.20.1

[PATCH RT 2/6] powerpc: reshuffle TIF bits

2019-05-24 Thread Steven Rostedt

4.19.37-rt20-rc1 stable review patch.
If anyone has any objections, please let me know.

--

From: Sebastian Andrzej Siewior 

Powerpc32/64 does not compile because TIF_SYSCALL_TRACE's bit is higher
than 15 and the assembly instructions don't expect that.

Move TIF_RESTOREALL, TIF_NOERROR to the higher bits and keep
TIF_NEED_RESCHED_LAZY in the lower range. As a result one split load is
needed and otherwise we can use immediates.

Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Steven Rostedt (VMware) 
---
 arch/powerpc/include/asm/thread_info.h | 11 +++
 arch/powerpc/kernel/entry_32.S | 12 +++-
 arch/powerpc/kernel/entry_64.S | 12 +++-
 3 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index ce316076bc52..64c3d1a720e2 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -83,18 +83,18 @@ extern int arch_dup_task_struct(struct task_struct *dst, 
struct task_struct *src
 #define TIF_SIGPENDING 1   /* signal pending */
 #define TIF_NEED_RESCHED   2   /* rescheduling necessary */
 #define TIF_FSCHECK3   /* Check FS is USER_DS on return */
-#define TIF_NEED_RESCHED_LAZY  4   /* lazy rescheduling necessary */
 #define TIF_RESTORE_TM 5   /* need to restore TM FP/VEC/VSX */
 #define TIF_PATCH_PENDING  6   /* pending live patching update */
 #define TIF_SYSCALL_AUDIT  7   /* syscall auditing active */
 #define TIF_SINGLESTEP 8   /* singlestepping active */
 #define TIF_NOHZ   9   /* in adaptive nohz mode */
 #define TIF_SECCOMP10  /* secure computing */
-#define TIF_RESTOREALL 11  /* Restore all regs (implies NOERROR) */
-#define TIF_NOERROR12  /* Force successful syscall return */
+
+#define TIF_NEED_RESCHED_LAZY  11  /* lazy rescheduling necessary */
+#define TIF_SYSCALL_TRACEPOINT 12  /* syscall tracepoint instrumentation */
+
 #define TIF_NOTIFY_RESUME  13  /* callback before returning to user */
 #define TIF_UPROBE 14  /* breakpointed or single-stepping */
-#define TIF_SYSCALL_TRACEPOINT 15  /* syscall tracepoint instrumentation */
 #define TIF_EMULATE_STACK_STORE16  /* Is an instruction emulation
for stack store? */
 #define TIF_MEMDIE 17  /* is terminating due to OOM killer */
@@ -103,6 +103,9 @@ extern int arch_dup_task_struct(struct task_struct *dst, 
struct task_struct *src
 #endif
 #define TIF_POLLING_NRFLAG 19  /* true if poll_idle() is polling 
TIF_NEED_RESCHED */
 #define TIF_32BIT  20  /* 32 bit binary */
+#define TIF_RESTOREALL 21  /* Restore all regs (implies NOERROR) */
+#define TIF_NOERROR22  /* Force successful syscall return */
+
 
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE (1<

[PATCH RT 1/6] powerpc/pseries/iommu: Use a locallock instead local_irq_save()

2019-05-24 Thread Steven Rostedt

4.19.37-rt20-rc1 stable review patch.
If anyone has any objections, please let me know.

--

From: Sebastian Andrzej Siewior 

The locallock protects the per-CPU variable tce_page. The function
attempts to allocate memory while tce_page is protected (by disabling
interrupts).

Use local_irq_save() instead of local_irq_disable().

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Steven Rostedt (VMware) 
---
 arch/powerpc/platforms/pseries/iommu.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 06f02960b439..d80d919c78d3 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -212,6 +213,7 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, 
long tcenum,
 }
 
 static DEFINE_PER_CPU(__be64 *, tce_page);
+static DEFINE_LOCAL_IRQ_LOCK(tcp_page_lock);
 
 static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 long npages, unsigned long uaddr,
@@ -232,7 +234,8 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
   direction, attrs);
}
 
-   local_irq_save(flags);  /* to protect tcep and the page behind it */
+   /* to protect tcep and the page behind it */
+   local_lock_irqsave(tcp_page_lock, flags);
 
tcep = __this_cpu_read(tce_page);
 
@@ -243,7 +246,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
/* If allocation fails, fall back to the loop implementation */
if (!tcep) {
-   local_irq_restore(flags);
+   local_unlock_irqrestore(tcp_page_lock, flags);
return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
direction, attrs);
}
@@ -277,7 +280,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
tcenum += limit;
} while (npages > 0 && !rc);
 
-   local_irq_restore(flags);
+   local_unlock_irqrestore(tcp_page_lock, flags);
 
if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
ret = (int)rc;
@@ -435,13 +438,14 @@ static int tce_setrange_multi_pSeriesLP(unsigned long 
start_pfn,
u64 rc = 0;
long l, limit;
 
-   local_irq_disable();/* to protect tcep and the page behind it */
+   /* to protect tcep and the page behind it */
+   local_lock_irq(tcp_page_lock);
tcep = __this_cpu_read(tce_page);
 
if (!tcep) {
tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
if (!tcep) {
-   local_irq_enable();
+   local_unlock_irq(tcp_page_lock);
return -ENOMEM;
}
__this_cpu_write(tce_page, tcep);
@@ -487,7 +491,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long 
start_pfn,
 
/* error cleanup: caller will clear whole range */
 
-   local_irq_enable();
+   local_unlock_irq(tcp_page_lock);
return rc;
 }
 
-- 
2.20.1

[PATCH RT 4/6] drm/i915: Dont disable interrupts independently of the lock

2019-05-24 Thread Steven Rostedt

4.19.37-rt20-rc1 stable review patch.
If anyone has any objections, please let me know.

--

From: Sebastian Andrzej Siewior 

The locks (timeline->lock and rq->lock) need to be taken with disabled
interrupts. This is done in __retire_engine_request() by disabling the
interrupts independently of the locks itself.
While local_irq_disable()+spin_lock() equals spin_lock_irq() on vanilla
it does not on RT. Also, it is not obvious if there is a special reason
to why the interrupts are disabled independently of the lock.

Enable/disable interrupts as part of the locking instruction.

Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Steven Rostedt (VMware) 
---
 drivers/gpu/drm/i915/i915_request.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 5c2c93cbab12..7124510b9131 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -356,9 +356,7 @@ static void __retire_engine_request(struct intel_engine_cs 
*engine,
 
GEM_BUG_ON(!i915_request_completed(rq));
 
-   local_irq_disable();
-
-   spin_lock(>timeline.lock);
+   spin_lock_irq(>timeline.lock);
GEM_BUG_ON(!list_is_first(>link, >timeline.requests));
list_del_init(>link);
spin_unlock(>timeline.lock);
@@ -372,9 +370,7 @@ static void __retire_engine_request(struct intel_engine_cs 
*engine,
GEM_BUG_ON(!atomic_read(>i915->gt_pm.rps.num_waiters));
atomic_dec(>i915->gt_pm.rps.num_waiters);
}
-   spin_unlock(>lock);
-
-   local_irq_enable();
+   spin_unlock_irq(>lock);
 
/*
 * The backing object for the context is done after switching to the
-- 
2.20.1

[PATCH RT 3/6] tty/sysrq: Convert show_lock to raw_spinlock_t

2019-05-24 Thread Steven Rostedt

4.19.37-rt20-rc1 stable review patch.
If anyone has any objections, please let me know.

--

From: Julien Grall 

Systems which don't provide arch_trigger_cpumask_backtrace() will
invoke showacpu() from a smp_call_function() function which is invoked
with disabled interrupts even on -RT systems.

The function acquires the show_lock lock which only purpose is to
ensure that the CPUs don't print simultaneously. Otherwise the
output would clash and it would be hard to tell the output from CPUx
apart from CPUy.

On -RT the spin_lock() can not be acquired from this context. A
raw_spin_lock() is required. It will introduce the system's latency
by performing the sysrq request and other CPUs will block on the lock
until the request is done. This is okay because the user asked for a
backtrace of all active CPUs and under "normal circumstances in
production" this path should not be triggered.

Signed-off-by: Julien Grall 
Signed-off-by: Steven Rostedt (VMware) 
[bige...@linuxtronix.de: commit description]
Signed-off-by: Sebastian Andrzej Siewior 
Acked-by: Sebastian Andrzej Siewior 
Signed-off-by: Greg Kroah-Hartman 
Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/tty/sysrq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 06ed20dd01ba..627517ad55bf 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -215,7 +215,7 @@ static struct sysrq_key_op sysrq_showlocks_op = {
 #endif
 
 #ifdef CONFIG_SMP
-static DEFINE_SPINLOCK(show_lock);
+static DEFINE_RAW_SPINLOCK(show_lock);
 
 static void showacpu(void *dummy)
 {
@@ -225,10 +225,10 @@ static void showacpu(void *dummy)
if (idle_cpu(smp_processor_id()))
return;
 
-   spin_lock_irqsave(_lock, flags);
+   raw_spin_lock_irqsave(_lock, flags);
pr_info("CPU%d:\n", smp_processor_id());
show_stack(NULL, NULL);
-   spin_unlock_irqrestore(_lock, flags);
+   raw_spin_unlock_irqrestore(_lock, flags);
 }
 
 static void sysrq_showregs_othercpus(struct work_struct *dummy)
-- 
2.20.1

[v5 PATCH 1/2] mm: vmscan: remove double slab pressure by inc'ing sc->nr_scanned

2019-05-24 Thread Yang Shi

The commit 9092c71bb724 ("mm: use sc->priority for slab shrink targets")
has broken up the relationship between sc->nr_scanned and slab pressure.
The sc->nr_scanned can't double slab pressure anymore.  So, it sounds no
sense to still keep sc->nr_scanned inc'ed.  Actually, it would prevent
from adding pressure on slab shrink since excessive sc->nr_scanned would
prevent from scan->priority raise.

The bonnie test doesn't show this would change the behavior of
slab shrinkers.

w/  w/o
  /sec%CP  /sec  %CP
Sequential delete:  3960.694.63997.6 96.2
Random delete:  2518  63.82561.6 64.6

The slight increase of "/sec" without the patch would be caused by the
slight increase of CPU usage.

Cc: Josef Bacik 
Cc: Michal Hocko 
Acked-by: Johannes Weiner 
Signed-off-by: Yang Shi 
---
v4: Added Johannes's ack

 mm/vmscan.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7acd0af..b65bc50 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1137,11 +1137,6 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
if (!sc->may_unmap && page_mapped(page))
goto keep_locked;
 
-   /* Double the slab pressure for mapped and swapcache pages */
-   if ((page_mapped(page) || PageSwapCache(page)) &&
-   !(PageAnon(page) && !PageSwapBacked(page)))
-   sc->nr_scanned++;
-
may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
(PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
 
-- 
1.8.3.1

[v5 PATCH 2/2] mm: vmscan: correct some vmscan counters for THP swapout

2019-05-24 Thread Yang Shi

Since commit bd4c82c22c36 ("mm, THP, swap: delay splitting THP after
swapped out"), THP can be swapped out in a whole.  But, nr_reclaimed
and some other vm counters still get inc'ed by one even though a whole
THP (512 pages) gets swapped out.

This doesn't make too much sense to memory reclaim.  For example, direct
reclaim may just need reclaim SWAP_CLUSTER_MAX pages, reclaiming one THP
could fulfill it.  But, if nr_reclaimed is not increased correctly,
direct reclaim may just waste time to reclaim more pages,
SWAP_CLUSTER_MAX * 512 pages in worst case.

And, it may cause pgsteal_{kswapd|direct} is greater than
pgscan_{kswapd|direct}, like the below:

pgsteal_kswapd 122933
pgsteal_direct 26600225
pgscan_kswapd 174153
pgscan_direct 14678312

nr_reclaimed and nr_scanned must be fixed in parallel otherwise it would
break some page reclaim logic, e.g.

vmpressure: this looks at the scanned/reclaimed ratio so it won't
change semantics as long as scanned & reclaimed are fixed in parallel.

compaction/reclaim: compaction wants a certain number of physical pages
freed up before going back to compacting.

kswapd priority raising: kswapd raises priority if we scan fewer pages
than the reclaim target (which itself is obviously expressed in order-0
pages). As a result, kswapd can falsely raise its aggressiveness even
when it's making great progress.

Other than nr_scanned and nr_reclaimed, some other counters, e.g.
pgactivate, nr_skipped, nr_ref_keep and nr_unmap_fail need to be fixed
too since they are user visible via cgroup, /proc/vmstat or trace
points, otherwise they would be underreported.

When isolating pages from LRUs, nr_taken has been accounted in base
page, but nr_scanned and nr_skipped are still accounted in THP.  It
doesn't make too much sense too since this may cause trace point
underreport the numbers as well.

So accounting those counters in base page instead of accounting THP as
one page.

nr_dirty, nr_unqueued_dirty, nr_congested and nr_writeback are used by
file cache, so they are not impacted by THP swap.

This change may result in lower steal/scan ratio in some cases since
THP may get split during page reclaim, then a part of tail pages get
reclaimed instead of the whole 512 pages, but nr_scanned is accounted
by 512, particularly for direct reclaim.  But, this should be not a
significant issue.

Cc: "Huang, Ying" 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: "Kirill A . Shutemov" 
Cc: Hugh Dickins 
Cc: Shakeel Butt 
Signed-off-by: Yang Shi 
---
v5: Fixed sc->nr_scanned double accounting per Huang Ying
Added some comments to address the concern about premature OOM per Hillf 
Danton 
v4: Fixed the comments from Johannes and Huang Ying
v3: Removed Shakeel's Reviewed-by since the patch has been changed significantly
Switched back to use compound_order per Matthew
Fixed more counters per Johannes
v2: Added Shakeel's Reviewed-by
Use hpage_nr_pages instead of compound_order per Huang Ying and William 
Kucharski

 mm/vmscan.c | 42 +++---
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b65bc50..f4f4d57 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1118,6 +1118,7 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
int may_enter_fs;
enum page_references references = PAGEREF_RECLAIM_CLEAN;
bool dirty, writeback;
+   unsigned int nr_pages;
 
cond_resched();
 
@@ -1129,6 +1130,13 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
 
VM_BUG_ON_PAGE(PageActive(page), page);
 
+   nr_pages = 1 << compound_order(page);
+
+   /*
+* Accounted one page for THP for now.  If THP gets swapped
+* out in a whole, will account all tail pages later to
+* avoid accounting tail pages twice.
+*/
sc->nr_scanned++;
 
if (unlikely(!page_evictable(page)))
@@ -1250,7 +1258,7 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
case PAGEREF_ACTIVATE:
goto activate_locked;
case PAGEREF_KEEP:
-   stat->nr_ref_keep++;
+   stat->nr_ref_keep += nr_pages;
goto keep_locked;
case PAGEREF_RECLAIM:
case PAGEREF_RECLAIM_CLEAN:
@@ -1292,7 +1300,9 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
 #endif
if (!add_to_swap(page))
goto activate_locked;
-   }
+   } else
+   /* Account tail pages for THP */
+   sc->nr_scanned += nr_pages - 1;
 
may_enter_fs =

Re: `SATA_AHCI` not selected by default with `make olddefconfig`

2019-05-24 Thread Randy Dunlap

On 1/10/19 6:43 AM, Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> There were some PCI Kconfig changes, which seem to cause problems
> with components depending on PCI. With the attached minimal config,
> running `make olddefconfig` on Linux 4.20 and older caused
> `SATA_AHCI` to be selected. But, with Linux 5.0-rc1 it is not
> selected.
> 
> 
> Kind regards,
> 
> Paul

[adding linux-pci for posterity]

Hi Paul,

I guess this is called progress.  Anyway, it's good that you noticed and
reported it.

In 4.20 (and earlier), PCI defaults to y.
As you hint, in 5.x, PCI does not default to y, so Kconfig symbols that
depend on PCI will not be set/enabled by "make olddefconfig", including
SATA_AHCI even though SATA_AHCI is set in your old .config file.

-- 
~Randy

Re: [PATCH net] bonding/802.3ad: fix slave link initialization transition states

2019-05-24 Thread Jarod Wilson


On 5/24/19 6:38 PM, Mahesh Bandewar (महेश बंडेवार) wrote:

On Fri, May 24, 2019 at 2:17 PM Jay Vosburgh  wrote:


Jarod Wilson  wrote:


Once in a while, with just the right timing, 802.3ad slaves will fail to
properly initialize, winding up in a weird state, with a partner system
mac address of 00:00:00:00:00:00. This started happening after a fix to
properly track link_failure_count tracking, where an 802.3ad slave that
reported itself as link up in the miimon code, but wasn't able to get a
valid speed/duplex, started getting set to BOND_LINK_FAIL instead of
BOND_LINK_DOWN. That was the proper thing to do for the general "my link
went down" case, but has created a link initialization race that can put
the interface in this odd state.



Are there any notification consequences because of this change?


No, there shouldn't be, it just makes initial link-up cleaner, 
everything during runtime once the link is initialized should remain the 
same.


--
Jarod Wilson
ja...@redhat.com

[PATCH 09/16 v3] ftrace: Allow ftrace startup flags exist without dynamic ftrace

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Some of the flags for ftrace_startup() may be exposed even when
CONFIG_DYNAMIC_FTRACE is not configured in. This is fine as the difference
between dynamic ftrace and static ftrace is done within the internals of
ftrace itself. No need to have use cases fail to compile because dynamic
ftrace is disabled.

This change is needed to move some of the logic of what is passed to
ftrace_startup() out of the parameters of ftrace_startup().

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 766c565ba243..d0307c9b866e 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -286,6 +286,15 @@ static inline void stack_tracer_disable(void) { }
 static inline void stack_tracer_enable(void) { }
 #endif
 
+enum {
+   FTRACE_UPDATE_CALLS = (1 << 0),
+   FTRACE_DISABLE_CALLS= (1 << 1),
+   FTRACE_UPDATE_TRACE_FUNC= (1 << 2),
+   FTRACE_START_FUNC_RET   = (1 << 3),
+   FTRACE_STOP_FUNC_RET= (1 << 4),
+   FTRACE_MAY_SLEEP= (1 << 5),
+};
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 int ftrace_arch_code_modify_prepare(void);
@@ -373,15 +382,6 @@ void ftrace_set_global_notrace(unsigned char *buf, int 
len, int reset);
 void ftrace_free_filter(struct ftrace_ops *ops);
 void ftrace_ops_set_global_filter(struct ftrace_ops *ops);
 
-enum {
-   FTRACE_UPDATE_CALLS = (1 << 0),
-   FTRACE_DISABLE_CALLS= (1 << 1),
-   FTRACE_UPDATE_TRACE_FUNC= (1 << 2),
-   FTRACE_START_FUNC_RET   = (1 << 3),
-   FTRACE_STOP_FUNC_RET= (1 << 4),
-   FTRACE_MAY_SLEEP= (1 << 5),
-};
-
 /*
  * The FTRACE_UPDATE_* enum is used to pass information back
  * from the ftrace_update_record() and ftrace_test_record()
-- 
2.20.1

[PATCH 03/16 v3] fgraph: Have the current->ret_stack go down not up

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Change the direction of the current->ret_stack shadown stack to move the
same as most normal arch stacks do.

Suggested-by: Peter Zijlstra 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/fgraph.c | 39 ---
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 63e701771c20..b0f8ae269351 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -27,8 +27,9 @@
 #define FGRAPH_RET_INDEX (FGRAPH_RET_SIZE / sizeof(long))
 #define SHADOW_STACK_SIZE (PAGE_SIZE)
 #define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
-/* Leave on a buffer at the end */
-#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
+#define SHADOW_STACK_MAX_INDEX SHADOW_STACK_INDEX
+/* Leave on a little buffer at the bottom */
+#define SHADOW_STACK_MIN_INDEX FGRAPH_RET_INDEX
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack 
*)(&(t)->ret_stack[index]))
 #define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
@@ -89,16 +90,16 @@ ftrace_push_return_trace(unsigned long ret, unsigned long 
func,
smp_rmb();
 
/* The return trace stack is full */
-   if (current->curr_ret_stack >= SHADOW_STACK_MAX_INDEX) {
+   if (current->curr_ret_stack <= SHADOW_STACK_MIN_INDEX) {
atomic_inc(>trace_overrun);
return -EBUSY;
}
 
calltime = trace_clock_local();
 
-   index = current->curr_ret_stack;
-   RET_STACK_INC(current->curr_ret_stack);
-   ret_stack = RET_STACK(current, index);
+   RET_STACK_DEC(current->curr_ret_stack);
+   ret_stack = RET_STACK(current, current->curr_ret_stack);
+   /* Make sure interrupts see the current value of curr_ret_stack */
barrier();
ret_stack->ret = ret;
ret_stack->func = func;
@@ -129,7 +130,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
 
return 0;
  out_ret:
-   RET_STACK_DEC(current->curr_ret_stack);
+   RET_STACK_INC(current->curr_ret_stack);
  out:
current->curr_ret_depth--;
return -EBUSY;
@@ -144,9 +145,8 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
int index;
 
index = current->curr_ret_stack;
-   RET_STACK_DEC(index);
 
-   if (unlikely(index < 0 || index > SHADOW_STACK_MAX_INDEX)) {
+   if (unlikely(index < 0 || index >= SHADOW_STACK_MAX_INDEX)) {
ftrace_graph_stop();
WARN_ON(1);
/* Might as well panic, otherwise we have no where to go */
@@ -239,7 +239,7 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
 * curr_ret_stack is after that.
 */
barrier();
-   RET_STACK_DEC(current->curr_ret_stack);
+   RET_STACK_INC(current->curr_ret_stack);
 
if (unlikely(!ret)) {
ftrace_graph_stop();
@@ -302,9 +302,9 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
if (ret != (unsigned long)return_to_handler)
return ret;
 
-   RET_STACK_DEC(index);
+   RET_STACK_INC(index);
 
-   for (i = index; i >= 0; RET_STACK_DEC(i)) {
+   for (i = index; i < SHADOW_STACK_MAX_INDEX; RET_STACK_INC(i)) {
ret_stack = RET_STACK(task, i);
if (ret_stack->retp == retp)
return ret_stack->ret;
@@ -322,13 +322,13 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
return ret;
 
task_idx = task->curr_ret_stack;
-   RET_STACK_DEC(task_idx);
+   RET_STACK_INC(task_idx);
 
-   if (!task->ret_stack || task_idx < *idx)
+   if (!task->ret_stack || task_idx > *idx)
return ret;
 
task_idx -= *idx;
-   RET_STACK_INC(*idx);
+   RET_STACK_DEC(*idx);
 
return RET_STACK(task, task_idx);
 }
@@ -391,7 +391,7 @@ static int alloc_retstack_tasklist(unsigned long 
**ret_stack_list)
if (t->ret_stack == NULL) {
atomic_set(>tracing_graph_pause, 0);
atomic_set(>trace_overrun, 0);
-   t->curr_ret_stack = 0;
+   t->curr_ret_stack = SHADOW_STACK_MAX_INDEX;
t->curr_ret_depth = -1;
/* Make sure the tasks see the 0 first: */
smp_wmb();
@@ -436,10 +436,11 @@ ftrace_graph_probe_sched_switch(void *ignore, bool 
preempt,
 */
timestamp -= next->ftrace_timestamp;
 
-   for (index = next->curr_ret_stack - FGRAPH_RET_INDEX; index >= 0; ) {
+   for (index = next->curr_ret_stack + FGRAPH_RET_INDEX;
+index < SHADOW_STACK_MAX_INDEX; ) {
ret_stack = RET_STACK(next, index);
ret_stack->calltime += timestamp;
-   index -= FGRAPH_RET_INDEX;
+   index += FGRAPH_RET_INDEX;
}
 }

[PATCH 04/16 v3] function_graph: Add an array structure that will allow multiple callbacks

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Add an array structure that will eventually allow the function graph tracer
to have up to 16 simultaneous callbacks attached. It's an array of 16
fgraph_ops pointers, that is assigned when one is registered. On entry of a
function the entry of the first item in the array is called, and if it
returns zero, then the callback returns non zero if it wants the return
callback to be called on exit of the function.

The array will simplify the process of having more than one callback
attached to the same function, as its index into the array can be stored on
the shadow stack. We need to only save the index, because this will allow
the fgraph_ops to be freed before the function returns (which may happen if
the function call schedule for a long time).

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/fgraph.c | 115 ++
 1 file changed, 82 insertions(+), 33 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index b0f8ae269351..93b0e243a742 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -38,9 +38,28 @@
 static bool kill_ftrace_graph;
 int ftrace_graph_active;
 
+static int fgraph_array_cnt;
+#define FGRAPH_ARRAY_SIZE  16
+
+static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
+
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+{
+   return 0;
+}
+
+static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace)
+{
+}
+
+static struct fgraph_ops fgraph_stub = {
+   .entryfunc = ftrace_graph_entry_stub,
+   .retfunc = ftrace_graph_ret_stub,
+};
+
 /**
  * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
  *
@@ -125,7 +144,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
goto out;
 
/* Only trace if the calling function expects to */
-   if (!ftrace_graph_entry())
+   if (!fgraph_array[0]->entryfunc())
goto out_ret;
 
return 0;
@@ -232,7 +251,7 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
 
ftrace_pop_return_trace(, , frame_pointer);
trace.rettime = trace_clock_local();
-   ftrace_graph_return();
+   fgraph_array[0]->retfunc();
/*
 * The ftrace_graph_return() may still access the current
 * ret_stack structure, we need to make sure the update of
@@ -352,11 +371,6 @@ void ftrace_graph_sleep_time_control(bool enable)
fgraph_sleep_time = enable;
 }
 
-int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
-{
-   return 0;
-}
-
 /* The callbacks that hook a function */
 trace_func_graph_ret_t ftrace_graph_return =
(trace_func_graph_ret_t)ftrace_stub;
@@ -590,37 +604,55 @@ static int start_graph_tracing(void)
 int register_ftrace_graph(struct fgraph_ops *gops)
 {
int ret = 0;
+   int i;
 
mutex_lock(_lock);
 
-   /* we currently allow only one tracer registered at a time */
-   if (ftrace_graph_active) {
+   if (!fgraph_array[0]) {
+   /* The array must always have real data on it */
+   for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+   fgraph_array[i] = _stub;
+   }
+   }
+
+   /* Look for an available spot */
+   for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+   if (fgraph_array[i] == _stub)
+   break;
+   }
+   if (i >= FGRAPH_ARRAY_SIZE) {
ret = -EBUSY;
goto out;
}
 
-   register_pm_notifier(_suspend_notifier);
+   fgraph_array[i] = gops;
+   if (i + 1 > fgraph_array_cnt)
+   fgraph_array_cnt = i + 1;
 
ftrace_graph_active++;
-   ret = start_graph_tracing();
-   if (ret) {
-   ftrace_graph_active--;
-   goto out;
-   }
 
-   ftrace_graph_return = gops->retfunc;
+   if (ftrace_graph_active == 1) {
+   register_pm_notifier(_suspend_notifier);
+   ret = start_graph_tracing();
+   if (ret) {
+   ftrace_graph_active--;
+   goto out;
+   }
 
-   /*
-* Update the indirect function to the entryfunc, and the
-* function that gets called to the entry_test first. Then
-* call the update fgraph entry function to determine if
-* the entryfunc should be called directly or not.
-*/
-   __ftrace_graph_entry = gops->entryfunc;
-   ftrace_graph_entry = ftrace_graph_entry_test;
-   update_function_graph_func();
+   ftrace_graph_return = gops->retfunc;
+
+   /*
+* Update the indirect function to the entryfunc, and the
+* function that gets called to the entry_test first. Then
+* call the update fgraph

[PATCH 05/16 v3] function_graph: Allow multiple users to attach to function graph

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Allow for multiple users to attach to function graph tracer at the same
time. Only 16 simultaneous users can attach to the tracer. This is because
there's an array that stores the pointers to the attached fgraph_ops. When a
a function being traced is entered, each of the ftrace_ops entryfunc is
called and if it returns non zero, its index into the array will be added to
the shadow stack.

On exit of the function being traced, the shadow stack will contain the
indexes of the ftrace_ops on the array that want their retfunc to be called.

Because a function may sleep for a long time (if a task sleeps itself), the
return of the function may be literally days later. If the ftrace_ops is
removed, its place on the array is replaced with a ftrace_ops that contains
the stub functions and that will be called when the function finally
returns.

If another ftrace_ops is added that happens to get the same index into the
array, its return function may be called. But that's actually the way things
current work with the old function graph tracer. If one tracer is removed
and another is added, the new one will get the return calls of the function
traced by the previous one, thus this is not a regression. This can be fixed
by adding a counter to each time the array item is updated and save that on
the shadow stack as well, such that it wont be called if the index saved
does not match the index on the array.

Note, being able to filter functions when both are called is not completely
handled yet, but that shouldn't be too hard to manage.

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h |   2 +
 kernel/trace/fgraph.c  | 334 ++---
 2 files changed, 283 insertions(+), 53 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 8a8cb3c401b2..6fe69e0dc415 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -787,6 +787,8 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int 
idx);
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
unsigned long ret, unsigned long *retp);
 
+int function_graph_enter(unsigned long ret, unsigned long func,
+unsigned long frame_pointer, unsigned long *retp);
 /*
  * Sometimes we don't want to trace a function with the function
  * graph tracer but we want them to keep traced by the usual function
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 93b0e243a742..a01d418791dc 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -25,24 +25,144 @@
 
 #define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
 #define FGRAPH_RET_INDEX (FGRAPH_RET_SIZE / sizeof(long))
+
+/*
+ * On entry to a function (via function_graph_enter()), a new ftrace_ret_stack
+ * is allocated on the task's ret_stack, then each fgraph_ops on the
+ * fgraph_array[]'s entryfunc is called and if that returns non-zero, the
+ * index into the fgraph_array[] for that fgraph_ops is added to the ret_stack.
+ * As the associated ftrace_ret_stack saved for those fgraph_ops needs to
+ * be found, the index to it is also added to the ret_stack along with the
+ * index of the fgraph_array[] to each fgraph_ops that needs their retfunc
+ * called.
+ *
+ * The top of the ret_stack (when not empty) will always have a reference
+ * to the last ftrace_ret_stack saved. All references to the
+ * ftrace_ret_stack has the format of:
+ *
+ * bits:  0 - 13   Index in words from the previous ftrace_ret_stack
+ * bits: 14 - 15   Type of storage
+ *   0 - reserved
+ *   1 - fgraph_array index
+ * For fgraph_array_index:
+ *  bits: 16 - 23  The fgraph_ops fgraph_array index
+ *
+ * That is, at the end of function_graph_enter, if the first and forth
+ * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
+ * on the return of the function being traced, this is what will be on the
+ * task's shadow ret_stack: (the stack grows upward)
+ *
+
+ * |  |
+ * | (X) | (N)| ( N words away from previous ret_stack)
+ * +--+
+ * | struct ftrace_ret_stack  |
+ * |   (stores the saved ret pointer) |
+ * +--+
+ * | (0 << FGRAPH_ARRAY_SHIFT)|(1)| ( 0 for index of first fgraph_ops)
+ * +--+
+ * | (3 << FGRAPH_ARRAY_SHIFT)|(2)| ( 3 for index of fourth fgraph_ops)
+ * |  | <- task->curr_ret_stack (points to 
data)
+ * +--+
+ * |  |
+ *
+ * If a backtrace is required, and the real return pointer needs to be
+ * fetched, then it looks at the task's curr_ret_stack index, if it
+ * is greater less than the top, it would subtact one, and then mask the value
+ * on the ret_stack by FGRAPH_RET_INDEX_MASK and subtract

[PATCH 08/16 v3] ftrace: Allow function_graph tracer to be enabled in instances

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Now that function graph tracing can handle more than one user, allow it to
be enabled in the ftrace instances. Note, the filtering of the functions is
still joined by the top level set_ftrace_filter and friends, as well as the
graph and nograph files.

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h   |  1 +
 kernel/trace/ftrace.c|  1 +
 kernel/trace/trace.h | 12 +
 kernel/trace/trace_functions.c   |  7 +++
 kernel/trace/trace_functions_graph.c | 65 +---
 kernel/trace/trace_selftest.c|  2 +-
 6 files changed, 62 insertions(+), 26 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 906f7c25faa6..766c565ba243 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -752,6 +752,7 @@ extern int ftrace_graph_entry_stub(struct ftrace_graph_ent 
*trace, struct fgraph
 struct fgraph_ops {
trace_func_graph_ent_t  entryfunc;
trace_func_graph_ret_t  retfunc;
+   void*private;
 };
 
 /*
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 35c79f3ab2f5..6719a6cae67b 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6226,6 +6226,7 @@ __init void ftrace_init_global_array_ops(struct 
trace_array *tr)
tr->ops = _ops;
tr->ops->private = tr;
ftrace_init_trace_array(tr);
+   init_array_fgraph_ops(tr);
 }
 
 void ftrace_init_array_ops(struct trace_array *tr, ftrace_func_t func)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 982f5fa8da09..40b0471194bf 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -313,6 +313,9 @@ struct trace_array {
 #ifdef CONFIG_FUNCTION_TRACER
struct ftrace_ops   *ops;
struct trace_pid_list   __rcu *function_pids;
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+   struct fgraph_ops   *gops;
+#endif
 #ifdef CONFIG_DYNAMIC_FTRACE
/* All of these are protected by the ftrace_lock */
struct list_headfunc_probes;
@@ -930,6 +933,9 @@ extern int __trace_graph_entry(struct trace_array *tr,
 extern void __trace_graph_return(struct trace_array *tr,
 struct ftrace_graph_ret *trace,
 unsigned long flags, int pc);
+extern void init_array_fgraph_ops(struct trace_array *tr);
+extern int allocate_fgraph_ops(struct trace_array *tr);
+extern void free_fgraph_ops(struct trace_array *tr);
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash *ftrace_graph_hash;
@@ -1023,6 +1029,12 @@ print_graph_function_flags(struct trace_iterator *iter, 
u32 flags)
 {
return TRACE_TYPE_UNHANDLED;
 }
+static inline void init_array_fgraph_ops(struct trace_array *tr) { }
+static inline int allocate_fgraph_ops(struct trace_array *tr)
+{
+   return 0;
+}
+static inline void free_fgraph_ops(struct trace_array *tr) { }
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
 extern struct list_head ftrace_pids;
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index b611cd36e22d..9b45ede6ea89 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -68,6 +68,12 @@ int ftrace_create_function_files(struct trace_array *tr,
if (ret)
return ret;
 
+   ret = allocate_fgraph_ops(tr);
+   if (ret) {
+   kfree(tr->ops);
+   return ret;
+   }
+
ftrace_create_filter_files(tr->ops, parent);
 
return 0;
@@ -78,6 +84,7 @@ void ftrace_destroy_function_files(struct trace_array *tr)
ftrace_destroy_filter_files(tr->ops);
kfree(tr->ops);
tr->ops = NULL;
+   free_fgraph_ops(tr);
 }
 
 static int function_trace_init(struct trace_array *tr)
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 2ae21788fcaf..064811ba846c 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -77,8 +77,6 @@ static struct tracer_flags tracer_flags = {
.opts = trace_opts
 };
 
-static struct trace_array *graph_array;
-
 /*
  * DURATION column is being also used to display IRQ signs,
  * following values are used by print_graph_irq and others
@@ -127,7 +125,7 @@ static inline int ftrace_graph_ignore_irqs(void)
 int trace_graph_entry(struct ftrace_graph_ent *trace,
  struct fgraph_ops *gops)
 {
-   struct trace_array *tr = graph_array;
+   struct trace_array *tr = gops->private;
struct trace_array_cpu *data;
unsigned long flags;
long disabled;
@@ -241,7 +239,7 @@ void __trace_graph_return(struct trace_array *tr,
 void trace_graph_return(struct ftrace_graph_ret *trace,
struct fgraph_ops *gops)
 {
-   struct trace_array *tr = graph_array;
+   struct trace_array *tr = gops->private;
struct trace_array_cpu *data;

[PATCH 12/16 v3] function_graph: Move set_graph_function tests to shadow stack global var

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

The use of the task->trace_recursion for the logic used for the
set_graph_funnction was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use that
instead.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.h | 37 +---
 kernel/trace/trace_functions_graph.c |  6 ++---
 kernel/trace/trace_irqsoff.c |  4 +--
 kernel/trace/trace_sched_wakeup.c|  4 +--
 4 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index c45932573317..4baa2887f66b 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -567,9 +567,6 @@ enum {
  */
TRACE_IRQ_BIT,
 
-   /* Set if the function is in the set_graph_function file */
-   TRACE_GRAPH_BIT,
-
/*
 * In the very unlikely case that an interrupt came in
 * at a start of graph tracing, and we want to trace
@@ -583,7 +580,7 @@ enum {
 * that preempted a softirq start of a function that
 * preempted normal context Luckily, it can't be
 * greater than 3, so the next two bits are a mask
-* of what the depth is when we set TRACE_GRAPH_BIT
+* of what the depth is when we set TRACE_GRAPH_FL
 */
 
TRACE_GRAPH_DEPTH_START_BIT,
@@ -937,11 +934,16 @@ extern void init_array_fgraph_ops(struct trace_array *tr, 
struct ftrace_ops *ops
 extern int allocate_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops);
 extern void free_fgraph_ops(struct trace_array *tr);
 
+enum {
+   TRACE_GRAPH_FL  = 1,
+};
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash *ftrace_graph_hash;
 extern struct ftrace_hash *ftrace_graph_notrace_hash;
 
-static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
+static inline int
+ftrace_graph_addr(unsigned long *task_var, struct ftrace_graph_ent *trace)
 {
unsigned long addr = trace->func;
int ret = 0;
@@ -954,12 +956,11 @@ static inline int ftrace_graph_addr(struct 
ftrace_graph_ent *trace)
}
 
if (ftrace_lookup_ip(ftrace_graph_hash, addr)) {
-
/*
 * This needs to be cleared on the return functions
 * when the depth is zero.
 */
-   trace_recursion_set(TRACE_GRAPH_BIT);
+   *task_var |= TRACE_GRAPH_FL;
trace_recursion_set_depth(trace->depth);
 
/*
@@ -979,11 +980,14 @@ static inline int ftrace_graph_addr(struct 
ftrace_graph_ent *trace)
return ret;
 }
 
-static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
+static inline void
+ftrace_graph_addr_finish(struct fgraph_ops *gops, struct ftrace_graph_ret 
*trace)
 {
-   if (trace_recursion_test(TRACE_GRAPH_BIT) &&
+   unsigned long *task_var = fgraph_get_task_var(gops);
+
+   if ((*task_var & TRACE_GRAPH_FL) &&
trace->depth == trace_recursion_depth())
-   trace_recursion_clear(TRACE_GRAPH_BIT);
+   *task_var &= ~TRACE_GRAPH_FL;
 }
 
 static inline int ftrace_graph_notrace_addr(unsigned long addr)
@@ -1000,7 +1004,7 @@ static inline int ftrace_graph_notrace_addr(unsigned long 
addr)
 }
 
 #else
-static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
+static inline int ftrace_graph_addr(unsigned long *task_var, struct 
ftrace_graph_ent *trace)
 {
return 1;
 }
@@ -1009,17 +1013,20 @@ static inline int ftrace_graph_notrace_addr(unsigned 
long addr)
 {
return 0;
 }
-static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
+static inline void ftrace_graph_addr_finish(struct fgraph_ops *gops, struct 
ftrace_graph_ret *trace)
 { }
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 extern unsigned int fgraph_max_depth;
 
-static inline bool ftrace_graph_ignore_func(struct ftrace_graph_ent *trace)
+static inline bool
+ftrace_graph_ignore_func(struct fgraph_ops *gops, struct ftrace_graph_ent 
*trace)
 {
+   unsigned long *task_var = fgraph_get_task_var(gops);
+
/* trace it when it is-nested-in or is a function enabled. */
-   return !(trace_recursion_test(TRACE_GRAPH_BIT) ||
-ftrace_graph_addr(trace)) ||
+   return !((*task_var & TRACE_GRAPH_FL) ||
+ftrace_graph_addr(task_var, trace)) ||
(trace->depth < 0) ||
(fgraph_max_depth && trace->depth >= fgraph_max_depth);
 }
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 0434e6052650..054ec91e5086 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -148,7 +148,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
if (!ftrace_trace_task(tr))
return 0;
 
-   if (ftrace_graph_ignore_func(trace))
+   if (ftrace_graph_ignore_func(gops, trace))
return 0;

[PATCH 01/16 v3] function_graph: Convert ret_stack to a series of longs

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

In order to make it possible to have multiple callbacks registered with the
function_graph tracer, the retstack needs to be converted from an array of
ftrace_ret_stack structures to an array of longs. This will allow to store
the list of callbacks on the stack for the return side of the functions.

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/sched.h |   2 +-
 kernel/trace/fgraph.c | 124 --
 2 files changed, 71 insertions(+), 55 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 11837410690f..1850d8a3c3f0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1113,7 +1113,7 @@ struct task_struct {
int curr_ret_depth;
 
/* Stack of return addresses for return function tracing: */
-   struct ftrace_ret_stack *ret_stack;
+   unsigned long   *ret_stack;
 
/* Timestamp for last schedule: */
unsigned long long  ftrace_timestamp;
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 8dfd5021b933..df48bbfc0a5a 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -23,6 +23,18 @@
 #define ASSIGN_OPS_HASH(opsname, val)
 #endif
 
+#define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
+#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_SIZE (PAGE_SIZE)
+#define SHADOW_STACK_INDEX \
+   (ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
+/* Leave on a buffer at the end */
+#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
+
+#define RET_STACK(t, index) ((struct ftrace_ret_stack 
*)(&(t)->ret_stack[index]))
+#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
+#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
+
 static bool kill_ftrace_graph;
 int ftrace_graph_active;
 
@@ -59,6 +71,7 @@ static int
 ftrace_push_return_trace(unsigned long ret, unsigned long func,
 unsigned long frame_pointer, unsigned long *retp)
 {
+   struct ftrace_ret_stack *ret_stack;
unsigned long long calltime;
int index;
 
@@ -75,23 +88,25 @@ ftrace_push_return_trace(unsigned long ret, unsigned long 
func,
smp_rmb();
 
/* The return trace stack is full */
-   if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
+   if (current->curr_ret_stack >= SHADOW_STACK_MAX_INDEX) {
atomic_inc(>trace_overrun);
return -EBUSY;
}
 
calltime = trace_clock_local();
 
-   index = ++current->curr_ret_stack;
+   index = current->curr_ret_stack;
+   RET_STACK_INC(current->curr_ret_stack);
+   ret_stack = RET_STACK(current, index);
barrier();
-   current->ret_stack[index].ret = ret;
-   current->ret_stack[index].func = func;
-   current->ret_stack[index].calltime = calltime;
+   ret_stack->ret = ret;
+   ret_stack->func = func;
+   ret_stack->calltime = calltime;
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
-   current->ret_stack[index].fp = frame_pointer;
+   ret_stack->fp = frame_pointer;
 #endif
 #ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
-   current->ret_stack[index].retp = retp;
+   ret_stack->retp = retp;
 #endif
return 0;
 }
@@ -113,7 +128,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
 
return 0;
  out_ret:
-   current->curr_ret_stack--;
+   RET_STACK_DEC(current->curr_ret_stack);
  out:
current->curr_ret_depth--;
return -EBUSY;
@@ -124,11 +139,13 @@ static void
 ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
unsigned long frame_pointer)
 {
+   struct ftrace_ret_stack *ret_stack;
int index;
 
index = current->curr_ret_stack;
+   RET_STACK_DEC(index);
 
-   if (unlikely(index < 0 || index >= FTRACE_RETFUNC_DEPTH)) {
+   if (unlikely(index < 0 || index > SHADOW_STACK_MAX_INDEX)) {
ftrace_graph_stop();
WARN_ON(1);
/* Might as well panic, otherwise we have no where to go */
@@ -136,6 +153,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
return;
}
 
+   ret_stack = RET_STACK(current, index);
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
/*
 * The arch may choose to record the frame pointer used
@@ -151,22 +169,22 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
 * Note, -mfentry does not use frame pointers, and this test
 *  is not needed if CC_USING_FENTRY is set.
 */
-   if (unlikely(current->ret_stack[index].fp != frame_pointer)) {
+   if (unlikely(ret_stack->fp != frame_pointer)) {
ftrace_graph_stop();
WARN(1, "Bad frame pointer: expected %lx, received %lx\n"
 "

[PATCH 00/16 v3] function_graph: Rewrite to allow multiple users

2019-05-24 Thread Steven Rostedt




The background for this is explained in the V1 version found here:

 http://lkml.kernel.org/r/20181122012708.491151...@goodmis.org

The TL;DR; is this:

 The function graph tracer required a rewrite, mainly because it
 can only allow one callback registered at a time. The main motivation
 for this change is to allow kretprobes to use the code of function
 graph tracer, which should allow all archs that have function graph
 tracing to also have kretprobes with no extra work.

Masami told me that one requirement was to allow the function entry
callback to store data on the shadow stack that can be retrieved by
the the function return callback. I added this, as well as a per-task
variable (used by one of the function graph users).

The two functions to allow the storing of data on the stack and
retrieval of it are:

 void *fgraph_reserve_data(int size_in_bytes)

Allows the entry function to reserve up to 4 words of data on
the shadow stack. On success, a pointer to the contents is returned.
This may be only called once per entry function.

 void *fgraph_retrieve_data(void)

Allows the return function to retrieve the reserved data that was
allocated by the entry function.

Changes since v2:

  http://lkml.kernel.org/r/20190520142001.270067...@goodmis.org

 As a request from Peter Zijlstra, I changed the direction of
 the stack from growing up, to growing down. It passes some smoke
 tests, but I will need to run a lot more tests on it. But I decide
 to post this series anyway.

 Also changed, was using BULID_BUG_ON() instead of the align tricks,
 And also used round_up() to remove another align trick.

 I found a bug it patch 4 that was fixed in patch 5, but I fixed
 it in patch 4 to keep it bisectable.

 Added a few more comments, and also added more boot up self tests to
 test more of the passing of data around.

The git repo can be found here:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
ftrace/fgraph-multi-stackdown

Head SHA1: 7e25deae405b757d5d98fcafdd79c34f87cb


Steven Rostedt (VMware) (16):
  function_graph: Convert ret_stack to a series of longs
  fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by 
long
  fgraph: Have the current->ret_stack go down not up
  function_graph: Add an array structure that will allow multiple callbacks
  function_graph: Allow multiple users to attach to function graph
  function_graph: Remove logic around ftrace_graph_entry and return
  ftrace/function_graph: Pass fgraph_ops to function graph callbacks
  ftrace: Allow function_graph tracer to be enabled in instances
  ftrace: Allow ftrace startup flags exist without dynamic ftrace
  function_graph: Have the instances use their own ftrace_ops for filtering
  function_graph: Add "task variables" per task for fgraph_ops
  function_graph: Move set_graph_function tests to shadow stack global var
  function_graph: Move graph depth stored data to shadow stack global var
  function_graph: Move graph notrace bit to shadow stack global var
  function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
  function_graph: Add selftest for passing local variables


 include/linux/ftrace.h   |  37 +-
 include/linux/sched.h|   2 +-
 kernel/trace/fgraph.c| 870 ---
 kernel/trace/ftrace.c|  13 +-
 kernel/trace/ftrace_internal.h   |   2 -
 kernel/trace/trace.h | 132 +++---
 kernel/trace/trace_functions.c   |   7 +
 kernel/trace/trace_functions_graph.c |  96 ++--
 kernel/trace/trace_irqsoff.c |  10 +-
 kernel/trace/trace_sched_wakeup.c|  10 +-
 kernel/trace/trace_selftest.c| 317 -
 11 files changed, 1205 insertions(+), 291 deletions(-)

[PATCH 02/16 v3] fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Instead of using "ALIGN()", use BUILD_BUG_ON() as the structures should
always be divisible by sizeof(long).

Link: 
http://lkml.kernel.org/r/2019052444.gi2...@hirez.programming.kicks-ass.net

Suggested-by: Peter Zijlstra 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/fgraph.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index df48bbfc0a5a..63e701771c20 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -24,10 +24,9 @@
 #endif
 
 #define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
-#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
+#define FGRAPH_RET_INDEX (FGRAPH_RET_SIZE / sizeof(long))
 #define SHADOW_STACK_SIZE (PAGE_SIZE)
-#define SHADOW_STACK_INDEX \
-   (ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
 /* Leave on a buffer at the end */
 #define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
 
@@ -81,6 +80,8 @@ ftrace_push_return_trace(unsigned long ret, unsigned long 
func,
if (!current->ret_stack)
return -EBUSY;
 
+   BUILD_BUG_ON(SHADOW_STACK_SIZE % sizeof(long));
+
/*
 * We must make sure the ret_stack is tested before we read
 * anything else.
@@ -266,6 +267,8 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int 
idx)
 {
int index = task->curr_ret_stack;
 
+   BUILD_BUG_ON(FGRAPH_RET_SIZE % sizeof(long));
+
index -= FGRAPH_RET_INDEX * (idx + 1);
if (index < 0)
return NULL;
-- 
2.20.1

[PATCH 06/16 v3] function_graph: Remove logic around ftrace_graph_entry and return

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

The function pointers ftrace_graph_entry and ftrace_graph_return are no
longer called via the function_graph tracer. Instead, an array structure is
now used that will allow for multiple users of the function_graph
infrastructure. The variables are still used by the architecture code for
non dynamic ftrace configs, where a test is made against them to see if they
point to the default stub function or not. This is how the static function
tracing knows to call into the function graph tracer infrastructure or not.

Two new stub functions are made. entry_run() and return_run(). The
ftrace_graph_entry and ftrace_graph_return are set to them repectively when
the function graph tracer is enabled, and this will trigger the architecture
specific function graph code to be executed.

This also requires checking the global_ops hash for all calls into the
function_graph tracer.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/fgraph.c  | 71 +-
 kernel/trace/ftrace.c  |  2 -
 kernel/trace/ftrace_internal.h |  2 -
 3 files changed, 19 insertions(+), 56 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index a01d418791dc..9a562937e255 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -126,6 +126,18 @@ static inline int get_fgraph_array(struct task_struct *t, 
int offset)
FGRAPH_ARRAY_MASK;
 }
 
+/* ftrace_graph_entry set to this to tell some archs to run function graph */
+static int entry_run(struct ftrace_graph_ent *trace)
+{
+   return 0;
+}
+
+/* ftrace_graph_return set to this to tell some archs to run function graph */
+static void return_run(struct ftrace_graph_ret *trace)
+{
+   return;
+}
+
 /*
  * @offset: The index into @t->ret_stack to find the ret_stack entry
  * @index: Where to place the index into @t->ret_stack of that entry
@@ -289,6 +301,9 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
int cnt = 0;
int i;
 
+   if (!ftrace_ops_test(_ops, func, NULL))
+   goto out;
+
trace.func = func;
trace.depth = ++current->curr_ret_depth;
 
@@ -602,7 +617,6 @@ void ftrace_graph_sleep_time_control(bool enable)
 trace_func_graph_ret_t ftrace_graph_return =
(trace_func_graph_ret_t)ftrace_stub;
 trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
-static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
 
 /* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
 static int alloc_retstack_tasklist(unsigned long **ret_stack_list)
@@ -684,46 +698,6 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
}
 }
 
-static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
-{
-   if (!ftrace_ops_test(_ops, trace->func, NULL))
-   return 0;
-   return __ftrace_graph_entry(trace);
-}
-
-/*
- * The function graph tracer should only trace the functions defined
- * by set_ftrace_filter and set_ftrace_notrace. If another function
- * tracer ops is registered, the graph tracer requires testing the
- * function against the global ops, and not just trace any function
- * that any ftrace_ops registered.
- */
-void update_function_graph_func(void)
-{
-   struct ftrace_ops *op;
-   bool do_test = false;
-
-   /*
-* The graph and global ops share the same set of functions
-* to test. If any other ops is on the list, then
-* the graph tracing needs to test if its the function
-* it should call.
-*/
-   do_for_each_ftrace_op(op, ftrace_ops_list) {
-   if (op != _ops && op != _ops &&
-   op != _list_end) {
-   do_test = true;
-   /* in double loop, break out with goto */
-   goto out;
-   }
-   } while_for_each_ftrace_op(op);
- out:
-   if (do_test)
-   ftrace_graph_entry = ftrace_graph_entry_test;
-   else
-   ftrace_graph_entry = __ftrace_graph_entry;
-}
-
 static DEFINE_PER_CPU(unsigned long *, idle_ret_stack);
 
 static void
@@ -866,18 +840,12 @@ int register_ftrace_graph(struct fgraph_ops *gops)
ftrace_graph_active--;
goto out;
}
-
-   ftrace_graph_return = gops->retfunc;
-
/*
-* Update the indirect function to the entryfunc, and the
-* function that gets called to the entry_test first. Then
-* call the update fgraph entry function to determine if
-* the entryfunc should be called directly or not.
+* Some archs just test to see if these are not
+* the default function
 */
-   __ftrace_graph_entry = gops->entryfunc;
-   ftrace_graph_entry = ftrace_graph_entry_test;
-

[PATCH 07/16 v3] ftrace/function_graph: Pass fgraph_ops to function graph callbacks

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Pass the fgraph_ops structure to the function graph callbacks. This will
allow callbacks to add a descriptor to a fgraph_ops private field that wil
be added in the future and use it for the callbacks. This will be useful
when more than one callback can be registered to the function graph tracer.

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h   | 10 +++---
 kernel/trace/fgraph.c| 14 --
 kernel/trace/ftrace.c|  6 --
 kernel/trace/trace.h |  4 ++--
 kernel/trace/trace_functions_graph.c | 11 +++
 kernel/trace/trace_irqsoff.c |  6 --
 kernel/trace/trace_sched_wakeup.c|  6 --
 kernel/trace/trace_selftest.c|  5 +++--
 8 files changed, 39 insertions(+), 23 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 6fe69e0dc415..906f7c25faa6 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -737,11 +737,15 @@ struct ftrace_graph_ret {
int depth;
 } __packed;
 
+struct fgraph_ops;
+
 /* Type of the callback handlers for tracing function graph*/
-typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *); /* return */
-typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *); /* entry */
+typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *,
+  struct fgraph_ops *); /* return */
+typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *,
+ struct fgraph_ops *); /* entry */
 
-extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace);
+extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct 
fgraph_ops *gops);
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 9a562937e255..09e5bf2740a8 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -127,13 +127,13 @@ static inline int get_fgraph_array(struct task_struct *t, 
int offset)
 }
 
 /* ftrace_graph_entry set to this to tell some archs to run function graph */
-static int entry_run(struct ftrace_graph_ent *trace)
+static int entry_run(struct ftrace_graph_ent *trace, struct fgraph_ops *ops)
 {
return 0;
 }
 
 /* ftrace_graph_return set to this to tell some archs to run function graph */
-static void return_run(struct ftrace_graph_ret *trace)
+static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops)
 {
return;
 }
@@ -178,12 +178,14 @@ get_ret_stack(struct task_struct *t, int offset, int 
*index)
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 
-int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace,
+   struct fgraph_ops *gops)
 {
return 0;
 }
 
-static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace)
+static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace,
+ struct fgraph_ops *gops)
 {
 }
 
@@ -323,7 +325,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
atomic_inc(>trace_overrun);
break;
}
-   if (fgraph_array[i]->entryfunc()) {
+   if (fgraph_array[i]->entryfunc(, fgraph_array[i])) {
offset = current->curr_ret_stack;
/* Check the top level stored word */
type = get_fgraph_type(current, offset);
@@ -491,7 +493,7 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
i = 0;
do {
idx = get_fgraph_array(current, offset + i);
-   fgraph_array[idx]->retfunc();
+   fgraph_array[idx]->retfunc(, fgraph_array[idx]);
i++;
} while (i < index);
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 1e31b8b37800..35c79f3ab2f5 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -790,7 +790,8 @@ void ftrace_graph_graph_time_control(bool enable)
fgraph_graph_time = enable;
 }
 
-static int profile_graph_entry(struct ftrace_graph_ent *trace)
+static int profile_graph_entry(struct ftrace_graph_ent *trace,
+  struct fgraph_ops *gops)
 {
struct ftrace_ret_stack *ret_stack;
 
@@ -807,7 +808,8 @@ static int profile_graph_entry(struct ftrace_graph_ent 
*trace)
return 1;
 }
 
-static void profile_graph_return(struct ftrace_graph_ret *trace)
+static void profile_graph_return(struct ftrace_graph_ret *trace,
+struct fgraph_ops *gops)
 {
struct ftrace_ret_stack *ret_stack;
struct ftrace_profile_stat *stat;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 65b765abb108..982f5fa8da09 100644
--- a/kernel/trace/trace.h
+++

[PATCH 13/16 v3] function_graph: Move graph depth stored data to shadow stack global var

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

The use of the task->trace_recursion for the logic used for the function
graph depth was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use that
instead.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.h | 63 ++--
 1 file changed, 32 insertions(+), 31 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4baa2887f66b..bda97d2f6aa9 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -567,25 +567,6 @@ enum {
  */
TRACE_IRQ_BIT,
 
-   /*
-* In the very unlikely case that an interrupt came in
-* at a start of graph tracing, and we want to trace
-* the function in that interrupt, the depth can be greater
-* than zero, because of the preempted start of a previous
-* trace. In an even more unlikely case, depth could be 2
-* if a softirq interrupted the start of graph tracing,
-* followed by an interrupt preempting a start of graph
-* tracing in the softirq, and depth can even be 3
-* if an NMI came in at the start of an interrupt function
-* that preempted a softirq start of a function that
-* preempted normal context Luckily, it can't be
-* greater than 3, so the next two bits are a mask
-* of what the depth is when we set TRACE_GRAPH_FL
-*/
-
-   TRACE_GRAPH_DEPTH_START_BIT,
-   TRACE_GRAPH_DEPTH_END_BIT,
-
/*
 * To implement set_graph_notrace, if this bit is set, we ignore
 * function graph tracing of called functions, until the return
@@ -598,16 +579,6 @@ enum {
 #define trace_recursion_clear(bit) do { (current)->trace_recursion &= 
~(1<<(bit)); } while (0)
 #define trace_recursion_test(bit)  ((current)->trace_recursion & 
(1<<(bit)))
 
-#define trace_recursion_depth() \
-   (((current)->trace_recursion >> TRACE_GRAPH_DEPTH_START_BIT) & 3)
-#define trace_recursion_set_depth(depth) \
-   do {\
-   current->trace_recursion &= \
-   ~(3 << TRACE_GRAPH_DEPTH_START_BIT);\
-   current->trace_recursion |= \
-   ((depth) & 3) << TRACE_GRAPH_DEPTH_START_BIT;   \
-   } while (0)
-
 #define TRACE_CONTEXT_BITS 4
 
 #define TRACE_FTRACE_START TRACE_FTRACE_BIT
@@ -936,8 +907,38 @@ extern void free_fgraph_ops(struct trace_array *tr);
 
 enum {
TRACE_GRAPH_FL  = 1,
+
+   /*
+* In the very unlikely case that an interrupt came in
+* at a start of graph tracing, and we want to trace
+* the function in that interrupt, the depth can be greater
+* than zero, because of the preempted start of a previous
+* trace. In an even more unlikely case, depth could be 2
+* if a softirq interrupted the start of graph tracing,
+* followed by an interrupt preempting a start of graph
+* tracing in the softirq, and depth can even be 3
+* if an NMI came in at the start of an interrupt function
+* that preempted a softirq start of a function that
+* preempted normal context Luckily, it can't be
+* greater than 3, so the next two bits are a mask
+* of what the depth is when we set TRACE_GRAPH_FL
+*/
+
+   TRACE_GRAPH_DEPTH_START_BIT,
+   TRACE_GRAPH_DEPTH_END_BIT,
 };
 
+static inline unsigned long ftrace_graph_depth(unsigned long *task_var)
+{
+   return (*task_var >> TRACE_GRAPH_DEPTH_START_BIT) & 3;
+}
+
+static inline void ftrace_graph_set_depth(unsigned long *task_var, int depth)
+{
+   *task_var &= ~(3 << TRACE_GRAPH_DEPTH_START_BIT);
+   *task_var |= (depth & 3) << TRACE_GRAPH_DEPTH_START_BIT;
+}
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash *ftrace_graph_hash;
 extern struct ftrace_hash *ftrace_graph_notrace_hash;
@@ -961,7 +962,7 @@ ftrace_graph_addr(unsigned long *task_var, struct 
ftrace_graph_ent *trace)
 * when the depth is zero.
 */
*task_var |= TRACE_GRAPH_FL;
-   trace_recursion_set_depth(trace->depth);
+   ftrace_graph_set_depth(task_var, trace->depth);
 
/*
 * If no irqs are to be traced, but a set_graph_function
@@ -986,7 +987,7 @@ ftrace_graph_addr_finish(struct fgraph_ops *gops, struct 
ftrace_graph_ret *trace
unsigned long *task_var = fgraph_get_task_var(gops);
 
if ((*task_var & TRACE_GRAPH_FL) &&
-   trace->depth == trace_recursion_depth())
+   trace->depth == ftrace_graph_depth(task_var))
*task_var &= ~TRACE_GRAPH_FL;
 }
 
-- 
2.20.1

[PATCH 11/16 v3] function_graph: Add "task variables" per task for fgraph_ops

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Add a "task variables" array on the tasks shadow ret_stack that is the
size of longs for each possible registered fgraph_ops. That's a total of 16,
taking up 8 * 16 = 128 bytes (out of a page size 4k).

This will allow for fgraph_ops to do specific features on a per task basis
having a way to maintain state for each task.

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h |  2 ++
 kernel/trace/fgraph.c  | 72 +-
 2 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index e6a596e7cdf4..a0bdd1745e56 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -754,6 +754,7 @@ struct fgraph_ops {
trace_func_graph_ret_t  retfunc;
struct ftrace_ops   ops; /* for the hash lists */
void*private;
+   int idx;
 };
 
 /*
@@ -792,6 +793,7 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int 
idx);
 
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
unsigned long ret, unsigned long *retp);
+unsigned long *fgraph_get_task_var(struct fgraph_ops *gops);
 
 int function_graph_enter(unsigned long ret, unsigned long func,
 unsigned long frame_pointer, unsigned long *retp);
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 8b52993044bc..3bb1204c6cf9 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -88,12 +88,19 @@ enum {
 
 #define SHADOW_STACK_SIZE (PAGE_SIZE)
 #define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
-#define SHADOW_STACK_MAX_INDEX SHADOW_STACK_INDEX
+#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_ARRAY_SIZE)
 /* Leave on a little buffer at the bottom */
 #define SHADOW_STACK_MIN_INDEX (FGRAPH_RET_INDEX + 1)
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack 
*)(&(t)->ret_stack[index]))
 
+/*
+ * Each fgraph_ops has a reservered unsigned long at the end (top) of the
+ * ret_stack to store task specific state.
+ */
+#define SHADOW_STACK_TASK_VARS(ret_stack) \
+   ((unsigned long *)(&(ret_stack)[SHADOW_STACK_MAX_INDEX]))
+
 static bool kill_ftrace_graph;
 int ftrace_graph_active;
 
@@ -130,6 +137,44 @@ static void return_run(struct ftrace_graph_ret *trace, 
struct fgraph_ops *ops)
return;
 }
 
+static void ret_stack_set_task_var(struct task_struct *t, int idx, long val)
+{
+   unsigned long *gvals = SHADOW_STACK_TASK_VARS(t->ret_stack);
+
+   gvals[idx] = val;
+}
+
+static unsigned long *
+ret_stack_get_task_var(struct task_struct *t, int idx)
+{
+   unsigned long *gvals = SHADOW_STACK_TASK_VARS(t->ret_stack);
+
+   return [idx];
+}
+
+static void ret_stack_init_task_vars(unsigned long *ret_stack)
+{
+   unsigned long *gvals = SHADOW_STACK_TASK_VARS(ret_stack);
+
+   memset(gvals, 0, sizeof(*gvals) * FGRAPH_ARRAY_SIZE);
+}
+
+/**
+ * fgraph_get_task_var - retrieve a task specific state variable
+ * @gops: The ftrace_ops that owns the task specific variable
+ *
+ * Every registered fgraph_ops has a task state variable
+ * reserved on the task's ret_stack. This function returns the
+ * address to that variable.
+ *
+ * Returns the address to the fgraph_ops @gops tasks specific
+ * unsigned long variable.
+ */
+unsigned long *fgraph_get_task_var(struct fgraph_ops *gops)
+{
+   return ret_stack_get_task_var(current, gops->idx);
+}
+
 /*
  * @offset: The index into @t->ret_stack to find the ret_stack entry
  * @index: Where to place the index into @t->ret_stack of that entry
@@ -647,6 +692,7 @@ static int alloc_retstack_tasklist(unsigned long 
**ret_stack_list)
if (t->ret_stack == NULL) {
atomic_set(>tracing_graph_pause, 0);
atomic_set(>trace_overrun, 0);
+   ret_stack_init_task_vars(ret_stack_list[start]);
t->curr_ret_stack = SHADOW_STACK_MAX_INDEX;
t->curr_ret_depth = -1;
/* Make sure the tasks see the 0 first: */
@@ -706,6 +752,7 @@ graph_init_task(struct task_struct *t, unsigned long 
*ret_stack)
 {
atomic_set(>tracing_graph_pause, 0);
atomic_set(>trace_overrun, 0);
+   ret_stack_init_task_vars(ret_stack);
t->ftrace_timestamp = 0;
t->curr_ret_stack = SHADOW_STACK_MAX_INDEX;
t->curr_ret_depth = -1;
@@ -804,6 +851,24 @@ static int start_graph_tracing(void)
return ret;
 }
 
+static void init_task_vars(int idx)
+{
+   struct task_struct *g, *t;
+   int cpu;
+
+   for_each_online_cpu(cpu) {
+   if (idle_task(cpu)->ret_stack)
+   ret_stack_set_task_var(idle_task(cpu), idx, 0);
+   }
+
+   read_lock(_lock);
+   do_each_thread(g, t) {
+   if (t->ret_stack)
+   ret_stack_set_task_var(t,

[PATCH 10/16 v3] function_graph: Have the instances use their own ftrace_ops for filtering

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Allow for instances to have their own ftrace_ops part of the fgraph_ops that
makes the funtion_graph tracer filter on the set_ftrace_filter file of the
instance and not the top instance.

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h   |  1 +
 kernel/trace/fgraph.c| 63 +---
 kernel/trace/ftrace.c|  6 +--
 kernel/trace/trace.h | 16 +++
 kernel/trace/trace_functions.c   |  2 +-
 kernel/trace/trace_functions_graph.c |  8 +++-
 6 files changed, 59 insertions(+), 37 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index d0307c9b866e..e6a596e7cdf4 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -752,6 +752,7 @@ extern int ftrace_graph_entry_stub(struct ftrace_graph_ent 
*trace, struct fgraph
 struct fgraph_ops {
trace_func_graph_ent_t  entryfunc;
trace_func_graph_ret_t  retfunc;
+   struct ftrace_ops   ops; /* for the hash lists */
void*private;
 };
 
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 09e5bf2740a8..8b52993044bc 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -15,14 +15,6 @@
 
 #include "ftrace_internal.h"
 
-#ifdef CONFIG_DYNAMIC_FTRACE
-#define ASSIGN_OPS_HASH(opsname, val) \
-   .func_hash  = val, \
-   .local_hash.regex_lock  = 
__MUTEX_INITIALIZER(opsname.local_hash.regex_lock),
-#else
-#define ASSIGN_OPS_HASH(opsname, val)
-#endif
-
 #define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
 #define FGRAPH_RET_INDEX (FGRAPH_RET_SIZE / sizeof(long))
 
@@ -303,9 +295,6 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
int cnt = 0;
int i;
 
-   if (!ftrace_ops_test(_ops, func, NULL))
-   goto out;
-
trace.func = func;
trace.depth = ++current->curr_ret_depth;
 
@@ -325,7 +314,8 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
atomic_inc(>trace_overrun);
break;
}
-   if (fgraph_array[i]->entryfunc(, fgraph_array[i])) {
+   if (ftrace_ops_test(>ops, func, NULL) &&
+   gops->entryfunc(, gops)) {
offset = current->curr_ret_stack;
/* Check the top level stored word */
type = get_fgraph_type(current, offset);
@@ -597,18 +587,27 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
 }
 #endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
 
-static struct ftrace_ops graph_ops = {
-   .func   = ftrace_stub,
-   .flags  = FTRACE_OPS_FL_RECURSION_SAFE |
-  FTRACE_OPS_FL_INITIALIZED |
-  FTRACE_OPS_FL_PID |
-  FTRACE_OPS_FL_STUB,
+void fgraph_init_ops(struct ftrace_ops *dst_ops,
+struct ftrace_ops *src_ops)
+{
+   dst_ops->func = ftrace_stub;
+   dst_ops->flags = FTRACE_OPS_FL_RECURSION_SAFE |
+   FTRACE_OPS_FL_PID |
+   FTRACE_OPS_FL_STUB;
+
 #ifdef FTRACE_GRAPH_TRAMP_ADDR
-   .trampoline = FTRACE_GRAPH_TRAMP_ADDR,
+   dst_ops->trampoline = FTRACE_GRAPH_TRAMP_ADDR;
/* trampoline_size is only needed for dynamically allocated tramps */
 #endif
-   ASSIGN_OPS_HASH(graph_ops, _ops.local_hash)
-};
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+   if (src_ops) {
+   dst_ops->func_hash = _ops->local_hash;
+   mutex_init(_ops->local_hash.regex_lock);
+   dst_ops->flags |= FTRACE_OPS_FL_INITIALIZED;
+   }
+#endif
+}
 
 void ftrace_graph_sleep_time_control(bool enable)
 {
@@ -807,11 +806,20 @@ static int start_graph_tracing(void)
 
 int register_ftrace_graph(struct fgraph_ops *gops)
 {
+   int command = 0;
int ret = 0;
int i;
 
mutex_lock(_lock);
 
+   if (!gops->ops.func) {
+   gops->ops.flags |= FTRACE_OPS_FL_STUB;
+   gops->ops.func = ftrace_stub;
+#ifdef FTRACE_GRAPH_TRAMP_ADDR
+   gops->ops.trampoline = FTRACE_GRAPH_TRAMP_ADDR;
+#endif
+   }
+
if (!fgraph_array[0]) {
/* The array must always have real data on it */
for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
@@ -848,9 +856,10 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 */
ftrace_graph_return = return_run;
ftrace_graph_entry = entry_run;
-
-   ret = ftrace_startup(_ops, FTRACE_START_FUNC_RET);
+   command = FTRACE_START_FUNC_RET;
}
+
+   ret = ftrace_startup(>ops, command);
 out:
mutex_unlock(_lock);
return ret;
@@ -858,6 +867,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 
 void

[PATCH 15/16 v3] function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Added functions that can be called by a fgraph_ops entryfunc and retfunc to
store state between the entry of the function being traced to the exit of
the same function. The fgraph_ops entryfunc() may call fgraph_reserve_data()
to store up to 4 words onto the task's shadow ret_stack and this then can be
retrived by fgraph_retrieve_data() called by the corresponding retfunc().

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h |   3 +
 kernel/trace/fgraph.c  | 253 +++--
 2 files changed, 220 insertions(+), 36 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index a0bdd1745e56..5b252dc9c1e6 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -757,6 +757,9 @@ struct fgraph_ops {
int idx;
 };
 
+void *fgraph_reserve_data(int size_bytes);
+void *fgraph_retrieve_data(void);
+
 /*
  * Stack of return addresses for functions
  * of a thread.
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 3bb1204c6cf9..c368da1a60b8 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -36,27 +36,36 @@
  * bits: 14 - 15   Type of storage
  *   0 - reserved
  *   1 - fgraph_array index
+ *   2 - reservered data
  * For fgraph_array_index:
  *  bits: 16 - 23  The fgraph_ops fgraph_array index
  *
+ * For reserved data:
+ *  bits: 16 - 17  The size in words that is stored
+ *
  * That is, at the end of function_graph_enter, if the first and forth
  * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
- * on the return of the function being traced, this is what will be on the
- * task's shadow ret_stack: (the stack grows upward)
+ * on the return of the function being traced, and the forth fgraph_ops
+ * stored two words of data, this is what will be on the task's shadow
+ * ret_stack: (the stack grows upward)
+ *
+ * | (X) | (N)   | ( N words away from last ret_stack)
+ * +-+
+ * | struct ftrace_ret_stack |
+ * |   (stores the saved ret pointer)|
+ * +-+
+ * | (0 << FGRAPH_ARRAY_SHIFT)|type:1|(1)| ( 0 for index of first fgraph_ops)
+ * +-+ ( It is 4 words from the ret_stack)
+ * | STORED DATA WORD 2  |
+ * | STORED DATA WORD 1  |
+ * +-+ ( Data with size of 2 words)
+ * | (3 << FGRAPH_DATA_SHIFT)|type:2|(4) |   ( 2 + 1 word for meta data )
+ * +-+
+ * | (3 << FGRAPH_ARRAY_SHIFT)|type:1|(5)| ( 3 for index of fourth fgraph_ops)
+ * | | <- task->curr_ret_stack
+ * +-+  (points to data)
+ * | |
  *
-
- * |  |
- * | (X) | (N)| ( N words away from previous ret_stack)
- * +--+
- * | struct ftrace_ret_stack  |
- * |   (stores the saved ret pointer) |
- * +--+
- * | (0 << FGRAPH_ARRAY_SHIFT)|(1)| ( 0 for index of first fgraph_ops)
- * +--+
- * | (3 << FGRAPH_ARRAY_SHIFT)|(2)| ( 3 for index of fourth fgraph_ops)
- * |  | <- task->curr_ret_stack (points to 
data)
- * +--+
- * |  |
  *
  * If a backtrace is required, and the real return pointer needs to be
  * fetched, then it looks at the task's curr_ret_stack index, if it
@@ -77,12 +86,17 @@
 enum {
FGRAPH_TYPE_RESERVED= 0,
FGRAPH_TYPE_ARRAY   = 1,
+   FGRAPH_TYPE_DATA= 2,
 };
 
 #define FGRAPH_ARRAY_SIZE  16
 #define FGRAPH_ARRAY_MASK  ((1 << FGRAPH_ARRAY_SIZE) - 1)
 #define FGRAPH_ARRAY_SHIFT (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
 
+#define FGRAPH_DATA_SIZE   2
+#define FGRAPH_DATA_MASK   ((1 << FGRAPH_DATA_SIZE) - 1)
+#define FGRAPH_DATA_SHIFT  (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
+
 /* Currently the max stack index can't be more than register callers */
 #define FGRAPH_MAX_INDEX   FGRAPH_ARRAY_SIZE
 
@@ -94,6 +108,8 @@ enum {
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack 
*)(&(t)->ret_stack[index]))
 
+#define FGRAPH_MAX_DATA_SIZE (sizeof(long) * 4)
+
 /*
  * Each fgraph_ops has a reservered unsigned long at the end (top) of the
  * ret_stack to store task specific state.
@@ -108,21 +124,50 @@ static int fgraph_array_cnt;
 
 static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
 
+/* The following extracts info from the value on the current_ret_stack */
+
+/* Extract the index to the next ret_stack */
+static inline int __get_index(unsigned long val)
+{
+   return val & FGRAPH_RET_INDEX_MASK;
+}
+
+/*

[PATCH 14/16 v3] function_graph: Move graph notrace bit to shadow stack global var

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

The use of the task->trace_recursion for the logic used for the function
graph no-trace was a bit of an abuse of that variable. Now that there exists
global vars that are per stack for registered graph traces, use that instead.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.h | 16 +---
 kernel/trace/trace_functions_graph.c | 10 ++
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index bda97d2f6aa9..d23283f9a627 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -566,13 +566,6 @@ enum {
  * can only be modified by current, we can reuse trace_recursion.
  */
TRACE_IRQ_BIT,
-
-   /*
-* To implement set_graph_notrace, if this bit is set, we ignore
-* function graph tracing of called functions, until the return
-* function is called to clear it.
-*/
-   TRACE_GRAPH_NOTRACE_BIT,
 };
 
 #define trace_recursion_set(bit)   do { (current)->trace_recursion |= 
(1<<(bit)); } while (0)
@@ -926,8 +919,17 @@ enum {
 
TRACE_GRAPH_DEPTH_START_BIT,
TRACE_GRAPH_DEPTH_END_BIT,
+
+   /*
+* To implement set_graph_notrace, if this bit is set, we ignore
+* function graph tracing of called functions, until the return
+* function is called to clear it.
+*/
+   TRACE_GRAPH_NOTRACE_BIT,
 };
 
+#define TRACE_GRAPH_NOTRACE(1 << TRACE_GRAPH_NOTRACE_BIT)
+
 static inline unsigned long ftrace_graph_depth(unsigned long *task_var)
 {
return (*task_var >> TRACE_GRAPH_DEPTH_START_BIT) & 3;
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 054ec91e5086..20ee84350f43 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -125,6 +125,7 @@ static inline int ftrace_graph_ignore_irqs(void)
 int trace_graph_entry(struct ftrace_graph_ent *trace,
  struct fgraph_ops *gops)
 {
+   unsigned long *task_var = fgraph_get_task_var(gops);
struct trace_array *tr = gops->private;
struct trace_array_cpu *data;
unsigned long flags;
@@ -133,11 +134,11 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
int cpu;
int pc;
 
-   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT))
+   if (*task_var & TRACE_GRAPH_NOTRACE)
return 0;
 
if (ftrace_graph_notrace_addr(trace->func)) {
-   trace_recursion_set(TRACE_GRAPH_NOTRACE_BIT);
+   *task_var |= TRACE_GRAPH_NOTRACE_BIT;
/*
 * Need to return 1 to have the return called
 * that will clear the NOTRACE bit.
@@ -239,6 +240,7 @@ void __trace_graph_return(struct trace_array *tr,
 void trace_graph_return(struct ftrace_graph_ret *trace,
struct fgraph_ops *gops)
 {
+   unsigned long *task_var = fgraph_get_task_var(gops);
struct trace_array *tr = gops->private;
struct trace_array_cpu *data;
unsigned long flags;
@@ -248,8 +250,8 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
 
ftrace_graph_addr_finish(gops, trace);
 
-   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
-   trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+   if (*task_var & TRACE_GRAPH_NOTRACE) {
+   *task_var &= ~TRACE_GRAPH_NOTRACE;
return;
}
 
-- 
2.20.1

[PATCH 16/16 v3] function_graph: Add selftest for passing local variables

2019-05-24 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Add boot up selftest that passes variables from a function entry to a
function exit, and make sure that they do get passed around. Also test
for some failure cases.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_selftest.c | 310 ++
 1 file changed, 310 insertions(+)

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index facd5d1c05e7..edee3c8bd307 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -718,6 +718,314 @@ trace_selftest_startup_function(struct tracer *trace, 
struct trace_array *tr)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
+#ifdef CONFIG_DYNAMIC_FTRACE
+
+#define BYTE_NUMBER 123
+#define SHORT_NUMBER 12345
+#define WORD_NUMBER 1234567890
+#define LONG_NUMBER 1234567890123456789LL
+
+static int fgraph_store_size __initdata;
+static const char *fgraph_store_type_name __initdata;
+static char *fgraph_error_str __initdata;
+static char fgraph_error_str_buf[128] __initdata;
+
+static __init int store_entry(struct ftrace_graph_ent *trace,
+ struct fgraph_ops *gops)
+{
+   const char *type = fgraph_store_type_name;
+   int size = fgraph_store_size;
+   void *p;
+   void *fail;
+
+   /* Try to reserve too much */
+   fail = fgraph_reserve_data(sizeof(long) * 4 + 1);
+   if (fail) {
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"Was able to reserve too much!\n");
+   fgraph_error_str = fgraph_error_str_buf;
+   return 0;
+   }
+   /* Reserve the amount we want to */
+   p = fgraph_reserve_data(size);
+   if (!p) {
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"Failed to reserve %s\n", type);
+   fgraph_error_str = fgraph_error_str_buf;
+   return 0;
+   }
+   /* We should only be able to reserve once */
+   fail = fgraph_reserve_data(4);
+   if (fail) {
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"Was able to reserve twice!\n");
+   fgraph_error_str = fgraph_error_str_buf;
+   return 0;
+   }
+
+   switch (fgraph_store_size) {
+   case 1:
+   *(char *)p = BYTE_NUMBER;
+   break;
+   case 2:
+   *(short *)p = SHORT_NUMBER;
+   break;
+   case 4:
+   *(int *)p = WORD_NUMBER;
+   break;
+   case 8:
+   *(long long *)p = LONG_NUMBER;
+   break;
+   case 12:
+   *(long long *)p = LONG_NUMBER;
+   p += 8;
+   *(int *)p = WORD_NUMBER;
+   break;
+   case 16:
+   *(long long *)p = LONG_NUMBER;
+   p += 8;
+   *(long long *)p = WORD_NUMBER;
+   *(long long *)p <<= 32;
+   *(long long *)p |= WORD_NUMBER;
+   break;
+   case 20:
+   *(long long *)p = 1;
+   p += 8;
+   *(long long *)p = 2;
+   p += 8;
+   *(int *)p = WORD_NUMBER;
+   break;
+   case 24:
+   *(long long *)p = 1;
+   p += 8;
+   *(long long *)p = 2;
+   p += 8;
+   *(long long *)p = LONG_NUMBER;
+   break;
+   case 28:
+   *(long long *)p = BYTE_NUMBER;
+   p += 8;
+   *(long long *)p = SHORT_NUMBER;
+   p += 8;
+   *(long long *)p = LONG_NUMBER;
+   p += 8;
+   *(int *)p = WORD_NUMBER;
+   break;
+   case 32:
+   *(long long *)p = BYTE_NUMBER;
+   p += 8;
+   *(long long *)p = SHORT_NUMBER;
+   p += 8;
+   *(long long *)p = 3;
+   p += 8;
+   *(long long *)p = LONG_NUMBER;
+   break;
+   }
+
+   return 1;
+}
+
+static __init void store_return(struct ftrace_graph_ret *trace,
+   struct fgraph_ops *gops)
+{
+   const char *type = fgraph_store_type_name;
+   long long expect[4] = { 0, 0, 0, 0};
+   long long found[4] = { -1, 0, 0, 0};
+   char *p;
+   int i;
+
+   p = fgraph_retrieve_data();
+   if (!p) {
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"Failed to retrieve %s\n", type);
+   fgraph_error_str = fgraph_error_str_buf;
+   return;
+   }
+
+   switch (fgraph_store_size) {
+   case 1:
+   expect[0] = BYTE_NUMBER;
+   found[0] = *(char *)p;
+   break;
+   case 2:
+   expect[0] = SHORT_NUMBER;
+   found[0] = *(short *)p;
+   break;
+   case 4:
+

[GIT PULL] tracing: Small fixes to histogram code and header cleanup

2019-05-24 Thread Steven Rostedt




Linus,

Tom Zanussi sent me some small fixes and cleanups to the histogram
code and I forgot to incorporate them.

I also added a small clean up patch that was sent to me a while ago
and I just noticed it.


Please pull the latest trace-v5.2-rc1 tree, which can be found at:


  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
trace-v5.2-rc1

Tag SHA1: bceb0fd66744c3aa0cd8f3bba3e4b45ca38b3aaa
Head SHA1: 4eebe38a37f9397ffecd4bd3afbdf36838a97969


Jagadeesh Pagadala (1):
  kernel/trace/trace.h: Remove duplicate header of trace_seq.h

Tom Zanussi (3):
  tracing: Prevent hist_field_var_ref() from accessing NULL tracing_map_elts
  tracing: Check keys for variable references in expressions too
  tracing: Add a check_val() check before updating cond_snapshot() track_val


 kernel/trace/trace.h |  1 -
 kernel/trace/trace_events_hist.c | 13 +++--
 2 files changed, 11 insertions(+), 3 deletions(-)
---
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 1974ce818ddb..82c70b63d375 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #ifdef CONFIG_FTRACE_SYSCALLS
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 7fca3457c705..ca6b0dff60c5 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -59,7 +59,7 @@
C(NO_CLOSING_PAREN, "No closing paren found"),  \
C(SUBSYS_NOT_FOUND, "Missing subsystem"),   \
C(INVALID_SUBSYS_EVENT, "Invalid subsystem or event name"), \
-   C(INVALID_REF_KEY,  "Using variable references as keys not 
supported"), \
+   C(INVALID_REF_KEY,  "Using variable references in keys not 
supported"), \
C(VAR_NOT_FOUND,"Couldn't find variable"),  \
C(FIELD_NOT_FOUND,  "Couldn't find field"),
 
@@ -1854,6 +1854,9 @@ static u64 hist_field_var_ref(struct hist_field 
*hist_field,
struct hist_elt_data *elt_data;
u64 var_val = 0;
 
+   if (WARN_ON_ONCE(!elt))
+   return var_val;
+
elt_data = elt->private_data;
var_val = elt_data->var_ref_vals[hist_field->var_ref_idx];
 
@@ -3582,14 +3585,20 @@ static bool cond_snapshot_update(struct trace_array 
*tr, void *cond_data)
struct track_data *track_data = tr->cond_snapshot->cond_data;
struct hist_elt_data *elt_data, *track_elt_data;
struct snapshot_context *context = cond_data;
+   struct action_data *action;
u64 track_val;
 
if (!track_data)
return false;
 
+   action = track_data->action_data;
+
track_val = get_track_val(track_data->hist_data, context->elt,
  track_data->action_data);
 
+   if (!action->track_data.check_val(track_data->track_val, track_val))
+   return false;
+
track_data->track_val = track_val;
memcpy(track_data->key, context->key, track_data->key_len);
 
@@ -4503,7 +4512,7 @@ static int create_key_field(struct hist_trigger_data 
*hist_data,
goto out;
}
 
-   if (hist_field->flags & HIST_FIELD_FL_VAR_REF) {
+   if (field_has_hist_vars(hist_field, 0)) {
hist_err(tr, HIST_ERR_INVALID_REF_KEY, 
errpos(field_str));
destroy_hist_field(hist_field, 0);
ret = -EINVAL;

Re: [PATCH bpf] samples: bpf: add ibumad sample to .gitignore

2019-05-24 Thread Alexei Starovoitov

On Fri, May 24, 2019 at 12:59 PM Matteo Croce  wrote:
>
> This commit adds ibumad to .gitignore which is
> currently ommited from the ignore file.
>
> Signed-off-by: Matteo Croce 

Applied. Thanks

Re: [PATCH v4 bpf-next 0/4] cgroup bpf auto-detachment

2019-05-24 Thread Alexei Starovoitov

On Fri, May 24, 2019 at 4:52 PM Roman Gushchin  wrote:
>
> This patchset implements a cgroup bpf auto-detachment functionality:
> bpf programs are detached as soon as possible after removal of the
> cgroup, without waiting for the release of all associated resources.
>
> Patches 2 and 3 are required to implement a corresponding kselftest
> in patch 4.
>
> v4:
>   1) release cgroup bpf data using a workqueue
>   2) add test_cgroup_attach to .gitignore

There is a conflict in tools/testing/selftests/bpf/Makefile
Please rebase

Re: [v4 PATCH 2/2] mm: vmscan: correct some vmscan counters for

2019-05-24 Thread Yang Shi





On 5/24/19 2:00 PM, Yang Shi wrote:



On 5/24/19 1:51 PM, Hillf Danton wrote:

On Fri, 24 May 2019 09:27:02 +0800 Yang Shi wrote:

On 5/23/19 11:51 PM, Hillf Danton wrote:

On Thu, 23 May 2019 10:27:38 +0800 Yang Shi wrote:
@ -1642,14 +1650,14 @@ static unsigned long 
isolate_lru_pages(unsigned long nr_to_scan,

   unsigned long nr_zone_taken[MAX_NR_ZONES] = { 0 };
   unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
   unsigned long skipped = 0;
-    unsigned long scan, total_scan, nr_pages;
+    unsigned long scan, total_scan;
+    unsigned long nr_pages;

Change for no earn:)

Aha, yes.


   LIST_HEAD(pages_skipped);
   isolate_mode_t mode = (sc->may_unmap ? 0 : ISOLATE_UNMAPPED);
+    total_scan = 0;
   scan = 0;
-    for (total_scan = 0;
- scan < nr_to_scan && nr_taken < nr_to_scan && 
!list_empty(src);

- total_scan++) {
+    while (scan < nr_to_scan && !list_empty(src)) {
   struct page *page;
AFAICS scan currently prevents us from looping for ever, while 
nr_taken bails
us out once we get what's expected, so I doubt it makes much sense 
to cut

nr_taken off.

It is because "scan < nr_to_scan && nr_taken >= nr_to_scan" is
impossible now with the units fixed.


With the units fixed, nr_taken is no longer checked.


It is because scan would be always >= nr_taken.




   page = lru_to_page(src);
@@ -1657,9 +1665,12 @@ static unsigned long 
isolate_lru_pages(unsigned long nr_to_scan,

   VM_BUG_ON_PAGE(!PageLRU(page), page);
+    nr_pages = 1 << compound_order(page);
+    total_scan += nr_pages;
+
   if (page_zonenum(page) > sc->reclaim_idx) {
   list_move(>lru, _skipped);
-    nr_skipped[page_zonenum(page)]++;
+    nr_skipped[page_zonenum(page)] += nr_pages;
   continue;
   }
@@ -1669,10 +1680,9 @@ static unsigned long 
isolate_lru_pages(unsigned long nr_to_scan,
    * ineligible pages.  This causes the VM to not reclaim 
any

    * pages, triggering a premature OOM.
    */
-    scan++;
+    scan += nr_pages;

The comment looks to defy the change if we fail to add a huge page to
the dst list; otherwise nr_taken knows how to do the right thing. What
I prefer is to let scan to do one thing a time.

I don't get your point. Do you mean the comment "Do not count skipped
pages because that makes the function return with no isolated pages if
the LRU mostly contains ineligible pages."? I'm supposed the comment is
used to explain why not count skipped page.


Well consider the case where there is a huge page in the second place
reversely on the src list along with other 20 regular pages, and we are
not able to add the huge page to the dst list. Currently we can go on 
and
try to scan other pages, provided nr_to_scan is 32; with the units 
fixed,
however, scan goes over nr_to_scan, leaving us no chance to scan any 
page

that may be not busy. I wonder that triggers a premature OOM, because I
think scan means the number of list nodes we try to isolate, and
nr_taken the number of regular pages successfully isolated.


Yes, good point. I think I just need roll back to what v3 did here to 
get scan accounted for each case separately to avoid the possible 
over-account.


By rethinking the code, I think "scan" here still should mean the number 
of base pages. If the case you mentioned happens, the right behavior 
should be to raise priority to give another round of scan.


And, vmscan uses sync isolation (mode = (sc->may_unmap ? 0 : 
ISOLATE_UNMAPPED)), it returns -EBUSY only when the page is freed 
somewhere else, so this should not cause premature OOM.





   switch (__isolate_lru_page(page, mode)) {
   case 0:
-    nr_pages = hpage_nr_pages(page);
   nr_taken += nr_pages;
   nr_zone_taken[page_zonenum(page)] += nr_pages;
   list_move(>lru, dst);
--
1.8.3.1

Best Regards
Hillf

Re: [PATCH net] bonding/802.3ad: fix slave link initialization transition states

2019-05-24 Thread Jarod Wilson


On 5/24/19 5:16 PM, Jay Vosburgh wrote:

Jarod Wilson  wrote:


Once in a while, with just the right timing, 802.3ad slaves will fail to
properly initialize, winding up in a weird state, with a partner system
mac address of 00:00:00:00:00:00. This started happening after a fix to
properly track link_failure_count tracking, where an 802.3ad slave that
reported itself as link up in the miimon code, but wasn't able to get a
valid speed/duplex, started getting set to BOND_LINK_FAIL instead of
BOND_LINK_DOWN. That was the proper thing to do for the general "my link
went down" case, but has created a link initialization race that can put
the interface in this odd state.


Reading back in the git history, the ultimate cause of this
"weird state" appears to be devices that assert NETDEV_UP prior to
actually being able to supply sane speed/duplex values, correct?

Presuming that this is the case, I don't see that there's much
else to be done here, and so:

Acked-by: Jay Vosburgh 


Correct, we've got an miimon "device is up", but still can't get speed 
and/or duplex in this case.


--
Jarod Wilson
ja...@redhat.com

Re: [PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering

2019-05-24 Thread Frank Rowand

Hi Saranova,

I'll try to address the other portions of this email that I 
in my previous replies.


On 5/24/19 2:53 PM, Saravana Kannan wrote:
> On Fri, May 24, 2019 at 10:49 AM Frank Rowand  wrote:
>>
>> On 5/23/19 6:01 PM, Saravana Kannan wrote:
>>> Add a generic "depends-on" property that allows specifying mandatory
>>> functional dependencies between devices. Add device-links after the
>>> devices are created (but before they are probed) by looking at this
>>> "depends-on" property.
>>>
>>> This property is used instead of existing DT properties that specify
>>> phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
>>> is because not all resources referred to by existing DT properties are
>>> mandatory functional dependencies. Some devices/drivers might be able> to 
>>> operate with reduced functionality when some of the resources
>>> aren't available. For example, a device could operate in polling mode
>>> if no IRQ is available, a device could skip doing power management if
>>> clock or voltage control isn't available and they are left on, etc.
>>>
>>> So, adding mandatory functional dependency links between devices by
>>> looking at referred phandles in DT properties won't work as it would
>>> prevent probing devices that could be probed. By having an explicit
>>> depends-on property, we can handle these cases correctly.
>>
>> Trying to wrap my brain around the concept, this series seems to be
>> adding the ability to declare that an apparent dependency (eg an IRQ
>> specified by a phandle) is _not_ actually a dependency.
> 
> The current implementation completely ignores existing bindings for
> dependencies and so does the current tip of the kernel. So it's not
> really overriding anything. However, if I change the implementation so
> that depends-on becomes the source of truth if it exists and falls
> back to existing common bindings if "depends-on" isn't present -- then
> depends-on would truly be overriding existing bindings for
> dependencies. It depends on how we want to define the DT property.
> 
>> The phandle already implies the dependency.
> 
> Sure, it might imply, but it's not always true.
> 
>> Creating a separate
>> depends-on property provides a method of ignoring the implied
>> dependencies.
> 
> implied != true
> 
>> This is not just hardware description.  It is instead a combination
>> of hardware functionality and driver functionality.  An example
>> provided in the second paragraph of the email I am replying to
>> suggests a device could operate in polling mode if no IRQ is
>> available.  Using this example, the devicetree does not know
>> whether the driver requires the IRQ (currently an implied
>> dependency since the IRQ phandle exists).  My understanding
>> of this example is that the device node would _not_ have a
>> depends-on property for the IRQ phandle so the IRQ would be
>> optional.  But this is an attribute of the driver, not the
>> hardware.
> 

> Not really. The interrupt could be for "SD card plugged in". That's
> never a mandatory dependency for the SD card controller to work. So
> the IRQ provider won't be a "depends-on" in this case. But if there is
> no power supply or clock for the SD card controller, it isn't going to
> work -- so they'd be listed in the "depends-on". So, this is still
> defining the hardware and not the OS.

Please comment on my observation that was based on an IRQ for a device
will polling mode vs interrupt driven mode.  You described a different
case and did not address my comment.


>> This is also configuration, declaring whether the
>> system is willing to accept polling mode instead of interrupt
>> mode.
> 
> Whether the driver will choose to operate without the IRQ is up to it.
> The OS could also assume the power supply is never turned off and
> still try to use the device. Depending on the hardware configuration,
> that might or might not work.
> 
>> Devicetree is not the proper place for driver description or
>> for configuration.
> 
> But depends-on isn't describing the driver configuration though.
> 
> Overall, the clock provider example I gave in another reply is a much
> better example. If you just assume implied dependencies are mandatory
> dependencies, some devices will never be probe because the kernel is
> using them incorrectly (they aren't meant to list mandatory
> dependencies).
> 
>> Another flaw with this method is that existing device trees
>> will be broken after the kernel is modified, because existing
>> device trees do not have the depends-on property.  This breaks
>> the devicetree compatibility rules.
> 
> This is 100% not true with the current implementation. I actually
> tested this. This is fully backwards compatible. That's another reason
> for adding depends-on and going by just what it says. The existing
> bindings were never meant to describe only mandatory dependencies. So
> using them as such is what would break backwards compatibility.
> 
>>> Having functional dependencies

Re: [PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering

2019-05-24 Thread Saravana Kannan

On Thu, May 23, 2019 at 10:52 PM Greg Kroah-Hartman
 wrote:
>
> On Thu, May 23, 2019 at 06:01:11PM -0700, Saravana Kannan wrote:
> > Add a generic "depends-on" property that allows specifying mandatory
> > functional dependencies between devices. Add device-links after the
> > devices are created (but before they are probed) by looking at this
> > "depends-on" property.
> >
> > This property is used instead of existing DT properties that specify
> > phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> > is because not all resources referred to by existing DT properties are
> > mandatory functional dependencies. Some devices/drivers might be able
> > to operate with reduced functionality when some of the resources
> > aren't available. For example, a device could operate in polling mode
> > if no IRQ is available, a device could skip doing power management if
> > clock or voltage control isn't available and they are left on, etc.
> >
> > So, adding mandatory functional dependency links between devices by
> > looking at referred phandles in DT properties won't work as it would
> > prevent probing devices that could be probed. By having an explicit
> > depends-on property, we can handle these cases correctly.
> >
> > Having functional dependencies explicitly called out in DT and
> > automatically added before the devices are probed, provides the
> > following benefits:
> >
> > - Optimizes device probe order and avoids the useless work of
> >   attempting probes of devices that will not probe successfully
> >   (because their suppliers aren't present or haven't probed yet).
> >
> >   For example, in a commonly available mobile SoC, registering just
> >   one consumer device's driver at an initcall level earlier than the
> >   supplier device's driver causes 11 failed probe attempts before the
> >   consumer device probes successfully. This was with a kernel with all
> >   the drivers statically compiled in. This problem gets a lot worse if
> >   all the drivers are loaded as modules without direct symbol
> >   dependencies.
> >
> > - Supplier devices like clock providers, regulators providers, etc
> >   need to keep the resources they provide active and at a particular
> >   state(s) during boot up even if their current set of consumers don't
> >   request the resource to be active. This is because the rest of the
> >   consumers might not have probed yet and turning off the resource
> >   before all the consumers have probed could lead to a hang or
> >   undesired user experience.
> >
> >   Some frameworks (Eg: regulator) handle this today by turning off
> >   "unused" resources at late_initcall_sync and hoping all the devices
> >   have probed by then. This is not a valid assumption for systems with
> >   loadable modules. Other frameworks (Eg: clock) just don't handle
> >   this due to the lack of a clear signal for when they can turn off
> >   resources. This leads to downstream hacks to handle cases like this
> >   that can easily be solved in the upstream kernel.
> >
> >   By linking devices before they are probed, we give suppliers a clear
> >   count of the number of dependent consumers. Once all of the
> >   consumers are active, the suppliers can turn off the unused
> >   resources without making assumptions about the number of consumers.
> >
> > By default we just add device-links to track "driver presence" (probe
> > succeeded) of the supplier device. If any other functionality provided
> > by device-links are needed, it is left to the consumer/supplier
> > devices to change the link when they probe.
>
> Somewhere in this wall of text you need to say:
> MAKES DEVICES BOOT FASTER!
> right?  :)

I'm sure it will, but I can't easily test and measure this number
because I don't have a device with 100s of devices (common in mobile
SoCs) where I can load all the drivers as modules and are supported
upstream. And the current ones I have mostly workaround this in their
downstream tree by manually ordering with initcalls and link order.
But I see the avoidance of useless probes that'll fail as more of a
free side benefit and not the main goal of this patch series. Getting
modules to actually work and crash the system while booting is the
main goal.

> So in short, this solves the issue of deferred probing with systems with
> loads of modules for platform devices and device tree, in that now you
> have a chance to probe devices in the correct order saving loads of busy
> loops.

Yes, definitely saves loads of busy work.

> A good thing, I like this, very nice work, all of these are:
> Reviewed-by: Greg Kroah-Hartman 

Thanks!

> but odds are I'll take this through my tree, so I'll add my s-o-b then.
> But only after the DT people agree on the new entry.

Yup! Trying to do that. :)

-Saravana

Re: [A General Question] What should I do after getting Reviewed-by from a maintainer?

2019-05-24 Thread Gen Zhang

On Fri, May 24, 2019 at 04:21:36PM -0700, Randy Dunlap wrote:
> On 5/22/19 6:17 PM, Gen Zhang wrote:
> > Hi Andrew,
> > I am starting submitting patches these days and got some patches 
> > "Reviewed-by" from maintainers. After checking the 
> > submitting-patches.html, I figured out what "Reviewed-by" means. But I
> > didn't get the guidance on what to do after getting "Reviewed-by".
> > Am I supposed to send this patch to more maintainers? Or something else?
> > Thanks
> > Gen
> > 
> 
> [Yes, I am not Andrew. ;]
> 
> Patches should be sent to a maintainer who is responsible for merging
> changes for the driver or $arch or subsystem.
> And they should also be Cc-ed to the appropriate mailing list(s) and
> source code author(s), usually [unless they are no longer active].
> 
> Some source files have author email addresses in them.
> Or in a kernel git tree, you can use "git log path/to/source/file.c" to see
> who has been making & merging patches to that file.c.
> Probably the easiest thing to do is run ./scripts/get_maintainer.pl and
> it will try to tell you who to send the patch to.
> 
> HTH.
> -- 
> ~Randy
Thanks for your patient instructions, Randy! I alrady figured it out.

Thanks
Gen

Re: [PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering

2019-05-24 Thread Saravana Kannan

Ugh... mobile app is sending HTML emails. Replying again.

On Fri, May 24, 2019 at 5:25 PM Frank Rowand  wrote:
>
> On 5/24/19 5:22 PM, Frank Rowand wrote:
> > On 5/24/19 2:53 PM, Saravana Kannan wrote:
> >> On Fri, May 24, 2019 at 10:49 AM Frank Rowand  
> >> wrote:
> >>>
> >>> On 5/23/19 6:01 PM, Saravana Kannan wrote:
> >
> > < snip >
> >
> >>> Another flaw with this method is that existing device trees
> >>> will be broken after the kernel is modified, because existing
> >>> device trees do not have the depends-on property.  This breaks
> >>> the devicetree compatibility rules.
> >>
> >> This is 100% not true with the current implementation. I actually
> >> tested this. This is fully backwards compatible. That's another reason
> >> for adding depends-on and going by just what it says. The existing
> >> bindings were never meant to describe only mandatory dependencies. So
> >> using them as such is what would break backwards compatibility.
> >
> > Are you saying that an existing, already compiled, devicetree (an FDT)
> > can be used to boot a new kernel that has implemented this patch set?
> >
> > The new kernel will boot with the existing FDT that does not have
> > any depends-on properties?

You sent out a lot of emails on this topic. But to answer them all.
The existing implementation is 100% backwards compatible.

> I overlooked something you said in the email I replied to.  You said:
>
>"that depends-on becomes the source of truth if it exists and falls
>back to existing common bindings if "depends-on" isn't present"

This is referring to an alternate implementation where implicit
dependencies are used by the kernel but new "depends-on" property
would allow overriding in cases where the implicit dependencies are
wrong. But this will need a kernel command line flag to enable this
feature so that we can be backwards compatible. Otherwise it won't be.

> Let me go back to look at the patch series to see how it falls back
> to the existing bindings.

Current patch series doesn't really "fallback" but rather it only acts
on this new property. Existing FDT binaries simply don't have this. So
it won't have any impact on the kernel behavior. But yes, looking at
the patches again will help :)

-Saravana

[devm_kfree() usage] When should devm_kfree() be used?

2019-05-24 Thread Gen Zhang

devm_kmalloc() is used to allocate memory for a driver dev. Comments
above the definition and doc 
(https://www.kernel.org/doc/Documentation/driver-model/devres.txt) all
imply that allocated the memory is automatically freed on driver attach,
no matter allocation fail or not. However, I examined the code, and
there are many sites that devm_kfree() is used to free devm_kmalloc().
e.g. hisi_sas_debugfs_init() in drivers/scsi/hisi_sas/hisi_sas_main.c.
So I am totally confused about this issue. Can anybody give me some
guidance? When should we use devm_kfree()?

Thanks
Gen

[PATCH v2 0/1] infiniband/mm: convert put_page() to put_user_page*()

2019-05-24 Thread john . hubbard

From: John Hubbard 

Hi Jason and all,

I've added Jerome's and Ira's Reviewed-by tags. Other than that, this patch
is the same as v1.

==
Earlier cover letter:

IIUC, now that we have the put_user_pages() merged in to linux.git, we can
start sending up the callsite conversions via different subsystem
maintainer trees. Here's one for linux-rdma.

I've left the various Reviewed-by: and Tested-by: tags on here, even
though it's been through a few rebases.

If anyone has hardware, it would be good to get a real test of this.

thanks,
--
John Hubbard
NVIDIA

Cc: Doug Ledford 
Cc: Jason Gunthorpe 
Cc: Mike Marciniszyn 
Cc: Dennis Dalessandro 
Cc: Christian Benvenuti 
Cc: Jan Kara 
Cc: Jason Gunthorpe 
Cc: Ira Weiny 
Cc: Jérôme Glisse 

John Hubbard (1):
  infiniband/mm: convert put_page() to put_user_page*()

 drivers/infiniband/core/umem.c  |  7 ---
 drivers/infiniband/core/umem_odp.c  | 10 +-
 drivers/infiniband/hw/hfi1/user_pages.c | 11 ---
 drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +++---
 drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ---
 drivers/infiniband/hw/qib/qib_user_sdma.c   |  6 +++---
 drivers/infiniband/hw/usnic/usnic_uiom.c|  7 ---
 7 files changed, 27 insertions(+), 31 deletions(-)

-- 
2.21.0

[PATCH v2] infiniband/mm: convert put_page() to put_user_page*()

2019-05-24 Thread john . hubbard

From: John Hubbard 

For infiniband code that retains pages via get_user_pages*(),
release those pages via the new put_user_page(), or
put_user_pages*(), instead of put_page()

This is a tiny part of the second step of fixing the problem described
in [1]. The steps are:

1) Provide put_user_page*() routines, intended to be used
   for releasing pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to
   invoke put_user_page*(), instead of put_page(). This involves dozens of
   call sites, and will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
   implement tracking of these pages. This tracking will be separate from
   the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
   special handling (especially in writeback paths) when the pages are
   backed by a filesystem. Again, [1] provides details as to why that is
   desirable.

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

Cc: Doug Ledford 
Cc: Jason Gunthorpe 
Cc: Mike Marciniszyn 
Cc: Dennis Dalessandro 
Cc: Christian Benvenuti 

Reviewed-by: Jan Kara 
Reviewed-by: Dennis Dalessandro 
Reviewed-by: Ira Weiny 
Reviewed-by: Jérôme Glisse 
Acked-by: Jason Gunthorpe 
Tested-by: Ira Weiny 
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c  |  7 ---
 drivers/infiniband/core/umem_odp.c  | 10 +-
 drivers/infiniband/hw/hfi1/user_pages.c | 11 ---
 drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +++---
 drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ---
 drivers/infiniband/hw/qib/qib_user_sdma.c   |  6 +++---
 drivers/infiniband/hw/usnic/usnic_uiom.c|  7 ---
 7 files changed, 27 insertions(+), 31 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index e7ea819fcb11..673f0d240b3e 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
 
for_each_sg_page(umem->sg_head.sgl, _iter, umem->sg_nents, 0) {
page = sg_page_iter_page(_iter);
-   if (!PageDirty(page) && umem->writable && dirty)
-   set_page_dirty_lock(page);
-   put_page(page);
+   if (umem->writable && dirty)
+   put_user_pages_dirty_lock(, 1);
+   else
+   put_user_page(page);
}
 
sg_free_table(>sg_head);
diff --git a/drivers/infiniband/core/umem_odp.c 
b/drivers/infiniband/core/umem_odp.c
index f962b5bbfa40..17e46df3990a 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -487,7 +487,7 @@ void ib_umem_odp_release(struct ib_umem_odp *umem_odp)
  * The function returns -EFAULT if the DMA mapping operation fails. It returns
  * -EAGAIN if a concurrent invalidation prevents us from updating the page.
  *
- * The page is released via put_page even if the operation failed. For
+ * The page is released via put_user_page even if the operation failed. For
  * on-demand pinning, the page is released whenever it isn't stored in the
  * umem.
  */
@@ -536,7 +536,7 @@ static int ib_umem_odp_map_dma_single_page(
}
 
 out:
-   put_page(page);
+   put_user_page(page);
 
if (remove_existing_mapping) {
ib_umem_notifier_start_account(umem_odp);
@@ -659,7 +659,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, 
u64 user_virt,
ret = -EFAULT;
break;
}
-   put_page(local_page_list[j]);
+   put_user_page(local_page_list[j]);
continue;
}
 
@@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, 
u64 user_virt,
 * ib_umem_odp_map_dma_single_page().
 */
if (npages - (j + 1) > 0)
-   release_pages(_page_list[j+1],
- npages - (j + 1));
+   put_user_pages(_page_list[j+1],
+  npages - (j + 1));
break;
}
}
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index 02eee8eff1db..b89a9b9aef7a 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -118,13 +118,10 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, 
unsigned long vaddr, size_t np
 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 size_t npages, bool dirty)
 {
-   size_t i;
-
-   for

Re: 答复: 答复: 答复: [PATCH] input: alps-fix the issue the special alps trackpoint do not work.

2019-05-24 Thread Hui Wang




On 2019/5/24 下午6:58, Peter Hutterer wrote:

On Fri, May 24, 2019 at 06:43:58PM +0800, Hui Wang wrote:

On 2019/5/24 下午5:37, Peter Hutterer wrote:

On Fri, May 24, 2019 at 03:37:57PM +0800, Hui Wang wrote:



OK, that is sth we need to do.  But anyway it is a bit risky to backport
that much code and the whole folder of quirks to libinput 1.10.4,  we need
to do lots of test to make sure there is no regression on other machines.

Probably we only need to keep the quirks/30-vendor-alps.quirks to 1.10.4 and
drop other quirks, that will be better for our testing effort.

might be worth looking at what is in 1.10.7, e.g.  a3b3e85c0e looks like it
may be of interest. That one suggests the range on some ALPS devices is over
100, so testing with 5-25 may really not have had any effect.


Oh, looks exactly the same as our issue, will have a try with it.

Thanks,

Hui.




Cheers,
Peter

[PATCH v2 2/2] MAINTAINERS: add entry for ad7780 adc driver

2019-05-24 Thread Renato Lui Geh


This patch adds a MAINTAINERS entry for the AD7780 ADC driver.

Signed-off-by: Renato Lui Geh 
---
MAINTAINERS | 9 +
1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 54c8e14fae98..d12685c5b09a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -880,6 +880,15 @@ S: Supported
F:  drivers/iio/adc/ad7768-1.c
F:  Documentation/devicetree/bindings/iio/adc/adi,ad7768-1.txt

+ANALOG DEVICES INC AD7780 DRIVER
+M: Michael Hennerich 
+M: Renato Lui Geh 
+L: linux-...@vger.kernel.org
+W: http://ez.analog.com/community/linux-device-drivers
+S: Supported
+F: drivers/iio/adc/ad7780.c
+F: Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml
+
ANALOG DEVICES INC AD9389B DRIVER
M:  Hans Verkuil 
L:  linux-me...@vger.kernel.org
--
2.21.0

[PATCH v2 0/2] dt-bindings: iio: adc: add ad7780 yaml and MAINTAINERS entry

2019-05-24 Thread Renato Lui Geh


This patchset converts the old ad7780 device-tree binding to
the new YAML format, and adds an entry to MAINTAINERS.

Renato Lui Geh (2):
 dt-bindings: iio: adc: add adi,ad7780.yaml binding
 MAINTAINERS: add entry for ad7780 adc driver

.../bindings/iio/adc/adi,ad7780.txt   | 48 --
.../bindings/iio/adc/adi,ad7780.yaml  | 87 +++
MAINTAINERS   |  9 ++
3 files changed, 96 insertions(+), 48 deletions(-)
delete mode 100644 Documentation/devicetree/bindings/iio/adc/adi,ad7780.txt
create mode 100644 Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml

--
2.21.0

[PATCH v2 1/2] dt-bindings: iio: adc: add adi,ad7780.yaml binding

2019-05-24 Thread Renato Lui Geh


This patch adds a YAML binding for the Analog Devices AD7780/1 and
AD7170/1 analog-to-digital converters.

Signed-off-by: Renato Lui Geh 
---
Changes in v2:
- vref-supply to avdd-supply
- remove avdd-supply from required list
- include adc block in an spi block

.../bindings/iio/adc/adi,ad7780.txt   | 48 --
.../bindings/iio/adc/adi,ad7780.yaml  | 87 +++
2 files changed, 87 insertions(+), 48 deletions(-)
delete mode 100644 Documentation/devicetree/bindings/iio/adc/adi,ad7780.txt
create mode 100644 Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml

diff --git a/Documentation/devicetree/bindings/iio/adc/adi,ad7780.txt 
b/Documentation/devicetree/bindings/iio/adc/adi,ad7780.txt
deleted file mode 100644
index 440e52555349..
--- a/Documentation/devicetree/bindings/iio/adc/adi,ad7780.txt
+++ /dev/null
@@ -1,48 +0,0 @@
-* Analog Devices AD7170/AD7171/AD7780/AD7781
-
-Data sheets:
-
-- AD7170:
-   * 
https://www.analog.com/media/en/technical-documentation/data-sheets/AD7170.pdf
-- AD7171:
-   * 
https://www.analog.com/media/en/technical-documentation/data-sheets/AD7171.pdf
-- AD7780:
-   * 
https://www.analog.com/media/en/technical-documentation/data-sheets/ad7780.pdf
-- AD7781:
-   * 
https://www.analog.com/media/en/technical-documentation/data-sheets/AD7781.pdf
-
-Required properties:
-
-- compatible: should be one of
-   * "adi,ad7170"
-   * "adi,ad7171"
-   * "adi,ad7780"
-   * "adi,ad7781"
-- reg: spi chip select number for the device
-- vref-supply: the regulator supply for the ADC reference voltage
-
-Optional properties:
-
-- powerdown-gpios:  must be the device tree identifier of the PDRST pin. If
-   specified, it will be asserted during driver probe. As the
-   line is active high, it should be marked GPIO_ACTIVE_HIGH.
-- adi,gain-gpios:   must be the device tree identifier of the GAIN pin. Only 
for
-   the ad778x chips. If specified, it will be asserted during
-   driver probe. As the line is active low, it should be marked
-   GPIO_ACTIVE_LOW.
-- adi,filter-gpios: must be the device tree identifier of the FILTER pin. Only
-   for the ad778x chips. If specified, it will be asserted
-   during driver probe. As the line is active low, it should be
-   marked GPIO_ACTIVE_LOW.
-
-Example:
-
-adc@0 {
-   compatible =  "adi,ad7780";
-   reg = <0>;
-   vref-supply = <_supply>
-
-   powerdown-gpios  = < 12 GPIO_ACTIVE_HIGH>;
-   adi,gain-gpios   = <  5 GPIO_ACTIVE_LOW>;
-   adi,filter-gpios = < 15 GPIO_ACTIVE_LOW>;
-};
diff --git a/Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml 
b/Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml
new file mode 100644
index ..d1109416963c
--- /dev/null
+++ b/Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml
@@ -0,0 +1,87 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iio/adc/adi,ad7780.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Analog Devices AD7170/AD7171/AD7780/AD7781 analog to digital converters
+
+maintainers:
+  - Michael Hennerich 
+
+description: |
+  The ad7780 is a sigma-delta analog to digital converter. This driver provides
+  reading voltage values and status bits from both the ad778x and ad717x 
series.
+  Its interface also allows writing on the FILTER and GAIN GPIO pins on the
+  ad778x.
+
+  Specifications on the converters can be found at:
+AD7170:
+  
https://www.analog.com/media/en/technical-documentation/data-sheets/AD7170.pdf
+AD7171:
+  
https://www.analog.com/media/en/technical-documentation/data-sheets/AD7171.pdf
+AD7780:
+  
https://www.analog.com/media/en/technical-documentation/data-sheets/ad7780.pdf
+AD7781:
+  
https://www.analog.com/media/en/technical-documentation/data-sheets/AD7781.pdf
+
+properties:
+  compatible:
+enum:
+  - adi,ad7170
+  - adi,ad7171
+  - adi,ad7780
+  - adi,ad7781
+
+  reg:
+maxItems: 1
+
+  avdd-supply:
+description:
+  The regulator supply for the ADC reference voltage.
+maxItems: 1
+
+  powerdown-gpios:
+description:
+  Must be the device tree identifier of the PDRST pin. If
+  specified, it will be asserted during driver probe. As the
+  line is active high, it should be marked GPIO_ACTIVE_HIGH.
+maxItems: 1
+
+  adi,gain-gpios:
+description:
+  Must be the device tree identifier of the GAIN pin. Only for
+  the ad778x chips. If specified, it will be asserted during
+  driver probe. As the line is active low, it should be marked
+  GPIO_ACTIVE_LOW.
+maxItems: 1
+
+  adi,filter-gpios:
+description:
+  Must be the device tree identifier of the FILTER pin. Only
+  for the ad778x chips. If specified, it will be asserted
+  during driver

Re: [PATCH net-next v2 2/2] net: phy: sfp: enable i2c-bus detection on ACPI based systems

2019-05-24 Thread Andrew Lunn

On Fri, May 24, 2019 at 05:53:02PM -0700, Ruslan Babayev wrote:
> Lookup I2C adapter using the "i2c-bus" device property on ACPI based
> systems similar to how it's done with DT.
> 
> An example DSD describing an SFP on an ACPI based system:
> 
> Device (SFP0)
> {
> Name (_HID, "PRP0001")
> Name (_CRS, ResourceTemplate()
> {
> GpioIo(Exclusive, PullDefault, 0, 0, IoRestrictionNone,
>"\\_SB.PCI0.RP01.GPIO", 0, ResourceConsumer)
> { 0, 1, 2, 3, 4 }
> })
> Name (_DSD, Package ()
> {
> ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> Package () {
> Package () { "compatible", "sff,sfp" },
> Package () { "i2c-bus", \_SB.PCI0.RP01.I2C.MUX.CH0 },
> Package () { "maximum-power-milliwatt", 1000 },
> Package () { "tx-disable-gpios", Package () { ^SFP0, 0, 0, 1} },
> Package () { "reset-gpio",   Package () { ^SFP0, 0, 1, 1} },
> Package () { "mod-def0-gpios",   Package () { ^SFP0, 0, 2, 1} },
> Package () { "tx-fault-gpios",   Package () { ^SFP0, 0, 3, 0} },
> Package () { "los-gpios",Package () { ^SFP0, 0, 4, 1} },
> },
> })
> }
> 
> Device (PHY0)
> {
> Name (_HID, "PRP0001")
> Name (_DSD, Package ()
> {
> ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> Package () {
> Package () { "compatible", "ethernet-phy-ieee802.3-c45" },
> Package () { "sfp", \_SB.PCI0.RP01.SFP0 },
> Package () { "managed", "in-band-status" },
> Package () { "phy-mode", "sgmii" },
> },
> })
> }
> 
> Signed-off-by: Ruslan Babayev 
> Cc: xe-linux-exter...@cisco.com

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH 1/2] open: add close_range()

2019-05-24 Thread Michael Tirado

What I do in ring=non-supervisor is close all fd's while
checking against an array of exemptions. if /proc is not
mounted I close RLIMIT_NOFILE, if that fails I use a dumb
loop to close everything(slooow). This new system call could
significantly increase the fallback code, but If you use a
range then you may have to call this in batches, depending
on the fd number sequence?

Here's what it looks like in practice:

   int exempt[] = { STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO };
   if (close_descriptors(exempt, 3))
  return -1;

On Tue, May 21, 2019 at 11:41 AM Christian Brauner  wrote:
>
> This adds the close_range() syscall. It allows to efficiently close a range
> of file descriptors up to all file descriptors of a calling task.
>
> The syscall came up in a recent discussion around the new mount API and
> making new file descriptor types cloexec by default. During this
> discussion, Al suggested the close_range() syscall (cf. [1]). Note, a
> syscall in this manner has been requested by various people over time.
>
> First, it helps to close all file descriptors of an exec()ing task. This
> can be done safely via (quoting Al's example from [1] verbatim):
>
> /* that exec is sensitive */
> unshare(CLONE_FILES);
> /* we don't want anything past stderr here */
> close_range(3, ~0U);
> execve();
>
> The code snippet above is one way of working around the problem that file
> descriptors are not cloexec by default. This is aggravated by the fact that
> we can't just switch them over without massively regressing userspace. For
> a whole class of programs having an in-kernel method of closing all file
> descriptors is very helpful (e.g. demons, service managers, programming
> language standard libraries, container managers etc.).
> (Please note, unshare(CLONE_FILES) should only be needed if the calling
>  task is multi-threaded and shares the file descriptor table with another
>  thread in which case two threads could race with one thread allocating
>  file descriptors and the other one closing them via close_range(). For the
>  general case close_range() before the execve() is sufficient.)
>
> Second, it allows userspace to avoid implementing closing all file
> descriptors by parsing through /proc//fd/* and calling close() on each
> file descriptor. From looking at various large(ish) userspace code bases
> this or similar patterns are very common in:
> - service managers (cf. [4])
> - libcs (cf. [6])
> - container runtimes (cf. [5])
> - programming language runtimes/standard libraries
>   - Python (cf. [2])
>   - Rust (cf. [7], [8])
> As Dmitry pointed out there's even a long-standing glibc bug about missing
> kernel support for this task (cf. [3]).
> In addition, the syscall will also work for tasks that do not have procfs
> mounted and on kernels that do not have procfs support compiled in. In such
> situations the only way to make sure that all file descriptors are closed
> is to call close() on each file descriptor up to UINT_MAX or RLIMIT_NOFILE,
> OPEN_MAX trickery (cf. comment [8] on Rust).
>
> The performance is striking. For good measure, comparing the following
> simple close_all_fds() userspace implementation that is essentially just
> glibc's version in [6]:
>
> static int close_all_fds(void)
> {
> DIR *dir;
> struct dirent *direntp;
>
> dir = opendir("/proc/self/fd");
> if (!dir)
> return -1;
>
> while ((direntp = readdir(dir))) {
> int fd;
> if (strcmp(direntp->d_name, ".") == 0)
> continue;
> if (strcmp(direntp->d_name, "..") == 0)
> continue;
> fd = atoi(direntp->d_name);
> if (fd == 0 || fd == 1 || fd == 2)
> continue;
> close(fd);
> }
>
> closedir(dir); /* cannot fail */
> return 0;
> }
>
> to close_range() yields:
> 1. closing 4 open files:
>- close_all_fds(): ~280 us
>- close_range():~24 us
>
> 2. closing 1000 open files:
>- close_all_fds(): ~5000 us
>- close_range():   ~800 us
>
> close_range() is designed to allow for some flexibility. Specifically, it
> does not simply always close all open file descriptors of a task. Instead,
> callers can specify an upper bound.
> This is e.g. useful for scenarios where specific file descriptors are
> created with well-known numbers that are supposed to be excluded from
> getting closed.
> For extra paranoia close_range() comes with a flags argument. This can e.g.
> be used to implement extension. Once can imagine userspace wanting to stop
> at the first error instead of ignoring errors under certain circumstances.
> There might be other valid ideas in the future. In any case, a flag
> argument doesn't hurt and keeps us on the safe side.
>
> From an implementation side this is kept rather dumb. It saw some input
> from David and Jann but all

clk/clk-next boot bisection: v5.2-rc1-4-gf191a146bcee on meson-g12a-x96-max

2019-05-24 Thread kernelci.org bot

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* This automated bisection report was sent to you on the basis  *
* that you may be involved with the breaking commit it has  *
* found.  No manual investigation has been done to verify it,   *
* and the root cause of the problem may be somewhere else.  *
* Hope this helps!  *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

clk/clk-next boot bisection: v5.2-rc1-4-gf191a146bcee on meson-g12a-x96-max

Summary:
  Start:  f191a146bcee Merge branch 'clk-fixes' into clk-next
  Details:https://kernelci.org/boot/id/5ce8391259b514c80a7a362c
  Plain log:  
https://storage.kernelci.org//clk/clk-next/v5.2-rc1-4-gf191a146bcee/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-baylibre/boot-meson-g12a-x96-max.txt
  HTML log:   
https://storage.kernelci.org//clk/clk-next/v5.2-rc1-4-gf191a146bcee/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-baylibre/boot-meson-g12a-x96-max.html
  Result: 11a7bea17c9e arm64: dts: meson: g12a: add pinctrl support 
controllers

Checks:
  revert: PASS
  verify: PASS

Parameters:
  Tree:   clk
  URL:https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git
  Branch: clk-next
  Target: meson-g12a-x96-max
  CPU arch:   arm64
  Lab:lab-baylibre
  Compiler:   gcc-8
  Config: defconfig+CONFIG_RANDOMIZE_BASE=y
  Test suite: boot

Breaking commit found:

---
commit 11a7bea17c9e0a36daab934d83e15a760f402147
Author: Jerome Brunet 
Date:   Mon Mar 18 10:58:45 2019 +0100

arm64: dts: meson: g12a: add pinctrl support controllers

Add the peripheral and always-on pinctrl controllers to the g12a soc.

Signed-off-by: Jerome Brunet 
Signed-off-by: Neil Armstrong 
Signed-off-by: Kevin Hilman 

diff --git a/arch/arm64/boot/dts/amlogic/meson-g12a.dtsi 
b/arch/arm64/boot/dts/amlogic/meson-g12a.dtsi
index abfa167751af..5e07e4ca3f4b 100644
--- a/arch/arm64/boot/dts/amlogic/meson-g12a.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-g12a.dtsi
@@ -104,6 +104,29 @@
#address-cells = <2>;
#size-cells = <2>;
ranges = <0x0 0x0 0x0 0x34400 0x0 0x400>;
+
+   periphs_pinctrl: pinctrl@40 {
+   compatible = 
"amlogic,meson-g12a-periphs-pinctrl";
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   gpio: bank@40 {
+   reg = <0x0 0x40  0x0 0x4c>,
+ <0x0 0xe8  0x0 0x18>,
+ <0x0 0x120 0x0 0x18>,
+ <0x0 0x2c0 0x0 0x40>,
+ <0x0 0x340 0x0 0x1c>;
+   reg-names = "gpio",
+   "pull",
+   "pull-enable",
+   "mux",
+   "ds";
+   gpio-controller;
+   #gpio-cells = <2>;
+   gpio-ranges = <_pinctrl 
0 0 86>;
+   };
+   };
};
 
hiu: bus@3c000 {
@@ -150,6 +173,25 @@
clocks = <>, < CLKID_CLK81>;
clock-names = "xtal", "mpeg-clk";
};
+
+   ao_pinctrl: pinctrl@14 {
+   compatible = 
"amlogic,meson-g12a-aobus-pinctrl";
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   gpio_ao: bank@14 {
+   reg = <0x0 0x14 0x0 0x8>,
+ <0x0 0x1c 0x0 0x8>,
+ <0x0 0x24 0x0 0x14>;
+   reg-names = "mux",
+   "ds",
+   "gpio";
+   gpio-controller;
+   #gpio-cells = <2>;
+

Re: [PATCH] printk: Monitor change of console loglevel.

2019-05-24 Thread Joe Perches

On Sat, 2019-05-25 at 09:14 +0900, Tetsuo Handa wrote:
> On 2019/05/25 2:17, Linus Torvalds wrote:
> > A config option or two that help syzbot doesn't sound like a bad idea to me.
> 
> Thanks for suggestion. I think that #ifdef'ing
> 
>   static bool suppress_message_printing(int level)
>   {
>   return (level >= console_loglevel && !ignore_loglevel);
>   }
> 
> is simpler.
[]
> On 2019/05/25 2:55, Linus Torvalds wrote:
> > On Fri, May 24, 2019 at 10:41 AM Joe Perches  wrote:
> > > That could also help eliminate unnecessary pr_ output
> > > from object code.
> > 
> > Indeed. The small-config people might like it (if they haven't already
> > given up..)
> 
> Do you mean doing e.g.
> 
>   #define pr_debug(fmt, ...) no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
> 
> depending on the minimal console loglevel kernel config option? Then, OK.

Yes.

Perhaps something like the below (or an equivalent generic wrapper)

#define pr_info(fmt, ...) \
do { \
if (CONFIG_STATIC_CONSOLE_LEVEL >= LOGLEVEL_INFO) \
printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__); \
else \
no_printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__); \
} while (0)

for each pr_, dev_ and netdev_

Re: [PATCH 2/2] powerpc/perf: Fix mmcra corruption by bhrb_filter

2019-05-24 Thread Michael Ellerman

On Sat, 2019-05-11 at 02:42:17 UTC, Ravi Bangoria wrote:
> Consider a scenario where user creates two events:
> 
>   1st event:
> attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
> attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY;
> fd = perf_event_open(attr, 0, 1, -1, 0);
> 
>   This sets cpuhw->bhrb_filter to 0 and returns valid fd.
> 
>   2nd event:
> attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
> attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL;
> fd = perf_event_open(attr, 0, 1, -1, 0);
> 
>   It overrides cpuhw->bhrb_filter to -1 and returns with error.
> 
> Now if power_pmu_enable() gets called by any path other than
> power_pmu_add(), ppmu->config_bhrb(-1) will set mmcra to -1.
> 
> Signed-off-by: Ravi Bangoria 
> Reviewed-by: Madhavan Srinivasan 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/3202e35ec1c8fc19cea24253ff83edf7

cheers

Re: [GIT PULL] SCSI fixes for 5.2-rc1

2019-05-24 Thread pr-tracker-bot

The pull request you sent on Fri, 24 May 2019 16:11:43 -0700:

> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/2409207a73cc8e4aff75ceccf6fe5c3ce4d391bc

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

[PATCH net-next v2 2/2] net: phy: sfp: enable i2c-bus detection on ACPI based systems

2019-05-24 Thread Ruslan Babayev

Lookup I2C adapter using the "i2c-bus" device property on ACPI based
systems similar to how it's done with DT.

An example DSD describing an SFP on an ACPI based system:

Device (SFP0)
{
Name (_HID, "PRP0001")
Name (_CRS, ResourceTemplate()
{
GpioIo(Exclusive, PullDefault, 0, 0, IoRestrictionNone,
   "\\_SB.PCI0.RP01.GPIO", 0, ResourceConsumer)
{ 0, 1, 2, 3, 4 }
})
Name (_DSD, Package ()
{
ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () {
Package () { "compatible", "sff,sfp" },
Package () { "i2c-bus", \_SB.PCI0.RP01.I2C.MUX.CH0 },
Package () { "maximum-power-milliwatt", 1000 },
Package () { "tx-disable-gpios", Package () { ^SFP0, 0, 0, 1} },
Package () { "reset-gpio",   Package () { ^SFP0, 0, 1, 1} },
Package () { "mod-def0-gpios",   Package () { ^SFP0, 0, 2, 1} },
Package () { "tx-fault-gpios",   Package () { ^SFP0, 0, 3, 0} },
Package () { "los-gpios",Package () { ^SFP0, 0, 4, 1} },
},
})
}

Device (PHY0)
{
Name (_HID, "PRP0001")
Name (_DSD, Package ()
{
ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () {
Package () { "compatible", "ethernet-phy-ieee802.3-c45" },
Package () { "sfp", \_SB.PCI0.RP01.SFP0 },
Package () { "managed", "in-band-status" },
Package () { "phy-mode", "sgmii" },
},
})
}

Signed-off-by: Ruslan Babayev 
Cc: xe-linux-exter...@cisco.com
---
 drivers/net/phy/sfp.c | 33 +
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index d4635c2178d1..7a6c8df8899b 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1783,6 +1784,7 @@ static int sfp_probe(struct platform_device *pdev)
 {
const struct sff_data *sff;
struct sfp *sfp;
+   struct i2c_adapter *i2c = NULL;
bool poll = false;
int irq, err, i;
 
@@ -1801,7 +1803,6 @@ static int sfp_probe(struct platform_device *pdev)
if (pdev->dev.of_node) {
struct device_node *node = pdev->dev.of_node;
const struct of_device_id *id;
-   struct i2c_adapter *i2c;
struct device_node *np;
 
id = of_match_node(sfp_of_match, node);
@@ -1818,14 +1819,30 @@ static int sfp_probe(struct platform_device *pdev)
 
i2c = of_find_i2c_adapter_by_node(np);
of_node_put(np);
-   if (!i2c)
-   return -EPROBE_DEFER;
-
-   err = sfp_i2c_configure(sfp, i2c);
-   if (err < 0) {
-   i2c_put_adapter(i2c);
-   return err;
+   } else if (ACPI_COMPANION(>dev)) {
+   struct acpi_device *adev = ACPI_COMPANION(>dev);
+   struct fwnode_handle *fw = acpi_fwnode_handle(adev);
+   struct fwnode_reference_args args;
+   struct acpi_handle *acpi_handle;
+   int ret;
+
+   ret = acpi_node_get_property_reference(fw, "i2c-bus", 0, );
+   if (ACPI_FAILURE(ret) || !is_acpi_device_node(args.fwnode)) {
+   dev_err(>dev, "missing 'i2c-bus' property\n");
+   return -ENODEV;
}
+
+   acpi_handle = ACPI_HANDLE_FWNODE(args.fwnode);
+   i2c = i2c_acpi_find_adapter_by_handle(acpi_handle);
+   }
+
+   if (!i2c)
+   return -EPROBE_DEFER;
+
+   err = sfp_i2c_configure(sfp, i2c);
+   if (err < 0) {
+   i2c_put_adapter(i2c);
+   return err;
}
 
for (i = 0; i < GPIO_MAX; i++)
-- 
2.17.1

[PATCH net-next v2 1/2] i2c: acpi: export i2c_acpi_find_adapter_by_handle

2019-05-24 Thread Ruslan Babayev

This allows drivers to lookup i2c adapters on ACPI based systems similar to
of_get_i2c_adapter_by_node() with DT based systems.

Signed-off-by: Ruslan Babayev 
Cc: xe-linux-exter...@cisco.com
---
 drivers/i2c/i2c-core-acpi.c | 3 ++-
 include/linux/i2c.h | 6 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/i2c/i2c-core-acpi.c b/drivers/i2c/i2c-core-acpi.c
index 272800692088..964687534754 100644
--- a/drivers/i2c/i2c-core-acpi.c
+++ b/drivers/i2c/i2c-core-acpi.c
@@ -337,7 +337,7 @@ static int i2c_acpi_find_match_device(struct device *dev, 
void *data)
return ACPI_COMPANION(dev) == data;
 }
 
-static struct i2c_adapter *i2c_acpi_find_adapter_by_handle(acpi_handle handle)
+struct i2c_adapter *i2c_acpi_find_adapter_by_handle(acpi_handle handle)
 {
struct device *dev;
 
@@ -345,6 +345,7 @@ static struct i2c_adapter 
*i2c_acpi_find_adapter_by_handle(acpi_handle handle)
  i2c_acpi_find_match_adapter);
return dev ? i2c_verify_adapter(dev) : NULL;
 }
+EXPORT_SYMBOL_GPL(i2c_acpi_find_adapter_by_handle);
 
 static struct i2c_client *i2c_acpi_find_client_by_adev(struct acpi_device 
*adev)
 {
diff --git a/include/linux/i2c.h b/include/linux/i2c.h
index 1308126fc384..78f7d39ea5bc 100644
--- a/include/linux/i2c.h
+++ b/include/linux/i2c.h
@@ -21,6 +21,7 @@
 #include 
 #include/* for Host Notify IRQ */
 #include   /* for struct device_node */
+#include /* for acpi_handle */
 #include /* for swab16 */
 #include 
 
@@ -981,6 +982,7 @@ bool i2c_acpi_get_i2c_resource(struct acpi_resource *ares,
 u32 i2c_acpi_find_bus_speed(struct device *dev);
 struct i2c_client *i2c_acpi_new_device(struct device *dev, int index,
   struct i2c_board_info *info);
+struct i2c_adapter *i2c_acpi_find_adapter_by_handle(acpi_handle handle);
 #else
 static inline bool i2c_acpi_get_i2c_resource(struct acpi_resource *ares,
 struct acpi_resource_i2c_serialbus 
**i2c)
@@ -996,6 +998,10 @@ static inline struct i2c_client 
*i2c_acpi_new_device(struct device *dev,
 {
return NULL;
 }
+struct i2c_adapter *i2c_acpi_find_adapter_by_handle(acpi_handle handle)
+{
+   return NULL;
+}
 #endif /* CONFIG_ACPI */
 
 #endif /* _LINUX_I2C_H */
-- 
2.17.1

Re: [PATCH] dt-bindings: iio: adc: add adi,ad7780.yaml binding

2019-05-24 Thread Renato Lui Geh


Hi Jonathan, Alex,

Thanks for the review. Some comments inline.

Thanks,
Renato

On 05/20, Ardelean, Alexandru wrote:

On Sun, 2019-05-19 at 12:32 +0100, Jonathan Cameron wrote:

[External]


On Sat, 18 May 2019 19:41:12 -0300
Renato Lui Geh  wrote:

> This patch adds a YAML binding for the Analog Devices AD7780/1 and
> AD7170/1 analog-to-digital converters.
>
> Signed-off-by: Renato Lui Geh 

One comment inline.  I'll also be needing an ack from Analog on this,
preferably Michael's.

Thanks,

Jonathan
> ---
>  .../bindings/iio/adc/adi,ad7780.txt   | 48 ---
>  .../bindings/iio/adc/adi,ad7780.yaml  | 85 +++


You should also update the MAINTAINERS file.
Maybe in a following patch.
It looks like there is not entry in there, so maybe you need to add a new
one.

Something like:


ANALOG DEVICES INC AD7780 DRIVER
M:  Michael Hennerich 
M:  Renato Lui Geh 
L:  linux-...@vger.kernel.org
W:  http://ez.analog.com/community/linux-device-drivers
S:  Supported
F:  drivers/iio/adc/ad7780.c
F:  Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml

This should be after this block
ANALOG DEVICES INC AD7768-1 DRIVER

Note that I added you as a co-maintainer.
If you want, you do not need to add that line.


>  2 files changed, 85 insertions(+), 48 deletions(-)
>  delete mode 100644
> Documentation/devicetree/bindings/iio/adc/adi,ad7780.txt
>  create mode 100644
> Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml
>
> diff --git a/Documentation/devicetree/bindings/iio/adc/adi,ad7780.txt
> b/Documentation/devicetree/bindings/iio/adc/adi,ad7780.txt
> deleted file mode 100644
> index 440e52555349..
> --- a/Documentation/devicetree/bindings/iio/adc/adi,ad7780.txt
> +++ /dev/null
> @@ -1,48 +0,0 @@
> -* Analog Devices AD7170/AD7171/AD7780/AD7781
> -
> -Data sheets:
> -
> -- AD7170:
> - *
> https://www.analog.com/media/en/technical-documentation/data-sheets/AD7170.pdf
> -- AD7171:
> - *
> https://www.analog.com/media/en/technical-documentation/data-sheets/AD7171.pdf
> -- AD7780:
> - *
> https://www.analog.com/media/en/technical-documentation/data-sheets/ad7780.pdf
> -- AD7781:
> - *
> https://www.analog.com/media/en/technical-documentation/data-sheets/AD7781.pdf
> -
> -Required properties:
> -
> -- compatible: should be one of
> - * "adi,ad7170"
> - * "adi,ad7171"
> - * "adi,ad7780"
> - * "adi,ad7781"
> -- reg: spi chip select number for the device
> -- vref-supply: the regulator supply for the ADC reference voltage
> -
> -Optional properties:
> -
> -- powerdown-gpios:  must be the device tree identifier of the PDRST
> pin. If
> - specified, it will be asserted during driver probe.
> As the
> - line is active high, it should be marked
> GPIO_ACTIVE_HIGH.
> -- adi,gain-gpios:   must be the device tree identifier of the GAIN
> pin. Only for
> - the ad778x chips. If specified, it will be asserted
> during
> - driver probe. As the line is active low, it should be
> marked
> - GPIO_ACTIVE_LOW.
> -- adi,filter-gpios: must be the device tree identifier of the FILTER
> pin. Only
> - for the ad778x chips. If specified, it will be
> asserted
> - during driver probe. As the line is active low, it
> should be
> - marked GPIO_ACTIVE_LOW.
> -
> -Example:
> -
> -adc@0 {
> - compatible =  "adi,ad7780";
> - reg = <0>;
> - vref-supply = <_supply>
> -
> - powerdown-gpios  = < 12 GPIO_ACTIVE_HIGH>;
> - adi,gain-gpios   = <  5 GPIO_ACTIVE_LOW>;
> - adi,filter-gpios = < 15 GPIO_ACTIVE_LOW>;
> -};
> diff --git a/Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml
> b/Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml
> new file mode 100644
> index ..931bc4f8ec04
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml
> @@ -0,0 +1,85 @@
> +# SPDX-License-Identifier: GPL-2.0
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/iio/adc/adi,ad7780.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Analog Devices AD7170/AD7171/AD7780/AD7781 analog to digital
> converters
> +
> +maintainers:
> +  - Michael Hennerich 
> +
> +description: |
> +  The ad7780 is a sigma-delta analog to digital converter. This driver
> provides
> +  reading voltage values and status bits from both the ad778x and
> ad717x series.
> +  Its interface also allows writing on the FILTER and GAIN GPIO pins
> on the
> +  ad778x.
> +
> +  Specifications on the converters can be found at:
> +AD7170:
> +
> https://www.analog.com/media/en/technical-documentation/data-sheets/AD7170.pdf
> +AD7171:
> +
> https://www.analog.com/media/en/technical-documentation/data-sheets/AD7171.pdf
> +AD7780:
> +
> https://www.analog.com/media/en/technical-documentation/data-sheets/ad7780.pdf
> +AD7781:
> +
>

Re: [PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering

2019-05-24 Thread Frank Rowand

On 5/24/19 5:22 PM, Frank Rowand wrote:
> On 5/24/19 2:53 PM, Saravana Kannan wrote:
>> On Fri, May 24, 2019 at 10:49 AM Frank Rowand  wrote:
>>>
>>> On 5/23/19 6:01 PM, Saravana Kannan wrote:
> 
> < snip >
> 
>>> Another flaw with this method is that existing device trees
>>> will be broken after the kernel is modified, because existing
>>> device trees do not have the depends-on property.  This breaks
>>> the devicetree compatibility rules.
>>
>> This is 100% not true with the current implementation. I actually
>> tested this. This is fully backwards compatible. That's another reason
>> for adding depends-on and going by just what it says. The existing
>> bindings were never meant to describe only mandatory dependencies. So
>> using them as such is what would break backwards compatibility.
> 
> Are you saying that an existing, already compiled, devicetree (an FDT)
> can be used to boot a new kernel that has implemented this patch set?
> 
> The new kernel will boot with the existing FDT that does not have
> any depends-on properties?

I overlooked something you said in the email I replied to.  You said:

   "that depends-on becomes the source of truth if it exists and falls
   back to existing common bindings if "depends-on" isn't present"

Let me go back to look at the patch series to see how it falls back
to the existing bindings.

> 
> -Frank
>

Re: [PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering

2019-05-24 Thread Frank Rowand

On 5/24/19 2:53 PM, Saravana Kannan wrote:
> On Fri, May 24, 2019 at 10:49 AM Frank Rowand  wrote:
>>
>> On 5/23/19 6:01 PM, Saravana Kannan wrote:

< snip >

>> Another flaw with this method is that existing device trees
>> will be broken after the kernel is modified, because existing
>> device trees do not have the depends-on property.  This breaks
>> the devicetree compatibility rules.
> 
> This is 100% not true with the current implementation. I actually
> tested this. This is fully backwards compatible. That's another reason
> for adding depends-on and going by just what it says. The existing
> bindings were never meant to describe only mandatory dependencies. So
> using them as such is what would break backwards compatibility.

Are you saying that an existing, already compiled, devicetree (an FDT)
can be used to boot a new kernel that has implemented this patch set?

The new kernel will boot with the existing FDT that does not have
any depends-on properties?

-Frank

Re: [PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering

2019-05-24 Thread Frank Rowand

Hi Saravana,

On 5/24/19 2:53 PM, Saravana Kannan wrote:
> On Fri, May 24, 2019 at 10:49 AM Frank Rowand  wrote:
>>
>> On 5/23/19 6:01 PM, Saravana Kannan wrote:

< snip >

> 
> -Saravana
> 

There were several different topics in your email.  I am going to do
separate replies for different topics so that each topic is contained
in a single sub-thread instead of possibly having a many topic
sub-thread where any of the topics might get lost.

If I drop any key context out of any of the replies, please feel
free to add it back in.

-Frank

Re: [PATCH] printk: Monitor change of console loglevel.

2019-05-24 Thread Tetsuo Handa

On 2019/05/25 2:17, Linus Torvalds wrote:
> A config option or two that help syzbot doesn't sound like a bad idea to me.

Thanks for suggestion. I think that #ifdef'ing

  static bool suppress_message_printing(int level)
  {
return (level >= console_loglevel && !ignore_loglevel);
  }

is simpler. If the cause of unexpected change of console loglevel
turns out to be syz_execute_func(), we will want a config option
which controls suppress_message_printing() for syzbot. That option
would also be used for guarding printk("WARNING:" ...) users.

Well, syzbot does not want to use ignore_loglevel kernel command
line option because that option would generate too much output...

https://lkml.kernel.org/r/cact4y+ay7nut-7y2jarozv1s0visuldn6vt+w9oseds1peb...@mail.gmail.com

On 2019/05/25 2:55, Linus Torvalds wrote:
> On Fri, May 24, 2019 at 10:41 AM Joe Perches  wrote:
> >
> > That could also help eliminate unnecessary pr_ output
> > from object code.
> 
> Indeed. The small-config people might like it (if they haven't already
> given up..)

Do you mean doing e.g.

  #define pr_debug(fmt, ...) no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)

depending on the minimal console loglevel kernel config option? Then, OK.
But callers using e.g. printk(KERN_DEBUG ...) and printk(KERN_SOH "%u" ...)
will remain unfiltered...

Re: [PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()

2019-05-24 Thread Frank Rowand

On 5/24/19 11:21 AM, Saravana Kannan wrote:
> On Fri, May 24, 2019 at 10:56 AM Frank Rowand  wrote:
>>
>> Hi Sarvana,
>>
>> I'm not reviewing patches 1-5 in any detail, given my reply to patch 0.
>>
>> But I had already skimmed through this patch before I received the
>> email for patch 0, so I want to make one generic comment below,
>> to give some feedback as you continue thinking through possible
>> implementations to solve the underlying problems.
> 
> Appreciate the feedback Frank!
> 
>>
>>
>> On 5/23/19 6:01 PM, Saravana Kannan wrote:
>>> Add a pointer from device tree node to the device created from it.
>>> This allows us to find the device corresponding to a device tree node
>>> without having to loop through all the platform devices.
>>>
>>> However, fallback to looping through the platform devices to handle
>>> any devices that might set their own of_node.
>>>
>>> Signed-off-by: Saravana Kannan 
>>> ---
>>>  drivers/of/platform.c | 20 +++-
>>>  include/linux/of.h|  3 +++
>>>  2 files changed, 22 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
>>> index 04ad312fd85b..1115a8d80a33 100644
>>> --- a/drivers/of/platform.c
>>> +++ b/drivers/of/platform.c
>>> @@ -42,6 +42,8 @@ static int of_dev_node_match(struct device *dev, void 
>>> *data)
>>>   return dev->of_node == data;
>>>  }
>>>
>>> +static DEFINE_SPINLOCK(of_dev_lock);
>>> +
>>>  /**
>>>   * of_find_device_by_node - Find the platform_device associated with a node
>>>   * @np: Pointer to device tree node
>>> @@ -55,7 +57,18 @@ struct platform_device *of_find_device_by_node(struct 
>>> device_node *np)
>>>  {
>>>   struct device *dev;
>>>
>>> - dev = bus_find_device(_bus_type, NULL, np, 
>>> of_dev_node_match);
>>> + /*
>>> +  * Spinlock needed to make sure np->dev doesn't get freed between NULL
>>> +  * check inside and kref count increment inside get_device(). This is
>>> +  * achieved by grabbing the spinlock before setting np->dev = NULL in
>>> +  * of_platform_device_destroy().
>>> +  */
>>> + spin_lock(_dev_lock);
>>> + dev = get_device(np->dev);
>>> + spin_unlock(_dev_lock);
>>> + if (!dev)
>>> + dev = bus_find_device(_bus_type, NULL, np,
>>> +   of_dev_node_match);
>>>   return dev ? to_platform_device(dev) : NULL;
>>>  }
>>>  EXPORT_SYMBOL(of_find_device_by_node);
>>> @@ -196,6 +209,7 @@ static struct platform_device 
>>> *of_platform_device_create_pdata(
>>>   platform_device_put(dev);
>>>   goto err_clear_flag;
>>>   }
>>> + np->dev = >dev;
>>>
>>>   return dev;
>>>
>>> @@ -556,6 +570,10 @@ int of_platform_device_destroy(struct device *dev, 
>>> void *data)
>>>   if (of_node_check_flag(dev->of_node, OF_POPULATED_BUS))
>>>   device_for_each_child(dev, NULL, of_platform_device_destroy);
>>>
>>> + /* Spinlock is needed for of_find_device_by_node() to work */
>>> + spin_lock(_dev_lock);
>>> + dev->of_node->dev = NULL;
>>> + spin_unlock(_dev_lock);
>>>   of_node_clear_flag(dev->of_node, OF_POPULATED);
>>>   of_node_clear_flag(dev->of_node, OF_POPULATED_BUS);
>>>
>>> diff --git a/include/linux/of.h b/include/linux/of.h
>>> index 0cf857012f11..f2b4912cbca1 100644
>>> --- a/include/linux/of.h
>>> +++ b/include/linux/of.h
>>> @@ -48,6 +48,8 @@ struct property {
>>>  struct of_irq_controller;
>>>  #endif
>>>
>>> +struct device;
>>> +
>>>  struct device_node {
>>>   const char *name;
>>>   phandle phandle;
>>> @@ -68,6 +70,7 @@ struct device_node {
>>>   unsigned int unique_id;
>>>   struct of_irq_controller *irq_trans;
>>>  #endif
>>> + struct device *dev; /* Device created from this node */
>>
>> We have actively been working on shrinking the size of struct device_node,
>> as part of reducing the devicetree memory usage.  As such, we need strong
>> justification for adding anything to this struct.  For example, proof that
>> there is a performance problem that can only be solved by increasing the
>> memory usage.
> 
> I didn't mean for people to focus on the deferred probe optimization.

I was speaking specifically of the of_find_device_by_node() optimization.
I did not chase any further back in the call chain to see how that would
impact anything else.  My comments stand, whether this patch is meant
to optimize deferred probe optimization or to optimize something else.


> In reality that was just a added side benefit of this series. The main
> problem to solve is that of suppliers having to know when all their
> consumers are up and managing the resources actively, especially in a
> system with loadable modules where we can't depend on the driver to
> notify the supplier because the consumer driver module might not be
> available or loaded until much later.
> 
> Having said that, I'm not saying we should go around and waste space
> willy-nilly. But, isn't the memory

Re: [PATCH v2 1/3] kselftest/cgroup: fix unexpected testing failure on test_memcontrol

2019-05-24 Thread shuah


On 5/24/19 3:44 PM, shuah wrote:

On 5/24/19 3:40 PM, Tejun Heo wrote:

Hello,

All three patches look good to me.  Please feel free to add my
acked-by.  Shuah, should I route these through cgroup tree or would
the kselftest tree be a better fit?

Thanks.




Tejun, I can take them through kselftest tree.



Alex,

patches 1/3 and 2/3 failed checkpatch. Could you please the warns
and send v3. Go ahead and send all v3 for all 3 patches

thanks,
-- Shuah

[PATCH v3 1/5] x86/cpufeatures: Enumerate user wait instructions

2019-05-24 Thread Fenghua Yu

umonitor, umwait, and tpause are a set of user wait instructions.

umonitor arms address monitoring hardware using an address. The
address range is determined by using CPUID.0x5. A store to
an address within the specified address range triggers the
monitoring hardware to wake up the processor waiting in umwait.

umwait instructs the processor to enter an implementation-dependent
optimized state while monitoring a range of addresses. The optimized
state may be either a light-weight power/performance optimized state
(C0.1 state) or an improved power/performance optimized state
(C0.2 state).

tpause instructs the processor to enter an implementation-dependent
optimized state C0.1 or C0.2 state and wake up when time-stamp counter
reaches specified timeout.

The three instructions may be executed at any privilege level.

The instructions provide power saving method while waiting in
user space. Additionally, they can allow a sibling hyperthread to
make faster progress while this thread is waiting. One example of an
application usage of umwait is when waiting for input data from another
application, such as a user level multi-threaded packet processing
engine.

Availability of the user wait instructions is indicated by the presence
of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].

Detailed information on the instructions and CPUID feature WAITPKG flag
can be found in the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference and Intel 64 and IA-32
Architectures Software Developer's Manual.

Signed-off-by: Fenghua Yu 
Reviewed-by: Ashok Raj 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 75f27ee2c263..b8bd428ae5bc 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -322,6 +322,7 @@
 #define X86_FEATURE_UMIP   (16*32+ 2) /* User Mode Instruction 
Protection */
 #define X86_FEATURE_PKU(16*32+ 3) /* Protection Keys 
for Userspace */
 #define X86_FEATURE_OSPKE  (16*32+ 4) /* OS Protection Keys Enable 
*/
+#define X86_FEATURE_WAITPKG(16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE 
Instructions */
 #define X86_FEATURE_AVX512_VBMI2   (16*32+ 6) /* Additional AVX512 Vector 
Bit Manipulation Instructions */
 #define X86_FEATURE_GFNI   (16*32+ 8) /* Galois Field New 
Instructions */
 #define X86_FEATURE_VAES   (16*32+ 9) /* Vector AES */
-- 
2.19.1

[PATCH v3 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state

2019-05-24 Thread Fenghua Yu

C0.2 state in umwait and tpause instructions can be enabled or disabled
on a processor through IA32_UMWAIT_CONTROL MSR register.

By default, C0.2 is enabled and the user wait instructions result in
lower power consumption with slower wakeup time.

But in real time systems which requrie faster wakeup time although power
savings could be smaller, the administrator needs to disable C0.2 and all
C0.2 requests from user applications revert to C0.1.

A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c0_2" is
created to allow the administrator to control C0.2 state during run time.

Signed-off-by: Fenghua Yu 
Reviewed-by: Ashok Raj 
Reviewed-by: Tony Luck 
---
 arch/x86/power/umwait.c | 75 ++---
 1 file changed, 71 insertions(+), 4 deletions(-)

diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
index 80cc53a9c2d0..cf5de7e1cc24 100644
--- a/arch/x86/power/umwait.c
+++ b/arch/x86/power/umwait.c
@@ -7,6 +7,7 @@
 static bool umwait_c0_2_enabled = true;
 /* Umwait max time is in TSC-quanta. Bits[1:0] are zero. */
 static u32 umwait_max_time = 10;
+static DEFINE_MUTEX(umwait_lock);
 
 /* Return value that will be used to set IA32_UMWAIT_CONTROL MSR */
 static u32 umwait_compute_msr_value(void)
@@ -22,7 +23,7 @@ static u32 umwait_compute_msr_value(void)
   (umwait_max_time & MSR_IA32_UMWAIT_CONTROL_MAX_TIME);
 }
 
-static void umwait_control_msr_update(void)
+static void umwait_control_msr_update(void *unused)
 {
u32 msr_val;
 
@@ -33,7 +34,9 @@ static void umwait_control_msr_update(void)
 /* Set up IA32_UMWAIT_CONTROL MSR on CPU using the current global setting. */
 static int umwait_cpu_online(unsigned int cpu)
 {
-   umwait_control_msr_update();
+   mutex_lock(_lock);
+   umwait_control_msr_update(NULL);
+   mutex_unlock(_lock);
 
return 0;
 }
@@ -49,24 +52,88 @@ static int umwait_cpu_online(unsigned int cpu)
  */
 static void umwait_syscore_resume(void)
 {
-   umwait_control_msr_update();
+   /* No need to lock because only BP is running now. */
+   umwait_control_msr_update(NULL);
 }
 
 static struct syscore_ops umwait_syscore_ops = {
.resume = umwait_syscore_resume,
 };
 
+static ssize_t
+enable_c0_2_show(struct device *dev, struct device_attribute *attr,
+char *buf)
+{
+   return sprintf(buf, "%d\n", umwait_c0_2_enabled);
+}
+
+static void umwait_control_msr_update_all_cpus(void)
+{
+   u32 msr_val;
+
+   msr_val = umwait_compute_msr_value();
+   /* All CPUs have same umwait control setting */
+   on_each_cpu(umwait_control_msr_update, NULL, 1);
+}
+
+static ssize_t enable_c0_2_store(struct device *dev,
+struct device_attribute *attr,
+const char *buf, size_t count)
+{
+   bool c0_2_enabled;
+   int ret;
+
+   ret = kstrtobool(buf, _2_enabled);
+   if (ret)
+   return ret;
+
+   mutex_lock(_lock);
+
+   if (umwait_c0_2_enabled == c0_2_enabled)
+   goto out_unlock;
+
+   umwait_c0_2_enabled = c0_2_enabled;
+   /* Enable/disable C0.2 state on all CPUs */
+   umwait_control_msr_update_all_cpus();
+
+out_unlock:
+   mutex_unlock(_lock);
+
+   return count;
+}
+static DEVICE_ATTR_RW(enable_c0_2);
+
+static struct attribute *umwait_attrs[] = {
+   _attr_enable_c0_2.attr,
+   NULL
+};
+
+static struct attribute_group umwait_attr_group = {
+   .attrs = umwait_attrs,
+   .name = "umwait_control",
+};
+
 static int __init umwait_init(void)
 {
+   struct device *dev;
int ret;
 
if (!boot_cpu_has(X86_FEATURE_WAITPKG))
return -ENODEV;
 
+   /* Add umwait control interface. */
+   dev = cpu_subsys.dev_root;
+   ret = sysfs_create_group(>kobj, _attr_group);
+   if (ret)
+   return ret;
+
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
umwait_cpu_online, NULL);
-   if (ret < 0)
+   if (ret < 0) {
+   sysfs_remove_group(>kobj, _attr_group);
+
return ret;
+   }
 
register_syscore_ops(_syscore_ops);
 
-- 
2.19.1

[PATCH v3 0/5] x86/umwait: Enable user wait instructions

2019-05-24 Thread Fenghua Yu

Today, if an application needs to wait for a very short duration
they have to have spinloops. Spinloops consume more power and continue
to use execution resources that could hurt its thread siblings in a core
with hyperthreads. New instructions umonitor, umwait and tpause allow
a low power alternative waiting at the same time could improve the HT
sibling perform while giving it any power headroom. These instructions
can be used in both user space and kernel space.

A new MSR IA32_UMWAIT_CONTROL allows kernel to set a time limit in
TSC-quanta that prevents user applications from waiting for a long time.
This allows applications to yield the CPU and the user application
should consider using other alternatives to wait.

The processor supports two levels of optimized states: a light-weight
power/performance optimized state (C0.1 state) or an improved
power/performance optimized state (C0.2 state with deeper power saving
and higher exit latency). The above MSR can be used to restrict
entry to C0.2 and then any request for C0.2 will revert to C0.1.

This patch set covers feature discovery, provides initial values for
the MSR, adds some sysfs control files for admin to tweak the values
in the MSR if needed.

The sysfs interface files are in /sys/devices/system/cpu/umwait_control/

GCC 9 enables intrinsics for the instructions. To use the instructions,
user applications should include  and be compiled with
-mwaitpkg.

Detailed information on the instructions, the MSR, and syntax of the
intrinsics can be found in the latest Intel Architecture Instruction
Set Extensions and Future Features Programming Reference and Intel 64
and IA-32 Architectures Software Developer's Manual.

Changelog:
v3:
Address issues pointed out by Andy Lutomirski:
- Change default umwait max time to 100k TSC cycles
- Setting up MSR on BSP during resume suspend/hibernation
- A few other naming and coding changes as suggested
- Some security concerns of the user wait instructions are not issues
of the patches and cannot be addressed in the patch set. They will be
discussed on lkml.

Plus:
- Add ABI document entry for umwait control sysfs interfaces

v2:
- Address comments from Thomas Gleixner and Andy Lutomirski
- Remove vDSO functions
- Add sysfs control file for umwait max time

v1:
Based on comments from Thomas:
- Change user APIs to vDSO functions
- Changed sysfs per comments from Thomas.
- Change patch descriptions etc

Fenghua Yu (5):
  x86/cpufeatures: Enumerate user wait instructions
  x86/umwait: Initialize umwait control values
  x86/umwait: Add sysfs interface to control umwait C0.2 state
  x86/umwait: Add sysfs interface to control umwait maximum time
  x86/umwait: Document umwait control sysfs interfaces

 .../ABI/testing/sysfs-devices-system-cpu  |  21 ++
 arch/x86/include/asm/cpufeatures.h|   1 +
 arch/x86/include/asm/msr-index.h  |   4 +
 arch/x86/power/Makefile   |   1 +
 arch/x86/power/umwait.c   | 179 ++
 5 files changed, 206 insertions(+)
 create mode 100644 arch/x86/power/umwait.c

-- 
2.19.1

[PATCH v3 4/5] x86/umwait: Add sysfs interface to control umwait maximum time

2019-05-24 Thread Fenghua Yu

IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta
that processor can stay in C0.1 or C0.2. A zero value means no maximum
time.

Each instruction sets its own deadline in the instruction's implicit
input EDX:EAX value. The instruction wakes up if the time-stamp counter
reaches or exceeds the specified deadline, or the umwait maximum time
expires, or a store happens in the monitored address range in umwait.

Users can write an unsigned 32-bit number to
/sys/devices/system/cpu/umwait_control/max_time to change the default
value. Note that a value of zero means there is no limit. Low order
two bits are ignored.

Signed-off-by: Fenghua Yu 
Reviewed-by: Ashok Raj 
Reviewed-by: Tony Luck 
---
 arch/x86/power/umwait.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
index cf5de7e1cc24..61076aad7138 100644
--- a/arch/x86/power/umwait.c
+++ b/arch/x86/power/umwait.c
@@ -103,8 +103,45 @@ static ssize_t enable_c0_2_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(enable_c0_2);
 
+static ssize_t
+max_time_show(struct device *kobj, struct device_attribute *attr, char *buf)
+{
+   return sprintf(buf, "%u\n", umwait_max_time);
+}
+
+static ssize_t max_time_store(struct device *kobj,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   u32 max_time;
+   int ret;
+
+   ret = kstrtou32(buf, 0, _time);
+   if (ret)
+   return ret;
+
+   mutex_lock(_lock);
+
+   /* Only get max time value from bits[31:2] */
+   max_time &= MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
+   if (umwait_max_time == max_time)
+   goto out_unlock;
+
+   umwait_max_time = max_time;
+
+   /* Update umwait max time on all CPUs */
+   umwait_control_msr_update_all_cpus();
+
+out_unlock:
+   mutex_unlock(_lock);
+
+   return count;
+}
+static DEVICE_ATTR_RW(max_time);
+
 static struct attribute *umwait_attrs[] = {
_attr_enable_c0_2.attr,
+   _attr_max_time.attr,
NULL
 };
 
-- 
2.19.1

[PATCH v3 2/5] x86/umwait: Initialize umwait control values

2019-05-24 Thread Fenghua Yu

umwait or tpause allows processor to enter a light-weight
power/performance optimized state (C0.1 state) or an improved
power/performance optimized state (C0.2 state) for a period
specified by the instruction or until the system time limit or until
a store to the monitored address range in umwait.

IA32_UMWAIT_CONTROL MSR register allows kernel to enable/disable C0.2
on the processor and set maximum time the processor can reside in
C0.1 or C0.2.

By default C0.2 is enabled so the user wait instructions can enter the
C0.2 state to save more power with slower wakeup time.

Default maximum umwait time is 10 cycles. A later patch provides
a sysfs interface to adjust this value.

Signed-off-by: Fenghua Yu 
Reviewed-by: Ashok Raj 
---
 arch/x86/include/asm/msr-index.h |  4 ++
 arch/x86/power/Makefile  |  1 +
 arch/x86/power/umwait.c  | 75 
 3 files changed, 80 insertions(+)
 create mode 100644 arch/x86/power/umwait.c

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 979ef971cc78..af502e947298 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -61,6 +61,10 @@
 #define MSR_PLATFORM_INFO_CPUID_FAULT_BIT  31
 #define MSR_PLATFORM_INFO_CPUID_FAULT  
BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT)
 
+#define MSR_IA32_UMWAIT_CONTROL0xe1
+#define MSR_IA32_UMWAIT_CONTROL_C02BIT(0)
+#define MSR_IA32_UMWAIT_CONTROL_MAX_TIME   0xfffc
+
 #define MSR_PKG_CST_CONFIG_CONTROL 0x00e2
 #define NHM_C3_AUTO_DEMOTE (1UL << 25)
 #define NHM_C1_AUTO_DEMOTE (1UL << 26)
diff --git a/arch/x86/power/Makefile b/arch/x86/power/Makefile
index 37923d715741..62e2c609d1fe 100644
--- a/arch/x86/power/Makefile
+++ b/arch/x86/power/Makefile
@@ -8,3 +8,4 @@ CFLAGS_cpu.o:= $(nostackp)
 
 obj-$(CONFIG_PM_SLEEP) += cpu.o
 obj-$(CONFIG_HIBERNATION)  += hibernate_$(BITS).o hibernate_asm_$(BITS).o 
hibernate.o
+obj-y  += umwait.o
diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
new file mode 100644
index ..80cc53a9c2d0
--- /dev/null
+++ b/arch/x86/power/umwait.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+
+static bool umwait_c0_2_enabled = true;
+/* Umwait max time is in TSC-quanta. Bits[1:0] are zero. */
+static u32 umwait_max_time = 10;
+
+/* Return value that will be used to set IA32_UMWAIT_CONTROL MSR */
+static u32 umwait_compute_msr_value(void)
+{
+   /*
+* When bit 0 in IA32_UMWAIT_CONTROL MSR is 1, C0.2 is disabled.
+* Otherwise, C0.2 is enabled.
+* So the value in bit 0 is opposite of umwait_c0_2_enabled.
+*/
+   u32 umwait_c0_2_disabled = umwait_c0_2_enabled ? 0 : 1;
+
+   return (umwait_c0_2_disabled & MSR_IA32_UMWAIT_CONTROL_C02) |
+  (umwait_max_time & MSR_IA32_UMWAIT_CONTROL_MAX_TIME);
+}
+
+static void umwait_control_msr_update(void)
+{
+   u32 msr_val;
+
+   msr_val = umwait_compute_msr_value();
+   wrmsr(MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
+}
+
+/* Set up IA32_UMWAIT_CONTROL MSR on CPU using the current global setting. */
+static int umwait_cpu_online(unsigned int cpu)
+{
+   umwait_control_msr_update();
+
+   return 0;
+}
+
+/*
+ * On resume, set up IA32_UMWAIT_CONTROL MSR on BP which is the only active
+ * CPU at this time. Setting up the MSR on APs when they are re-added later
+ * using CPU hotplug.
+ * The MSR on BP is supposed not to be changed during suspend and thus it's
+ * unnecessary to set it again during resume from suspend. But at this point
+ * we don't know resume is from suspend or hiberation. To simplify the
+ * situation, just set up the MSR on resume from suspend.
+ */
+static void umwait_syscore_resume(void)
+{
+   umwait_control_msr_update();
+}
+
+static struct syscore_ops umwait_syscore_ops = {
+   .resume = umwait_syscore_resume,
+};
+
+static int __init umwait_init(void)
+{
+   int ret;
+
+   if (!boot_cpu_has(X86_FEATURE_WAITPKG))
+   return -ENODEV;
+
+   ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
+   umwait_cpu_online, NULL);
+   if (ret < 0)
+   return ret;
+
+   register_syscore_ops(_syscore_ops);
+
+   return 0;
+}
+device_initcall(umwait_init);
-- 
2.19.1

[PATCH v3 5/5] x86/umwait: Document umwait control sysfs interfaces

2019-05-24 Thread Fenghua Yu

Since two new sysfs interface files are created for umwait control, add
an ABI document entry for the files:
/sys/devices/system/cpu/umwait_control/enable_c0_2
/sys/devices/system/cpu/umwait_control/max_time

Signed-off-by: Fenghua Yu 
Reviewed-by: Ashok Raj 
---
 .../ABI/testing/sysfs-devices-system-cpu  | 21 +++
 1 file changed, 21 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1528239f69b2..bbf65ae447ff 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -538,3 +538,24 @@ Description:   Intel Energy and Performance Bias Hint 
(EPB)
 
This attribute is present for all online CPUs supporting the
Intel EPB feature.
+
+What:  /sys/devices/system/cpu/umwait_control
+   /sys/devices/system/cpu/umwait_control/enable_c0_2
+   /sys/devices/system/cpu/umwait_control/max_time
+Date:  May 2019
+Contact:   Linux kernel mailing list 
+Description:   Umwait control
+
+   enable_c0_2: Read/write interface to control umwait C0.2 state
+   Read returns C0.2 state status:
+   0: C0.2 is disabled
+   1: C0.2 is enabled
+
+   Write 'Yy1' or [oO][nN] for on to enable C0.2 state.
+   Write 'Nn0' or [oO][fF] for off to disable C0.2 state.
+
+   max_time: Read/write interface to control umwait maximum time
+ in TSC-quanta that the CPU can reside in either C0.1
+ or C0.2 state. The time is represented as an unsigned
+ integer decimal value. Bits[1:0] are ignore.
+ A zero value indicates no maximum time.
-- 
2.19.1

Re: [5.2-rc1 regression]: nvme vs. hibernation

2019-05-24 Thread Dongli Zhang

Hi Jiri,

Looks this has been discussed in the past.

http://lists.infradead.org/pipermail/linux-nvme/2019-April/023234.html

I created a fix for a case but not good enough.

http://lists.infradead.org/pipermail/linux-nvme/2019-April/023277.html

Perhaps people would have better solution.

Dongli Zhang

On 05/25/2019 06:27 AM, Jiri Kosina wrote:
> On Fri, 24 May 2019, Keith Busch wrote:
> 
>>> Something is broken in Linus' tree (4dde821e429) with respec to 
>>> hibernation on my thinkpad x270, and it seems to be nvme related.
>>>
>>> I reliably see the warning below during hibernation, and then sometimes 
>>> resume sort of works but the machine misbehaves here and there (seems like 
>>> lost IRQs), sometimes it never comes back from the hibernated state.
>>>
>>> I will not have too much have time to look into this over weekend, so I am 
>>> sending this out as-is in case anyone has immediate idea. Otherwise I'll 
>>> bisect it on monday (I don't even know at the moment what exactly was the 
>>> last version that worked reliably, I'll have to figure that out as well 
>>> later).
>>
>> I believe the warning call trace was introduced when we converted nvme to
>> lock-less completions. On device shutdown, we'll check queues for any
>> pending completions, and we temporarily disable the interrupts to make
>> sure that queues interrupt handler can't run concurrently.
> 
> Yeah, the completion changes were the primary reason why I brought this up 
> with all of you guys in CC.
> 
>> On hibernation, most CPUs are offline, and the interrupt re-enabling
>> is hitting this warning that says the IRQ is not associated with any
>> online CPUs.
>>
>> I'm sure we can find a way to fix this warning, but I'm not sure that
>> explains the rest of the symptoms you're describing though.
> 
> It seems to be more or less reliable enough for bisect. I'll try that on 
> monday and will let you know.
> 
> Thanks,
>

[PATCH v4 bpf-next 4/4] selftests/bpf: add auto-detach test

2019-05-24 Thread Roman Gushchin

Add a kselftest to cover bpf auto-detachment functionality.
The test creates a cgroup, associates some resources with it,
attaches a couple of bpf programs and deletes the cgroup.

Then it checks that bpf programs are going away in 5 seconds.

Expected output:
  $ ./test_cgroup_attach
  #override:PASS
  #multi:PASS
  #autodetach:PASS
  test_cgroup_attach:PASS

On a kernel without auto-detaching:
  $ ./test_cgroup_attach
  #override:PASS
  #multi:PASS
  #autodetach:FAIL
  test_cgroup_attach:FAIL

Signed-off-by: Roman Gushchin 
Acked-by: Yonghong Song 
---
 .../selftests/bpf/test_cgroup_attach.c| 98 ++-
 1 file changed, 97 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_cgroup_attach.c 
b/tools/testing/selftests/bpf/test_cgroup_attach.c
index 2d6d57f50e10..7671909ee1cb 100644
--- a/tools/testing/selftests/bpf/test_cgroup_attach.c
+++ b/tools/testing/selftests/bpf/test_cgroup_attach.c
@@ -456,9 +456,105 @@ static int test_multiprog(void)
return rc;
 }
 
+static int test_autodetach(void)
+{
+   __u32 prog_cnt = 4, attach_flags;
+   int allow_prog[2] = {0};
+   __u32 prog_ids[2] = {0};
+   int cg = 0, i, rc = -1;
+   void *ptr = NULL;
+   int attempts;
+
+   for (i = 0; i < ARRAY_SIZE(allow_prog); i++) {
+   allow_prog[i] = prog_load_cnt(1, 1 << i);
+   if (!allow_prog[i])
+   goto err;
+   }
+
+   if (setup_cgroup_environment())
+   goto err;
+
+   /* create a cgroup, attach two programs and remember their ids */
+   cg = create_and_get_cgroup("/cg_autodetach");
+   if (cg < 0)
+   goto err;
+
+   if (join_cgroup("/cg_autodetach"))
+   goto err;
+
+   for (i = 0; i < ARRAY_SIZE(allow_prog); i++) {
+   if (bpf_prog_attach(allow_prog[i], cg, BPF_CGROUP_INET_EGRESS,
+   BPF_F_ALLOW_MULTI)) {
+   log_err("Attaching prog[%d] to cg:egress", i);
+   goto err;
+   }
+   }
+
+   /* make sure that programs are attached and run some traffic */
+   assert(bpf_prog_query(cg, BPF_CGROUP_INET_EGRESS, 0, _flags,
+ prog_ids, _cnt) == 0);
+   assert(system(PING_CMD) == 0);
+
+   /* allocate some memory (4Mb) to pin the original cgroup */
+   ptr = malloc(4 * (1 << 20));
+   if (!ptr)
+   goto err;
+
+   /* close programs and cgroup fd */
+   for (i = 0; i < ARRAY_SIZE(allow_prog); i++) {
+   close(allow_prog[i]);
+   allow_prog[i] = 0;
+   }
+
+   close(cg);
+   cg = 0;
+
+   /* leave the cgroup and remove it. don't detach programs */
+   cleanup_cgroup_environment();
+
+   /* wait for the asynchronous auto-detachment.
+* wait for no more than 5 sec and give up.
+*/
+   for (i = 0; i < ARRAY_SIZE(prog_ids); i++) {
+   for (attempts = 5; attempts >= 0; attempts--) {
+   int fd = bpf_prog_get_fd_by_id(prog_ids[i]);
+
+   if (fd < 0)
+   break;
+
+   /* don't leave the fd open */
+   close(fd);
+
+   if (!attempts)
+   goto err;
+
+   sleep(1);
+   }
+   }
+
+   rc = 0;
+err:
+   for (i = 0; i < ARRAY_SIZE(allow_prog); i++)
+   if (allow_prog[i] > 0)
+   close(allow_prog[i]);
+   if (cg)
+   close(cg);
+   free(ptr);
+   cleanup_cgroup_environment();
+   if (!rc)
+   printf("#autodetach:PASS\n");
+   else
+   printf("#autodetach:FAIL\n");
+   return rc;
+}
+
 int main(void)
 {
-   int (*tests[])(void) = {test_foo_bar, test_multiprog};
+   int (*tests[])(void) = {
+   test_foo_bar,
+   test_multiprog,
+   test_autodetach,
+   };
int errors = 0;
int i;
 
-- 
2.21.0

[PATCH v4 bpf-next 0/4] cgroup bpf auto-detachment

2019-05-24 Thread Roman Gushchin

This patchset implements a cgroup bpf auto-detachment functionality:
bpf programs are detached as soon as possible after removal of the
cgroup, without waiting for the release of all associated resources.

Patches 2 and 3 are required to implement a corresponding kselftest
in patch 4.

v4:
  1) release cgroup bpf data using a workqueue
  2) add test_cgroup_attach to .gitignore

v3:
  1) some minor changes and typo fixes

v2:
  1) removed a bogus check in patch 4
  2) moved buf[len] = 0 in patch 2


Roman Gushchin (4):
  bpf: decouple the lifetime of cgroup_bpf from cgroup itself
  selftests/bpf: convert test_cgrp2_attach2 example into kselftest
  selftests/bpf: enable all available cgroup v2 controllers
  selftests/bpf: add auto-detach test

 include/linux/bpf-cgroup.h|  11 +-
 include/linux/cgroup.h|  18 +++
 kernel/bpf/cgroup.c   |  41 -
 kernel/cgroup/cgroup.c|  11 +-
 samples/bpf/Makefile  |   2 -
 tools/testing/selftests/bpf/.gitignore|   1 +
 tools/testing/selftests/bpf/Makefile  |   4 +-
 tools/testing/selftests/bpf/cgroup_helpers.c  |  57 +++
 .../selftests/bpf/test_cgroup_attach.c| 146 --
 9 files changed, 262 insertions(+), 29 deletions(-)
 rename samples/bpf/test_cgrp2_attach2.c => 
tools/testing/selftests/bpf/test_cgroup_attach.c (79%)

-- 
2.21.0

[PATCH v4 bpf-next 2/4] selftests/bpf: convert test_cgrp2_attach2 example into kselftest

2019-05-24 Thread Roman Gushchin

Convert test_cgrp2_attach2 example into a proper test_cgroup_attach
kselftest. It's better because we do run kselftest on a constant
basis, so there are better chances to spot a potential regression.

Also make it slightly less verbose to conform kselftests output style.

Output example:
  $ ./test_cgroup_attach
  #override:PASS
  #multi:PASS
  test_cgroup_attach:PASS

Signed-off-by: Roman Gushchin 
Acked-by: Yonghong Song 
---
 samples/bpf/Makefile  |  2 -
 tools/testing/selftests/bpf/.gitignore|  1 +
 tools/testing/selftests/bpf/Makefile  |  4 +-
 .../selftests/bpf/test_cgroup_attach.c| 50 ---
 4 files changed, 37 insertions(+), 20 deletions(-)
 rename samples/bpf/test_cgrp2_attach2.c => 
tools/testing/selftests/bpf/test_cgroup_attach.c (91%)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 4f0a1cdbfe7c..253e5a2856be 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -26,7 +26,6 @@ hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
 hostprogs-y += test_cgrp2_array_pin
 hostprogs-y += test_cgrp2_attach
-hostprogs-y += test_cgrp2_attach2
 hostprogs-y += test_cgrp2_sock
 hostprogs-y += test_cgrp2_sock2
 hostprogs-y += xdp1
@@ -81,7 +80,6 @@ map_perf_test-objs := bpf_load.o map_perf_test_user.o
 test_overhead-objs := bpf_load.o test_overhead_user.o
 test_cgrp2_array_pin-objs := test_cgrp2_array_pin.o
 test_cgrp2_attach-objs := test_cgrp2_attach.o
-test_cgrp2_attach2-objs := test_cgrp2_attach2.o $(CGROUP_HELPERS)
 test_cgrp2_sock-objs := test_cgrp2_sock.o
 test_cgrp2_sock2-objs := bpf_load.o test_cgrp2_sock2.o
 xdp1-objs := xdp1_user.o
diff --git a/tools/testing/selftests/bpf/.gitignore 
b/tools/testing/selftests/bpf/.gitignore
index dd5d69529382..86a546e5e4db 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -22,6 +22,7 @@ test_lirc_mode2_user
 get_cgroup_id_user
 test_skb_cgroup_id_user
 test_socket_cookie
+test_cgroup_attach
 test_cgroup_storage
 test_select_reuseport
 test_flow_dissector
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 66f2dca1dee1..e09f419f4d7e 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -23,7 +23,8 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps 
test_lru_map test_lpm_map test
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user 
\
test_socket_cookie test_cgroup_storage test_select_reuseport 
test_section_names \
-   test_netcnt test_tcpnotify_user test_sock_fields test_sysctl
+   test_netcnt test_tcpnotify_user test_sock_fields test_sysctl \
+   test_cgroup_attach
 
 BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
 TEST_GEN_FILES = $(BPF_OBJ_FILES)
@@ -96,6 +97,7 @@ $(OUTPUT)/test_cgroup_storage: cgroup_helpers.c
 $(OUTPUT)/test_netcnt: cgroup_helpers.c
 $(OUTPUT)/test_sock_fields: cgroup_helpers.c
 $(OUTPUT)/test_sysctl: cgroup_helpers.c
+$(OUTPUT)/test_cgroup_attach: cgroup_helpers.c
 
 .PHONY: force
 
diff --git a/samples/bpf/test_cgrp2_attach2.c 
b/tools/testing/selftests/bpf/test_cgroup_attach.c
similarity index 91%
rename from samples/bpf/test_cgrp2_attach2.c
rename to tools/testing/selftests/bpf/test_cgroup_attach.c
index 0bb6507256b7..2d6d57f50e10 100644
--- a/samples/bpf/test_cgrp2_attach2.c
+++ b/tools/testing/selftests/bpf/test_cgroup_attach.c
@@ -1,3 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+
 /* eBPF example program:
  *
  * - Creates arraymap in kernel with 4 bytes keys and 8 byte values
@@ -25,20 +27,27 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 
-#include "bpf_insn.h"
+#include "bpf_util.h"
 #include "bpf_rlimit.h"
 #include "cgroup_helpers.h"
 
 #define FOO"/foo"
 #define BAR"/foo/bar/"
-#define PING_CMD   "ping -c1 -w1 127.0.0.1 > /dev/null"
+#define PING_CMD   "ping -q -c1 -w1 127.0.0.1 > /dev/null"
 
 char bpf_log_buf[BPF_LOG_BUF_SIZE];
 
+#ifdef DEBUG
+#define debug(args...) printf(args)
+#else
+#define debug(args...)
+#endif
+
 static int prog_load(int verdict)
 {
int ret;
@@ -89,7 +98,7 @@ static int test_foo_bar(void)
goto err;
}
 
-   printf("Attached DROP prog. This ping in cgroup /foo should fail...\n");
+   debug("Attached DROP prog. This ping in cgroup /foo should fail...\n");
assert(system(PING_CMD) != 0);
 
/* Create cgroup /foo/bar, get fd, and join it */
@@ -100,7 +109,7 @@ static int test_foo_bar(void)
if (join_cgroup(BAR))
goto err;
 
-   printf("Attached DROP prog. This ping in cgroup /foo/bar should 
fail...\n");
+   debug("Attached DROP prog. This ping in cgroup /foo/bar should 
fail...\n");
assert(system(PING_CMD) != 0);
 
if (bpf_prog_attach(allow_prog, bar, BPF_CGROUP_INET_EGRESS,
@@ -109,7

[PATCH v4 bpf-next 3/4] selftests/bpf: enable all available cgroup v2 controllers

2019-05-24 Thread Roman Gushchin

Enable all available cgroup v2 controllers when setting up
the environment for the bpf kselftests. It's required to properly test
the bpf prog auto-detach feature. Also it will generally increase
the code coverage.

Signed-off-by: Roman Gushchin 
Acked-by: Yonghong Song 
---
 tools/testing/selftests/bpf/cgroup_helpers.c | 57 
 1 file changed, 57 insertions(+)

diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c 
b/tools/testing/selftests/bpf/cgroup_helpers.c
index 6692a40a6979..0d89f0396be4 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.c
+++ b/tools/testing/selftests/bpf/cgroup_helpers.c
@@ -33,6 +33,60 @@
snprintf(buf, sizeof(buf), "%s%s%s", CGROUP_MOUNT_PATH, \
 CGROUP_WORK_DIR, path)
 
+/**
+ * enable_all_controllers() - Enable all available cgroup v2 controllers
+ *
+ * Enable all available cgroup v2 controllers in order to increase
+ * the code coverage.
+ *
+ * If successful, 0 is returned.
+ */
+int enable_all_controllers(char *cgroup_path)
+{
+   char path[PATH_MAX + 1];
+   char buf[PATH_MAX];
+   char *c, *c2;
+   int fd, cfd;
+   size_t len;
+
+   snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
+   fd = open(path, O_RDONLY);
+   if (fd < 0) {
+   log_err("Opening cgroup.controllers: %s", path);
+   return 1;
+   }
+
+   len = read(fd, buf, sizeof(buf) - 1);
+   if (len < 0) {
+   close(fd);
+   log_err("Reading cgroup.controllers: %s", path);
+   return 1;
+   }
+   buf[len] = 0;
+   close(fd);
+
+   /* No controllers available? We're probably on cgroup v1. */
+   if (len == 0)
+   return 0;
+
+   snprintf(path, sizeof(path), "%s/cgroup.subtree_control", cgroup_path);
+   cfd = open(path, O_RDWR);
+   if (cfd < 0) {
+   log_err("Opening cgroup.subtree_control: %s", path);
+   return 1;
+   }
+
+   for (c = strtok_r(buf, " ", ); c; c = strtok_r(NULL, " ", )) {
+   if (dprintf(cfd, "+%s\n", c) <= 0) {
+   log_err("Enabling controller %s: %s", c, path);
+   close(cfd);
+   return 1;
+   }
+   }
+   close(cfd);
+   return 0;
+}
+
 /**
  * setup_cgroup_environment() - Setup the cgroup environment
  *
@@ -71,6 +125,9 @@ int setup_cgroup_environment(void)
return 1;
}
 
+   if (enable_all_controllers(cgroup_workdir))
+   return 1;
+
return 0;
 }
 
-- 
2.21.0

Re: [PATCH] proc: report eip and esp for all threads when coredumping

2019-05-24 Thread John Ogness

On 2019-05-22, Jan Luebbe  wrote:
> Commit 0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in
> /proc/PID/stat") stopped reporting eip/esp and commit fd7d56270b52
> ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping")
> reintroduced the feature to fix a regression with userspace core dump
> handlers (such as minicoredumper).
>
> Because PF_DUMPCORE is only set for the primary thread, this didn't fix
> the original problem for secondary threads. This commit checks
> mm->core_state instead, as already done for /proc//status in
> task_core_dumping(). As we have a mm_struct available here anyway, this
> seems to be a clean solution.
>
> Signed-off-by: Jan Luebbe 
> ---
>  fs/proc/array.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 2edbb657f859..b76b1e29fc36 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -462,7 +462,7 @@ static int do_task_stat(struct seq_file *m, struct 
> pid_namespace *ns,
>* a program is not able to use ptrace(2) in that case. It is
>* safe because the task has stopped executing permanently.
>*/
> - if (permitted && (task->flags & PF_DUMPCORE)) {
> + if (permitted && (!!mm->core_state)) {

This is not entirely safe. mm->core_state is set _before_ zap_process()
is called. Therefore tasks can be executing on a CPU with mm->core_state
set.

With the following additional change, I was able to close the window.

diff --git a/fs/coredump.c b/fs/coredump.c
index e42e17e55bfd..93f55563e2c1 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -340,10 +340,10 @@ static int zap_threads(struct task_struct *tsk, struct 
mm_struct *mm,
 
spin_lock_irq(>sighand->siglock);
if (!signal_group_exit(tsk->signal)) {
-   mm->core_state = core_state;
tsk->signal->group_exit_task = tsk;
nr = zap_process(tsk, exit_code, 0);
clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
+   mm->core_state = core_state;
}
spin_unlock_irq(>sighand->siglock);
if (unlikely(nr < 0))

AFAICT core_state does not need to be set before the other lines. But
there may be some side effects that I overlooked!

John Ogness

Re: [PATCH v2 0/2] close_range()

2019-05-24 Thread Al Viro

On Sat, May 25, 2019 at 12:27:40AM +0300, Alexey Dobriyan wrote:

> What about orthogonality of interfaces?
> 
>   fdmap()
>   bulk_close()
> 
> Now fdmap() can be reused for lsof/criu and it is only 2 system calls
> for close-everything usecase which is OK because readdir is 4(!) minimum:
> 
>   open
>   getdents
>   getdents() = 0
>   close
> 
> Writing all of this I understood how fdmap can be made more faster which
> neither getdents() nor even read() have the luxury of: it can return
> a flag if more data is available so that application would do next fdmap()
> only if truly necessary.

Tactless question: what has traumatised you so badly about string operations?
Because that seems to be the common denominator to a lot of things...

Re: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-24 Thread Andy Lutomirski




> On May 24, 2019, at 3:41 PM, Sean Christopherson 
>  wrote:
> 
>> On Fri, May 24, 2019 at 02:27:34PM -0700, Andy Lutomirski wrote:
>> On Fri, May 24, 2019 at 1:03 PM Sean Christopherson
>>  wrote:
>>> 
 On Fri, May 24, 2019 at 12:37:44PM -0700, Andy Lutomirski wrote:
> On Fri, May 24, 2019 at 11:34 AM Xing, Cedric  
> wrote:
> 
> If "initial permissions" for enclaves are less restrictive than shared
> objects, then it'd become a backdoor for circumventing LSM when enclave
> whitelisting is *not* in place. For example, an adversary may load a page,
> which would otherwise never be executable, as an executable page in EPC.
> 
> In the case a RWX page is needed, the calling process has to have a RWX
> page serving as the source for EADD so PROCESS__EXECMEM will have been
> checked. For SGX2, changing an EPC page to RWX is subject to FILE__EXECMEM
> on /dev/sgx/enclave, which I see as a security benefit because it only
> affects the enclave but not the whole process hosting it.
 
 So the permission would be like FILE__EXECMOD on the source enclave
 page, because it would be mapped MAP_ANONYMOUS, PROT_WRITE?
 MAP_SHARED, PROT_WRITE isn't going to work because that means you can
 modify the file.
>>> 
>>> Was this in response to Cedric's comment, or to my comment?
>> 
>> Yours.  I think that requiring source pages to be actually mapped W is
>> not such a great idea.
> 
> I wasn't requiring source pages to be mapped W.  At least I didn't intend
> to require W.  What I was trying to say is that SGX could trigger an
> EXECMEM check if userspace attempted to EADD or EAUG an enclave page with
> RWX permissions, e.g.:
> 
>  if ((SECINFO.PERMS & RWX) == RWX) {
>  ret = security_mmap_file(NULL, RWX, ???);
>  if (ret)
>  return ret;
>  }
> 
> But that's a moot point if we add security_enclave_load() or whatever.
> 
>> 
>>> 
 I'm starting to think that looking at the source VMA permission bits
 or source PTE permission bits is putting a bit too much policy into
 the driver as opposed to the LSM.  How about delegating the whole
 thing to an LSM hook?  The EADD operation would invoke a new hook,
 something like:
 
 int security_enclave_load_bytes(void *source_addr, struct
 vm_area_struct *source_vma, loff_t source_offset, unsigned int
 maxperm);
 
 Then you don't have to muck with mapping anything PROT_EXEC.  Instead
 you load from a mapping of a file and the LSM applies whatever policy
 it feels appropriate.  If the first pass gets something wrong, the
 application or library authors can take it up with the SELinux folks
 without breaking the whole ABI :)
 
 (I'm proposing passing in the source_vma because this hook would be
 called with mmap_sem held for read to avoid a TOCTOU race.)
 
 If we go this route, the only substantial change to the existing
 driver that's needed for an initial upstream merge is the maxperm
 mechanism and whatever hopefully minimal API changes are needed to
 allow users to conveniently set up the mappings.  And we don't need to
 worry about how to hack around mprotect() calling into the LSM,
 because the LSM will actually be aware of SGX and can just do the
 right thing.
>>> 
>>> This doesn't address restricting which processes can run which enclaves,
>>> it only allows restricting the build flow.  Or are you suggesting this
>>> be done in addition to whitelisting sigstructs?
>> 
>> In addition.
>> 
>> But I named the function badly and gave it a bad signature, which
>> confused you.  Let's try again:
>> 
>> int security_enclave_load_from_memory(const struct vm_area_struct
>> *source, unsigned int maxperm);
> 
> I prefer security_enclave_load(), "from_memory" seems redundant at best.

Fine with me.

> 
>> Maybe some really fancy future LSM would also want loff_t
>> source_offset, but it's probably not terribly useful.  This same
>> callback would be used for EAUG.
>> 
>> Following up on your discussion with Cedric about sigstruct, the other
>> callback would be something like:
>> 
>> int security_enclave_init(struct file *sigstruct_file);
>> 
>> The main issue I see is that we also want to control the enclave's
>> ability to have RWX pages or to change a W page to X.  We might also
>> want:
>> 
>> int security_enclave_load_zeros(unsigned int maxperm);
> 
> What's the use case for this?  @maxperm will always be at least RW in
> this case, otherwise the page is useless to the enclave, and if the
> enclave can write the page, the fact that it started as zeros is
> irrelevant.

This is how EAUG could ask if RWX is okay. If an enclave is internally doing 
dynamic loading, the it will need a heap page with maxperm = RWX.  (If it’s 
well designed, it will make it RW and then RX, either by changing SECINFO or by 
asking the host to mprotect() it, but it still needs the overall RWX mask.).

Also, do real

Re: [REVIEW][PATCH 00/26] signal: Remove task argument from force_sig_info

2019-05-24 Thread Eric W. Biederman



Oleg,

Any comments on this patchset?

Eric

[PATCH] ARM: dts: rockchip: Add pin names for rk3288-veyron jaq, mickey, speedy

2019-05-24 Thread Douglas Anderson

This is like commit 0ca87bd5baa6 ("ARM: dts: rockchip: Add pin names
for rk3288-veyron-jerry") and commit ca3516b32cd9 ("ARM: dts:
rockchip: Add pin names for rk3288-veyron-minnie") but for 3 more
veyron boards.

A few notes:
- While there is most certainly duplication between all the veyron
  boards, it still feels like it is sane to just have each board have
  a full list of its pin names.  The format of "gpio-line-names" does
  not lend itself to one-off overriding and besides it seems sane to
  more fully match schematic names.  Also note that the extra
  duplication here is only in source code and is unlikely to ever
  change (since these boards are shipped).  Duplication in the .dtb
  files is unavoidable.
- veyron-jaq and veyron-mighty are very closely related and so I have
  shared a single list for them both with comments on how they are
  different.  This is just a typo fix on one of the boards, a possible
  missing signal on one of the boards (or perhaps I was never given
  the most recent schematics?) and dealing with the fact that one of
  the two boards has full sized SD.

Signed-off-by: Douglas Anderson 
---

 arch/arm/boot/dts/rk3288-veyron-jaq.dts| 207 +
 arch/arm/boot/dts/rk3288-veyron-mickey.dts | 151 +++
 arch/arm/boot/dts/rk3288-veyron-speedy.dts | 207 +
 3 files changed, 565 insertions(+)

diff --git a/arch/arm/boot/dts/rk3288-veyron-jaq.dts 
b/arch/arm/boot/dts/rk3288-veyron-jaq.dts
index e248f55ee8d2..fcd119168cb6 100644
--- a/arch/arm/boot/dts/rk3288-veyron-jaq.dts
+++ b/arch/arm/boot/dts/rk3288-veyron-jaq.dts
@@ -135,6 +135,213 @@
pinctrl-0 = <_hdmi_en>;
 };
 
+ {
+   gpio-line-names = "PMIC_SLEEP_AP",
+ "DDRIO_PWROFF",
+ "DDRIO_RETEN",
+ "TS3A227E_INT_L",
+ "PMIC_INT_L",
+ "PWR_KEY_L",
+ "AP_LID_INT_L",
+ "EC_IN_RW",
+
+ "AC_PRESENT_AP",
+ /*
+  * RECOVERY_SW_L is Chrome OS ABI.  Schematics call
+  * it REC_MODE_L.
+  */
+ "RECOVERY_SW_L",
+ "OTP_OUT",
+ "HOST1_PWR_EN",
+ "USBOTG_PWREN_H",
+ "AP_WARM_RESET_H",
+ "nFALUT2",
+ "I2C0_SDA_PMIC",
+
+ "I2C0_SCL_PMIC",
+ "SUSPEND_L",
+ "USB_INT";
+};
+
+ {
+   gpio-line-names = "CONFIG0",
+ "CONFIG1",
+ "CONFIG2",
+ "",
+ "",
+ "",
+ "",
+ "CONFIG3",
+
+ "",
+ "EMMC_RST_L",
+ "",
+ "",
+ "BL_PWR_EN",
+ "AVDD_1V8_DISP_EN";
+};
+
+ {
+   gpio-line-names = "FLASH0_D0",
+ "FLASH0_D1",
+ "FLASH0_D2",
+ "FLASH0_D3",
+ "FLASH0_D4",
+ "FLASH0_D5",
+ "FLASH0_D6",
+ "FLASH0_D7",
+
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+
+ "FLASH0_CS2/EMMC_CMD",
+ "",
+ "FLASH0_DQS/EMMC_CLKO";
+};
+
+ {
+   gpio-line-names = "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+
+ "UART0_RXD",
+ "UART0_TXD",
+ "UART0_CTS",
+ "UART0_RTS",
+ "SDIO0_D0",
+ "SDIO0_D1",
+ "SDIO0_D2",
+ "SDIO0_D3",
+
+ "SDIO0_CMD",
+ "SDIO0_CLK",
+ "BT_DEV_WAKE",/* Maybe missing from mighty? */
+ "",
+ "WIFI_ENABLE_H",
+ "BT_ENABLE_L",
+ "WIFI_HOST_WAKE",
+

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-24 Thread Josh Poimboeuf

On Fri, May 24, 2019 at 10:20:52AM +0800, Kairui Song wrote:
> On Fri, May 24, 2019 at 1:27 AM Josh Poimboeuf  wrote:
> >
> > On Fri, May 24, 2019 at 12:41:59AM +0800, Kairui Song wrote:
> > >  On Thu, May 23, 2019 at 11:24 PM Josh Poimboeuf  
> > > wrote:
> > > >
> > > > On Thu, May 23, 2019 at 10:50:24PM +0800, Kairui Song wrote:
> > > > > > > Hi Josh, this still won't fix the problem.
> > > > > > >
> > > > > > > Problem is not (or not only) with ___bpf_prog_run, what actually 
> > > > > > > went
> > > > > > > wrong is with the JITed bpf code.
> > > > > >
> > > > > > There seem to be a bunch of issues.  My patch at least fixes the 
> > > > > > failing
> > > > > > selftest reported by Alexei for ORC.
> > > > > >
> > > > > > How can I recreate your issue?
> > > > >
> > > > > Hmm, I used bcc's example to attach bpf to trace point, and with that
> > > > > fix stack trace is still invalid.
> > > > >
> > > > > CMD I used with bcc:
> > > > > python3 ./tools/stackcount.py t:sched:sched_fork
> > > >
> > > > I've had problems in the past getting bcc to build, so I was hoping it
> > > > was reproducible with a standalone selftest.
> > > >
> > > > > And I just had another try applying your patch, self test is also 
> > > > > failing.
> > > >
> > > > Is it the same selftest reported by Alexei?
> > > >
> > > >   test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap 
> > > > err -1 errno 2
> > > >
> > > > > I'm applying on my local master branch, a few days older than
> > > > > upstream, I can update and try again, am I missing anything?
> > > >
> > > > The above patch had some issues, so with some configs you might see an
> > > > objtool warning for ___bpf_prog_run(), in which case the patch doesn't
> > > > fix the test_stacktrace_map selftest.
> > > >
> > > > Here's the latest version which should fix it in all cases (based on
> > > > tip/master):
> > > >
> > > >   
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/commit/?h=bpf-orc-fix
> > >
> > > Hmm, I still get the failure:
> > > test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap
> > > err -1 errno 2
> > >
> > > And I didn't see how this will fix the issue. As long as ORC need to
> > > unwind through the JITed code it will fail. And that will happen
> > > before reaching ___bpf_prog_run.
> >
> > Ok, I was able to recreate by doing
> >
> >   echo 1 > /proc/sys/net/core/bpf_jit_enable
> >
> > first.  I'm guessing you have CONFIG_BPF_JIT_ALWAYS_ON.
> >
> 
> Yes, with JIT off it will be fixed. I can confirm that.

Here's a tentative BPF fix for the JIT frame pointer issue.  It was a
bit harder than I expected.  Encoding r12 as a base register requires a
SIB byte, so I had to add support for encoding that.  I also simplified
the prologue to resemble a GCC prologue, which decreases the prologue
size quite a bit.

Next week I can work on the corresponding ORC change.  Then I can clean
all the patches up and submit them properly.

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index afabf597c855..c9b4503558c9 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -104,9 +104,8 @@ static int bpf_size_to_x86_bytes(int bpf_size)
 /*
  * The following table maps BPF registers to x86-64 registers.
  *
- * x86-64 register R12 is unused, since if used as base address
- * register in load/store instructions, it always needs an
- * extra byte of encoding and is callee saved.
+ * RBP isn't used; it needs to be preserved to allow the unwinder to move
+ * through generated code stacks.
  *
  * Also x86-64 register R9 is unused. x86-64 register R10 is
  * used for blinding (if enabled).
@@ -122,7 +121,7 @@ static const int reg2hex[] = {
[BPF_REG_7] = 5,  /* R13 callee saved */
[BPF_REG_8] = 6,  /* R14 callee saved */
[BPF_REG_9] = 7,  /* R15 callee saved */
-   [BPF_REG_FP] = 5, /* RBP readonly */
+   [BPF_REG_FP] = 4, /* R12 readonly */
[BPF_REG_AX] = 2, /* R10 temp register */
[AUX_REG] = 3,/* R11 temp register */
 };
@@ -139,6 +138,7 @@ static bool is_ereg(u32 reg)
 BIT(BPF_REG_7) |
 BIT(BPF_REG_8) |
 BIT(BPF_REG_9) |
+BIT(BPF_REG_FP) |
 BIT(BPF_REG_AX));
 }
 
@@ -147,6 +147,11 @@ static bool is_axreg(u32 reg)
return reg == BPF_REG_0;
 }
 
+static bool is_sib_reg(u32 reg)
+{
+   return reg == BPF_REG_FP;
+}
+
 /* Add modifiers if 'reg' maps to x86-64 registers R8..R15 */
 static u8 add_1mod(u8 byte, u32 reg)
 {
@@ -190,15 +195,13 @@ struct jit_context {
 #define BPF_MAX_INSN_SIZE  128
 #define BPF_INSN_SAFETY64
 
-#define AUX_STACK_SPACE40 /* Space for RBX, R13, R14, R15, 
tailcnt */
-
-#define PROLOGUE_SIZE  37
+#define PROLOGUE_SIZE  25
 
 /*
  * Emit x86-64 prologue code for BPF program and check its size.
  * bpf_tail_call

Re: [PATCH 05/12] perf tools: Read also the end of the kernel

2019-05-24 Thread Jiri Olsa

On Fri, May 24, 2019 at 03:15:06PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, May 08, 2019 at 03:20:03PM +0200, Jiri Olsa escreveu:
> > We mark the end of kernel based on the first module,
> > but that could cover some bpf program maps. Reading
> > _etext symbol if it's present to get precise kernel
> > map end.
> 
> Investigating... Have you run 'perf test' before hitting the send
> button? :-)

yea, I got skip test.. for not having the vmlinux in place

[jolsa@krava perf]$ sudo ./perf test 1
 1: vmlinux symtab matches kallsyms   : Skip

did not realized it would break.. because I have 'Skip' in
this one always :-\ sry

jirka

> 
> - Arnaldo
> 
> [root@quaco c]# perf test 1
>  1: vmlinux symtab matches kallsyms   : FAILED!
> [root@quaco c]# perf test -v 1
>  1: vmlinux symtab matches kallsyms   :
> --- start ---
> test child forked, pid 17488
> Looking at the vmlinux_path (8 entries long)
> Using /lib/modules/5.2.0-rc1+/build/vmlinux for symbols
> WARN: 0x8c001000: diff name v: hypercall_page k: 
> xen_hypercall_set_trap_table
> WARN: 0x8c0275c0: diff name v: __ia32_sys_rt_sigreturn k: 
> __x64_sys_rt_sigreturn
> WARN: 0x8c06ac31: diff name v: end_irq_irq_disable k: 
> start_irq_irq_enable
> WARN: 0x8c06ac32: diff name v: end_irq_irq_enable k: 
> start_irq_restore_fl
> WARN: 0x8c06ac34: diff name v: end_irq_restore_fl k: start_irq_save_fl
> WARN: 0x8c06ac36: diff name v: end_irq_save_fl k: start_mmu_read_cr2
> WARN: 0x8c06ac3c: diff name v: end_mmu_read_cr3 k: start_mmu_write_cr3
> WARN: 0x8c06ac3f: diff name v: end_mmu_write_cr3 k: start_cpu_wbinvd
> WARN: 0x8c06ac41: diff name v: end_cpu_wbinvd k: 
> start_cpu_usergs_sysret64
> WARN: 0x8c06ac47: diff name v: end_cpu_usergs_sysret64 k: 
> start_cpu_swapgs
> WARN: 0x8c06ac4a: diff name v: end_cpu_swapgs k: start__mov64
> WARN: 0x8c0814b0: diff end addr for aesni_gcm_dec v: 
> 0x8c083606 k: 0x8c0817c7
> WARN: 0x8c083610: diff end addr for aesni_gcm_enc v: 
> 0x8c0856f2 k: 0x8c083927
> WARN: 0x8c085c00: diff end addr for aesni_gcm_enc_update v: 
> 0x8c087556 k: 0x8c085c31
> WARN: 0x8c087560: diff end addr for aesni_gcm_dec_update v: 
> 0x8c088f2a k: 0x8c087591
> WARN: 0x8c08b7c0: diff end addr for aesni_gcm_enc_update_avx_gen2 v: 
> 0x8c09b13c k: 0x8c08b818
> WARN: 0x8c08fac1: diff name v: _initial_blocks_done2259 k: 
> _initial_blocks_encrypted15
> WARN: 0x8c094943: diff name v: _initial_blocks_done4447 k: 
> _initial_blocks_encrypted2497
> WARN: 0x8c09a023: diff name v: _initial_blocks_done7187 k: 
> _initial_blocks_encrypted4649
> WARN: 0x8c09b140: diff end addr for aesni_gcm_dec_update_avx_gen2 v: 
> 0x8c0ab05f k: 0x8c09b198
> WARN: 0x8c09f5b6: diff name v: _initial_blocks_done9706 k: 
> _initial_blocks_encrypted7462
> WARN: 0x8c0a4619: diff name v: _initial_blocks_done11894 k: 
> _initial_blocks_encrypted9944
> WARN: 0x8c0a9eda: diff name v: _initial_blocks_done14634 k: 
> _initial_blocks_encrypted12096
> WARN: 0x8c0abcd0: diff end addr for aesni_gcm_enc_update_avx_gen4 v: 
> 0x8c0ba4a6 k: 0x8c0abd28
> WARN: 0x8c0afaa5: diff name v: _initial_blocks_done17291 k: 
> _initial_blocks_encrypted15047
> WARN: 0x8c0b4345: diff name v: _initial_blocks_done19479 k: 
> _initial_blocks_encrypted17529
> WARN: 0x8c0b9443: diff name v: _initial_blocks_done22219 k: 
> _initial_blocks_encrypted19681
> WARN: 0x8c0ba4b0: diff end addr for aesni_gcm_dec_update_avx_gen4 v: 
> 0x8c0c9229 k: 0x8c0ba508
> WARN: 0x8c0be3fa: diff name v: _initial_blocks_done24738 k: 
> _initial_blocks_encrypted22494
> WARN: 0x8c0c2e7b: diff name v: _initial_blocks_done26926 k: 
> _initial_blocks_encrypted24976
> WARN: 0x8c0c815a: diff name v: _initial_blocks_done29666 k: 
> _initial_blocks_encrypted27128
> WARN: 0x8c0dc2b0: diff name v: __ia32_sys_fork k: __x64_sys_fork
> WARN: 0x8c0dc2d0: diff name v: __ia32_sys_vfork k: __x64_sys_vfork
> WARN: 0x8c0e9eb0: diff name v: __ia32_sys_restart_syscall k: 
> __x64_sys_restart_syscall
> WARN: 0x8c0e9f30: diff name v: __ia32_sys_sgetmask k: 
> __x64_sys_sgetmask
> WARN: 0x8c0ea4b0: diff name v: __ia32_sys_pause k: __x64_sys_pause
> WARN: 0x8c0f1610: diff name v: __ia32_sys_gettid k: __x64_sys_gettid
> WARN: 0x8c0f1630: diff name v: __ia32_sys_getpid k: __x64_sys_getpid
> WARN: 0x8c0f1650: diff name v: __ia32_sys_getppid k: __x64_sys_getppid
> WARN: 0x8c0f1980: diff name v: __ia32_sys_getuid k: __x64_sys_getuid
> WARN: 0x8c0f19b0: diff name v: __ia32_sys_geteuid k: __x64_sys_geteuid
> WARN: 0x8c0f1b30: diff name v: __ia32_sys_getgid

Re: [A General Question] What should I do after getting Reviewed-by from a maintainer?

2019-05-24 Thread Randy Dunlap

On 5/22/19 6:17 PM, Gen Zhang wrote:
> Hi Andrew,
> I am starting submitting patches these days and got some patches 
> "Reviewed-by" from maintainers. After checking the 
> submitting-patches.html, I figured out what "Reviewed-by" means. But I
> didn't get the guidance on what to do after getting "Reviewed-by".
> Am I supposed to send this patch to more maintainers? Or something else?
> Thanks
> Gen
> 

[Yes, I am not Andrew. ;]

Patches should be sent to a maintainer who is responsible for merging
changes for the driver or $arch or subsystem.
And they should also be Cc-ed to the appropriate mailing list(s) and
source code author(s), usually [unless they are no longer active].

Some source files have author email addresses in them.
Or in a kernel git tree, you can use "git log path/to/source/file.c" to see
who has been making & merging patches to that file.c.
Probably the easiest thing to do is run ./scripts/get_maintainer.pl and
it will try to tell you who to send the patch to.

HTH.
-- 
~Randy

Re: [GIT PULL] Kselftest fixes update for Linux 5.2-rc2

2019-05-24 Thread pr-tracker-bot

The pull request you sent on Fri, 24 May 2019 15:19:29 -0600:

> git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest 
> tags/linux-kselftest-5.2-rc2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/7f8b40e3dbcd7dbeabe6be8f157376ef0b890e06

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [GIT PULL] Devicetree fixes for 5.2-rc

2019-05-24 Thread pr-tracker-bot

The pull request you sent on Fri, 24 May 2019 16:01:21 -0500:

> git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git 
> tags/devicetree-fixes-for-5.2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e7bd3e248bc36451fdbf2a2e3a3c5a23cd0b1f6f

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

[GIT PULL] SCSI fixes for 5.2-rc1

2019-05-24 Thread James Bottomley

This is the same set of patches sent in the merge window as the final
pull except that Martin's read only rework is replaced with a simple
revert of the original change that caused the regression.  Everything
else is an obvious fix or small cleanup.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

The short changelog is:

Colin Ian King (1):
  scsi: bnx2fc: fix incorrect cast to u64 on shift operation

Erwan Velu (1):
  scsi: smartpqi: Reporting unhandled SCSI errors

James Smart (4):
  scsi: lpfc: Update lpfc version to 12.2.0.2
  scsi: lpfc: add check for loss of ndlp when sending RRQ
  scsi: lpfc: correct rcu unlock issue in lpfc_nvme_info_show
  scsi: lpfc: resolve lockdep warnings

Martin K. Petersen (1):
  Revert "scsi: sd: Keep disk read-only when re-reading partition"

Quinn Tran (1):
  scsi: qla2xxx: Add cleanup for PCI EEH recovery

YueHaibing (3):
  scsi: myrs: Fix uninitialized variable
  scsi: qedi: remove set but not used variables 'cdev' and 'udev'
  scsi: qedi: remove memset/memcpy to nfunc and use func instead

And the diffstat:

 drivers/scsi/bnx2fc/bnx2fc_hwi.c  |   2 +-
 drivers/scsi/lpfc/lpfc_attr.c |  37 +++---
 drivers/scsi/lpfc/lpfc_els.c  |   5 +-
 drivers/scsi/lpfc/lpfc_sli.c  |  84 -
 drivers/scsi/lpfc/lpfc_version.h  |   2 +-
 drivers/scsi/myrs.c   |   2 +-
 drivers/scsi/qedi/qedi_dbg.c  |  32 ++---
 drivers/scsi/qedi/qedi_iscsi.c|   4 -
 drivers/scsi/qla2xxx/qla_os.c | 221 +-
 drivers/scsi/sd.c |   3 +-
 drivers/scsi/smartpqi/smartpqi_init.c |  23 ++--
 11 files changed, 189 insertions(+), 226 deletions(-)

With full diff below.

James

---

diff --git a/drivers/scsi/bnx2fc/bnx2fc_hwi.c b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
index 039328d9ef13..30e6d78e82f0 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_hwi.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
@@ -830,7 +830,7 @@ static void bnx2fc_process_unsol_compl(struct bnx2fc_rport 
*tgt, u16 wqe)
((u64)err_entry->data.err_warn_bitmap_hi << 32) |
(u64)err_entry->data.err_warn_bitmap_lo;
for (i = 0; i < BNX2FC_NUM_ERR_BITS; i++) {
-   if (err_warn_bit_map & (u64) (1 << i)) {
+   if (err_warn_bit_map & ((u64)1 << i)) {
err_warn = i;
break;
}
diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index e9adb3f1961d..d4c65e2109e2 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -176,6 +176,7 @@ lpfc_nvme_info_show(struct device *dev, struct 
device_attribute *attr,
int i;
int len = 0;
char tmp[LPFC_MAX_NVME_INFO_TMP_LEN] = {0};
+   unsigned long iflags = 0;
 
if (!(vport->cfg_enable_fc4_type & LPFC_ENABLE_NVME)) {
len = scnprintf(buf, PAGE_SIZE, "NVME Disabled\n");
@@ -354,7 +355,7 @@ lpfc_nvme_info_show(struct device *dev, struct 
device_attribute *attr,
  phba->sli4_hba.io_xri_max,
  lpfc_sli4_get_els_iocb_cnt(phba));
if (strlcat(buf, tmp, PAGE_SIZE) >= PAGE_SIZE)
-   goto buffer_done;
+   goto rcu_unlock_buf_done;
 
/* Port state is only one of two values for now. */
if (localport->port_id)
@@ -370,15 +371,15 @@ lpfc_nvme_info_show(struct device *dev, struct 
device_attribute *attr,
  wwn_to_u64(vport->fc_nodename.u.wwn),
  localport->port_id, statep);
if (strlcat(buf, tmp, PAGE_SIZE) >= PAGE_SIZE)
-   goto buffer_done;
+   goto rcu_unlock_buf_done;
 
list_for_each_entry(ndlp, >fc_nodes, nlp_listp) {
nrport = NULL;
-   spin_lock(>phba->hbalock);
+   spin_lock_irqsave(>phba->hbalock, iflags);
rport = lpfc_ndlp_get_nrport(ndlp);
if (rport)
nrport = rport->remoteport;
-   spin_unlock(>phba->hbalock);
+   spin_unlock_irqrestore(>phba->hbalock, iflags);
if (!nrport)
continue;
 
@@ -397,39 +398,39 @@ lpfc_nvme_info_show(struct device *dev, struct 
device_attribute *attr,
 
/* Tab in to show lport ownership. */
if (strlcat(buf, "NVME RPORT   ", PAGE_SIZE) >= PAGE_SIZE)
-   goto buffer_done;
+   goto rcu_unlock_buf_done;
if (phba->brd_no >= 10) {
if (strlcat(buf, " ", PAGE_SIZE) >= PAGE_SIZE)
-   goto buffer_done;
+   goto rcu_unlock_buf_done;
}
 
scnprintf(tmp, sizeof(tmp), "WWPN x%llx ",

Revert "leds: avoid races with workqueue"?

2019-05-24 Thread Hugh Dickins

Hi Pavel,

I'm having to revert 0db37915d912 ("leds: avoid races with workqueue")
from my 5.2-rc testing tree, because lockdep and other debug options
don't like it: net/mac80211/led.c arranges for led_blink_setup() to be
called at softirq time, and flush_work() is not good for calling then.

Hugh


WARNING: inconsistent lock state
5.2.0-rc1 #1 Tainted: GW

inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
6e30541b ((work_completion)(_cdev->set_brightness_work)){+.?.}, at: 
__flush_work+0x3b/0x38a
{SOFTIRQ-ON-W} state was registered at:
  lock_acquire+0x146/0x1a1
  __flush_work+0x5b/0x38a
  flush_work+0xb/0xd
  led_blink_setup+0x1e/0xd3
  led_blink_set+0x3f/0x44
  tpt_trig_timer+0xdb/0x106
  ieee80211_mod_tpt_led_trig+0xed/0x112
  __ieee80211_recalc_idle+0xd9/0x11f
  ieee80211_idle_off+0xe/0x10
  ieee80211_add_chanctx+0x6c/0x2df
  ieee80211_new_chanctx+0x7d/0xe8
  ieee80211_vif_use_channel+0x163/0x1fe
  ieee80211_prep_connection+0x9db/0xbac
  ieee80211_mgd_auth+0x274/0x328
  ieee80211_auth+0x13/0x15
  cfg80211_mlme_auth+0x1e1/0x341
  nl80211_authenticate+0x25c/0x29e
  genl_family_rcv_msg+0x2b7/0x31a
  genl_rcv_msg+0x4a/0x6c
  netlink_rcv_skb+0x55/0xaa
  genl_rcv+0x23/0x32
  netlink_unicast+0xfc/0x1bb
  netlink_sendmsg+0x2c6/0x335
  sock_sendmsg+0x12/0x1d
  ___sys_sendmsg+0x1c5/0x23d
  __sys_sendmsg+0x4b/0x75
  __x64_sys_sendmsg+0x1a/0x1c
  do_syscall_64+0x51/0x182
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
irq event stamp: 44098
hardirqs last  enabled at (44098): [] 
_raw_spin_unlock_irqrestore+0x3a/0x5b
hardirqs last disabled at (44097): [] 
_raw_spin_lock_irqsave+0x13/0x4c
softirqs last  enabled at (44088): [] 
_local_bh_enable+0x1e/0x20
softirqs last disabled at (44089): [] irq_exit+0x69/0xb9

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock((work_completion)(_cdev->set_brightness_work));
  
lock((work_completion)(_cdev->set_brightness_work));

 *** DEADLOCK ***

2 locks held by swapper/1/0:
 #0: 02d634a0 ((_trig->timer)){+.-.}, at: call_timer_fn+0x0/0x2ce
 #1: 7ed2567d (>leddev_list_lock){.+.?}, at: 
tpt_trig_timer+0xbe/0x106

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Tainted: GW 5.2.0-rc1 #1
Hardware name: LENOVO 4174EH1/4174EH1, BIOS 8CET51WW (1.31 ) 11/29/2011
Call Trace:
 
 dump_stack+0x67/0x93
 print_usage_bug+0x292/0x2a5
 ? print_irq_inversion_bug+0x1cb/0x1cb
 mark_lock+0x307/0x51e
 __lock_acquire+0x2c0/0x762
 lock_acquire+0x146/0x1a1
 ? __flush_work+0x3b/0x38a
 ? __ieee80211_create_tpt_led_trigger+0xcb/0xcb
 __flush_work+0x5b/0x38a
 ? __flush_work+0x3b/0x38a
 ? mark_held_locks+0x47/0x63
 ? _raw_spin_unlock_irqrestore+0x3a/0x5b
 ? _raw_spin_unlock_irqrestore+0x3a/0x5b
 ? lockdep_hardirqs_on+0x196/0x1a5
 ? try_to_del_timer_sync+0x44/0x4f
 ? trace_hardirqs_on+0xc7/0xf7
 ? __ieee80211_create_tpt_led_trigger+0xcb/0xcb
 ? _raw_spin_unlock_irqrestore+0x46/0x5b
 ? __ieee80211_create_tpt_led_trigger+0xcb/0xcb
 flush_work+0xb/0xd
 led_blink_setup+0x1e/0xd3
 led_blink_set+0x3f/0x44
 tpt_trig_timer+0xdb/0x106
 ? add_timer_on+0xce/0xce
 call_timer_fn+0x11e/0x2ce
 ? __ieee80211_create_tpt_led_trigger+0xcb/0xcb
 expire_timers+0x141/0x197
 run_timer_softirq+0x65/0x10e
 __do_softirq+0x1bf/0x430
 irq_exit+0x69/0xb9
 smp_apic_timer_interrupt+0x1ee/0x269
 apic_timer_interrupt+0xf/0x20
 
RIP: 0010:cpuidle_enter_state+0x1f4/0x34d
Code: ff e8 36 0c ac ff 45 84 ff 74 16 9c 58 f6 c4 02 74 08 0f 0b fa e8 e5 da 
b4 ff 31 ff e8 23 c9 b1 ff e8 f0 d8 b4 ff fb 45 85 ed <0f> 88 e2 00 00 00 49 63 
f5 b9 e8 03 00 00 48 6b c6 60 49 8d 7c 04
RSP: 0018:888234d8be58 EFLAGS: 0206 ORIG_RAX: ff13
RAX: 888234d84300 RBX: e8c864c0 RCX: 001f
RDX:  RSI: 0006 RDI: 888234d84300
RBP: 888234d8be98 R08: 0002 R09: fffa2dd3f8df
R10: 0ed5 R11: 0086 R12: 8229e320
R13: 0005 R14: 8229e518 R15: 
 ? cpuidle_enter_state+0x1f0/0x34d
 cpuidle_enter+0x28/0x36
 call_cpuidle+0x3b/0x3d
 do_idle+0x189/0x1eb
 cpu_startup_entry+0x1a/0x1e
 start_secondary+0xfe/0x11b
 secondary_startup_64+0xa4/0xb0
BUG: sleeping function called from invalid context at kernel/workqueue.c:2974
in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
INFO: lockdep is turned off.
Preemption disabled at:
[] start_secondary+0x48/0x11b
CPU: 1 PID: 0 Comm: swapper/1 Tainted: GW 5.2.0-rc1 #1
Hardware name: LENOVO 4174EH1/4174EH1, BIOS 8CET51WW (1.31 ) 11/29/2011
Call Trace:
 
 dump_stack+0x67/0x93
 ? start_secondary+0x48/0x11b
 ___might_sleep+0x229/0x240
 ? __ieee80211_create_tpt_led_trigger+0xcb/0xcb
 __might_sleep+0x63/0x77
 ? __flush_work+0x3b/0x38a
 __flush_work+0x84/0x38a
 ? mark_held_locks+0x47/0x63
 ? _raw_spin_unlock_irqrestore+0x3a/0x5b
 ?

Re: [PATCH v7 5/5] namei: resolveat(2) syscall

2019-05-24 Thread Linus Torvalds

On Tue, May 7, 2019 at 9:44 AM Aleksa Sarai  wrote:
>
> The most obvious syscall to add support for the new LOOKUP_* scoping
> flags would be openat(2) (along with the required execveat(2) change
> included in this series). However, there are a few reasons to not do
> this:

So honestly, this last patch is what turns me off the whole thing.

It goes from a nice new feature ("you can use O_NOSYMLINKS to disallow
symlink traversal") to a special-case joke that isn't worth it any
more. You get a useless path descrptor back from s special hacky
system call, you don't actually get the useful data that you probably
*want* the open to get you.

Sure, you could eventually then use a *second* system call (openat
with O_EMPTYPATH) to actually get something you can *use*, but at this
point you've just wasted everybodys time and effort with a pointless
second system call.

So I really don't see the point of this whole thing. Why even bother.
Nobody sane will ever use that odd two-systemcall model, and even if
they did, it would be slower and inconvenient.

The whole and only point of this seems to be the two lines that say

   if (flags & ~VALID_RESOLVE_FLAGS)
  return -EINVAL;

but that adds absolutely zero value to anything.  The argument is that
"we can't add it to existing flags, because old kernels won't honor
it", but that's a completely BS argument, since the user has to have a
fallback anyway for the old kernel case - so we literally could much
more conveniently just expose it as a prctl() or something to _ask_
the kernel what flags it honors.

So to me, this whole argument means that "Oh, we'll make it really
inconvenient to actually use this".

If we want to introduce a new system call that allows cool new
features, it should have *more* powerful semantics than the existing
ones, not be clearly weaker and less useful.

So how about making the new system call be something that is a
*superset* of "openat()" so that people can use that, and then if it
fails, just fall back to openat(). But if it succeeds, it just
succeeds, and you don't need to then do other system calls to actually
make it useful.

Make the new system call something people *want* to use because it's
useful, not a crippled useless thing that has some special case use
for some limited thing and just wastes system call space.

Example *useful* system call attributes:

 - make it like openat(), but have another argument with the "limit flags"

 - maybe return more status of the resulting file. People very
commonly do "open->fstat" just to get the size for mmap or to check
some other detail of the file before use.

In other words, make the new system call *useful*. Not some castrated
"not useful on its own" thing.

So I still support the whole "let's make it easy to limit path lookup
in sane ways", but this model of then limiting using the result sanely
just makes me a sad panda.

 Linus

Re: [PATCH AUTOSEL 5.1 033/375] leds: avoid races with workqueue

2019-05-24 Thread Pavel Machek

Hi!

Could we hold this patch for now?

> From: Pavel Machek 
> 
> [ Upstream commit 0db37915d912e8dc6588f25da76d3ed36718d92f ]
> 
> There are races between "main" thread and workqueue. They manifest
> themselves on Thinkpad X60:
> 
> This should result in LED blinking, but it turns it off instead:
> 
> root@amd:/data/pavel# cd /sys/class/leds/tpacpi\:\:power
> root@amd:/sys/class/leds/tpacpi::power# echo timer > trigger
> root@amd:/sys/class/leds/tpacpi::power# echo timer > trigger
> 
> It should be possible to transition from blinking to solid on by echo
> 0 > brightness; echo 1 > brightness... but that does not work, either,
> if done too quickly.
> 
> Synchronization of the workqueue fixes both.
> 
> Fixes: 1afcadfcd184 ("leds: core: Use set_brightness_work for the blocking 
> op")
> Signed-off-by: Pavel Machek 
> Signed-off-by: Jacek Anaszewski 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/leds/led-class.c | 1 +
>  drivers/leds/led-core.c  | 5 +
>  2 files changed, 6 insertions(+)

> index e3da7c03da1b5..e9ae7f87ab900 100644
> --- a/drivers/leds/led-core.c
> +++ b/drivers/leds/led-core.c
> @@ -164,6 +164,11 @@ static void led_blink_setup(struct led_classdev 
> *led_cdev,
>unsigned long *delay_on,
>unsigned long *delay_off)
>  {
> + /*
> +  * If "set brightness to 0" is pending in workqueue, we don't
> +  * want that to be reordered after blink_set()
> +  */
> + flush_work(_cdev->set_brightness_work);
>   if (!test_bit(LED_BLINK_ONESHOT, _cdev->work_flags) &&
>   led_cdev->blink_set &&
>   !led_cdev->blink_set(led_cdev, delay_on, delay_off))

This part is likely buggy. It seems triggers are using this from
atomic context... ledtrig-disk for example.


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH 00/18] locking/atomic: atomic64 type cleanup

2019-05-24 Thread Andrea Parri

> ---
> Subject: Documentation/atomic_t.txt: Clarify pure non-rmw usage
> 
> Clarify that pure non-RMW usage of atomic_t is pointless, there is
> nothing 'magical' about atomic_set() / atomic_read().
> 
> This is something that seems to confuse people, because I happen upon it
> semi-regularly.
> 
> Acked-by: Will Deacon 
> Reviewed-by: Greg Kroah-Hartman 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  Documentation/atomic_t.txt | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/atomic_t.txt b/Documentation/atomic_t.txt
> index dca3fb0554db..89eae7f6b360 100644
> --- a/Documentation/atomic_t.txt
> +++ b/Documentation/atomic_t.txt
> @@ -81,9 +81,11 @@ SEMANTICS
>  
>  The non-RMW ops are (typically) regular LOADs and STOREs and are canonically
>  implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
> -smp_store_release() respectively.
> +smp_store_release() respectively. Therefore, if you find yourself only using
> +the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
> +and are doing it wrong.

The counterargument (not so theoretic, just look around in the kernel!) is:
we all 'forget' to use READ_ONCE() and WRITE_ONCE(), it should be difficult
or more difficult to forget to use atomic_read() and atomic_set()...   IAC,
I wouldn't call any of them 'wrong'.

  Andrea


>  
> -The one detail to this is that atomic_set{}() should be observable to the RMW
> +A subtle detail of atomic_set{}() is that it should be observable to the RMW
>  ops. That is:
>  
>C atomic-set

re: net: ll_temac: Cleanup multicast filter on change

2019-05-24 Thread Colin Ian King

Hi,

static analysis with Coverity has detected a potential issue with the
following commit:

commit 1b3fa5cf859bce7094ac18d32f54af8a7148ad51
Author: Esben Haabendal 
Date:   Thu May 23 14:02:21 2019 +0200

net: ll_temac: Cleanup multicast filter on change

In function temac_set_multicast_list
(drivers/net/ethernet/xilinx/ll_temac_main.c), loop counter i is *only*
initialized in the code block:

if (!netdev_mc_empty(ndev)) {
...
}

Following this if code block there is a while-loop that iterates using i
as counter which will be problematic if i has not been correctly
initialized:

while (i < MULTICAST_CAM_TABLE_NUM) {
temac_indirect_out32_locked(lp, XTE_MAW0_OFFSET, 0);
temac_indirect_out32_locked(lp, XTE_MAW1_OFFSET, i << 16);
i++;
}

Colin

Re: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-24 Thread Sean Christopherson

On Fri, May 24, 2019 at 02:27:34PM -0700, Andy Lutomirski wrote:
> On Fri, May 24, 2019 at 1:03 PM Sean Christopherson
>  wrote:
> >
> > On Fri, May 24, 2019 at 12:37:44PM -0700, Andy Lutomirski wrote:
> > > On Fri, May 24, 2019 at 11:34 AM Xing, Cedric  
> > > wrote:
> > > >
> > > > If "initial permissions" for enclaves are less restrictive than shared
> > > > objects, then it'd become a backdoor for circumventing LSM when enclave
> > > > whitelisting is *not* in place. For example, an adversary may load a 
> > > > page,
> > > > which would otherwise never be executable, as an executable page in EPC.
> > > >
> > > > In the case a RWX page is needed, the calling process has to have a RWX
> > > > page serving as the source for EADD so PROCESS__EXECMEM will have been
> > > > checked. For SGX2, changing an EPC page to RWX is subject to 
> > > > FILE__EXECMEM
> > > > on /dev/sgx/enclave, which I see as a security benefit because it only
> > > > affects the enclave but not the whole process hosting it.
> > >
> > > So the permission would be like FILE__EXECMOD on the source enclave
> > > page, because it would be mapped MAP_ANONYMOUS, PROT_WRITE?
> > > MAP_SHARED, PROT_WRITE isn't going to work because that means you can
> > > modify the file.
> >
> > Was this in response to Cedric's comment, or to my comment?
> 
> Yours.  I think that requiring source pages to be actually mapped W is
> not such a great idea.

I wasn't requiring source pages to be mapped W.  At least I didn't intend
to require W.  What I was trying to say is that SGX could trigger an
EXECMEM check if userspace attempted to EADD or EAUG an enclave page with
RWX permissions, e.g.:

  if ((SECINFO.PERMS & RWX) == RWX) {
  ret = security_mmap_file(NULL, RWX, ???);
  if (ret)
  return ret;
  }

But that's a moot point if we add security_enclave_load() or whatever.

> 
> >
> > > I'm starting to think that looking at the source VMA permission bits
> > > or source PTE permission bits is putting a bit too much policy into
> > > the driver as opposed to the LSM.  How about delegating the whole
> > > thing to an LSM hook?  The EADD operation would invoke a new hook,
> > > something like:
> > >
> > > int security_enclave_load_bytes(void *source_addr, struct
> > > vm_area_struct *source_vma, loff_t source_offset, unsigned int
> > > maxperm);
> > >
> > > Then you don't have to muck with mapping anything PROT_EXEC.  Instead
> > > you load from a mapping of a file and the LSM applies whatever policy
> > > it feels appropriate.  If the first pass gets something wrong, the
> > > application or library authors can take it up with the SELinux folks
> > > without breaking the whole ABI :)
> > >
> > > (I'm proposing passing in the source_vma because this hook would be
> > > called with mmap_sem held for read to avoid a TOCTOU race.)
> > >
> > > If we go this route, the only substantial change to the existing
> > > driver that's needed for an initial upstream merge is the maxperm
> > > mechanism and whatever hopefully minimal API changes are needed to
> > > allow users to conveniently set up the mappings.  And we don't need to
> > > worry about how to hack around mprotect() calling into the LSM,
> > > because the LSM will actually be aware of SGX and can just do the
> > > right thing.
> >
> > This doesn't address restricting which processes can run which enclaves,
> > it only allows restricting the build flow.  Or are you suggesting this
> > be done in addition to whitelisting sigstructs?
> 
> In addition.
> 
> But I named the function badly and gave it a bad signature, which
> confused you.  Let's try again:
> 
> int security_enclave_load_from_memory(const struct vm_area_struct
> *source, unsigned int maxperm);

I prefer security_enclave_load(), "from_memory" seems redundant at best.

> Maybe some really fancy future LSM would also want loff_t
> source_offset, but it's probably not terribly useful.  This same
> callback would be used for EAUG.
> 
> Following up on your discussion with Cedric about sigstruct, the other
> callback would be something like:
> 
> int security_enclave_init(struct file *sigstruct_file);
> 
> The main issue I see is that we also want to control the enclave's
> ability to have RWX pages or to change a W page to X.  We might also
> want:
> 
> int security_enclave_load_zeros(unsigned int maxperm);

What's the use case for this?  @maxperm will always be at least RW in
this case, otherwise the page is useless to the enclave, and if the
enclave can write the page, the fact that it started as zeros is
irrelevant.

> An enclave that's going to modify its own code will need memory with
> maxperm = RWX or WX.
> 
> But this is a bit awkward if the LSM's decision depends on the
> sigstruct.  We could get fancy and require that the sigstruct be
> supplied before any EADD operations so that the maxperm decisions can
> depend on the sigstruct.
> 
> Am I making more sense now?

Yep.  Requiring .sigstruct at ECREATE would be trivial.  If we

Re: [PATCH net] bonding/802.3ad: fix slave link initialization transition states

2019-05-24 Thread महेश बंडेवार

On Fri, May 24, 2019 at 2:17 PM Jay Vosburgh  wrote:
>
> Jarod Wilson  wrote:
>
> >Once in a while, with just the right timing, 802.3ad slaves will fail to
> >properly initialize, winding up in a weird state, with a partner system
> >mac address of 00:00:00:00:00:00. This started happening after a fix to
> >properly track link_failure_count tracking, where an 802.3ad slave that
> >reported itself as link up in the miimon code, but wasn't able to get a
> >valid speed/duplex, started getting set to BOND_LINK_FAIL instead of
> >BOND_LINK_DOWN. That was the proper thing to do for the general "my link
> >went down" case, but has created a link initialization race that can put
> >the interface in this odd state.
>
Are there any notification consequences because of this change?

>Reading back in the git history, the ultimate cause of this
> "weird state" appears to be devices that assert NETDEV_UP prior to
> actually being able to supply sane speed/duplex values, correct?
>
> Presuming that this is the case, I don't see that there's much
> else to be done here, and so:
>
> Acked-by: Jay Vosburgh 
>
> >The simple fix is to instead set the slave link to BOND_LINK_DOWN again,
> >if the link has never been up (last_link_up == 0), so the link state
> >doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed
> >in this case, it simply hasn't been up yet, and this prevents the
> >unnecessary state change from DOWN to FAIL and getting stuck in an init
> >failure w/o a partner mac.
> >
> >Fixes: ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking")
> >CC: Jay Vosburgh 
> >CC: Veaceslav Falico 
> >CC: Andy Gospodarek 
> >CC: "David S. Miller" 
> >CC: net...@vger.kernel.org
> >Tested-by: Heesoon Kim 
> >Signed-off-by: Jarod Wilson 
>
>
>
> >---
> > drivers/net/bonding/bond_main.c | 15 ++-
> > 1 file changed, 10 insertions(+), 5 deletions(-)
> >
> >diff --git a/drivers/net/bonding/bond_main.c 
> >b/drivers/net/bonding/bond_main.c
> >index 062fa7e3af4c..407f4095a37a 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
> >@@ -3122,13 +3122,18 @@ static int bond_slave_netdev_event(unsigned long 
> >event,
> >   case NETDEV_CHANGE:
> >   /* For 802.3ad mode only:
> >* Getting invalid Speed/Duplex values here will put slave
> >-   * in weird state. So mark it as link-fail for the time
> >-   * being and let link-monitoring (miimon) set it right when
> >-   * correct speeds/duplex are available.
> >+   * in weird state. Mark it as link-fail if the link was
> >+   * previously up or link-down if it hasn't yet come up, and
> >+   * let link-monitoring (miimon) set it right when correct
> >+   * speeds/duplex are available.
> >*/
> >   if (bond_update_speed_duplex(slave) &&
> >-  BOND_MODE(bond) == BOND_MODE_8023AD)
> >-  slave->link = BOND_LINK_FAIL;
> >+  BOND_MODE(bond) == BOND_MODE_8023AD) {
> >+  if (slave->last_link_up)
> >+  slave->link = BOND_LINK_FAIL;
> >+  else
> >+  slave->link = BOND_LINK_DOWN;
> >+  }
> >
> >   if (BOND_MODE(bond) == BOND_MODE_8023AD)
> >   bond_3ad_adapter_speed_duplex_changed(slave);
> >--
> >2.20.1
> >

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 915 matches

Mail list logo