Re: [PATCH V10 05/19] block: introduce bvec_last_segment()

2018-11-18 Thread Ming Lei
On Thu, Nov 15, 2018 at 03:23:56PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:52:52PM +0800, Ming Lei wrote:
> > BTRFS and guard_bio_eod() need to get the last singlepage segment
> > from one multipage bvec, so introduce this helper to make them happy.
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: Alexander Viro 
> > Cc: linux-fsde...@vger.kernel.org
> > Cc: Shaohua Li 
> > Cc: linux-r...@vger.kernel.org
> > Cc: linux-er...@lists.ozlabs.org
> > Cc: David Sterba 
> > Cc: linux-bt...@vger.kernel.org
> > Cc: Darrick J. Wong 
> > Cc: linux-...@vger.kernel.org
> > Cc: Gao Xiang 
> > Cc: Christoph Hellwig 
> > Cc: Theodore Ts'o 
> > Cc: linux-e...@vger.kernel.org
> > Cc: Coly Li 
> > Cc: linux-bca...@vger.kernel.org
> > Cc: Boaz Harrosh 
> > Cc: Bob Peterson 
> > Cc: cluster-de...@redhat.com
> 
> Reviewed-by: Omar Sandoval 
> 
> Minor comments below.
> 
> > Signed-off-by: Ming Lei 
> > ---
> >  include/linux/bvec.h | 25 +
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> > index 3d61352cd8cf..01616a0b6220 100644
> > --- a/include/linux/bvec.h
> > +++ b/include/linux/bvec.h
> > @@ -216,4 +216,29 @@ static inline bool mp_bvec_iter_advance(const struct 
> > bio_vec *bv,
> > .bi_bvec_done   = 0,\
> >  }
> >  
> > +/*
> > + * Get the last singlepage segment from the multipage bvec and store it
> > + * in @seg
> > + */
> > +static inline void bvec_last_segment(const struct bio_vec *bvec,
> > +   struct bio_vec *seg)
> 
> Indentation is all messed up here.

Will fix it.

> 
> > +{
> > +   unsigned total = bvec->bv_offset + bvec->bv_len;
> > +   unsigned last_page = total / PAGE_SIZE;
> > +
> > +   if (last_page * PAGE_SIZE == total)
> > +   last_page--;
> 
> I think this could just be
> 
>   unsigned int last_page = (total - 1) / PAGE_SIZE;

This way is really elegant.

Thanks,
Ming


Re: [PATCH V10 04/19] block: use bio_for_each_bvec() to map sg

2018-11-18 Thread Ming Lei
On Fri, Nov 16, 2018 at 02:33:14PM +0100, Christoph Hellwig wrote:
> > +   if (!*sg)
> > +   return sglist;
> > +   else {
> 
> No need for an else after an early return.

OK, good catch!

Thanks,
Ming


Re: [PATCH V10 03/19] block: use bio_for_each_bvec() to compute multi-page bvec count

2018-11-18 Thread Ming Lei
On Thu, Nov 15, 2018 at 12:20:28PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:52:50PM +0800, Ming Lei wrote:
> > First it is more efficient to use bio_for_each_bvec() in both
> > blk_bio_segment_split() and __blk_recalc_rq_segments() to compute how
> > many multi-page bvecs there are in the bio.
> > 
> > Secondly once bio_for_each_bvec() is used, the bvec may need to be
> > splitted because its length can be very longer than max segment size,
> > so we have to split the big bvec into several segments.
> > 
> > Thirdly when splitting multi-page bvec into segments, the max segment
> > limit may be reached, so the bio split need to be considered under
> > this situation too.
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: Alexander Viro 
> > Cc: linux-fsde...@vger.kernel.org
> > Cc: Shaohua Li 
> > Cc: linux-r...@vger.kernel.org
> > Cc: linux-er...@lists.ozlabs.org
> > Cc: David Sterba 
> > Cc: linux-bt...@vger.kernel.org
> > Cc: Darrick J. Wong 
> > Cc: linux-...@vger.kernel.org
> > Cc: Gao Xiang 
> > Cc: Christoph Hellwig 
> > Cc: Theodore Ts'o 
> > Cc: linux-e...@vger.kernel.org
> > Cc: Coly Li 
> > Cc: linux-bca...@vger.kernel.org
> > Cc: Boaz Harrosh 
> > Cc: Bob Peterson 
> > Cc: cluster-de...@redhat.com
> > Signed-off-by: Ming Lei 
> > ---
> >  block/blk-merge.c | 90 
> > ++-
> >  1 file changed, 76 insertions(+), 14 deletions(-)
> > 
> > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > index 91b2af332a84..6f7deb94a23f 100644
> > --- a/block/blk-merge.c
> > +++ b/block/blk-merge.c
> > @@ -160,6 +160,62 @@ static inline unsigned get_max_io_size(struct 
> > request_queue *q,
> > return sectors;
> >  }
> >  
> > +/*
> > + * Split the bvec @bv into segments, and update all kinds of
> > + * variables.
> > + */
> > +static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv,
> > +   unsigned *nsegs, unsigned *last_seg_size,
> > +   unsigned *front_seg_size, unsigned *sectors)
> > +{
> > +   bool need_split = false;
> > +   unsigned len = bv->bv_len;
> > +   unsigned total_len = 0;
> > +   unsigned new_nsegs = 0, seg_size = 0;
> 
> "unsigned int" here and everywhere else.
> 
> > +   if ((*nsegs >= queue_max_segments(q)) || !len)
> > +   return need_split;
> > +
> > +   /*
> > +* Multipage bvec may be too big to hold in one segment,
> > +* so the current bvec has to be splitted as multiple
> > +* segments.
> > +*/
> > +   while (new_nsegs + *nsegs < queue_max_segments(q)) {
> > +   seg_size = min(queue_max_segment_size(q), len);
> > +
> > +   new_nsegs++;
> > +   total_len += seg_size;
> > +   len -= seg_size;
> > +
> > +   if ((queue_virt_boundary(q) && ((bv->bv_offset +
> > +   total_len) & queue_virt_boundary(q))) || !len)
> > +   break;
> 
> Checking queue_virt_boundary(q) != 0 is superfluous, and the len check
> could just control the loop, i.e.,
> 
>   while (len && new_nsegs + *nsegs < queue_max_segments(q)) {
>   seg_size = min(queue_max_segment_size(q), len);
> 
>   new_nsegs++;
>   total_len += seg_size;
>   len -= seg_size;
> 
>   if ((bv->bv_offset + total_len) & queue_virt_boundary(q))
>   break;
>   }
> 
> And if you rewrite it this way, I _think_ you can get rid of this
> special case:
> 
>   if ((*nsegs >= queue_max_segments(q)) || !len)
>   return need_split;
> 
> above.

Good point, will do in next version.

> 
> > +   }
> > +
> > +   /* split in the middle of the bvec */
> > +   if (len)
> > +   need_split = true;
> 
> need_split is unnecessary, just return len != 0.

OK.

> 
> > +
> > +   /* update front segment size */
> > +   if (!*nsegs) {
> > +   unsigned first_seg_size = seg_size;
> > +
> > +   if (new_nsegs > 1)
> > +   first_seg_size = queue_max_segment_size(q);
> > +   if (*front_seg_size < first_seg_size)
> > +   *front_seg_size = first_seg_size;
> > +   }
> > +
> > +   /* update other varibles */
> > +   *last_seg_size = seg_size;
> > +   *nsegs += new_nsegs;
> > +   if (sectors)
> > +   *sectors += total_len >> 9;
> > +
> > +   return need_split;
> > +}
> > +
> >  static struct bio *blk_bio_segment_split(struct request_queue *q,
> >  struct bio *bio,
> >  struct bio_set *bs,
> > @@ -173,7 +229,7 @@ static struct bio *blk_bio_segment_split(struct 
> > request_queue *q,
> > struct bio *new = NULL;
> > const unsigned max_sectors = get_max_io_size(q, bio);
> >  
> > -   bio_for_each_segment(bv, bio, iter) {
> > +   bio_for_each_bvec(bv, bio, iter) {
> > /*
> >  * If the queue doesn't support SG gaps and adding this
> >  * offset would create 

Re: [PATCH v2 3/3] mmc: sdhci-pci: only install voltage switch method when useful

2018-11-18 Thread Adrian Hunter
On 16/11/18 6:58 PM, Anisse Astier wrote:
> Hi Adrian,
> 
> On Tue, Oct 23, 2018 at 12:07:29PM +0200, Anisse Astier wrote:
>> If there's no ACPI DSM for voltage switch, it will just cause a lot of
>> debug info down the line, we only need one at startup.
>>
>> Signed-off-by: Anisse Astier 
>> ---
>>  drivers/mmc/host/sdhci-pci-core.c | 11 +++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/mmc/host/sdhci-pci-core.c 
>> b/drivers/mmc/host/sdhci-pci-core.c
>> index f2c1fb339d66..95fdbb883c7e 100644
>> --- a/drivers/mmc/host/sdhci-pci-core.c
>> +++ b/drivers/mmc/host/sdhci-pci-core.c
>> @@ -723,6 +723,7 @@ static const struct dmi_system_id board_no_1_8v[] = {
>>  static void byt_probe_slot(struct sdhci_pci_slot *slot)
>>  {
>>  struct mmc_host_ops *ops = >host->mmc_host_ops;
>> +struct intel_host *intel_host = sdhci_pci_priv(slot);
>>  
>>  byt_read_dsm(slot);
>>  
>> @@ -733,6 +734,16 @@ static void byt_probe_slot(struct sdhci_pci_slot *slot)
>>  mmc_hostname(slot->host->mmc));
>>  slot->host->quirks2 |= SDHCI_QUIRK2_NO_1_8_V;
>>  }
>> +/* Check if we have the appropriate voltage switch DSM methods */
>> +if (!(intel_host->dsm_fns & (1 << INTEL_DSM_V18_SWITCH)) &&
>> +!(intel_host->dsm_fns & (1 << INTEL_DSM_V33_SWITCH))) {
>> +/* No voltage switching supported at all, there's no
>> + * point in installing the callback: return.
>> + */
>> +pr_debug("%s: No voltage switching ACPI DSM helper\n",
>> +mmc_hostname(slot->host->mmc));
>> +return;
>> +}
>>  ops->start_signal_voltage_switch = intel_start_signal_voltage_switch;
>>  }
>>  
>> -- 
>> 2.17.2
>>
> 
> What do you think of picking this last patch ? Or maybe you had
> different cleanups in mind when you said you wanted to rework this ?

Voltage switches are relatively rare, and dynamic debug allows control over
exactly which debug messages display, so I am not sure this patch is needed.


Re: [RCF PATCH,v2,2/2] pwm: imx: Configure output to GPIO in disabled state

2018-11-18 Thread Linus Walleij
On Wed, Nov 14, 2018 at 10:51 PM Uwe Kleine-König
 wrote:
> On Wed, Nov 14, 2018 at 12:34:49PM +0100, Thierry Reding wrote:
> > On Fri, Nov 09, 2018 at 05:55:55PM +0100, Uwe Kleine-König wrote:
> > > On Fri, Nov 09, 2018 at 02:24:42PM +, Vokáč Michal wrote:
> > > > On 8.11.2018 20:18, Uwe Kleine-König wrote:
> > > > > Taking your example with the backlight device you specify an "init" 
> > > > > and
> > > > > a "default" pinctrl and only "default" contains the muxing for the PWM
> > > > > pin everything should be as smooth as necessary: The pwm is only muxed
> > > > > when the backlight device is successfully bound.
> > > >
> > > > Have you tried that Uwe? The bad news is I tested that before and now
> > > > again and it does not work like that. We already discussed that earlier.
> > >
> > > The key is that the pinmux setting for the PWM pin should be part of the
> > > bl pinctrl, not the pwm pinctrl. Then "default" is only setup when the
> > > bl device is successfully bound which is after the bl's .probe callback
> > > called pwm_apply().
> >
> > No, that's not at all correct. Pinmux settings should reside with the
> > consumer of the pin. In this case, the PWM is the consumer of the pin,
> > whereas the backlight is the consumer of the *PWM*.
>
> This is news to me. Adding Linus W. to Cc, maybe he can comment?!
>
> Grepping through the arm device trees it really seems common to put the
> pinctrl for the pwm pin into the pwm device. I didn't search in depth,
> but I didn't find a counter example.
>
> For GPIOs it is common that the pinmuxing is included in the GPIO's
> consumer pinctrl. Ditto for mdio busses whose pinctrl is included in the
> ethernet pinctrl.

There is quite a discussion you folks have going on here. I tried to
grasp it but I can't. I can try to answer the above question specifically.

For pin control it is mainly paramount that the state is associated
with *a* consumer. The problem we were facing when fleshing out
the subsystem can be seen in horrific solutions such as in
Documentation/devicetree/bindings/gpio/gpio-twl4030.txt
where you see that pull-ups and pull-downs are set on the
PRODUCER side, which just make everything a complete
mess.

So compared to things like that (that we still have to support
forever) whatever you are doing here you're doing great as long
as it is consumer controlled.

Whether that consumer is the previous driver thingie in the
daisy-chain of consumers or the final end user consumer of
the pin doesn't really matter to pin control, as long as it is a
consumer. I would tend to say it is up to the subsystem,
and the old IETF motto "rough consensus and running code".

It seems in the current discussion the "rough consensus"
part is the problem, and I'm afraid I can't fix that.

Yours,
Linus Walleij


Re: [PATCH] pinctrl: zynq: Use define directive for PIN_CONFIG_IO_STANDARD

2018-11-18 Thread Michal Simek
On 17. 11. 18 2:56, Nathan Chancellor wrote:
> On Fri, Nov 16, 2018 at 09:40:45AM +0100, Michal Simek wrote:
>> On 09. 11. 18 16:36, Nathan Chancellor wrote:
>>> On Fri, Nov 09, 2018 at 10:33:00AM +0100, Michal Simek wrote:
 On 08. 11. 18 16:01, Nathan Chancellor wrote:
> On Thu, Nov 08, 2018 at 07:45:42AM +0100, Michal Simek wrote:
>> On 07. 11. 18 18:48, Nick Desaulniers wrote:
>>> On Wed, Nov 7, 2018 at 1:01 AM Michal Simek  
>>> wrote:

 On 07. 11. 18 9:55, Nathan Chancellor wrote:
> On Wed, Nov 07, 2018 at 09:46:12AM +0100, Michal Simek wrote:
>> On 01. 11. 18 1:57, Nathan Chancellor wrote:
>>> Clang warns when one enumerated type is implicitly converted to 
>>> another:
>>>
>>> drivers/pinctrl/pinctrl-zynq.c:985:18: warning: implicit conversion 
>>> from
>>> enumeration type 'enum zynq_pin_config_param' to different 
>>> enumeration
>>> type 'enum pin_config_param' [-Wenum-conversion]
>>> {"io-standard", PIN_CONFIG_IOSTANDARD, zynq_iostd_lvcmos18},
>>> ~   ^
>>> drivers/pinctrl/pinctrl-zynq.c:990:16: warning: implicit conversion 
>>> from
>>> enumeration type 'enum zynq_pin_config_param' to different 
>>> enumeration
>>> type 'enum pin_config_param' [-Wenum-conversion]
>>> = { PCONFDUMP(PIN_CONFIG_IOSTANDARD, "IO-standard", NULL, 
>>> true),
>>> 
>>> ~~^
>>> ./include/linux/pinctrl/pinconf-generic.h:163:11: note: expanded 
>>> from
>>> macro 'PCONFDUMP'
>>> .param = a, .display = b, .format = c, .has_arg = d \
>>>  ^
>>> 2 warnings generated.
>>
>> This is interesting. I have never tried to use llvm for building the
>> kernel. Do you have any description how this can be done?
>>
>
> Depending on what version of Clang you have access to, it is usually 
> just as
> simple as running 'make ARCH=arm CC=clang 
> CROSS_COMPILE=arm-linux-gnueabi-'.
>
> Clang 7.0+ is recommended but 6.0 might work too.

 TBH I would expect to download container and run this there to make 
 sure
 that I don't break anything else.
>>>
>>> This is the first request we've had for a container in order to test a
>>> patch.  If it comes up again from other folks, I think it makes sense
>>> to create one.  Until then, its nice to have.  It's definitely
>>> overkill for this patch.
>>
>> I have played with it to see that error and here are some steps I did.
>>
>> docker run --name test-clang -it --rm debian:latest bash
>> apt-get update
>> apt-get install gcc-arm-linux-gnueabi gcc-aarch64-linux-gnu clang git bc
>> build-essential bison flex make llvm vim libssl-dev sparse
>>
>> git clone
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git --depth 
>> 1
>> cd linux
>>
>> export ARCH=arm64
>> export CROSS_COMPILE=aarch64-linux-gnu-
>>
>> make defconfig
>
> This should also have 'CC=clang' because compiler detection happens in
> Kconfig now so compiler flags get properly set. Other than that, looks
> good and I was able to build pinctrl-zynq.o without any issues with
> those commands.

 For arm32 I am still getting compilation issue (arm64 is fine)
 Below are my steps and version I use. Do you know what could be the
 problem?

 Thanks,
 Michal

 root@1e15921e6d15:~/linux# arm-linux-gnueabi-gcc --version
 arm-linux-gnueabi-gcc (Debian 6.3.0-18) 6.3.0 20170516
 Copyright (C) 2016 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

 root@1e15921e6d15:~/linux# clang --version
 clang version 3.8.1-24 (tags/RELEASE_381/final)
 Target: x86_64-pc-linux-gnu
 Thread model: posix
 InstalledDir: /usr/bin


 export ARCH=arm
 export CROSS_COMPILE=arm-linux-gnuaebi-
 make CC=clang defconfig
 make CC=clang drivers/pinctrl/pinctrl-zynq.o

 and I get
 clang: error: unsupported argument '-W' to option 'Wa,'
 scripts/Makefile.build:305: recipe for target 'scripts/mod/empty.o' failed
 make[2]: *** [scripts/mod/empty.o] Error 1
 scripts/Makefile.build:546: recipe for target 'scripts/mod' failed

>>>
>>> Ah because Debian's regular Clang is ancient.
>>>
>>> For testing, we use the builds from apt.llvm.org. Commands assuming
>>> you're using the normal Debian image:
>>>
>>> curl https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add -
>>> echo "deb 

[PATCH] arm64:crc:accelerated-crc32-by-64bytes

2018-11-18 Thread Rui Sun
add 64 bytes loop to acceleration calculation

Signed-off-by: Rui Sun 
---
 arch/arm64/lib/crc32.S | 54 ++
 1 file changed, 50 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/lib/crc32.S b/arch/arm64/lib/crc32.S
index 5bc1e85..2b37009 100644
--- a/arch/arm64/lib/crc32.S
+++ b/arch/arm64/lib/crc32.S
@@ -15,15 +15,61 @@
.cpugeneric+crc
 
.macro  __crc32, c
-0: subsx2, x2, #16
-   b.mi8f
+
+64: cmp x2, #64
+b.lt32f
+
+addsx11, x1, #16
+addsx12, x1, #32
+addsx13, x1, #48
+
+0 : subsx2, x2, #64
+b.mi32f
+
+ldp x3, x4, [x1], #64
+ldp x5, x6, [x11], #64
+ldp x7, x8, [x12], #64
+ldp x9, x10,[x13], #64
+
+ CPU_BE( rev x3, x3  )
+ CPU_BE( rev x4, x4  )
+ CPU_BE( rev x5, x5  )
+ CPU_BE( rev x6, x6  )
+ CPU_BE( rev x7, x7  )
+ CPU_BE( rev x8, x8  )
+ CPU_BE( rev x9, x9  )
+ CPU_BE( rev x10,x10 )
+
+crc32\c\()x w0, w0, x3
+crc32\c\()x w0, w0, x4
+crc32\c\()x w0, w0, x5
+crc32\c\()x w0, w0, x6
+crc32\c\()x w0, w0, x7
+crc32\c\()x w0, w0, x8
+crc32\c\()x w0, w0, x9
+crc32\c\()x w0, w0, x10
+
+b.ne   0b
+ret
+
+32: tbz x2, #5, 16f
+ldp x3, x4, [x1], #16
+ldp x5, x6, [x1], #16
+CPU_BE( rev x3, x3  )
+CPU_BE( rev x4, x4  )
+CPU_BE( rev x5, x5  )
+CPU_BE( rev x6, x6  )
+crc32\c\()x w0, w0, x3
+crc32\c\()x w0, w0, x4
+crc32\c\()x w0, w0, x5
+crc32\c\()x w0, w0, x6
+
+16: tbz x2, #4, 8f
ldp x3, x4, [x1], #16
 CPU_BE(rev x3, x3  )
 CPU_BE(rev x4, x4  )
crc32\c\()x w0, w0, x3
crc32\c\()x w0, w0, x4
-   b.ne0b
-   ret
 
 8: tbz x2, #3, 4f
ldr x3, [x1], #8
-- 
1.8.3.1



Re: [PATCH v2 1/4] dt-bindings: pinctrl: k3: Introduce pinmux definitions

2018-11-18 Thread Tero Kristo

On 17/11/2018 18:05, Nishanth Menon wrote:

On 11:31-20181113, Vignesh R wrote:

The dt-bindings header for TI K3 AM6 SoCs define a set of macros for
defining pinmux configs in human readable form, instead of raw-coded
hex values.

Signed-off-by: Vignesh R 
---
  MAINTAINERS  |  1 +
  include/dt-bindings/pinctrl/k3.h | 35 
  2 files changed, 36 insertions(+)
  create mode 100644 include/dt-bindings/pinctrl/k3.h

diff --git a/MAINTAINERS b/MAINTAINERS
index fa45ff36fde9..1574ad6d7ead 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2204,6 +2204,7 @@ S:Supported
  F:Documentation/devicetree/bindings/arm/ti/k3.txt
  F:arch/arm64/boot/dts/ti/Makefile
  F:arch/arm64/boot/dts/ti/k3-*
+F: include/dt-bindings/pinctrl/k3.h
  
  ARM/TEXAS INSTRUMENT KEYSTONE ARCHITECTURE

  M:Santosh Shilimkar 
diff --git a/include/dt-bindings/pinctrl/k3.h b/include/dt-bindings/pinctrl/k3.h
new file mode 100644
index ..463d845a9b36
--- /dev/null
+++ b/include/dt-bindings/pinctrl/k3.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * This header provides constants for pinctrl bindings for TI's K3 SoC
+ * family.
+ *
+ * Copyright (C) 2018 Texas Instruments Incorporated - http://www.ti.com/
+ */
+#ifndef _DT_BINDINGS_PINCTRL_TI_K3_H
+#define _DT_BINDINGS_PINCTRL_TI_K3_H
+
+#define PULLUDEN_SHIFT (16)
+#define PULLTYPESEL_SHIFT  (17)
+#define RXACTIVE_SHIFT (18)
+
+#define PULL_DISABLE   (1 << PULLUDEN_SHIFT)
+#define PULL_ENABLE(0 << PULLUDEN_SHIFT)
+
+#define PULL_UP(1 << PULLTYPESEL_SHIFT | PULL_ENABLE)
+#define PULL_DOWN  (0 << PULLTYPESEL_SHIFT | PULL_ENABLE)
+
+#define INPUT_EN   (1 << RXACTIVE_SHIFT)
+#define INPUT_DISABLE  (0 << RXACTIVE_SHIFT)
+
+/* Only these macros are expected be used directly in device tree files */
+#define PIN_OUTPUT (INPUT_DISABLE | PULL_DISABLE)
+#define PIN_OUTPUT_PULLUP  (INPUT_DISABLE | PULL_UP)
+#define PIN_OUTPUT_PULLDOWN(INPUT_DISABLE | PULL_DOWN)
+#define PIN_INPUT  (INPUT_EN | PULL_DISABLE)
+#define PIN_INPUT_PULLUP   (INPUT_EN | PULL_UP)
+#define PIN_INPUT_PULLDOWN (INPUT_EN | PULL_DOWN)



Thanks for reducing the combinations down to the minimum needed. We can
worry about the DS and isolation bits when we have a real user for them.

Acked-by: Nishanth Menon 

Tero: v4.21-rc1 perhaps ?



Yeah, looks fine, queueing up for 4.21.

-Tero
--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. 
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


Re: [PATCH RFC] hist lookups

2018-11-18 Thread Namhyung Kim
On Sun, Nov 18, 2018 at 10:33:55PM -0800, David Miller wrote:
> From: Namhyung Kim 
> Date: Mon, 19 Nov 2018 15:28:37 +0900
> 
> > Hello David,
> > 
> > On Sun, Nov 18, 2018 at 08:52:43PM -0800, David Miller wrote:
> >> From: Jiri Olsa 
> >> Date: Tue, 13 Nov 2018 11:40:54 +0100
> >> 
> >> > I pushed/rebased what I have to perf/fixes branch again
> >> > 
> >> > please note I had to change our compile changes, because
> >> > they wouldn't compile on x86, but I can't verify on sparc,
> >> > so you might see some compile fails again
> >> 
> >> I just checked your current perf/fixes branch.
> >> 
> >> It builds on Sparc ;-)
> >> 
> >> And it behaves better too.  I do get tons of drops and lost events,
> >> but it seems to keep going even during the hardest load.
> >> 
> >> Eventually I end up with a lot of unresolvable histogram entries,
> >> so that is something to look into.
> > 
> > Did you record callchains as well?  I'd like to know whether it's
> > related to the children (cumulative) mode or not.
> 
> I did not have callchains on, just plain "./perf top"

OK, I need to think about it more..

Thanks,
Namhyung


[PATCH 2/5 v7] regulator: fixed/gpio: Pull inversion/OD into gpiolib

2018-11-18 Thread Linus Walleij
This pushes the handling of inversion semantics and open drain
settings to the GPIO descriptor and gpiolib. All affected board
files are also augmented.

This is especially nice since we don't have to have any
confusing flags passed around to the left and right littering
the fixed and GPIO regulator drivers and the regulator core.
It is all just very straight-forward: the core asks the GPIO
line to be asserted or deasserted and gpiolib deals with the
rest depending on how the platform is configured: if the line
is active low, it deals with that, if the line is open drain,
it deals with that too.

Cc: Janusz Krzysztofik  # OMAP1
Cc: Alexander Shiyan  # i.MX boards user
Cc: Haojian Zhuang  # MMP2 maintainer
Cc: Aaro Koskinen  # OMAP1 maintainer
Cc: Tony Lindgren  # OMAP1,2,3 maintainer
Cc: Mike Rapoport  # EM-X270 maintainer
Cc: Robert Jarzmik  # EZX maintainer
Cc: Philipp Zabel  # Magician maintainer
Cc: Petr Cvek  # Magician
Cc: Robert Jarzmik  # PXA
Cc: Paul Parsons  # hx4700
Cc: Daniel Mack  # Raumfeld maintainer
Cc: Marc Zyngier  # Zeus maintainer
Cc: Geert Uytterhoeven  # SuperH pinctrl/GPIO 
maintainer
Cc: Russell King  # SA1100
Signed-off-by: Linus Walleij 
---
ChangeLog v6->v7:
- Fix a missed .enable_high on OMAP1.
ChangeLog v4->v6:
- Split out parts relation to GPIO regulator descriptor conversion
  to the right patch.
- Renumber to fit the rest of the series.
- Daniel Mack says he will probably delete the Raumfeld board file
  and replace it with a device tree, I suggest we just deal with that
  conflict upstream.
ChangeLog v3->v4:
- Rebase on fixed regulator changes.
ChangeLog v2->v3:
- Resending.
ChangeLog v1->v2:
- Rebase the patch series
- Cover the new user added in sa1100
---
 arch/arm/mach-imx/mach-mx21ads.c  |  1 -
 arch/arm/mach-imx/mach-mx27ads.c  |  2 +-
 arch/arm/mach-mmp/brownstone.c|  1 -
 arch/arm/mach-omap1/board-ams-delta.c |  2 --
 arch/arm/mach-omap2/pdata-quirks.c|  1 -
 arch/arm/mach-pxa/em-x270.c   |  1 -
 arch/arm/mach-pxa/ezx.c   |  3 +-
 arch/arm/mach-pxa/raumfeld.c  |  1 -
 arch/arm/mach-pxa/zeus.c  |  3 +-
 arch/arm/mach-sa1100/assabet.c|  1 -
 arch/sh/boards/mach-ecovec24/setup.c  |  2 --
 .../intel-mid/device_libs/platform_bcm43xx.c  |  1 -
 drivers/regulator/core.c  |  8 ++---
 drivers/regulator/da9055-regulator.c  |  1 -
 drivers/regulator/fixed.c | 35 +--
 include/linux/regulator/fixed.h   | 10 --
 include/linux/regulator/gpio-regulator.h  |  6 
 17 files changed, 13 insertions(+), 66 deletions(-)

diff --git a/arch/arm/mach-imx/mach-mx21ads.c b/arch/arm/mach-imx/mach-mx21ads.c
index 2e1e540f2e5a..d278fb672d40 100644
--- a/arch/arm/mach-imx/mach-mx21ads.c
+++ b/arch/arm/mach-imx/mach-mx21ads.c
@@ -205,7 +205,6 @@ static struct regulator_init_data 
mx21ads_lcd_regulator_init_data = {
 static struct fixed_voltage_config mx21ads_lcd_regulator_pdata = {
.supply_name= "LCD",
.microvolts = 330,
-   .enable_high= 1,
.init_data  = _lcd_regulator_init_data,
 };
 
diff --git a/arch/arm/mach-imx/mach-mx27ads.c b/arch/arm/mach-imx/mach-mx27ads.c
index f5e04047ed13..6dd7f57c332f 100644
--- a/arch/arm/mach-imx/mach-mx27ads.c
+++ b/arch/arm/mach-imx/mach-mx27ads.c
@@ -237,7 +237,7 @@ static struct fixed_voltage_config 
mx27ads_lcd_regulator_pdata = {
 static struct gpiod_lookup_table mx27ads_lcd_regulator_gpiod_table = {
.dev_id = "reg-fixed-voltage.0", /* Let's hope ID 0 is what we get */
.table = {
-   GPIO_LOOKUP("LCD", 0, NULL, GPIO_ACTIVE_HIGH),
+   GPIO_LOOKUP("LCD", 0, NULL, GPIO_ACTIVE_LOW),
{ },
},
 };
diff --git a/arch/arm/mach-mmp/brownstone.c b/arch/arm/mach-mmp/brownstone.c
index a04e249c654b..d2560fb1e835 100644
--- a/arch/arm/mach-mmp/brownstone.c
+++ b/arch/arm/mach-mmp/brownstone.c
@@ -149,7 +149,6 @@ static struct regulator_init_data brownstone_v_5vp_data = {
 static struct fixed_voltage_config brownstone_v_5vp = {
.supply_name= "v_5vp",
.microvolts = 500,
-   .enable_high= 1,
.enabled_at_boot= 1,
.init_data  = _v_5vp_data,
 };
diff --git a/arch/arm/mach-omap1/board-ams-delta.c 
b/arch/arm/mach-omap1/board-ams-delta.c
index 3d191fd52910..26e9b5969b0a 100644
--- a/arch/arm/mach-omap1/board-ams-delta.c
+++ b/arch/arm/mach-omap1/board-ams-delta.c
@@ -268,7 +268,6 @@ static struct fixed_voltage_config modem_nreset_config = {
.supply_name= "modem_nreset",
.microvolts = 330,
.startup_delay  = 25000,
-   .enable_high= 1,
.enabled_at_boot= 1,
.init_data  = _nreset_data,
 };
@@ -529,7 +528,6 @@ static struct 

[PATCH 4/5 v7] regulator: gpio: Simplify probe path

2018-11-18 Thread Linus Walleij
Use devm_* managed device resources and create a local
struct device *dev variable to simplify the code inside
probe().

Signed-off-by: Linus Walleij 
---
ChangeLog v6->v7:
- Resend with the rest.
ChangeLog v3->v6:
- Rebase on top of the other changes.
- Change numbering to fit the rest of the series.
ChangeLog v2->v3:
- Resending.
ChangeLog v1->v2:
- Rebase the patch on the other changes.
---
 drivers/regulator/gpio-regulator.c | 55 +-
 1 file changed, 23 insertions(+), 32 deletions(-)

diff --git a/drivers/regulator/gpio-regulator.c 
b/drivers/regulator/gpio-regulator.c
index 68a6c861bcad..62ddea4a5255 100644
--- a/drivers/regulator/gpio-regulator.c
+++ b/drivers/regulator/gpio-regulator.c
@@ -228,31 +228,33 @@ static struct regulator_ops gpio_regulator_current_ops = {
 
 static int gpio_regulator_probe(struct platform_device *pdev)
 {
-   struct gpio_regulator_config *config = dev_get_platdata(>dev);
-   struct device_node *np = pdev->dev.of_node;
+   struct device *dev = >dev;
+   struct gpio_regulator_config *config = dev_get_platdata(dev);
+   struct device_node *np = dev->of_node;
struct gpio_regulator_data *drvdata;
struct regulator_config cfg = { };
enum gpiod_flags gflags;
int ptr, ret, state, i;
 
-   drvdata = devm_kzalloc(>dev, sizeof(struct gpio_regulator_data),
+   drvdata = devm_kzalloc(dev, sizeof(struct gpio_regulator_data),
   GFP_KERNEL);
if (drvdata == NULL)
return -ENOMEM;
 
if (np) {
-   config = of_get_gpio_regulator_config(>dev, np,
+   config = of_get_gpio_regulator_config(dev, np,
  >desc);
if (IS_ERR(config))
return PTR_ERR(config);
}
 
-   drvdata->desc.name = kstrdup(config->supply_name, GFP_KERNEL);
+   drvdata->desc.name = devm_kstrdup(dev, config->supply_name, GFP_KERNEL);
if (drvdata->desc.name == NULL) {
-   dev_err(>dev, "Failed to allocate supply name\n");
+   dev_err(dev, "Failed to allocate supply name\n");
return -ENOMEM;
}
 
+
for (i = 0; i < config->ngpios; i++) {
drvdata->gpiods[i] = devm_gpiod_get_index(>dev,
  NULL,
@@ -265,14 +267,14 @@ static int gpio_regulator_probe(struct platform_device 
*pdev)
}
drvdata->nr_gpios = config->ngpios;
 
-   drvdata->states = kmemdup(config->states,
- config->nr_states *
-sizeof(struct gpio_regulator_state),
- GFP_KERNEL);
+   drvdata->states = devm_kmemdup(dev,
+  config->states,
+  config->nr_states *
+  sizeof(struct gpio_regulator_state),
+  GFP_KERNEL);
if (drvdata->states == NULL) {
-   dev_err(>dev, "Failed to allocate state data\n");
-   ret = -ENOMEM;
-   goto err_name;
+   dev_err(dev, "Failed to allocate state data\n");
+   return -ENOMEM;
}
drvdata->nr_states = config->nr_states;
 
@@ -291,9 +293,8 @@ static int gpio_regulator_probe(struct platform_device 
*pdev)
drvdata->desc.ops = _regulator_current_ops;
break;
default:
-   dev_err(>dev, "No regulator type set\n");
-   ret = -EINVAL;
-   goto err_memstate;
+   dev_err(dev, "No regulator type set\n");
+   return -EINVAL;
}
 
/* build initial state from gpio init data. */
@@ -304,7 +305,7 @@ static int gpio_regulator_probe(struct platform_device 
*pdev)
}
drvdata->state = state;
 
-   cfg.dev = >dev;
+   cfg.dev = dev;
cfg.init_data = config->init_data;
cfg.driver_data = drvdata;
cfg.of_node = np;
@@ -318,28 +319,20 @@ static int gpio_regulator_probe(struct platform_device 
*pdev)
else
gflags = GPIOD_OUT_LOW | GPIOD_FLAGS_BIT_NONEXCLUSIVE;
 
-   cfg.ena_gpiod = devm_gpiod_get_optional(>dev, "enable", gflags);
-   if (IS_ERR(cfg.ena_gpiod)) {
-   ret = PTR_ERR(cfg.ena_gpiod);
-   goto err_memstate;
-   }
+   cfg.ena_gpiod = devm_gpiod_get_optional(dev, "enable", gflags);
+   if (IS_ERR(cfg.ena_gpiod))
+   return PTR_ERR(cfg.ena_gpiod);
 
drvdata->dev = regulator_register(>desc, );
if (IS_ERR(drvdata->dev)) {
ret = PTR_ERR(drvdata->dev);
-   dev_err(>dev, "Failed to register regulator: %d\n", ret);
-   goto err_memstate;
+   dev_err(dev, "Failed to register regulator: %d\n", ret);
+   return 

[PATCH 3/5 v7] regulator: fixed/gpio: Update device tree bindings

2018-11-18 Thread Linus Walleij
Deprecate the open drain binding for fixed regulator and indicate
that we prefer this to be passed in the GPIO phandle flags.

Clarify that the line inversion semantics for fixed and GPIO
regulators completely overrides the active low flags in the
phandle flags.

Unfortunately this can not be changed to prefer that we pass
the flags in the phandle: the bindings have been specified and
deployed such that the presence/absence of this flag and only
that controls the line inversion semantics. The crucial semantic
is that the absence of the flag means the core will assume
the line is active low, which in GPIO terms is an exception,
as GPIO lines are normally assumed to be active high.

This special device tree semantic cannot be changed without
introducing a whole new compatible string for the fixed and
GPIO regulators, so we just contain the situation.

Cc: devicet...@vger.kernel.org
Reviewed-by: Rob Herring 
Signed-off-by: Linus Walleij 
---
ChangeLog v6->v7:
- Resend with the rest.
ChangeLog v3->v6:
- Resending.
ChangeLog v2->v3:
- Resending.
ChangeLog v1->v2:
- Collect Rob's ACK.
---
 .../bindings/regulator/fixed-regulator.txt  | 13 +++--
 .../bindings/regulator/gpio-regulator.txt   |  4 
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/regulator/fixed-regulator.txt 
b/Documentation/devicetree/bindings/regulator/fixed-regulator.txt
index 0c2a6c8a1536..a7a1cd0dfa6e 100644
--- a/Documentation/devicetree/bindings/regulator/fixed-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/fixed-regulator.txt
@@ -8,10 +8,19 @@ Optional properties:
 - gpio: gpio to use for enable control
 - startup-delay-us: startup time in microseconds
 - enable-active-high: Polarity of GPIO is Active high
-If this property is missing, the default assumed is Active low.
+  If this property is missing, the default assumed is Active low.
+  If the phandle flags to the GPIO handle also flag the line as active
+  low or high, that will be ignored for fixed regulators and the
+  presence or absence of this flag solely controls the inversion
+  semantics.
+-vin-supply: Input supply name.
+
+Deprecated properties:
 - gpio-open-drain: GPIO is open drain type.
   If this property is missing then default assumption is false.
--vin-supply: Input supply name.
+  Do not use this property in new device trees: instead use the
+  phandle flag to indicate to the GPIO provider that the line
+  should be handled as open drain.
 
 Any property defined as part of the core regulator
 binding, defined in regulator.txt, can also be used.
diff --git a/Documentation/devicetree/bindings/regulator/gpio-regulator.txt 
b/Documentation/devicetree/bindings/regulator/gpio-regulator.txt
index 1f496159e2bb..8fef3e5358a2 100644
--- a/Documentation/devicetree/bindings/regulator/gpio-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/gpio-regulator.txt
@@ -14,6 +14,10 @@ Optional properties:
  defualt is LOW if nothing is specified.
 - startup-delay-us : Startup time in microseconds.
 - enable-active-high   : Polarity of GPIO is active high (default is low).
+ If the phandle flags to the GPIO handle also flag the 
line as
+ active low or high, that will be ignored for fixed 
regulators
+ and the presence or absence of this flag solely 
controls the
+ inversion semantics.
 - regulator-type   : Specifies what is being regulated, must be either
  "voltage" or "current", defaults to voltage.
 
-- 
2.19.1



[PATCH 1/5 v7] regulator: gpio: Convert to use descriptors

2018-11-18 Thread Linus Walleij
This converts the GPIO regulator driver to use decriptors only.

We have to let go of the array gpio handling: the fetched descriptors
are handled individually anyway, and the array retrieveal function
does not make it possible to retrieve each GPIO descriptor with
unique flags. Instead get them one by one.

We request the "enable" GPIO separately as before, and make sure
that this line is requested as nonexclusive since enable lines can
be shared and the regulator core expects this.

Most users of the GPIO regulator are using device tree.

There are two boards in the kernel using the gpio regulator from a
non-devicetree path: PXA hx4700 and magician. Make sure to switch
these over to use descriptors as well.

Cc: Philipp Zabel  # Magician
Cc: Petr Cvek  # Magician
Cc: Robert Jarzmik  # PXA
Cc: Paul Parsons  # hx4700
Signed-off-by: Linus Walleij 
---
ChangeLog v6->v7:
- Resend with the rest.
ChangeLog v3->v6:
- Make sure to request the GPIO line nonexclusive as with other
  regulator GPIOs.
- Request the voltage controlling GPIOs from the name NULL as
  only "enable-gpio" is explicitly named.
- Make sure to delete all unused struct members and assignments
  in board files.
- Change numbering to fit the rest of the patches.
ChangeLog v2->v3:
- Resending.
ChangeLog v1->v2:
- Rebase the patch on the other changes.
---
 arch/arm/mach-pxa/hx4700.c   |  23 ++--
 arch/arm/mach-pxa/magician.c |  23 ++--
 drivers/regulator/gpio-regulator.c   | 146 ---
 include/linux/regulator/gpio-regulator.h |  12 +-
 4 files changed, 91 insertions(+), 113 deletions(-)

diff --git a/arch/arm/mach-pxa/hx4700.c b/arch/arm/mach-pxa/hx4700.c
index b79b757fdd41..51d38d5e776a 100644
--- a/arch/arm/mach-pxa/hx4700.c
+++ b/arch/arm/mach-pxa/hx4700.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -702,9 +703,7 @@ static struct regulator_init_data bq24022_init_data = {
.consumer_supplies  = bq24022_consumers,
 };
 
-static struct gpio bq24022_gpios[] = {
-   { GPIO96_HX4700_BQ24022_ISET2, GPIOF_OUT_INIT_LOW, "bq24022_iset2" },
-};
+static enum gpiod_flags bq24022_gpiod_gflags[] = { GPIOD_OUT_LOW };
 
 static struct gpio_regulator_state bq24022_states[] = {
{ .value = 10, .gpios = (0 << 0) },
@@ -714,12 +713,10 @@ static struct gpio_regulator_state bq24022_states[] = {
 static struct gpio_regulator_config bq24022_info = {
.supply_name = "bq24022",
 
-   .enable_gpio = GPIO72_HX4700_BQ24022_nCHARGE_EN,
-   .enable_high = 0,
.enabled_at_boot = 0,
 
-   .gpios = bq24022_gpios,
-   .nr_gpios = ARRAY_SIZE(bq24022_gpios),
+   .gflags = bq24022_gpiod_gflags,
+   .ngpios = ARRAY_SIZE(bq24022_gpiod_gflags),
 
.states = bq24022_states,
.nr_states = ARRAY_SIZE(bq24022_states),
@@ -736,6 +733,17 @@ static struct platform_device bq24022 = {
},
 };
 
+static struct gpiod_lookup_table bq24022_gpiod_table = {
+   .dev_id = "gpio-regulator",
+   .table = {
+   GPIO_LOOKUP("gpio-pxa", GPIO96_HX4700_BQ24022_ISET2,
+   NULL, GPIO_ACTIVE_HIGH),
+   GPIO_LOOKUP("gpio-pxa", GPIO72_HX4700_BQ24022_nCHARGE_EN,
+   "enable", GPIO_ACTIVE_LOW),
+   { },
+   },
+};
+
 /*
  * StrataFlash
  */
@@ -878,6 +886,7 @@ static void __init hx4700_init(void)
pxa_set_btuart_info(NULL);
pxa_set_stuart_info(NULL);
 
+   gpiod_add_lookup_table(_gpiod_table);
platform_add_devices(devices, ARRAY_SIZE(devices));
pwm_add_table(hx4700_pwm_lookup, ARRAY_SIZE(hx4700_pwm_lookup));
 
diff --git a/arch/arm/mach-pxa/magician.c b/arch/arm/mach-pxa/magician.c
index 14c0f80bc9e7..5d21de79135b 100644
--- a/arch/arm/mach-pxa/magician.c
+++ b/arch/arm/mach-pxa/magician.c
@@ -645,9 +645,8 @@ static struct regulator_init_data bq24022_init_data = {
.consumer_supplies  = bq24022_consumers,
 };
 
-static struct gpio bq24022_gpios[] = {
-   { EGPIO_MAGICIAN_BQ24022_ISET2, GPIOF_OUT_INIT_LOW, "bq24022_iset2" },
-};
+
+static enum gpiod_flags bq24022_gpiod_gflags[] = { GPIOD_OUT_LOW };
 
 static struct gpio_regulator_state bq24022_states[] = {
{ .value = 10, .gpios = (0 << 0) },
@@ -657,12 +656,10 @@ static struct gpio_regulator_state bq24022_states[] = {
 static struct gpio_regulator_config bq24022_info = {
.supply_name= "bq24022",
 
-   .enable_gpio= GPIO30_MAGICIAN_BQ24022_nCHARGE_EN,
-   .enable_high= 0,
.enabled_at_boot= 1,
 
-   .gpios  = bq24022_gpios,
-   .nr_gpios   = ARRAY_SIZE(bq24022_gpios),
+   .gflags = bq24022_gpiod_gflags,
+   .ngpios = ARRAY_SIZE(bq24022_gpiod_gflags),
 
.states = bq24022_states,
.nr_states  = ARRAY_SIZE(bq24022_states),
@@ -679,6 +676,17 @@ static struct platform_device bq24022 

[PATCH 5/5 v7] regulator: core: Only support passing enable GPIO descriptors

2018-11-18 Thread Linus Walleij
Now that we changed all providers to pass descriptors into the core
for enable GPIOs instead of a global GPIO number, delete the support
for passing GPIO numbers in, and we get a cleanup and size reduction
in the core, and from a GPIO point of view we use the modern, cleaner
interface.

Signed-off-by: Linus Walleij 
---
ChangeLog v6->v7:
- Resend with the rest.
ChangeLog v4->v6:
- Rebase on top of the other changes.
- Renumber to fit the rest of the series.
ChangeLog v3->v4:
- Also drop ena_gpio_invert from driver data
ChangeLog v2->v3:
- Resending.
ChangeLog v1->v2:
- Rebase the patch on the other changes.
---
 drivers/regulator/core.c | 32 ++--
 include/linux/regulator/driver.h | 12 +---
 2 files changed, 7 insertions(+), 37 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 109bd2bee868..0eb5c48f1162 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -23,7 +23,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -2015,35 +2014,19 @@ static int regulator_ena_gpio_request(struct 
regulator_dev *rdev,
 {
struct regulator_enable_gpio *pin;
struct gpio_desc *gpiod;
-   int ret;
 
-   if (config->ena_gpiod)
-   gpiod = config->ena_gpiod;
-   else
-   gpiod = gpio_to_desc(config->ena_gpio);
+   gpiod = config->ena_gpiod;
 
list_for_each_entry(pin, _ena_gpio_list, list) {
if (pin->gpiod == gpiod) {
-   rdev_dbg(rdev, "GPIO %d is already used\n",
-   config->ena_gpio);
+   rdev_dbg(rdev, "GPIO is already used\n");
goto update_ena_gpio_to_rdev;
}
}
 
-   if (!config->ena_gpiod) {
-   ret = gpio_request_one(config->ena_gpio,
-  GPIOF_DIR_OUT | config->ena_gpio_flags,
-  rdev_get_name(rdev));
-   if (ret)
-   return ret;
-   }
-
pin = kzalloc(sizeof(struct regulator_enable_gpio), GFP_KERNEL);
-   if (pin == NULL) {
-   if (!config->ena_gpiod)
-   gpio_free(config->ena_gpio);
+   if (pin == NULL)
return -ENOMEM;
-   }
 
pin->gpiod = gpiod;
list_add(>list, _ena_gpio_list);
@@ -2066,7 +2049,6 @@ static void regulator_ena_gpio_free(struct regulator_dev 
*rdev)
if (pin->gpiod == rdev->ena_pin->gpiod) {
if (pin->request_count <= 1) {
pin->request_count = 0;
-   gpiod_put(pin->gpiod);
list_del(>list);
kfree(pin);
rdev->ena_pin = NULL;
@@ -4336,15 +4318,13 @@ regulator_register(const struct regulator_desc 
*regulator_desc,
goto clean;
}
 
-   if (config->ena_gpiod ||
-   ((config->ena_gpio || config->ena_gpio_initialized) &&
-gpio_is_valid(config->ena_gpio))) {
+   if (config->ena_gpiod) {
mutex_lock(_list_mutex);
ret = regulator_ena_gpio_request(rdev, config);
mutex_unlock(_list_mutex);
if (ret != 0) {
-   rdev_err(rdev, "Failed to request enable GPIO%d: %d\n",
-config->ena_gpio, ret);
+   rdev_err(rdev, "Failed to request enable GPIO: %d\n",
+ret);
goto clean;
}
}
diff --git a/include/linux/regulator/driver.h b/include/linux/regulator/driver.h
index a9c030192147..10ee158eac00 100644
--- a/include/linux/regulator/driver.h
+++ b/include/linux/regulator/driver.h
@@ -400,13 +400,7 @@ struct regulator_desc {
  *   NULL).
  * @regmap: regmap to use for core regmap helpers if dev_get_regmap() is
  *  insufficient.
- * @ena_gpio_initialized: GPIO controlling regulator enable was properly
- *initialized, meaning that >= 0 is a valid gpio
- *identifier and < 0 is a non existent gpio.
- * @ena_gpio: GPIO controlling regulator enable.
- * @ena_gpiod: GPIO descriptor controlling regulator enable.
- * @ena_gpio_invert: Sense for GPIO enable control.
- * @ena_gpio_flags: Flags to use when calling gpio_request_one()
+ * @ena_gpiod: GPIO controlling regulator enable.
  */
 struct regulator_config {
struct device *dev;
@@ -415,11 +409,7 @@ struct regulator_config {
struct device_node *of_node;
struct regmap *regmap;
 
-   bool ena_gpio_initialized;
-   int ena_gpio;
struct gpio_desc *ena_gpiod;
-   unsigned int ena_gpio_invert:1;
-   unsigned int ena_gpio_flags;
 };
 
 /*
-- 
2.19.1



Re: [PATCH 1/2] dt-bindings: phy: Add Qualcomm Synopsys High-Speed USB PHY binding

2018-11-18 Thread Vivek Gautam
On Mon, Nov 19, 2018 at 12:29 PM Shawn Guo  wrote:
>
> On Sat, Nov 17, 2018 at 09:13:38AM -0600, Rob Herring wrote:
> 
> > > > > +- qcom,init-seq:
> > > > > +Value type: 
> > > > > +Definition: Should contain a sequence of  
> > > > > tuples to
> > > > > +program 'value' into phy register at 'offset' with 
> > > > > 'delay'
> > > > > +   in us afterwards.
> > > >
> > > > If we wanted this type of thing in DT, we'd have a generic binding (or
> > > > forth).
> > >
> > > Right now, this is a qualcomm usb phy specific bindings - first used in
> > > qcom,usb-hs-phy.txt and I extended it a bit for my phy.  As this is not
> > > a so good hardware description, I'm a little hesitated to make it
> > > generic for other platforms to use in general.  What about we put off it
> > > a little bit until we see more platforms need the same thing?
> >
> > I'm not saying I want it generic. Quite the opposite. I don't think we
> > should have it either generically or vendor specific. The main thing I
> > have a problem with is the timing information because then we're more
> > that just data. Without that we're talking about a bunch of properties
> > for register fields or just raw register values in DT. That becomes
> > more of a judgement call. There's not too much value in making a
> > driver translate a bunch of properties just to stuff them into
> > registers on init. But then just allowing any raw register value to be
> > in DT could be easily abused.
>
> Rob,
>
> I agree with your comments.  Honestly, I'm not comfortable with this
> 'qcom,init-seq' thing in the first impression.  The similar existence in
> mainline qcom,usb-hs-phy.txt makes me think it might be acceptable with
> the timing data added.  Okay, I know your position on this now.
>
> @Sriharsha,
>
> Seeing that 'qcom,init-seq' is being configured with the exactly same
> values for both HS phys in SoC level dts file (qcs404.dtsi), I think
> such settings can be moved into driver code as SoC specific data.
> Unless you have a different view on this, I will do it with v4.

phy-qcom-qmp and phy-qcom-qusb2 have been maintaining such SoC specific
init sequences in the drivers if you would like to have pointers from them.

Thanks
Vivek

>
> Shawn



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


RE: [PATCH v2 2/2] dpaa_eth: add ethtool coalesce control

2018-11-18 Thread Madalin-cristian Bucur
> -Original Message-
> From: David Miller 
> Sent: Saturday, November 17, 2018 5:42 AM
> To: Madalin-cristian Bucur 
> Subject: Re: [PATCH v2 2/2] dpaa_eth: add ethtool coalesce control
> 
> From: Madalin Bucur 
> Date: Tue, 13 Nov 2018 18:29:51 +0200
> 
> > +   for_each_cpu(cpu, cpus) {
> > +   portal = qman_get_affine_portal(cpu);
> > +   res = qman_portal_set_iperiod(portal, period);
> > +   if (res)
> > +   return res;
> > +   res = qman_dqrr_set_ithresh(portal, thresh);
> > +   if (res)
> > +   return res;
> 
> Nope, you can't do it like this.
> 
> If any intermediate change fails, you have to unwind all of the
> changes made up until that point.
> 
> Which means you'll have to store the previous setting somewhere
> and reinstall those saved values in the error path.

Thank you, I'll come back with a v3.

Madalin


RE: [PATCH 3/3] PCI: imx: Add support for i.MX8MQ

2018-11-18 Thread Richard Zhu
Hi Andrey:
Thanks for your patch-set.
I have comment about the L1SS implementation below.
It's better to figure out one method to fix it.

BR
Richard

> -Original Message-
> From: Andrey Smirnov [mailto:andrew.smir...@gmail.com]
> Sent: 2018年11月18日 2:12
> To: linux-kernel@vger.kernel.org
> Cc: Andrey Smirnov ; bhelg...@google.com;
> Fabio Estevam ; cphe...@gmail.com;
> l.st...@pengutronix.de; Leonard Crestez ;
> Aisheng DONG ; Richard Zhu
> ; dl-linux-imx ;
> linux-arm-ker...@lists.infradead.org; linux-...@vger.kernel.org
> Subject: [PATCH 3/3] PCI: imx: Add support for i.MX8MQ
> 
> Cc: bhelg...@google.com
> Cc: Fabio Estevam 
> Cc: cphe...@gmail.com
> Cc: l.st...@pengutronix.de
> Cc: Leonard Crestez 
> Cc: "A.s. Dong" 
> Cc: Richard Zhu 
> Cc: linux-...@nxp.com
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Andrey Smirnov 
> ---
>  drivers/pci/controller/dwc/Kconfig|   2 +-
>  drivers/pci/controller/dwc/pci-imx6.c | 117 --
>  2 files changed, 113 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/Kconfig
> b/drivers/pci/controller/dwc/Kconfig
> index 91b0194240a5..2b139acccf32 100644
> --- a/drivers/pci/controller/dwc/Kconfig
> +++ b/drivers/pci/controller/dwc/Kconfig
> @@ -90,7 +90,7 @@ config PCI_EXYNOS
> 
>  config PCI_IMX6
>   bool "Freescale i.MX6 PCIe controller"
> - depends on SOC_IMX6Q || (ARM && COMPILE_TEST)
> + depends on SOC_IMX8MQ || SOC_IMX6Q || (ARM && COMPILE_TEST)
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIE_DW_HOST
> 
> diff --git a/drivers/pci/controller/dwc/pci-imx6.c
> b/drivers/pci/controller/dwc/pci-imx6.c
> index 3c3002861d25..8d1f310e41a6 100644
> --- a/drivers/pci/controller/dwc/pci-imx6.c
> +++ b/drivers/pci/controller/dwc/pci-imx6.c
> @@ -8,6 +8,7 @@
>   * Author: Sean Cross 
>   */
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -30,6 +31,14 @@
> 
>  #include "pcie-designware.h"
> 
> +#define IMX8MQ_PCIE_LINK_CAP_REG_OFFSET  0x7C
> +#define IMX8MQ_PCIE_LINK_CAP_L1EL_64US   (0x6 << 15)
> +
> +#define IMX8MQ_GPR_PCIE_REF_USE_PAD  BIT(9)
> +#define IMX8MQ_GPR_PCIE_CLK_REQ_OVERRIDE_EN  BIT(10)
> +#define IMX8MQ_GPR_PCIE_CLK_REQ_OVERRIDE BIT(11)
> +
> +
>  #define to_imx6_pcie(x)  dev_get_drvdata((x)->dev)
> 
>  enum imx6_pcie_variants {
> @@ -37,6 +46,7 @@ enum imx6_pcie_variants {
>   IMX6SX,
>   IMX6QP,
>   IMX7D,
> + IMX8MQ,
>  };
> 
>  struct imx6_pcie {
> @@ -48,8 +58,10 @@ struct imx6_pcie {
>   struct clk  *pcie_inbound_axi;
>   struct clk  *pcie;
>   struct regmap   *iomuxc_gpr;
> + u32 gpr1x;
>   struct reset_control*pciephy_reset;
>   struct reset_control*apps_reset;
> + struct reset_control*apps_clk_req;
>   struct reset_control*turnoff_reset;
>   enum imx6_pcie_variants variant;
>   u32 tx_deemph_gen1;
> @@ -59,6 +71,7 @@ struct imx6_pcie {
>   u32 tx_swing_low;
>   int link_gen;
>   struct regulator*vpcie;
> + u32 device_type[2];
>  };
> 
>  /* Parameters for the waiting for PCIe PHY PLL to lock on i.MX7 */ @@
> -245,7 +258,8 @@ static void imx6_pcie_reset_phy(struct imx6_pcie
> *imx6_pcie)  {
>   u32 tmp;
> 
> - if (imx6_pcie->variant == IMX7D)
> + if (imx6_pcie->variant == IMX7D ||
> + imx6_pcie->variant == IMX8MQ)
>   return;
> 
>   pcie_phy_read(imx6_pcie, PHY_RX_OVRD_IN_LO, ); @@ -261,6
> +275,7 @@ static void imx6_pcie_reset_phy(struct imx6_pcie *imx6_pcie)
>   pcie_phy_write(imx6_pcie, PHY_RX_OVRD_IN_LO, tmp);  }
> 
> +#ifdef CONFIG_ARM
>  /*  Added for PCI abort handling */
>  static int imx6q_pcie_abort_handler(unsigned long addr,
>   unsigned int fsr, struct pt_regs *regs) @@ -294,6 +309,7 @@ 
> static
> int imx6q_pcie_abort_handler(unsigned long addr,
> 
>   return 1;
>  }
> +#endif
> 
>  static void imx6_pcie_assert_core_reset(struct imx6_pcie *imx6_pcie)
> { @@ -301,6 +317,7 @@ static void imx6_pcie_assert_core_reset(struct
> imx6_pcie *imx6_pcie)
> 
>   switch (imx6_pcie->variant) {
>   case IMX7D:
> + case IMX8MQ: /* FALLTHROUGH */
>   reset_control_assert(imx6_pcie->pciephy_reset);
>   reset_control_assert(imx6_pcie->apps_reset);
>   break;
> @@ -369,6 +386,18 @@ static int imx6_pcie_enable_ref_clk(struct
> imx6_pcie *imx6_pcie)
>   break;
>   case IMX7D:
>   break;
> + case IMX8MQ:
> + /*
> +  * Set the over ride low and enabled
> +  * make sure that REF_CLK is turned on.
> +  */
> + regmap_update_bits(imx6_pcie->iomuxc_gpr, imx6_pcie->gpr1x,
> +IMX8MQ_GPR_PCIE_CLK_REQ_OVERRIDE,
> + 

Re: [PATCH 1/2] dt-bindings: phy: Add Qualcomm Synopsys High-Speed USB PHY binding

2018-11-18 Thread Shawn Guo
On Sat, Nov 17, 2018 at 09:13:38AM -0600, Rob Herring wrote:

> > > > +- qcom,init-seq:
> > > > +Value type: 
> > > > +Definition: Should contain a sequence of  
> > > > tuples to
> > > > +program 'value' into phy register at 'offset' with 
> > > > 'delay'
> > > > +   in us afterwards.
> > >
> > > If we wanted this type of thing in DT, we'd have a generic binding (or
> > > forth).
> >
> > Right now, this is a qualcomm usb phy specific bindings - first used in
> > qcom,usb-hs-phy.txt and I extended it a bit for my phy.  As this is not
> > a so good hardware description, I'm a little hesitated to make it
> > generic for other platforms to use in general.  What about we put off it
> > a little bit until we see more platforms need the same thing?
> 
> I'm not saying I want it generic. Quite the opposite. I don't think we
> should have it either generically or vendor specific. The main thing I
> have a problem with is the timing information because then we're more
> that just data. Without that we're talking about a bunch of properties
> for register fields or just raw register values in DT. That becomes
> more of a judgement call. There's not too much value in making a
> driver translate a bunch of properties just to stuff them into
> registers on init. But then just allowing any raw register value to be
> in DT could be easily abused.

Rob,

I agree with your comments.  Honestly, I'm not comfortable with this
'qcom,init-seq' thing in the first impression.  The similar existence in
mainline qcom,usb-hs-phy.txt makes me think it might be acceptable with
the timing data added.  Okay, I know your position on this now.

@Sriharsha,

Seeing that 'qcom,init-seq' is being configured with the exactly same
values for both HS phys in SoC level dts file (qcs404.dtsi), I think
such settings can be moved into driver code as SoC specific data.
Unless you have a different view on this, I will do it with v4.

Shawn


[PATCH 2/2] iio: adc: ti_am335x_tscadc: Improve accuracy of measurement

2018-11-18 Thread Vignesh R
When performing single ended measurements with TSCADC, its recommended
to set negative input (SEL_INM_SWC_3_0) of ADC step to ADC's VREFN in the
corresponding STEP_CONFIGx register.
Also, the positive(SEL_RFP_SWC_2_0) and negative(SEL_RFM_SWC_1_0)
reference voltage for ADC step needs to be set to VREFP and VREFN
respectively in STEP_CONFIGx register.
Without these changes, there may be variation of as much as ~2% in the
ADC's digital output which is bad for precise measurement.

Signed-off-by: Vignesh R 
---
 drivers/iio/adc/ti_am335x_adc.c  | 5 -
 include/linux/mfd/ti_am335x_tscadc.h | 4 
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/iio/adc/ti_am335x_adc.c b/drivers/iio/adc/ti_am335x_adc.c
index cafb1dcadc48..9d984f2a8ba7 100644
--- a/drivers/iio/adc/ti_am335x_adc.c
+++ b/drivers/iio/adc/ti_am335x_adc.c
@@ -142,7 +142,10 @@ static void tiadc_step_config(struct iio_dev *indio_dev)
stepconfig |= STEPCONFIG_MODE_SWCNT;
 
tiadc_writel(adc_dev, REG_STEPCONFIG(steps),
-   stepconfig | STEPCONFIG_INP(chan));
+   stepconfig | STEPCONFIG_INP(chan) |
+   STEPCONFIG_INM_ADCREFM |
+   STEPCONFIG_RFP_VREFP |
+   STEPCONFIG_RFM_VREFN);
 
if (adc_dev->open_delay[i] > STEPDELAY_OPEN_MASK) {
dev_warn(dev, "chan %d open delay truncating to 
0x3\n",
diff --git a/include/linux/mfd/ti_am335x_tscadc.h 
b/include/linux/mfd/ti_am335x_tscadc.h
index b9a53e013bff..483168403ae5 100644
--- a/include/linux/mfd/ti_am335x_tscadc.h
+++ b/include/linux/mfd/ti_am335x_tscadc.h
@@ -78,6 +78,8 @@
 #define STEPCONFIG_YNN BIT(8)
 #define STEPCONFIG_XNP BIT(9)
 #define STEPCONFIG_YPN BIT(10)
+#define STEPCONFIG_RFP(val)((val) << 12)
+#define STEPCONFIG_RFP_VREFP   (0x3 << 12)
 #define STEPCONFIG_INM_MASK(0xF << 15)
 #define STEPCONFIG_INM(val)((val) << 15)
 #define STEPCONFIG_INM_ADCREFM STEPCONFIG_INM(8)
@@ -86,6 +88,8 @@
 #define STEPCONFIG_INP_AN4 STEPCONFIG_INP(4)
 #define STEPCONFIG_INP_ADCREFM STEPCONFIG_INP(8)
 #define STEPCONFIG_FIFO1   BIT(26)
+#define STEPCONFIG_RFM(val)((val) << 23)
+#define STEPCONFIG_RFM_VREFN   (0x3 << 23)
 
 /* Delay register */
 #define STEPDELAY_OPEN_MASK(0x3 << 0)
-- 
2.19.1



[PATCH 1/2] mfd: ti_am335x_tscadc: Provide unique name for child mfd cells

2018-11-18 Thread Vignesh R
Provide unique names for child mfd cells, this is required in order to
support registering of multiple instances of same ti_am335x_tscadc IP.

Signed-off-by: Vignesh R 
---
 drivers/mfd/ti_am335x_tscadc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/mfd/ti_am335x_tscadc.c b/drivers/mfd/ti_am335x_tscadc.c
index c2d47d78705b..ee08af34f81d 100644
--- a/drivers/mfd/ti_am335x_tscadc.c
+++ b/drivers/mfd/ti_am335x_tscadc.c
@@ -248,7 +248,8 @@ static  int ti_tscadc_probe(struct platform_device 
*pdev)
if (tsc_wires > 0) {
tscadc->tsc_cell = tscadc->used_cells;
cell = >cells[tscadc->used_cells++];
-   cell->name = "TI-am335x-tsc";
+   cell->name = devm_kasprintf(>dev, GFP_KERNEL, "%s:%s",
+   dev_name(>dev), "tsc");
cell->of_compatible = "ti,am3359-tsc";
cell->platform_data = 
cell->pdata_size = sizeof(tscadc);
@@ -258,7 +259,8 @@ static  int ti_tscadc_probe(struct platform_device 
*pdev)
if (adc_channels > 0) {
tscadc->adc_cell = tscadc->used_cells;
cell = >cells[tscadc->used_cells++];
-   cell->name = "TI-am335x-adc";
+   cell->name = devm_kasprintf(>dev, GFP_KERNEL, "%s:%s",
+   dev_name(>dev), "adc");
cell->of_compatible = "ti,am3359-adc";
cell->platform_data = 
cell->pdata_size = sizeof(tscadc);
-- 
2.19.1



[PATCH 0/2] tscadc: Couple of fixes

2018-11-18 Thread Vignesh R
Couple of fixes for tscadc drivers that I found while adding support for
new SoC. Patches are standalone and can go via individual domain trees.

Vignesh R (2):
  mfd: ti_am335x_tscadc: Provide unique name for child mfd cells
  iio: adc: ti_am335x_tscadc: Improve accuracy of measurement

 drivers/iio/adc/ti_am335x_adc.c  | 5 -
 drivers/mfd/ti_am335x_tscadc.c   | 6 --
 include/linux/mfd/ti_am335x_tscadc.h | 4 
 3 files changed, 12 insertions(+), 3 deletions(-)

-- 
2.19.1



Re: [PATCH v4] drivers/vfio: Fix a redundant copy bug

2018-11-18 Thread David Gibson
On Mon, Oct 29, 2018 at 03:32:34PM -0600, Alex Williamson wrote:
> On Mon, 29 Oct 2018 13:56:54 -0500
> Wenwen Wang  wrote:
> 
> > Hello,
> > 
> > Could you please apply this patch? Thanks!
> 
> I'd like to see testing and/or review from David or Alexey since I also
> don't have an environment for spapr/eeh.  We're already late into the
> v4.20 merge window so this is probably v4.21 material.  Thanks,

I didn't spot this for ages, since I wasn't CCed, I'm guessing the
same is true for Alexey.

TBH, I don't think it's worth bothering with this.  It's a tiny amount
of extra work on a rare debug path.  A couple of code lines simplicity
is worth more than a few bytes worth of redundant copy here.

Testing is.. difficult, since the EEH is already so broken it's hard
to know what to compare against.  Sam Bobroff is already looking
medium-long term at a bunch of EEH rework, so it's quite possible this
will be rewritten then anyway.

> 
> Alex
> 
> > On Wed, Oct 17, 2018 at 2:18 PM Wenwen Wang  wrote:
> > >
> > > In vfio_spapr_iommu_eeh_ioctl(), if the ioctl command is VFIO_EEH_PE_OP,
> > > the user-space buffer 'arg' is copied to the kernel object 'op' and the
> > > 'argsz' and 'flags' fields of 'op' are checked. If the check fails, an
> > > error code EINVAL is returned. Otherwise, 'op.op' is further checked
> > > through a switch statement to invoke related handlers. If 'op.op' is
> > > VFIO_EEH_PE_INJECT_ERR, the whole user-space buffer 'arg' is copied again
> > > to 'op' to obtain the err information. However, in the following execution
> > > of this case, the fields of 'op', except the field 'err', are actually not
> > > used. That is, the second copy has a redundant part. Therefore, for
> > > performance consideration, the redundant part of the second copy should be
> > > removed.
> > >
> > > This patch removes such a part in the second copy. It only copies from
> > > 'err.type' to 'err.mask', which is exactly required by the
> > > VFIO_EEH_PE_INJECT_ERR op.
> > >
> > > Signed-off-by: Wenwen Wang 
> > > ---
> > >  drivers/vfio/vfio_spapr_eeh.c | 9 ++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/vfio/vfio_spapr_eeh.c b/drivers/vfio/vfio_spapr_eeh.c
> > > index 38edeb4..66634c6 100644
> > > --- a/drivers/vfio/vfio_spapr_eeh.c
> > > +++ b/drivers/vfio/vfio_spapr_eeh.c
> > > @@ -37,6 +37,7 @@ long vfio_spapr_iommu_eeh_ioctl(struct iommu_group 
> > > *group,
> > > struct eeh_pe *pe;
> > > struct vfio_eeh_pe_op op;
> > > unsigned long minsz;
> > > +   unsigned long start, end;
> > > long ret = -EINVAL;
> > >
> > > switch (cmd) {
> > > @@ -86,10 +87,12 @@ long vfio_spapr_iommu_eeh_ioctl(struct iommu_group 
> > > *group,
> > > ret = eeh_pe_configure(pe);
> > > break;
> > > case VFIO_EEH_PE_INJECT_ERR:
> > > -   minsz = offsetofend(struct vfio_eeh_pe_op, 
> > > err.mask);
> > > -   if (op.argsz < minsz)
> > > +   start = offsetof(struct vfio_eeh_pe_op, err.type);
> > > +   end = offsetofend(struct vfio_eeh_pe_op, 
> > > err.mask);
> > > +   if (op.argsz < end)
> > > return -EINVAL;
> > > -   if (copy_from_user(, (void __user *)arg, 
> > > minsz))
> > > +   if (copy_from_user(, (char __user *)arg +
> > > +   start, end - start))
> > > return -EFAULT;
> > >
> > > ret = eeh_pe_inject_err(pe, op.err.type, 
> > > op.err.func,
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH RFC] hist lookups

2018-11-18 Thread David Miller
From: Namhyung Kim 
Date: Mon, 19 Nov 2018 15:28:37 +0900

> Hello David,
> 
> On Sun, Nov 18, 2018 at 08:52:43PM -0800, David Miller wrote:
>> From: Jiri Olsa 
>> Date: Tue, 13 Nov 2018 11:40:54 +0100
>> 
>> > I pushed/rebased what I have to perf/fixes branch again
>> > 
>> > please note I had to change our compile changes, because
>> > they wouldn't compile on x86, but I can't verify on sparc,
>> > so you might see some compile fails again
>> 
>> I just checked your current perf/fixes branch.
>> 
>> It builds on Sparc ;-)
>> 
>> And it behaves better too.  I do get tons of drops and lost events,
>> but it seems to keep going even during the hardest load.
>> 
>> Eventually I end up with a lot of unresolvable histogram entries,
>> so that is something to look into.
> 
> Did you record callchains as well?  I'd like to know whether it's
> related to the children (cumulative) mode or not.

I did not have callchains on, just plain "./perf top"


Re: [PATCH RFC] hist lookups

2018-11-18 Thread Namhyung Kim
Hello David,

On Sun, Nov 18, 2018 at 08:52:43PM -0800, David Miller wrote:
> From: Jiri Olsa 
> Date: Tue, 13 Nov 2018 11:40:54 +0100
> 
> > I pushed/rebased what I have to perf/fixes branch again
> > 
> > please note I had to change our compile changes, because
> > they wouldn't compile on x86, but I can't verify on sparc,
> > so you might see some compile fails again
> 
> I just checked your current perf/fixes branch.
> 
> It builds on Sparc ;-)
> 
> And it behaves better too.  I do get tons of drops and lost events,
> but it seems to keep going even during the hardest load.
> 
> Eventually I end up with a lot of unresolvable histogram entries,
> so that is something to look into.

Did you record callchains as well?  I'd like to know whether it's
related to the children (cumulative) mode or not.

Thanks,
Namhyung


> 
> I looked at your drop logic and it seems perfect, we avoid dropping
> all non-SAMPLE events which is what we want.  So that can't be the
> cause of the issues I am seeing.
> 


[Patch v3 ] drm/msm/dpu: Correct dpu destroy and disable order

2018-11-18 Thread Jayant Shekhar
In case of msm drm bind failure, pm runtime put sync
is called from dsi driver which issues an asynchronous
put on mdss device. Subsequently when dpu_mdss_destroy
is triggered the change will make sure to put the mdss
device in suspend and clearing pending work if not
scheduled.

Changes in v2:
   - Removed double spacings [Jeykumar]

Changes in v3:
   - Fix clock on issue during bootup [Rajendra]

Signed-off-by: Jayant Shekhar 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c
index fd9c893..df8127b 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c
@@ -156,18 +156,16 @@ static void dpu_mdss_destroy(struct drm_device *dev)
struct dpu_mdss *dpu_mdss = to_dpu_mdss(priv->mdss);
struct dss_module_power *mp = _mdss->mp;
 
+   pm_runtime_suspend(dev->dev);
+   pm_runtime_disable(dev->dev);
_dpu_mdss_irq_domain_fini(dpu_mdss);
-
free_irq(platform_get_irq(pdev, 0), dpu_mdss);
-
msm_dss_put_clk(mp->clk_config, mp->num_clk);
devm_kfree(>dev, mp->clk_config);
 
if (dpu_mdss->mmio)
devm_iounmap(>dev, dpu_mdss->mmio);
dpu_mdss->mmio = NULL;
-
-   pm_runtime_disable(dev->dev);
priv->mdss = NULL;
 }
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [RFC PATCH] zinc chacha20 generic implementation using crypto API code

2018-11-18 Thread Herbert Xu
Hi Jason:

On Mon, Nov 19, 2018 at 07:13:07AM +0100, Jason A. Donenfeld wrote:
>
> Sorry, but there isn't a chance I'll agree to this. The whole idea is
> to have a clean place to focus on synchronous software
> implementations, and not keep it tangled up with the asynchronous API.

There is nothing asynchronous in the relevant crypto code at all.
The underlying CPU-based software implementations are all completely
synchronous, including the architecture-specific ones such as
x86/arm.

So I don't understand why you are rejecting it.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


RE: [LINUX PATCH v12 3/3] mtd: rawnand: arasan: Add support for Arasan NAND Flash Controller

2018-11-18 Thread Naga Sureshkumar Relli
H Boris,

> -Original Message-
> From: Boris Brezillon [mailto:boris.brezil...@bootlin.com]
> Sent: Monday, November 19, 2018 1:13 AM
> To: Naga Sureshkumar Relli 
> Cc: miquel.ray...@bootlin.com; rich...@nod.at; dw...@infradead.org;
> computersforpe...@gmail.com; marek.va...@gmail.com; 
> linux-...@lists.infradead.org; linux-
> ker...@vger.kernel.org; nagasures...@gmail.com; r...@kernel.org; Michal Simek
> 
> Subject: Re: [LINUX PATCH v12 3/3] mtd: rawnand: arasan: Add support for 
> Arasan NAND
> Flash Controller
> 
> On Thu, 15 Nov 2018 09:34:16 +
> Naga Sureshkumar Relli  wrote:
> 
> > Hi Boris & Miquel,
> >
> > I am updating the driver by addressing your comments, and I have one
> > concern,  especially in anfc_read_page_hwecc(), there I am checking for 
> > erased pages bit flips.
> > Since Arasan NAND controller doesn't have multibit error detection
> > beyond 24-bit( it can correct up to 24 bit), i.e. there is no indication 
> > from controller to detect
> uncorrectable error beyond 24bit.
> 
> Do you mean that you can't detect uncorrectable errors, or just that it's not 
> 100% sure to detect
> errors above max_strength?
Yes, in Arasan NAND controller there is no way to detect uncorrectable errors 
beyond 24-bit.
> 
> > So I took some error count as default value(MULTI_BIT_ERR_CNT  16, I
> > put this based on the error count that I got while reading erased page on 
> > Micron device).
> > And during a page read, will just read the error count register and
> > compare this value with the default error count(16) and if it is more Than 
> > default then I am
> checking for erased page bit flips.
> 
> Hm, that's wrong, especially if you set ecc_strength to something > 16.
Ok
> 
> > I am doubting that this will not work in all cases.
> 
> It definitely doesn't.
Ok
> 
> > In my case it is just working because the error count that it got on an 
> > erased page is 16.
> > Could you please suggest a way to do detect erased_page bit flips when 
> > reading a page with
> HW-ECC?.
> 
> I'm a bit lost. Is the problem only about bitflips in erase pages, or is it 
> also impacting reads of
> written pages that lead to uncorrectable errors.
Yes, it is for both. But in case of read errors that we can't detect beyond 
24-bit, then the answer from HW design team
Is that the flash part is bad.
Unfortunately till now we haven't ran into that situation(read errors of 
written pages beyond 24-bit).
But we are hitting this because of erased page reading(needed in case of ubifs).

> 
> Don't you have a bit (or several bits) reporting when the ECC engine was not 
> able to correct
> data? I you do, you should base the "detect bitflips in erase pages" logic on 
> this information.
Bit reporting for several bit errors is there only for Hamming(1bit correction 
and 2bit detection) but not in BCH.

Thanks,
Naga Sureshkumar Relli


Re: [RFC PATCH] zinc chacha20 generic implementation using crypto API code

2018-11-18 Thread Jason A. Donenfeld
Hi Herbert,

On Mon, Nov 19, 2018 at 6:25 AM Herbert Xu  wrote:
> My proposal is to merge the zinc interface as is, but to invert
> how we place the algorithm implementations.  IOW the implementations
> should stay where they are now, with in the crypto API.  However,
> we will provide direct access to them for zinc without going through
> the crypto API.  IOW we'll just export the functions directly.

Sorry, but there isn't a chance I'll agree to this. The whole idea is
to have a clean place to focus on synchronous software
implementations, and not keep it tangled up with the asynchronous API.
If WireGuard and Zinc go in, it's going to be done right, not in a way
that introduces more problems and organizational complexity. Avoiding
the solution proposed here is kind of the point. And given that
cryptographic code is so security sensitive, I want to make sure this
is done right, as opposed to rushing it with some half-baked
concoction that will be maybe possibly be replaced "later". From
talking to folks at Plumbers, I believe most of the remaining issues
are taken care of by the upcoming v9, and that you're overstating the
status of discussions regarding that.

I think the remaining issue right now is how to order my series and
Eric's series. If Eric's goes in first, it means that I can either a)
spend some time developing Zinc further _now_ to support chacha12 and
keep the top two conversion patches, or b) just drop the top two
conversion patches and work that out carefully with Eric after. I
think (b) would be better, but I'm happy to do (a) if you disagree.
And as I mentioned in the last email, I'd prefer Eric's work to go in
after Zinc, but I don't think the reasoning for that is particularly
strong, so it seems fine to merge Eric's work first.

I'll post v9 pretty soon and you can see how things are shaping up.
Hopefully before then Eric/Ard/you can provide some feedback on
whether you'd prefer (a) or (b) above.

Regards,
Jason


[PATCH] iio: st_sensors: Fix the sleep time for sampling

2018-11-18 Thread Jian-Hong Pan
According to the description of st_sensor_settings and st_sensor_data
structures' comments:
- bootime: samples to discard when sensor passing from power-down to
power-up.
- odr: Output data rate of the sensor [Hz].

The sleep time should be
sdata->sensor_settings->bootime + 1000 / sdata->odr ms.

Signed-off-by: Jian-Hong Pan 
---
 drivers/iio/common/st_sensors/st_sensors_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iio/common/st_sensors/st_sensors_core.c 
b/drivers/iio/common/st_sensors/st_sensors_core.c
index 26fbd1bd9413..6b87ea657a92 100644
--- a/drivers/iio/common/st_sensors/st_sensors_core.c
+++ b/drivers/iio/common/st_sensors/st_sensors_core.c
@@ -594,7 +594,7 @@ int st_sensors_read_info_raw(struct iio_dev *indio_dev,
if (err < 0)
goto out;
 
-   msleep((sdata->sensor_settings->bootime * 1000) / sdata->odr);
+   msleep(sdata->sensor_settings->bootime + 1000 / sdata->odr);
err = st_sensors_read_axis_data(indio_dev, ch, val);
if (err < 0)
goto out;
-- 
2.11.0



Re: [PATCH v5 04/12] drm/mediatek: Add support for mmsys through a pdev

2018-11-18 Thread CK Hu
Hi, Matthias:

On Fri, 2018-11-16 at 13:54 +0100, matthias@kernel.org wrote:
> From: Matthias Brugger 
> 
> The MMSYS subsystem includes clocks and drm components.
> This patch adds an initailization path through a platform device
> for the clock part, so that both drivers get probed from the same
> device tree compatible.

Looks good to me except one coding style preference.

Reviewed-by: CK Hu 

> 
> Signed-off-by: Matthias Brugger 
> ---
>  drivers/gpu/drm/mediatek/mtk_drm_drv.c | 23 +++
>  drivers/gpu/drm/mediatek/mtk_drm_drv.h |  2 ++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c 
> b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> index 99dd612a6683..18fc761ba94f 100644
> --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> @@ -199,6 +199,7 @@ static const struct mtk_mmsys_driver_data 
> mt2701_mmsys_driver_data = {
>   .ext_path = mt2701_mtk_ddp_ext,
>   .ext_len = ARRAY_SIZE(mt2701_mtk_ddp_ext),
>   .shadow_register = true,
> + .clk_drv_name = "clk-mt2701-mm",
>  };
>  
>  static const struct mtk_mmsys_driver_data mt2712_mmsys_driver_data = {
> @@ -215,6 +216,7 @@ static const struct mtk_mmsys_driver_data 
> mt8173_mmsys_driver_data = {
>   .main_len = ARRAY_SIZE(mt8173_mtk_ddp_main),
>   .ext_path = mt8173_mtk_ddp_ext,
>   .ext_len = ARRAY_SIZE(mt8173_mtk_ddp_ext),
> + .clk_drv_name = "clk-mt8173-mm",
>  };
>  
>  static int mtk_drm_kms_init(struct drm_device *drm)
> @@ -473,6 +475,24 @@ static int mtk_drm_probe(struct platform_device *pdev)
>   if (IS_ERR(private->config_regs))
>   return PTR_ERR(private->config_regs);
>  
> + /*
> +  * For legacy reasons we need to probe the clock driver via
> +  * a platfomr device. This is outdated and should not be used
> +  * in newer SoCs.
> +  */
> + if (private->data->clk_drv_name) {
> + private->clk_dev = platform_device_register_data(dev,
> + private->data->clk_drv_name, -1,
> + NULL, 0);
> +
> + if (IS_ERR(private->clk_dev)) {
> + pr_err("failed to register %s platform device\n",
> + private->data->clk_drv_name);

I would like to align to the right of '('.

Regards,
CK

> +
> + return PTR_ERR(private->clk_dev);
> + }
> + }
> +
>   /* Iterate over sibling DISP function blocks */
>   for_each_child_of_node(dev->of_node->parent, node) {
>   const struct of_device_id *of_id;
> @@ -577,6 +597,9 @@ static int mtk_drm_remove(struct platform_device *pdev)
>   for (i = 0; i < DDP_COMPONENT_ID_MAX; i++)
>   of_node_put(private->comp_node[i]);
>  
> + if (private->clk_dev)
> + platform_device_unregister(private->clk_dev);
> +
>   return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.h 
> b/drivers/gpu/drm/mediatek/mtk_drm_drv.h
> index ab0adbd7d4ee..515ac4cae922 100644
> --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.h
> +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.h
> @@ -37,11 +37,13 @@ struct mtk_mmsys_driver_data {
>   unsigned int third_len;
>  
>   bool shadow_register;
> + const char *clk_drv_name;
>  };
>  
>  struct mtk_drm_private {
>   struct drm_device *drm;
>   struct device *dma_dev;
> + struct platform_device *clk_dev;
>  
>   unsigned int num_pipes;
>  




linux-next: Tree for Nov 19

2018-11-18 Thread Stephen Rothwell
Hi all,

Changes since 20181116:

Removed trees:  (not updated in more that a year)
aio, drm-panel, edac, idle, keys, kvm-mips, lightnvm,
nand-fixes, realtek, sunxi-drm, uuid, vfs-miklos

The tip tree still had its build failure for which I applied a fix patch.

The akpm tree lost a patch that turned up elsewhere.

Non-merge commits (relative to Linus' tree): 3389
 3518 files changed, 137024 insertions(+), 108059 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 286 trees (counting Linus' and 67 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (9ff01193a20d Linux 4.20-rc3)
Merging fixes/master (7c6c54b505b8 Merge branch 'i2c/for-next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux)
Merging kbuild-current/fixes (ccda4af0f4b9 Linux 4.20-rc2)
Merging arc-current/for-curr (121e38e5acdc ARC: mm: fix uninitialised signal 
code in do_page_fault)
Merging arm-current/fixes (e46daee53bb5 ARM: 8806/1: kprobes: Fix false 
positive with FORTIFY_SOURCE)
Merging arm64-fixes/for-next/fixes (24cc61d8cb5a arm64: memblock: don't permit 
memblock resizing until linear mapping is up)
Merging m68k-current/for-linus (58c116fb7dc6 m68k/sun3: Remove is_medusa and 
m68k_pgtable_cachemode)
Merging powerpc-fixes/fixes (b2fed34a628d selftests/powerpc: Adjust wild_bctr 
to build with old binutils)
Merging sparc/master (25e19c1fe421 Merge tag 'libnvdimm-fixes-4.20-rc3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (1c1274a56999 tipc: don't assume linear buffer when reading 
ancillary data)
Merging bpf/master (569a933b03f3 bpf: allocate local storage buffers using 
GFP_ATOMIC)
Merging ipsec/master (ca92e173ab34 xfrm: Fix bucket count reported to userspace)
Merging netfilter/master (38d37baa8123 ipvs: call ip_vs_dst_notifier earlier 
than ipv6_dev_notf)
Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates 
of non-anonymous set)
Merging wireless-drivers/master (1770f0fa978e mt76: fix uninitialized mutex 
access setting rts threshold)
Merging mac80211/master (113f3aaa81bd cfg80211: Prevent regulatory restore 
during STA disconnect in concurrent interfaces)
Merging rdma-fixes/for-rc (99b77fef3c6c net/mlx5: Fix XRC SRQ umem valid bits)
Merging sound-current/for-linus (d99501b8575d ALSA: hda/ca0132 - Call 
pci_iounmap() instead of iounmap())
Merging sound-asoc-fixes/for-linus (f75d53d33907 Merge branch 'asoc-4.20' into 
asoc-linus)
Merging regmap-fixes/for-linus (ccda4af0f4b9 Linux 4.20-rc2)
Merging regulator-fixes/for-linus (6b46411b9ed3 Merge branch 'regulator-4.20' 
into regulator-linus)
Merging spi-fixes/for-linus (5dbff5ab4135 Merge branch 'spi-4.20' into 
spi-linus)
Merging pci-current/for-linus (1a87119b7bcf Revert "ACPI/PCI: Pay attention to 
device-specific _PXM node values")
Merging driver-core.current/driver-core-linus (a66d972465d1 devres: Align 
data[] to ARCH_KMALLOC_MINALIGN)
Merging tty.current/tty-linus (ccda4af0f4b9 Linux 4.20-rc2)
Merging usb.current/usb-linus (2f31a67f01a8 usb: xhci: Prevent bus suspend if a 
port connect change or polling state is detected)
Merging usb-gadget-fixes/fixes (2fc6d4be35fb usb: dwc3: gadget: fix ISOC TRB 
type on unaligned transfers)
Merging 

Re: [PATCH 0/7] ACPI HMAT memory sysfs representation

2018-11-18 Thread Anshuman Khandual



On 11/16/2018 10:25 PM, Dave Hansen wrote:
> On 11/15/18 10:27 PM, Anshuman Khandual wrote:
>> Not able to see the patches from this series either on the list or on the
>> archive (https://lkml.org/lkml/2018/11/15/331). IIRC last time we discussed
>> about this and the concern which I raised was in absence of a broader NUMA
>> rework for multi attribute memory it might not a good idea to settle down
>> and freeze sysfs interface for the user space. 
> 

Hello Dave,

> This *is* the broader NUMA rework.  I think it's just a bit more
> incremental that what you originally had in mind.

IIUC NUMA re-work in principle involves these functional changes

1. Enumerating compute and memory nodes in heterogeneous environment 
(short/medium term)
2. Enumerating memory node attributes as seen from the compute nodes 
(short/medium term)
3. Changing core MM to accommodate multi attribute memory (long term)

The first two set of changes can get the user space applications
moving by identifying the right nodes and their attributes through
sysfs interface.

> 
> Did you have an alternative for how you wanted this to look?
> 

No. I did not get enough time this year to rework on the original
proposal I had. But will be able to help here to make this interface
more generic, abstract out these properties which is extensible in
the future.

- Anshuman


[PATCH v1 2/2]: Documentation/admin-guide: introduce perf-security.rst file

2018-11-18 Thread Alexey Budankov


Implement initial version of perf-security.rst documentation file 
initially covering security concerns related to PCL/Perf performance 
monitoring in multiuser environments.

Suggested-by: Thomas Gleixner 
Signed-off-by: Alexey Budankov 
---
 Documentation/admin-guide/perf-security.rst | 83 +
 1 file changed, 83 insertions(+)

diff --git a/Documentation/admin-guide/perf-security.rst 
b/Documentation/admin-guide/perf-security.rst
new file mode 100644
index ..b9564066e686
--- /dev/null
+++ b/Documentation/admin-guide/perf-security.rst
@@ -0,0 +1,83 @@
+.. _perf_security:
+
+PCL/Perf security
+=
+
+Overview
+
+
+Usage of Performance Counters for Linux (PCL) [1]_ , [2]_ , [3]_ can impose a
+considerable risk of leaking sensitive data accessed by monitored processes.
+The data leakage is possible both in scenarios of direct usage of PCL system
+call API [2]_ and over data files generated by Perf tool user mode utility
+(Perf) [3]_ , [4]_ . The risk depends on the nature of data that PCL 
performance
+monitoring units (PMU) [2]_ collect and expose for performance analysis.
+Having that said PCL/Perf performance monitoring is the subject for security
+access control management [5]_ .
+
+PCL/Perf access control
+---
+
+For the purpose of performing security checks Linux implementation splits
+processes into two categories [6]_ : a) privileged processes (whose effective
+user ID is 0, referred to as superuser or root), and b) unprivileged processes
+(whose effective UID is nonzero). Privileged processes bypass all kernel
+security permission checks so PCL performance monitoring is fully available to
+privileged processes without *access*, *scope* and *resource* restrictions.
+Unprivileged processes are subject to full security permission check based
+on the process's credentials [5]_ (usually: effective UID, effective GID,
+and supplementary group list).
+
+PCL/Perf unprivileged users
+---
+
+PCL/Perf *scope* and *access* control for unprivileged processes is governed by
+perf_event_paranoid [2]_ setting:
+
+**-1**:
+ Impose no *scope* and *access* restrictions on using PCL performance
+ monitoring. Per-user per-cpu perf_event_mlock_kb [2]_ locking limit is
+ ignored when allocating memory buffers for storing performance data.
+ This is the least secure mode since allowed monitored *scope* is
+ maximized and no PCL specific limits are imposed on *resources*
+ allocated for performance monitoring.
+
+**>=0**:
+ *scope* includes per-process and system wide performance monitoring
+ but excludes raw tracepoints and ftrace function tracepoints monitoring.
+ CPU and system events happened when executing either in user or
+ in kernel space can be monitored and captured for later analysis.
+ Per-user per-cpu perf_event_mlock_kb locking limit is imposed but
+ ignored for unprivileged processes with CAP_IPC_LOCK [6]_ capability.
+
+**>=1**:
+ *scope* includes per-process performance monitoring only and excludes
+ system wide performance monitoring. CPU and system events happened when
+ executing either in user or in kernel space can be monitored and
+ captured for later analysis. Per-user per-cpu perf_event_mlock_kb
+ locking limit is imposed but ignored for unprivileged processes with
+ CAP_IPC_LOCK capability.
+
+**>=2**:
+ *scope* includes per-process performance monitoring only. CPU and system
+ events happened when executing in user space only can be monitored and
+ captured for later analysis. Per-user per-cpu perf_event_mlock_kb
+ locking limit is imposed but ignored for unprivileged processes with
+ CAP_IPC_LOCK capability.
+
+**>=3**:
+ Restrict *access* to PCL performance monitoring for unprivileged 
processes.
+ This is the default on Debian and Android [7]_ , [8]_ .
+
+Bibliography
+
+
+.. [1] ``_
+.. [2] ``_
+.. [3] ``_
+.. [4] ``_
+.. [5] ``_
+.. [6] ``_
+.. [7] ``_
+.. [8] ``_
+



[PATCH v1 1/2]: Documentation/admin-guide: update admin-guide index.rst

2018-11-18 Thread Alexey Budankov


Extend index.rst index file at admin-guide root directory with 
the reference to perf-security.rst file being introduced.

Signed-off-by: Alexey Budankov 
---
 Documentation/admin-guide/index.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/admin-guide/index.rst 
b/Documentation/admin-guide/index.rst
index 0873685bab0f..885cc0de9114 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -75,6 +75,7 @@ configure specific aspects of kernel behavior to your liking.
thunderbolt
LSM/index
mm/index
+   perf-security
 
 .. only::  subproject and html


Re: [PATCH v5 05/12] drm: mediatek: Omit warning on probe defers

2018-11-18 Thread CK Hu
Hi, Matthias:

On Fri, 2018-11-16 at 13:54 +0100, matthias@kernel.org wrote:
> From: Matthias Brugger 
> 
> It can happen that the mmsys clock drivers aren't probed before the
> platform driver gets invoked. The platform driver used to print a warning
> that the driver failed to get the clocks. Omit this error on
> the defered probe path.

This patch looks good to me, but you have not modified the sub driver in
HDMI path. We could let HDMI path print the warning and someone send
another patch later, or you modify for HDMI path in this patch.
> 
> Signed-off-by: Matthias Brugger 
> ---
>  drivers/gpu/drm/mediatek/mtk_disp_color.c | 4 +++-
>  drivers/gpu/drm/mediatek/mtk_disp_ovl.c   | 4 +++-
>  drivers/gpu/drm/mediatek/mtk_disp_rdma.c  | 4 +++-
>  drivers/gpu/drm/mediatek/mtk_drm_ddp.c| 3 ++-
>  drivers/gpu/drm/mediatek/mtk_dsi.c| 6 --
>  5 files changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/mediatek/mtk_disp_color.c 
> b/drivers/gpu/drm/mediatek/mtk_disp_color.c
> index f609b62b8be6..1ea3178d4c18 100644
> --- a/drivers/gpu/drm/mediatek/mtk_disp_color.c
> +++ b/drivers/gpu/drm/mediatek/mtk_disp_color.c
> @@ -126,7 +126,9 @@ static int mtk_disp_color_probe(struct platform_device 
> *pdev)
>   ret = mtk_ddp_comp_init(dev, dev->of_node, >ddp_comp, comp_id,
>   _disp_color_funcs);
>   if (ret) {
> - dev_err(dev, "Failed to initialize component: %d\n", ret);
> + if (ret != -EPROBE_DEFER)
> + dev_err(dev, "Failed to initialize component: %d\n",
> + ret);

I would like one more blank line here.

>   return ret;
>   }
>  
> diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c 
> b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
> index 28d191192945..5ebbcaa4e70e 100644
> --- a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
> +++ b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
> @@ -293,7 +293,9 @@ static int mtk_disp_ovl_probe(struct platform_device 
> *pdev)
>   ret = mtk_ddp_comp_init(dev, dev->of_node, >ddp_comp, comp_id,
>   _disp_ovl_funcs);
>   if (ret) {
> - dev_err(dev, "Failed to initialize component: %d\n", ret);
> + if (ret != -EPROBE_DEFER)
> + dev_err(dev, "Failed to initialize component: %d\n",
> + ret);

I would like to align to the right of '('.

Regards,
CK

>   return ret;
>   }
>  
> diff --git a/drivers/gpu/drm/mediatek/mtk_disp_rdma.c 
> b/drivers/gpu/drm/mediatek/mtk_disp_rdma.c
> index b0a5cffe345a..59a08ed5fea5 100644
> --- a/drivers/gpu/drm/mediatek/mtk_disp_rdma.c
> +++ b/drivers/gpu/drm/mediatek/mtk_disp_rdma.c
> @@ -295,7 +295,9 @@ static int mtk_disp_rdma_probe(struct platform_device 
> *pdev)
>   ret = mtk_ddp_comp_init(dev, dev->of_node, >ddp_comp, comp_id,
>   _disp_rdma_funcs);
>   if (ret) {
> - dev_err(dev, "Failed to initialize component: %d\n", ret);
> + if (ret != -EPROBE_DEFER)
> + dev_err(dev, "Failed to initialize component: %d\n",
> + ret);
>   return ret;
>   }
>  
> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_ddp.c 
> b/drivers/gpu/drm/mediatek/mtk_drm_ddp.c
> index b06cd9d4b525..b76a2d071a97 100644
> --- a/drivers/gpu/drm/mediatek/mtk_drm_ddp.c
> +++ b/drivers/gpu/drm/mediatek/mtk_drm_ddp.c
> @@ -566,7 +566,8 @@ static int mtk_ddp_probe(struct platform_device *pdev)
>  
>   ddp->clk = devm_clk_get(dev, NULL);
>   if (IS_ERR(ddp->clk)) {
> - dev_err(dev, "Failed to get clock\n");
> + if (PTR_ERR(ddp->clk) != -EPROBE_DEFER)
> + dev_err(dev, "Failed to get clock\n");
>   return PTR_ERR(ddp->clk);
>   }
>  
> diff --git a/drivers/gpu/drm/mediatek/mtk_dsi.c 
> b/drivers/gpu/drm/mediatek/mtk_dsi.c
> index 90109a0d6fff..cc6de75636c3 100644
> --- a/drivers/gpu/drm/mediatek/mtk_dsi.c
> +++ b/drivers/gpu/drm/mediatek/mtk_dsi.c
> @@ -1103,14 +1103,16 @@ static int mtk_dsi_probe(struct platform_device *pdev)
>   dsi->engine_clk = devm_clk_get(dev, "engine");
>   if (IS_ERR(dsi->engine_clk)) {
>   ret = PTR_ERR(dsi->engine_clk);
> - dev_err(dev, "Failed to get engine clock: %d\n", ret);
> + if (ret != -EPROBE_DEFER)
> + dev_err(dev, "Failed to get engine clock: %d\n", ret);
>   return ret;
>   }
>  
>   dsi->digital_clk = devm_clk_get(dev, "digital");
>   if (IS_ERR(dsi->digital_clk)) {
>   ret = PTR_ERR(dsi->digital_clk);
> - dev_err(dev, "Failed to get digital clock: %d\n", ret);
> + if (ret != -EPROBE_DEFER)
> + dev_err(dev, "Failed to get digital clock: %d\n", ret);
>   return ret;
>   }
>  




[PATCH v1 0/2]: Documentation/admin-guide: introduce perf-security.rst file and extend perf_event_paranoid documentation

2018-11-18 Thread Alexey Budankov


To facilitate informed decision making by system administrators [1]
to permit and manage access to PCL/Perf [2],[3] performance monitoring 
for multiple users perf-security.rst document suggested by Thomas Gleixner 
is introduced [4] that:

a) states PCL/Perf access security concerns for multi user environment
b) refers to base Linux access control and management principles
c) extends documentation of possible perf_event_paranoid knob settings 

The file serves as single knowledge source for PCL/Perf security and 
access control related matter according to decisions, discussion and  
PoC prototype previously made here [5],[6].

The file can later be extended with information describing:

a) PCL/Perf usage models and its security implications
b) PCL/Perf user interface, its changes and related security implications
c) security related implications of monitoring by a specific PCL PMU [2]

---
 Alexey Budankov (2):
Documentation/admin-guide: update admin-guide index.rst
Documentation/admin-guide: introduce perf-security.rst file

 Documentation/admin-guide/index.rst |  1 +
 Documentation/admin-guide/perf-security.rst | 83 +
 2 files changed, 84 insertions(+)

---
[1] https://marc.info/?l=linux-kernel=153815883923913=2
[2] http://man7.org/linux/man-pages/man2/perf_event_open.2.html
[3] https://perf.wiki.kernel.org/index.php/Main_Page
[4] https://marc.info/?l=linux-kernel=153837512226838=2
[5] https://marc.info/?l=linux-kernel=153736008310781=2
[6] https://lkml.org/lkml/2018/5/21/156


Re: [PATCH v5 14/18] arm64: dts: qcom: qcs404: Add remoteproc nodes

2018-11-18 Thread Sibi Sankar

Hi Bjorn/Vinod,

On 2018-11-09 15:14, Vinod Koul wrote:

From: Bjorn Andersson 

Add the TrustZone based remoteproc nodes and their glink edges for
adsp, cdsp and wcss. Enable them for EVB common DTS.

Signed-off-by: Bjorn Andersson 
Signed-off-by: Vinod Koul 
---
 arch/arm64/boot/dts/qcom/qcs404-evb.dtsi | 12 +
 arch/arm64/boot/dts/qcom/qcs404.dtsi | 93 


 2 files changed, 105 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/qcs404-evb.dtsi
b/arch/arm64/boot/dts/qcom/qcs404-evb.dtsi
index db035fef67d9..a39924efebe4 100644
--- a/arch/arm64/boot/dts/qcom/qcs404-evb.dtsi
+++ b/arch/arm64/boot/dts/qcom/qcs404-evb.dtsi
@@ -21,6 +21,18 @@
};
 };

+_adsp {
+   status = "ok";
+};
+
+_cdsp {
+   status = "ok";
+};
+
+_wcss {
+   status = "ok";
+};
+
 _requests {
pms405-regulators {
compatible = "qcom,rpm-pms405-regulators";
diff --git a/arch/arm64/boot/dts/qcom/qcs404.dtsi
b/arch/arm64/boot/dts/qcom/qcs404.dtsi
index 46fce264c8fe..06607419c9d6 100644
--- a/arch/arm64/boot/dts/qcom/qcs404.dtsi
+++ b/arch/arm64/boot/dts/qcom/qcs404.dtsi
@@ -80,6 +80,99 @@
method = "smc";
};

+   remoteproc_adsp: remoteproc-adsp {
+   compatible = "qcom,qcs404-adsp-pas";
+
+   interrupts-extended = < GIC_SPI 293 IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 0 IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 1 IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 2 IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 3 IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "wdog", "fatal", "ready",
+ "handover", "stop-ack";
+
+   clocks = <_board>;
+   clock-names = "xo";
+
+   memory-region = <_fw_mem>;
+
+   qcom,smem-states = <_smp2p_out 0>;
+   qcom,smem-state-names = "stop";
+
+   status = "disabled";
+
+   glink-edge {
+   interrupts = ;
+
+   qcom,remote-pid = <2>;
+   mboxes = <_glb 8>;
+
+   label = "adsp";
+   };
+   };
+
+   remoteproc_cdsp: remoteproc-cdsp {
+   compatible = "qcom,qcs404-cdsp-pas";
+
+   interrupts-extended = < GIC_SPI 229 IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 0 IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 1 IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 2 IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 3 IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "wdog", "fatal", "ready",
+ "handover", "stop-ack";
+
+   clocks = <_board>;
+   clock-names = "xo";
+
+   memory-region = <_fw_mem>;
+
+   qcom,smem-states = <_smp2p_out 0>;
+   qcom,smem-state-names = "stop";
+
+   status = "disabled";
+
+   glink-edge {
+   interrupts = ;
+
+   qcom,remote-pid = <5>;
+   mboxes = <_glb 12>;
+
+   label = "cdsp";
+   };
+   };
+
+   remoteproc_wcss: remoteproc-wcss {
+   compatible = "qcom,qcs404-wcss-pas";
+
+   interrupts-extended = < GIC_SPI 153 IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 0 IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 1 IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 2 IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 3 IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "wdog", "fatal", "ready",
+ "handover", "stop-ack";


I can see that wcss remoteproc uses an additional smp2p interrupt called 
shutdown-ack
downstream you may want to skip wcss entry for now till the shutdown-ack 
gets posted,

reviewed and merged.


+
+   clocks = <_board>;
+   clock-names = "xo";
+
+   memory-region = <_fw_mem>;
+
+   qcom,smem-states = <_smp2p_out 0>;
+   qcom,smem-state-names = "stop";
+
+   status = "disabled";
+
+   glink-edge {
+   interrupts = ;
+
+   qcom,remote-pid = <1>;
+   mboxes = <_glb 16>;
+
+   label = "wcss";
+   };
+   };
+
reserved-memory {
#address-cells = <2>;
#size-cells = <2>;


--
-- Sibi Sankar --
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.


Re: [PATCH RFC] hist lookups

2018-11-18 Thread Namhyung Kim
Hi Jirka

Sorry for late!

On Tue, Nov 06, 2018 at 12:54:36PM +0100, Jiri Olsa wrote:
> On Mon, Nov 05, 2018 at 08:53:42PM -0800, David Miller wrote:
> > 
> > Jiri,
> > 
> > Because you now run queued_events__queue() lockless with that condvar
> > trick, it is possible for top->qe.in to be seen as one past the data[]
> > array, this is because the rotate_queues() code goes:
> > 
> > if (++top->qe.in > >qe.data[1])
> > top->qe.in = >qe.data[0];
> > 
> > So for a brief moment top->qe.in is out of range and thus
> > perf_top__mmap_read_idx() can try to enqueue to top->qe.data[2]
> > 
> > We can just do:
> > 
> > if (top->qe.in == >qe.data[1])
> > top->qe.in = >qe.data[0];
> > else
> > top->qe.in = >qe.data[1];
> > 
> > Or, make top->qe.in an index, and simply go:
> > 
> > top->qe.in ^= 1;
> > 
> > Either way will fix the bug.
> 
> ah right.. I had originaly full mutex around that,
> then I switched it off in the last patch and did
> not realize this implication.. nice ;-)

I like the rotate_queues() using cond-variable.  Have you tried to use
the same for hists->lock in hists__get_rotate_entries_in() too?

Eventually it'd be nice to avoid locks when a single thread processes
all the events.

Thanks,
Namhyung


Re: [PATCH v3 0/3] sh: system call table generation support

2018-11-18 Thread Rob Landley
On 11/13/18 10:32 PM, Firoz Khan wrote:
> The purpose of this patch series is, we can easily
> add/modify/delete system call table support by cha-
> nging entry in syscall.tbl file instead of manually
> changing many files. The other goal is to unify the 
> system call table generation support implementation 
> across all the architectures. 

I applied the patch in https://github.com/landley/mkroot and the result booted
under qemu-system-sh4, seems to work fine. Network's fine, it can read a block
device, etc.

Acked-and-or-tested-by: Rob Landley 

I assume that this is just git du jour and not your patch:

WARNING: CPU: 0 PID: 1 at mm/slub.c:2448 ___slab_alloc.constprop.34+0x196/0x288

CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc3 #1
PC is at ___slab_alloc.constprop.34+0x196/0x288
PR is at __slab_alloc.constprop.33+0x2a/0x4c
PC  : 8c09d09a SP  : 8f829ea0 SR  : 400080f0
TEA : c0001240
R0  : 8c09cf04 R1  : 8c01cbec R2  :  R3  : 
R4  : 8f8020a0 R5  : 006080c0 R6  : 8c01d74a R7  : 8fff5180
R8  : 8c011a40 R9  : 8fff5180 R10 : 8f8020a0 R11 : 8000
R12 : 8c01d74a R13 : 006080c0 R14 : 8f80211c
MACH: 008e MACL: 0ae4849d GBR :  PR  : 8c09d1b6

Call trace:
 [<(ptrval)>] arch_local_irq_restore+0x0/0x24
 [<(ptrval)>] __slab_alloc.constprop.33+0x2a/0x4c
 [<(ptrval)>] arch_local_save_flags+0x0/0x8
 [<(ptrval)>] arch_local_irq_restore+0x0/0x24
 [<(ptrval)>] mm_init.isra.6+0xca/0x120
 [<(ptrval)>] kmem_cache_alloc+0x9a/0xf4
 [<(ptrval)>] mm_init.isra.6+0xca/0x120
 [<(ptrval)>] arch_local_irq_restore+0x0/0x24
 [<(ptrval)>] kmem_cache_alloc+0x9a/0xf4
 [<(ptrval)>] mm_alloc+0xe/0x48
 [<(ptrval)>] mm_init.isra.6+0xca/0x120
 [<(ptrval)>] memset+0x0/0x8c
 [<(ptrval)>] __do_execve_file+0x1de/0x574
 [<(ptrval)>] getname_kernel+0x1e/0xc8
 [<(ptrval)>] kmem_cache_alloc+0x0/0xf4
 [<(ptrval)>] do_execve+0x16/0x24
 [<(ptrval)>] arch_local_save_flags+0x0/0x8
 [<(ptrval)>] arch_local_irq_restore+0x0/0x24
 [<(ptrval)>] printk+0x0/0x24
 [<(ptrval)>] kernel_init+0x34/0xec
 [<(ptrval)>] ret_from_kernel_thread+0xc/0x14
 [<(ptrval)>] schedule_tail+0x0/0x58
 [<(ptrval)>] kernel_init+0x0/0xec

---[ end trace 6e84d1e05051e55d ]---



[RFC PATCH] zinc chacha20 generic implementation using crypto API code

2018-11-18 Thread Herbert Xu
On Sun, Nov 18, 2018 at 02:46:30PM +0100, Jason A. Donenfeld wrote:
> 
> Personally I'd prefer this be merged after Zinc, since there's work to
> be done on adjusting the 20->12 in chacha20. That's not really much of
> a reason though. But maybe we can just sidestep the ordering concern
> all together:

In response to Martin's patch-set which I merged last week, I think
here is quick way out for the zinc interface.

Going through the past zinc discussions it would appear that
everybody is quite happy with the zinc interface per se.  The
most contentious areas are in fact the algorithm implementations
under zinc, as well as the conversion of the crypto API algorithms
over to using the zinc interface (e.g., you can no longer access
specific implementations).

My proposal is to merge the zinc interface as is, but to invert
how we place the algorithm implementations.  IOW the implementations
should stay where they are now, with in the crypto API.  However,
we will provide direct access to them for zinc without going through
the crypto API.  IOW we'll just export the functions directly.

Here is a proof of concept patch to do it for chacha20-generic.
It basically replaces patch 3 in the October zinc patch series.
Actually this patch also exports the arm/x86 chacha20 functions
too so they are ready for use by zinc.

If everyone is happy with this then we can immediately add the
zinc interface and expose the existing crypto API algorithms
through it.  This would allow wireguard to be merged right away.

In parallel, the discussions over replacing the implementations
underneath can carry on without stalling the whole project.

PS This patch is totally untested :)

Thanks,

diff --git a/arch/arm/crypto/chacha20-neon-glue.c 
b/arch/arm/crypto/chacha20-neon-glue.c
index 59a7be08e80c..1f6239ff41ae 100644
--- a/arch/arm/crypto/chacha20-neon-glue.c
+++ b/arch/arm/crypto/chacha20-neon-glue.c
@@ -31,7 +31,7 @@
 asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
 asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
 
-static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
+void crypto_chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
unsigned int bytes)
 {
u8 buf[CHACHA20_BLOCK_SIZE];
@@ -56,11 +56,12 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 
*src,
memcpy(dst, buf, bytes);
}
 }
+EXPORT_SYMBOL_GPL(crypto_chacha20_doneon);
 
 static int chacha20_neon(struct skcipher_request *req)
 {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-   struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
+   struct crypto_chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
u32 state[16];
int err;
@@ -79,8 +80,8 @@ static int chacha20_neon(struct skcipher_request *req)
if (nbytes < walk.total)
nbytes = round_down(nbytes, walk.stride);
 
-   chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
-   nbytes);
+   crypto_chacha20_doneon(state, walk.dst.virt.addr,
+  walk.src.virt.addr, nbytes);
err = skcipher_walk_done(, walk.nbytes - nbytes);
}
kernel_neon_end();
@@ -93,7 +94,7 @@ static struct skcipher_alg alg = {
.base.cra_driver_name   = "chacha20-neon",
.base.cra_priority  = 300,
.base.cra_blocksize = 1,
-   .base.cra_ctxsize   = sizeof(struct chacha20_ctx),
+   .base.cra_ctxsize   = sizeof(struct crypto_chacha20_ctx),
.base.cra_module= THIS_MODULE,
 
.min_keysize= CHACHA20_KEY_SIZE,
diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c
index 9fd84fe6ec09..519bbbf8c477 100644
--- a/arch/x86/crypto/chacha20_glue.c
+++ b/arch/x86/crypto/chacha20_glue.c
@@ -39,7 +39,7 @@ static unsigned int chacha20_advance(unsigned int len, 
unsigned int maxblocks)
return round_up(len, CHACHA20_BLOCK_SIZE) / CHACHA20_BLOCK_SIZE;
 }
 
-static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
+void crypto_chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
unsigned int bytes)
 {
 #ifdef CONFIG_AS_AVX2
@@ -85,11 +85,12 @@ static void chacha20_dosimd(u32 *state, u8 *dst, const u8 
*src,
state[12]++;
}
 }
+EXPORT_SYMBOL_GPL(crypto_chacha20_dosimd);
 
 static int chacha20_simd(struct skcipher_request *req)
 {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-   struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
+   struct crypto_chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
u32 *state, state_buf[16 + 2] __aligned(8);
struct skcipher_walk walk;
int err;
@@ -112,8 +113,8 @@ static int chacha20_simd(struct skcipher_request *req)
if (nbytes < walk.total)
  

Re: [PATCH v3 0/4] sparc: system call table generation support

2018-11-18 Thread Firoz Khan
Hi David,

On Mon, 19 Nov 2018 at 08:29, David Miller  wrote:
>
> From: Firoz Khan 
> Date: Wed, 14 Nov 2018 10:56:27 +0530
>
> > The purpose of this patch series is, we can easily
> > add/modify/delete system call table support by cha-
> > nging entry in syscall.tbl file instead of manually
> > changing many files. The other goal is to unify the
> > system call table generation support implementation
> > across all the architectures.
>  ...
>
> Series applied to sparc-next.

Sounds good. Thanks

Firoz


Re: [PATCH v7 0/5] parisc: system call table generation support

2018-11-18 Thread Firoz Khan
Hi Helge,

On Sat, 17 Nov 2018 at 22:01, Helge Deller  wrote:
>
> * Arnd Bergmann :
> > On Fri, Nov 16, 2018 at 1:55 PM Helge Deller  wrote:
> > > > On Fri, 16 Nov 2018 at 01:01, Helge Deller  wrote:
> > > > >
> > > > > On 14.11.2018 07:34, Firoz Khan wrote:
> > > > > > The purpose of this patch series is, we can easily
> > > > > > add/modify/delete system call table support by cha-
> > > > > > nging entry in syscall.tbl file instead of manually
> > > > > > changing many files. The other goal is to unify the
> > > > > > system call table generation support implementation
> > > > > > across all the architectures.
> > > > > >
> > > > > > The system call tables are in different format in
> > > > > > all architecture. It will be difficult to manually
> > > > > > add, modify or delete the system calls in the resp-
> > > > > > ective files manually. To make it easy by keeping a
> > > > > > script and which'll generate uapi header file and
> > > > > > syscall table file.
> > > > > >
> > > > > > syscall.tbl contains the list of available system
> > > > > > calls along with system call number and correspond-
> > > > > > ing entry point. Add a new system call in this arch-
> > > > > > itecture will be possible by adding new entry in the
> > > > > > syscall.tbl file.
> > > > > >
> > > > > > Adding a new table entry consisting of:
> > > > > > - System call number.
> > > > > > - ABI.
> > > > > > - System call name.
> > > > > > - Entry point name.
> > > > > >
> > > > > > 
> > > > > > Firoz Khan (5):
> > > > > >   parisc: move __IGNORE* entries to non uapi header
> > > > > >   parisc: add __NR_syscalls along with __NR_Linux_syscalls
> > > > > >   parisc: add system call table generation support
> > > > > >   parisc: generate uapi header and system call table files
> > > > > >   parisc: syscalls: ignore nfsservctl for other architectures
> > > > >
> > > > > Firoz, you may add
> > > > > Acked-by: Helge Deller 
> > > > > to the whole parisc series.
> > > >
> > > > Sure, will do.
> > > > I'm on a vacation right now. will send mid next week.
> > >
> > > That's ok, there is no urgency.
> > >
> > > Actually, I noticed that the generated files unistd_32.h
> > > and unistd_64.h do have the same contents, since on parisc
> > > we keep the syscall numbers the same for 32- and 64-bit.
> > > With that in mind, we can simply generate on unistd.h
> > > file for both variants.
> >
> > It depends on what we want to do in the future. When we add
> > around 20 new system calls fro y2038, my plan was to
> > only add them for 32-bit architectures, leaving holes on
> > 64-bit ones.
>
> Ok, I didn't thought of that.
>
> > We can also assign the new numbers on parisc64 but they would have the
> > same entry point as existing calls.
>
> > If you prefer doing it like that, your patch seems fine for that,
> > it's just slightly inconsistent with the other 64-bit architectures
> > then.
>
> I really prefer to stay in sync with other major architectures.
> So, please drop my last patch.
>
> Instead please apply only the next one, which drops the NR_Linux
> offset value (which is 0 anyway).

This can be easily done. FYI, our intention is the generated file must be
same as the old one. That's why I kept the offset as it is. I can add
one extra patch to remove NR_Linux.

>
> Thanks,
> Helge
>
>
> diff --git a/arch/parisc/include/uapi/asm/unistd.h 
> b/arch/parisc/include/uapi/asm/unistd.h
> index 6e31f58ad6b5..98dc953656af 100644
> --- a/arch/parisc/include/uapi/asm/unistd.h
> +++ b/arch/parisc/include/uapi/asm/unistd.h
> @@ -2,7 +2,6 @@
>  #ifndef _UAPI_ASM_PARISC_UNISTD_H_
>  #define _UAPI_ASM_PARISC_UNISTD_H_
>
> -#define __NR_Linux 0
>  #ifdef __LP64__
>  #include 
>  #else
> diff --git a/arch/parisc/kernel/syscalls/Makefile 
> b/arch/parisc/kernel/syscalls/Makefile
> index defa8878f6d2..4dcc5c9ae7f2 100644
> --- a/arch/parisc/kernel/syscalls/Makefile
> +++ b/arch/parisc/kernel/syscalls/Makefile
> @@ -12,22 +12,18 @@ systbl := $(srctree)/$(src)/syscalltbl.sh
>  quiet_cmd_syshdr = SYSHDR  $@
>cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@'   \
>'$(syshdr_abis_$(basetarget))'   \
> -  '$(syshdr_pfx_$(basetarget))'\
> -  '$(syshdr_offset_$(basetarget))'
> +  '$(syshdr_pfx_$(basetarget))'
>
>  quiet_cmd_systbl = SYSTBL  $@
>cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@'   \
>'$(systbl_abis_$(basetarget))'   \
> -  '$(systbl_abi_$(basetarget))'\
> -  '$(systbl_offset_$(basetarget))'
> +  '$(systbl_abi_$(basetarget))'
>
>  syshdr_abis_unistd_32 := common,32
> -syshdr_offset_unistd_32 := __NR_Linux

I'll remove this line

>  $(uapi)/unistd_32.h: $(syscall) $(syshdr)
> $(call if_changed,syshdr)
>
>  syshdr_abis_unistd_64 := common,64
> -syshdr_offset_unistd_64 := __NR_Linux

And 

Re: RFC: userspace exception fixups

2018-11-18 Thread Jethro Beekman

On 2018-11-18 18:32, Jarkko Sakkinen wrote:

On Sun, Nov 18, 2018 at 09:15:48AM +0200, Jarkko Sakkinen wrote:

On Thu, Nov 01, 2018 at 10:53:40AM -0700, Andy Lutomirski wrote:

Hi all-

The people working on SGX enablement are grappling with a somewhat
annoying issue: the x86 EENTER instruction is used from user code and
can, as part of its normal-ish operation, raise an exception.  It is
also highly likely to be used from a library, and signal handling in
libraries is unpleasant at best.

There's been some discussion of adding a vDSO entry point to wrap
EENTER and do something sensible with the exceptions, but I'm
wondering if a more general mechanism would be helpful.


I haven't really followed all of this discussion because I've been busy
working on the patch set but for me all of these approaches look awfully
complicated.

I'll throw my own suggestion and apologize if this has been already
suggested and discarded: return-to-AEP.

My idea is to do just a small extension to SGX AEX handling. At the
moment hardware will RAX, RBX and RCX with ERESUME parameters. We can
fill extend this by filling other three spare registers with exception
information.

AEP handler can then do whatever it wants to do with this information
or just do ERESUME.


A correction here. In practice this will add a requirement to have a bit
more complicated AEP code (check the regs for exceptions) than before
and not just bytes for ENCLU.

e.g. AEP handler should be along the lines

1. #PF (or #UD or) happens. Kernel fills the registers when it cannot
handle the exception and returns back to user space i.e. to the
AEP handler.
2. Check the registers containing exception information. If they have
been filled, take whatever actions user space wants to take.
3. Otherwise, just ERESUME.

 From my point of view this is making the AEP parameter useful. Its
standard use is just weird (always point to a place just containing
ENCLU bytes, why the heck it even exists).


I like this solution. Keeps things simple. One question: when an 
exception occurs, how does the kernel know whether to set special 
registers or send a signal?


--
Jethro Beekman | Fortanix




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH v3 0/3] sh: system call table generation support

2018-11-18 Thread Firoz Khan
On Wed, 14 Nov 2018 at 10:02, Firoz Khan  wrote:
>
> The purpose of this patch series is, we can easily
> add/modify/delete system call table support by cha-
> nging entry in syscall.tbl file instead of manually
> changing many files. The other goal is to unify the
> system call table generation support implementation
> across all the architectures.
>
> The system call tables are in different format in
> all architecture. It will be difficult to manually
> add, modify or delete the system calls in the resp-
> ective files manually. To make it easy by keeping a
> script and which'll generate uapi header file and
> syscall table file.
>
> syscall.tbl contains the list of available system
> calls along with system call number and correspond-
> ing entry point. Add a new system call in this arch-
> itecture will be possible by adding new entry in
> the syscall.tbl file.
>
> Adding a new table entry consisting of:
> - System call number.
> - ABI.
> - System call name.
> - Entry point name.
>
> Please note, this support is only available for 32-bit
> kernel, not 64-bit kernel. As I came across the 64-bit
> kernel is not active for long time.
>
> ARM, s390 and x86 architecuture does exist the sim-
> ilar support. I leverage their implementation to come
> up with a generic solution.
>
> I have done the same support for work for alpha, ia64,
> m68k, microblaze, mips, parisc, powerpc, sparc, and
> xtensa. Below mentioned git repository contains more
> details about the workflow.
>
> https://github.com/frzkhn/system_call_table_generator/
>
> Finally, this is the ground work to solve the Y2038
> issue. We need to add two dozen of system calls to solve
> Y2038 issue. So this patch series will help to add new
> system calls easily by adding new entry in the syscall-
> .tbl.
>
> Changes since v2:
>  - changed from generic-y to generated-y in Kbuild.
>
> Changes since v1:
>  - optimized/updated the syscall table generation
>scripts.
>  - fixed all mixed indentation issues in syscall.tbl.
>  - added "comments" in syscall.tbl.
>
> Firoz Khan (3):
>   sh: add __NR_syscalls along with NR_syscalls
>   sh: add system call table generation support
>   sh: generate uapi header and syscall table header files

Gentle reminder!

Could someone review this patch series. I haven't received any
feedback till now.

FYI, this support is only available for 32-bit kernel, not 64-bit
kernel. As I came across the 64-bit kernel is not active for long time.

Thanks
Firoz

>
>  arch/sh/Makefile  |   3 +
>  arch/sh/include/asm/Kbuild|   1 +
>  arch/sh/include/asm/unistd.h  |   2 +
>  arch/sh/include/uapi/asm/Kbuild   |   1 +
>  arch/sh/include/uapi/asm/unistd_32.h  |   4 +-
>  arch/sh/include/uapi/asm/unistd_64.h  |   4 +-
>  arch/sh/kernel/syscalls/Makefile  |  38 
>  arch/sh/kernel/syscalls/syscall.tbl   | 392 
> ++
>  arch/sh/kernel/syscalls/syscallhdr.sh |  36 
>  arch/sh/kernel/syscalls/syscalltbl.sh |  32 +++
>  arch/sh/kernel/syscalls_32.S  | 387 +
>  11 files changed, 514 insertions(+), 386 deletions(-)
>  create mode 100644 arch/sh/kernel/syscalls/Makefile
>  create mode 100644 arch/sh/kernel/syscalls/syscall.tbl
>  create mode 100644 arch/sh/kernel/syscalls/syscallhdr.sh
>  create mode 100644 arch/sh/kernel/syscalls/syscalltbl.sh
>
> --
> 1.9.1
>


Re: [PATCH v5 7/7] tpm: pass an array of tpm_bank_list structures to tpm_pcr_extend()

2018-11-18 Thread Mimi Zohar
On Sun, 2018-11-18 at 09:27 +0200, Jarkko Sakkinen wrote:
> On Fri, Nov 16, 2018 at 04:55:36PM +0100, Roberto Sassu wrote:
> > On 11/16/2018 4:03 PM, Jarkko Sakkinen wrote:
> > > On Wed, Nov 14, 2018 at 04:31:08PM +0100, Roberto Sassu wrote:
> > > > Currently, tpm_pcr_extend() accepts as an input only a SHA1 digest.
> > > > 
> > > > This patch modifies the definition of tpm_pcr_extend() to allow other
> > > > kernel subsystems to pass a digest for each algorithm supported by the 
> > > > TPM.
> > > > All digests are processed by the TPM in one operation.
> > > > 
> > > > If a tpm_pcr_extend() caller provides a subset of the supported 
> > > > algorithms,
> > > > the TPM driver extends the remaining PCR banks with the first digest
> > > > passed as an argument to the function.
> > > 
> > > What is the legit use case for this?
> > 
> > A subset could be chosen for better performance, or when a TPM algorithm
> > is not supported by the crypto subsystem.
> 
> Doesn't extending a subset a security concern?

Right, so instead of extending a subset of the allocated banks, all of
the allocated banks need to be extended, even for those banks that a
digest was not included.  This is no different than what is being done
today.  IMA is currently only calculating the SHA1 hash, padding the
digest with 0's, and extending the padded value(s) into all of the
allocated banks.

If there is a vulnerability with the hash algorithm, then any bank
extended with the padded/truncated digest would be susceptible.

IMA will need to become TPM 2.0 aware, calculating and extending
multiple banks and define a new measurement list format containing the
multiple digests.

Mimi



Re: [PATCH RFC] hist lookups

2018-11-18 Thread David Miller
From: Jiri Olsa 
Date: Tue, 13 Nov 2018 11:40:54 +0100

> I pushed/rebased what I have to perf/fixes branch again
> 
> please note I had to change our compile changes, because
> they wouldn't compile on x86, but I can't verify on sparc,
> so you might see some compile fails again

I just checked your current perf/fixes branch.

It builds on Sparc ;-)

And it behaves better too.  I do get tons of drops and lost events,
but it seems to keep going even during the hardest load.

Eventually I end up with a lot of unresolvable histogram entries,
so that is something to look into.

I looked at your drop logic and it seems perfect, we avoid dropping
all non-SAMPLE events which is what we want.  So that can't be the
cause of the issues I am seeing.



Re: [PATCH -next] exportfs: fix 'passing zero to ERR_PTR()' warning

2018-11-18 Thread Al Viro
On Mon, Nov 19, 2018 at 11:32:41AM +0800, YueHaibing wrote:
> Fix a static code checker warning:
>   fs/exportfs/expfs.c:171 reconnect_one() warn: passing zero to 'ERR_PTR'
> 
> The error path for lookup_one_len_unlocked failure
> should set err to PTR_ERR.
> 
> Fixes: bbf7a8a3562f ("exportfs: move most of reconnect_path to helper 
> function")
> Signed-off-by: YueHaibing 

Applied.


Re: [PATCH 21/34] kernfs, sysfs, cgroup, intel_rdt: Support fs_context [ver #12]

2018-11-18 Thread Andrei Vagin
On Fri, Sep 21, 2018 at 05:33:01PM +0100, David Howells wrote:
> Make kernfs support superblock creation/mount/remount with fs_context.
> 
> This requires that sysfs, cgroup and intel_rdt, which are built on kernfs,
> be made to support fs_context also.
> 
> Notes:
> 
>  (1) A kernfs_fs_context struct is created to wrap fs_context and the
>  kernfs mount parameters are moved in here (or are in fs_context).
> 
>  (2) kernfs_mount{,_ns}() are made into kernfs_get_tree().  The extra
>  namespace tag parameter is passed in the context if desired
> 
>  (3) kernfs_free_fs_context() is provided as a destructor for the
>  kernfs_fs_context struct, but for the moment it does nothing except
>  get called in the right places.
> 
>  (4) sysfs doesn't wrap kernfs_fs_context since it has no parameters to
>  pass, but possibly this should be done anyway in case someone wants to
>  add a parameter in future.
> 
>  (5) A cgroup_fs_context struct is created to wrap kernfs_fs_context and
>  the cgroup v1 and v2 mount parameters are all moved there.
> 
>  (6) cgroup1 parameter parsing error messages are now handled by invalf(),
>  which allows userspace to collect them directly.
> 
>  (7) cgroup1 parameter cleanup is now done in the context destructor rather
>  than in the mount/get_tree and remount functions.
> 
> Weirdies:
> 
>  (*) cgroup_do_get_tree() calls cset_cgroup_from_root() with locks held,
>  but then uses the resulting pointer after dropping the locks.  I'm
>  told this is okay and needs commenting.
> 
>  (*) The cgroup refcount web.  This really needs documenting.
> 
>  (*) cgroup2 only has one root?
> 
> Add a suggestion from Thomas Gleixner in which the RDT enablement code is
> placed into its own function.
> 
> Signed-off-by: David Howells 
> cc: Greg Kroah-Hartman 
> cc: Tejun Heo 
> cc: Li Zefan 
> cc: Johannes Weiner 
> cc: cgro...@vger.kernel.org
> cc: fenghua...@intel.com
> ---
> 
>  arch/x86/kernel/cpu/intel_rdt.h  |   15 +
>  arch/x86/kernel/cpu/intel_rdt_rdtgroup.c |  183 ++--
>  fs/kernfs/mount.c|   88 
>  fs/sysfs/mount.c |   67 --
>  include/linux/cgroup.h   |3 
>  include/linux/kernfs.h   |   39 ++-
>  kernel/cgroup/cgroup-internal.h  |   50 +++-
>  kernel/cgroup/cgroup-v1.c|  345 
> --
>  kernel/cgroup/cgroup.c   |  264 +++
>  kernel/cgroup/cpuset.c   |4 
>  10 files changed, 640 insertions(+), 418 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
> index 4e588f36228f..1461adc2c5e8 100644
> --- a/arch/x86/kernel/cpu/intel_rdt.h
> +++ b/arch/x86/kernel/cpu/intel_rdt.h
> @@ -33,6 +33,21 @@
>  #define RMID_VAL_ERROR   BIT_ULL(63)
>  #define RMID_VAL_UNAVAIL BIT_ULL(62)
>  
> +
> +struct rdt_fs_context {
> + struct kernfs_fs_contextkfc;
> + boolenable_cdpl2;
> + boolenable_cdpl3;
> + boolenable_mba_mbps;
> +};
> +
> +static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
> +{
> + struct kernfs_fs_context *kfc = fc->fs_private;
> +
> + return container_of(kfc, struct rdt_fs_context, kfc);
> +}
> +
>  DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
>  
>  /**
> diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
> b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> index d6cb04c3a28b..34733a221669 100644
> --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1707,43 +1708,6 @@ static void cdp_disable_all(void)
>   cdpl2_disable();
>  }
>  
> -static int parse_rdtgroupfs_options(char *data)
> -{
> - char *token, *o = data;
> - int ret = 0;
> -
> - while ((token = strsep(, ",")) != NULL) {
> - if (!*token) {
> - ret = -EINVAL;
> - goto out;
> - }
> -
> - if (!strcmp(token, "cdp")) {
> - ret = cdpl3_enable();
> - if (ret)
> - goto out;
> - } else if (!strcmp(token, "cdpl2")) {
> - ret = cdpl2_enable();
> - if (ret)
> - goto out;
> - } else if (!strcmp(token, "mba_MBps")) {
> - ret = set_mba_sc(true);
> - if (ret)
> - goto out;
> - } else {
> - ret = -EINVAL;
> - goto out;
> - }
> - }
> -
> - return 0;
> -
> -out:
> - pr_err("Invalid mount option \"%s\"\n", token);
> -
> - return ret;

[PATCH] ARM64: dts: meson-gxbb-wetek: fix hwmon temperature sensor output

2018-11-18 Thread Christian Hewitt
Add  and supporting vddio_ao18 nodes to fix hwmon temperature
readings on the WeTek Hub and Play2 devices. Without these nodes the
temp is reported as 4294967295 degrees C.

Thanks to Martin Blumenstingl for assisting debug and the winning
guess at what was missing.

Signed-off-by: Christian Hewitt 
---
 arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi 
b/arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi
index 70325b2..dd12e02 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi
@@ -59,6 +59,13 @@
regulator-max-microvolt = <330>;
};
 
+   vddio_ao18: regulator-vddio_ao18 {
+   compatible = "regulator-fixed";
+   regulator-name = "VDDIO_AO18";
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   };
+
vcc_3v3: regulator-vcc_3v3 {
compatible = "regulator-fixed";
regulator-name = "VCC_3V3";
@@ -173,6 +180,11 @@
clock-names = "clkin0";
 };
 
+ {
+   status = "okay";
+   vref-supply = <_ao18>;
+};
+
 /* Wireless SDIO Module */
 _emmc_a {
status = "okay";
-- 
2.7.4



Re: [PATCH] rhashtable: detect when object movement between tables might have invalidated a lookup

2018-11-18 Thread David Miller
From: Herbert Xu 
Date: Mon, 19 Nov 2018 12:06:34 +0800

> On Mon, Nov 19, 2018 at 11:56:35AM +0800, Herbert Xu wrote:
>>
>> I take that back.  Because of your shift which cancels out the
>> shift in NULLS_MARKER, it would appear that this should work just
>> fine with RHT_NULLS_MARRKER(0), no? IOW, it would appear that
>> 
>>  RHT_NULLS_MARKER(0) = RHT_NULLS_MARKER(RHT_NULLS_MARKER(0))
> 
> My emails to Neil are bouncing:
> 
>   ne...@suse.com
> host smtp.glb1.softwaregrp.com [15.124.2.87]
> SMTP error from remote mail server after RCPT TO::
> 550 Cannot process address

Yeah this just started happening 2 days ago.


Re: [PATCH v1 08/11] clk: mediatek: Add MT8183 clock support

2018-11-18 Thread Weiyi Lu
On Tue, 2018-11-13 at 22:25 -0800, Nicolas Boichat wrote:
> On Mon, Nov 5, 2018 at 10:42 PM Weiyi Lu  wrote:
> >
> > Add MT8183 clock support, include topckgen, apmixedsys,
> > infracfg, mcucfg and subsystem clocks.
> >
> > Signed-off-by: Weiyi Lu 
> > ---
> >  drivers/clk/mediatek/Kconfig   |   75 ++
> >  drivers/clk/mediatek/Makefile  |   12 +
> >  drivers/clk/mediatek/clk-mt8183-audio.c|  112 ++
> >  drivers/clk/mediatek/clk-mt8183-cam.c  |   75 ++
> >  drivers/clk/mediatek/clk-mt8183-img.c  |   75 ++
> >  drivers/clk/mediatek/clk-mt8183-ipu0.c |   68 +
> >  drivers/clk/mediatek/clk-mt8183-ipu1.c |   68 +
> >  drivers/clk/mediatek/clk-mt8183-ipu_adl.c  |   66 +
> >  drivers/clk/mediatek/clk-mt8183-ipu_conn.c |  155 +++
> >  drivers/clk/mediatek/clk-mt8183-mfgcfg.c   |   66 +
> >  drivers/clk/mediatek/clk-mt8183-mm.c   |  128 ++
> >  drivers/clk/mediatek/clk-mt8183-vdec.c |   84 ++
> >  drivers/clk/mediatek/clk-mt8183-venc.c |   71 ++
> >  drivers/clk/mediatek/clk-mt8183.c  | 1334 
> >  14 files changed, 2389 insertions(+)
> >  create mode 100644 drivers/clk/mediatek/clk-mt8183-audio.c
> >  create mode 100644 drivers/clk/mediatek/clk-mt8183-cam.c
> >  create mode 100644 drivers/clk/mediatek/clk-mt8183-img.c
> >  create mode 100644 drivers/clk/mediatek/clk-mt8183-ipu0.c
> >  create mode 100644 drivers/clk/mediatek/clk-mt8183-ipu1.c
> >  create mode 100644 drivers/clk/mediatek/clk-mt8183-ipu_adl.c
> >  create mode 100644 drivers/clk/mediatek/clk-mt8183-ipu_conn.c
> >  create mode 100644 drivers/clk/mediatek/clk-mt8183-mfgcfg.c
> >  create mode 100644 drivers/clk/mediatek/clk-mt8183-mm.c
> >  create mode 100644 drivers/clk/mediatek/clk-mt8183-vdec.c
> >  create mode 100644 drivers/clk/mediatek/clk-mt8183-venc.c
> >  create mode 100644 drivers/clk/mediatek/clk-mt8183.c
> 
> Can you look at how clk-mt8173.c did this?
> 
> I think you can avoid a lot of this duplicated code and most of the
> extra files by using something like:
> CLK_OF_DECLARE(mtk_audio, "mediatek,mt8183-audiosys", clk_mt8183_audio_init);
> 
I might keep following the method that registering clocks when clock
driver probes. About the duplicated code, there's some related
discussion in Stephen's series
https://patchwork.kernel.org/project/linux-mediatek/list/?series=39497
 A. I'd keep pending this part until there comes out some
conclusion.

> Some more comments below.
> > diff --git a/drivers/clk/mediatek/Kconfig b/drivers/clk/mediatek/Kconfig
> > index 3dd1dab92223..5d4fd67fa259 100644
> > --- a/drivers/clk/mediatek/Kconfig
> > +++ b/drivers/clk/mediatek/Kconfig
> > @@ -193,4 +193,79 @@ config COMMON_CLK_MT8173
> > default ARCH_MEDIATEK
> > ---help---
> >   This driver supports MediaTek MT8173 clocks.
> > +
> > +config COMMON_CLK_MT8183
> > +   bool "Clock driver for MediaTek MT8183"
> > +   depends on (ARCH_MEDIATEK && ARM64) || COMPILE_TEST
> > +   select COMMON_CLK_MEDIATEK
> > +   default ARCH_MEDIATEK && ARM64
> > +   help
> > + This driver supports MediaTek MT8183 basic clocks.
> > +
> > +config COMMON_CLK_MT8183_AUDIOSYS
> > +   bool "Clock driver for MediaTek MT8183 audiosys"
> > +   depends on COMMON_CLK_MT8183
> > +   help
> > + This driver supports MediaTek MT8183 audiosys clocks.
> > +
> > +config COMMON_CLK_MT8183_CAMSYS
> > +   bool "Clock driver for MediaTek MT8183 camsys"
> > +   depends on COMMON_CLK_MT8183
> > +   help
> > + This driver supports MediaTek MT8183 camsys clocks.
> > +
> > +config COMMON_CLK_MT8183_IMGSYS
> > +   bool "Clock driver for MediaTek MT8183 imgsys"
> > +   depends on COMMON_CLK_MT8183
> > +   help
> > + This driver supports MediaTek MT8183 imgsys clocks.
> > +
> > +config COMMON_CLK_MT8183_IPU_CORE0
> > +   bool "Clock driver for MediaTek MT8183 ipu_core0"
> > +   depends on COMMON_CLK_MT8183
> > +   help
> > + This driver supports MediaTek MT8183 ipu_core0 clocks.
> > +
> > +config COMMON_CLK_MT8183_IPU_CORE1
> > +   bool "Clock driver for MediaTek MT8183 ipu_core1"
> > +   depends on COMMON_CLK_MT8183
> > +   help
> > + This driver supports MediaTek MT8183 ipu_core1 clocks.
> > +
> > +config COMMON_CLK_MT8183_IPU_ADL
> > +   bool "Clock driver for MediaTek MT8183 ipu_adl"
> > +   depends on COMMON_CLK_MT8183
> > +   help
> > + This driver supports MediaTek MT8183 ipu_adl clocks.
> > +
> > +config COMMON_CLK_MT8183_IPU_CONN
> > +   bool "Clock driver for MediaTek MT8183 ipu_conn"
> > +   depends on COMMON_CLK_MT8183
> > +   help
> > + This driver supports MediaTek MT8183 ipu_conn clocks.
> > +
> > +config COMMON_CLK_MT8183_MFGCFG
> > +   bool "Clock driver for MediaTek MT8183 mfgcfg"
> > +   depends on COMMON_CLK_MT8183
> > +   help
> > + This driver supports MediaTek MT8183 mfgcfg clocks.
> > +

Re: [PATCH 4/7] node: Add memory caching attributes

2018-11-18 Thread Anshuman Khandual



On 11/15/2018 04:19 AM, Keith Busch wrote:
> System memory may have side caches to help improve access speed. While
> the system provided cache is transparent to the software accessing
> these memory ranges, applications can optimize their own access based
> on cache attributes.

Cache is not a separate memory attribute. It impacts how the real attributes
like bandwidth, latency e.g which are already captured in the previous patch.
What is the purpose of adding this as a separate attribute ? Can you explain
how this is going to help the user space apart from the hints it has already
received with bandwidth, latency etc properties.

> 
> In preparation for such systems, provide a new API for the kernel to
> register these memory side caches under the memory node that provides it.

Under target memory node interface /sys/devices/system/node/nodeY/target* ?

> 
> The kernel's sysfs representation is modeled from the cpu cacheinfo
> attributes, as seen from /sys/devices/system/cpu/cpuX/cache/. Unlike CPU
> cacheinfo, though, a higher node's memory cache level is nearer to the
> CPU, while lower levels are closer to the backing memory. Also unlike
> CPU cache, the system handles flushing any dirty cached memory to the
> last level the memory on a power failure if the range is persistent.

Lets assume that a CPU has got four levels of caches L1, L2, L3, L4 before
reaching memory. L4 is the backing cache for the memory and L1-L3 is from
CPU till the system bus. Hence some of them will be represented as CPU
caches and some of them will be represented as memory caches ?

/sys/devices/system/cpu/cpuX/cache/ --> L1, L2, L3
/sys/devices/system/node/nodeY/target --> L4 

L4 will be listed even if the node is memory only ?


Re: [PATCH] rhashtable: detect when object movement between tables might have invalidated a lookup

2018-11-18 Thread Herbert Xu
On Mon, Nov 19, 2018 at 11:56:35AM +0800, Herbert Xu wrote:
>
> I take that back.  Because of your shift which cancels out the
> shift in NULLS_MARKER, it would appear that this should work just
> fine with RHT_NULLS_MARRKER(0), no? IOW, it would appear that
> 
>   RHT_NULLS_MARKER(0) = RHT_NULLS_MARKER(RHT_NULLS_MARKER(0))

My emails to Neil are bouncing:

ne...@suse.com
  host smtp.glb1.softwaregrp.com [15.124.2.87]
  SMTP error from remote mail server after RCPT TO::
  550 Cannot process address

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH 1/7] drivers/cpufreq: change CONFIG_6xx to CONFIG_PPC_BOOK3S_32

2018-11-18 Thread Viresh Kumar
On 17-11-18, 10:24, Christophe Leroy wrote:
> Today, powerpc has three CONFIG labels which means exactly the same:
> - CONFIG_6xx
> - CONFIG_PPC_BOOK3S_32
> - CONFIG_PPC_STD_MMU_32
> 
> By consistency with PPC64, CONFIG_PPC_BOOK3S_32 is the preferred one.
> Using a label with includes _PPC_ also makes it clearer that it is
> linked to powerpc.
> 
> In preparation of the removal of CONFIG_6xx, this patch replaces it
> by CONFIG_PPC_BOOK3S_32
> 
> Signed-off-by: Christophe Leroy 
> ---
>  drivers/cpufreq/pmac32-cpufreq.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/pmac32-cpufreq.c 
> b/drivers/cpufreq/pmac32-cpufreq.c
> index 61ae06ca008e..52f0d91d30c1 100644
> --- a/drivers/cpufreq/pmac32-cpufreq.c
> +++ b/drivers/cpufreq/pmac32-cpufreq.c
> @@ -128,7 +128,7 @@ static int cpu_750fx_cpu_speed(int low_speed)
>   mtspr(SPRN_HID2, hid2);
>   }
>   }
> -#ifdef CONFIG_6xx
> +#ifdef CONFIG_PPC_BOOK3S_32
>   low_choose_750fx_pll(low_speed);
>  #endif
>   if (low_speed == 1) {
> @@ -166,7 +166,7 @@ static int dfs_set_cpu_speed(int low_speed)
>   }
>  
>   /* set frequency */
> -#ifdef CONFIG_6xx
> +#ifdef CONFIG_PPC_BOOK3S_32
>   low_choose_7447a_dfs(low_speed);
>  #endif
>   udelay(100);

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH v1 11/11] soc: mediatek: Add MT8183 scpsys support

2018-11-18 Thread Weiyi Lu
On Tue, 2018-11-13 at 11:35 -0800, Nicolas Boichat wrote:
> On Mon, Nov 5, 2018 at 10:43 PM Weiyi Lu  wrote:
> >
> > Add scpsys driver for MT8183
> >
> > Signed-off-by: Weiyi Lu 
> > ---
> >  drivers/soc/mediatek/mtk-scpsys.c | 226 ++
> >  1 file changed, 226 insertions(+)
> >
> > diff --git a/drivers/soc/mediatek/mtk-scpsys.c 
> > b/drivers/soc/mediatek/mtk-scpsys.c
> > index 80be2e05e4e0..57b9f04a69de 100644
> > --- a/drivers/soc/mediatek/mtk-scpsys.c
> > +++ b/drivers/soc/mediatek/mtk-scpsys.c
> > @@ -29,6 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #define MTK_POLL_DELAY_US   10
> >  #define MTK_POLL_TIMEOUT(jiffies_to_usecs(HZ))
> > @@ -1179,6 +1180,217 @@ static const struct scp_subdomain 
> > scp_subdomain_mt8173[] = {
> > {MT8173_POWER_DOMAIN_MFG_2D, MT8173_POWER_DOMAIN_MFG},
> >  };
> >
> > +/*
> > + * MT8183 power domain support
> > + */
> > +
> > +static const struct scp_domain_data scp_domain_data_mt8183[] = {
> > +   [MT8183_POWER_DOMAIN_AUDIO] = {
> > +   .name = "audio",
> > +   .sta_mask = PWR_STATUS_AUDIO,
> > +   .ctl_offs = 0x0314,
> > +   .sram_pdn_bits = GENMASK(11, 8),
> > +   .sram_pdn_ack_bits = GENMASK(15, 12),
> > +   .basic_clk_name = {"audio"},
> > +   },
> > +   [MT8183_POWER_DOMAIN_CONN] = {
> > +   .name = "conn",
> > +   .sta_mask = PWR_STATUS_CONN,
> > +   .ctl_offs = 0x032c,
> > +   .sram_pdn_bits = 0,
> > +   .sram_pdn_ack_bits = 0,
> > +   .bp_table = {
> > +   [0] = BUS_PROT(IFR_TYPE, 0x2a0, 0x2a4, 0, 0x228,
> > +   BIT(13) | BIT(14), BIT(13) | BIT(14)),
> > +   },
> > +   },
> > +   [MT8183_POWER_DOMAIN_MFG_ASYNC] = {
> > +   .name = "mfg_async",
> > +   .sta_mask = PWR_STATUS_MFG_ASYNC,
> > +   .ctl_offs = 0x0334,
> > +   .sram_pdn_bits = 0,
> > +   .sram_pdn_ack_bits = 0,
> > +   .basic_clk_name = {"mfg"},
> > +   },
> > +   [MT8183_POWER_DOMAIN_MFG] = {
> > +   .name = "mfg",
> > +   .sta_mask = PWR_STATUS_MFG,
> > +   .ctl_offs = 0x0338,
> > +   .sram_pdn_bits = GENMASK(8, 8),
> > +   .sram_pdn_ack_bits = GENMASK(12, 12),
> > +   },
> > +   [MT8183_POWER_DOMAIN_MFG_CORE0] = {
> > +   .name = "mfg_core0",
> > +   .sta_mask = BIT(7),
> > +   .ctl_offs = 0x034c,
> > +   .sram_pdn_bits = GENMASK(8, 8),
> > +   .sram_pdn_ack_bits = GENMASK(12, 12),
> > +   },
> > +   [MT8183_POWER_DOMAIN_MFG_CORE1] = {
> > +   .name = "mfg_core1",
> > +   .sta_mask = BIT(20),
> > +   .ctl_offs = 0x0310,
> > +   .sram_pdn_bits = GENMASK(8, 8),
> > +   .sram_pdn_ack_bits = GENMASK(12, 12),
> > +   },
> > +   [MT8183_POWER_DOMAIN_MFG_2D] = {
> > +   .name = "mfg_2d",
> > +   .sta_mask = PWR_STATUS_MFG_2D,
> > +   .ctl_offs = 0x0348,
> > +   .sram_pdn_bits = GENMASK(8, 8),
> > +   .sram_pdn_ack_bits = GENMASK(12, 12),
> > +   .bp_table = {
> > +   [0] = BUS_PROT(IFR_TYPE, 0x2a8, 0x2ac, 0, 0x258,
> > +   BIT(19) | BIT(20) | BIT(21),
> > +   BIT(19) | BIT(20) | BIT(21)),
> > +   [1] = BUS_PROT(IFR_TYPE, 0x2a0, 0x2a4, 0, 0x228,
> > +   BIT(21) | BIT(22), BIT(21) | BIT(22)),
> > +   },
> > +   },
> > +   [MT8183_POWER_DOMAIN_DISP] = {
> > +   .name = "disp",
> > +   .sta_mask = PWR_STATUS_DISP,
> > +   .ctl_offs = 0x030c,
> > +   .sram_pdn_bits = GENMASK(8, 8),
> > +   .sram_pdn_ack_bits = GENMASK(12, 12),
> > +   .basic_clk_name = {"mm"},
> > +   .subsys_clk_prefix = "mm",
> > +   .bp_table = {
> > +   [0] = BUS_PROT(IFR_TYPE, 0x2a8, 0x2ac, 0, 0x258,
> > +   BIT(16) | BIT(17), BIT(16) | BIT(17)),
> > +   [1] = BUS_PROT(IFR_TYPE, 0x2a0, 0x2a4, 0, 0x228,
> > +   BIT(10) | BIT(11), BIT(10) | BIT(11)),
> > +   [2] = BUS_PROT(SMI_TYPE, 0x3c4, 0x3c8, 0, 0x3c0,
> > +   GENMASK(7, 0), GENMASK(7, 0)),
> > +   },
> > +   },
> > +   [MT8183_POWER_DOMAIN_CAM] = {
> > +   .name = "cam",
> > +   .sta_mask = BIT(25),
> > +   .ctl_offs = 0x0344,
> > +   .sram_pdn_bits = GENMASK(9, 8),
> > +   .sram_pdn_ack_bits = GENMASK(13, 12),
> > +   .basic_clk_name = {"cam"},
> > +   

Re: [PATCH] rhashtable: detect when object movement between tables might have invalidated a lookup

2018-11-18 Thread Herbert Xu
On Mon, Nov 19, 2018 at 11:54:15AM +0800, Herbert Xu wrote:
>
> > >> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> > >> index 30526afa8343..852ffa5160f1 100644
> > >> --- a/lib/rhashtable.c
> > >> +++ b/lib/rhashtable.c
> > >> @@ -1179,8 +1179,7 @@ struct rhash_head __rcu **rht_bucket_nested(const 
> > >> struct bucket_table *tbl,
> > >>  unsigned int hash)
> > >>  {
> > >>  const unsigned int shift = PAGE_SHIFT - ilog2(sizeof(void *));
> > >> -static struct rhash_head __rcu *rhnull =
> > >> -(struct rhash_head __rcu *)NULLS_MARKER(0);
> > >> +static struct rhash_head __rcu *rhnull;
> > >
> > > I don't understand why you can't continue to do NULLS_MARKER(0) or
> > > RHT_NULLS_MARKER(0).
> > 
> > Because then the test
> > 
> > +   } while (he != RHT_NULLS_MARKER(head));
> > 
> > in __rhashtable_lookup() would always succeed, and it would loop
> > forever.
> 
> This change is only necessary because of your shifting change
> above, which AFAICS adds no real benefit.

I take that back.  Because of your shift which cancels out the
shift in NULLS_MARKER, it would appear that this should work just
fine with RHT_NULLS_MARRKER(0), no? IOW, it would appear that

RHT_NULLS_MARKER(0) = RHT_NULLS_MARKER(RHT_NULLS_MARKER(0))

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] rhashtable: detect when object movement between tables might have invalidated a lookup

2018-11-18 Thread Herbert Xu
On Fri, Nov 16, 2018 at 05:59:19PM +1100, NeilBrown wrote:
>
> NULLS_MARKER assumes a hash value in which the bottom bits are most
> likely to be unique.  To convert this to a pointer which certainly not
> valid, it shifts left by 1 and sets the lsb.
> We aren't passing a hash value, but are passing an address instead.
> In this case the bottom 2 bits are certain to be 0, and the top bit
> could contain valuable information (on a 32bit system).
> The best way to turn a pointer into a certainly-invalid pointer
> is to just set the lsb.  By shifting right by one, we discard an
> uninteresting bit, preserve all the interesting bits, and effectively
> just set the lsb.
> 
> I could add a comment explaining that if you like.

The top-bit is most likely to be fixed and offer no real value.

> >> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> >> index 30526afa8343..852ffa5160f1 100644
> >> --- a/lib/rhashtable.c
> >> +++ b/lib/rhashtable.c
> >> @@ -1179,8 +1179,7 @@ struct rhash_head __rcu **rht_bucket_nested(const 
> >> struct bucket_table *tbl,
> >>unsigned int hash)
> >>  {
> >>const unsigned int shift = PAGE_SHIFT - ilog2(sizeof(void *));
> >> -  static struct rhash_head __rcu *rhnull =
> >> -  (struct rhash_head __rcu *)NULLS_MARKER(0);
> >> +  static struct rhash_head __rcu *rhnull;
> >
> > I don't understand why you can't continue to do NULLS_MARKER(0) or
> > RHT_NULLS_MARKER(0).
> 
> Because then the test
> 
> + } while (he != RHT_NULLS_MARKER(head));
> 
> in __rhashtable_lookup() would always succeed, and it would loop
> forever.

This change is only necessary because of your shifting change
above, which AFAICS adds no real benefit.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: STIBP by default.. Revert?

2018-11-18 Thread Willy Tarreau
On Sun, Nov 18, 2018 at 02:40:28PM -0800, Tim Chen wrote:
> Tasks that want extra security will enable that via prctl interface or
> making themselves non-dumpable.

Well, you need to be careful regarding the last part of your option
above, because a number of network daemons become non-dumpable by
executing setuid() at boot, and certainly don't want to suffer a
performance loss as a side effect of wanting to become "normally"
secure. I'd suggest to use the prctl only so that it doesn't
randomly hit innocent applications that would only have as a last
resort to turn off reasonable security features to avoid this impact.

Regards,
Willy


Re: [PATCH V10 01/19] block: introduce multi-page page bvec helpers

2018-11-18 Thread Ming Lei
On Sun, Nov 18, 2018 at 08:10:14PM -0700, Jens Axboe wrote:
> On 11/18/18 7:23 PM, Ming Lei wrote:
> > On Fri, Nov 16, 2018 at 02:13:05PM +0100, Christoph Hellwig wrote:
> >>> -#define bvec_iter_page(bvec, iter)   \
> >>> +#define mp_bvec_iter_page(bvec, iter)\
> >>>   (__bvec_iter_bvec((bvec), (iter))->bv_page)
> >>>  
> >>> -#define bvec_iter_len(bvec, iter)\
> >>> +#define mp_bvec_iter_len(bvec, iter) \
> >>
> >> I'd much prefer if we would stick to the segment naming that
> >> we also use in the higher level helper.
> >>
> >> So segment_iter_page, segment_iter_len, etc.
> > 
> > We discussed the naming problem before, one big problem is that the 
> > 'segment'
> > in bio_for_each_segment*() means one single page segment actually.
> > 
> > If we use segment_iter_page() here for multi-page segment, it may
> > confuse people.
> > 
> > Of course, I prefer to the naming of segment/page, 
> > 
> > And Jens didn't agree to rename bio_for_each_segment*() before.
> 
> I didn't like frivolous renaming (and I still don't), but mp_
> is horrible imho. Don't name these after the fact that they
> are done in conjunction with supporting multipage bvecs. That
> very fact will be irrelevant very soon

OK, so what is your suggestion for the naming issue?

Are you fine to use segment_iter_page() here? Then the term of 'segment'
may be interpreted as multi-page segment here, but as single-page in
bio_for_each_segment*().

thanks
Ming


Re: [PATCH 2/7] node: Add heterogenous memory performance

2018-11-18 Thread Anshuman Khandual



On 11/15/2018 04:19 AM, Keith Busch wrote:
> Heterogeneous memory systems provide memory nodes with latency
> and bandwidth performance attributes that are different from other
> nodes. Create an interface for the kernel to register these attributes

There are other properties like power consumption, reliability which can
be associated with a particular PA range. Also the set of properties has
to be extensible for the future.

> under the node that provides the memory. If the system provides this
> information, applications can query the node attributes when deciding
> which node to request memory.

Right but each (memory initiator, memory target) should have these above
mentioned properties enumerated to have an 'property as seen' from kind
of semantics.

> 
> When multiple memory initiators exist, accessing the same memory target
> from each may not perform the same as the other. The highest performing
> initiator to a given target is considered to be a local initiator for
> that target. The kernel provides performance attributes only for the
> local initiators.

As mentioned above the interface must enumerate a future extensible set
of properties for each (memory initiator, memory target) pair available
on the system.

> 
> The memory's compute node should be symlinked in sysfs as one of the
> node's initiators.

Right. IIUC the first patch skips the linking process of for two nodes A
and B if (A == B) preventing association to local memory initiator.


[PATCH -next] exportfs: fix 'passing zero to ERR_PTR()' warning

2018-11-18 Thread YueHaibing
Fix a static code checker warning:
  fs/exportfs/expfs.c:171 reconnect_one() warn: passing zero to 'ERR_PTR'

The error path for lookup_one_len_unlocked failure
should set err to PTR_ERR.

Fixes: bbf7a8a3562f ("exportfs: move most of reconnect_path to helper function")
Signed-off-by: YueHaibing 
---
 fs/exportfs/expfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 645158d..c8a3dfd 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -147,6 +147,7 @@ static struct dentry *reconnect_one(struct vfsmount *mnt,
tmp = lookup_one_len_unlocked(nbuf, parent, strlen(nbuf));
if (IS_ERR(tmp)) {
dprintk("%s: lookup failed: %d\n", __func__, PTR_ERR(tmp));
+   err = PTR_ERR(tmp);
goto out_err;
}
if (tmp != dentry) {
-- 
2.7.0




Re: [PATCH V10 02/19] block: introduce bio_for_each_bvec()

2018-11-18 Thread Ming Lei
On Fri, Nov 16, 2018 at 02:30:28PM +0100, Christoph Hellwig wrote:
> > +static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter 
> > *iter,
> > + unsigned bytes, bool mp)
> 
> I think these magic 'bool np' arguments and wrappers over wrapper
> don't help anyone to actually understand the code.  I'd vote for
> removing as many wrappers as we really don't need, and passing the
> actual segment limit instead of the magic bool flag.  Something like
> this untested patch:

I think this way is fine, just a little comment.

> 
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 277921ad42e7..dcad0b69f57a 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -138,30 +138,21 @@ static inline bool bio_full(struct bio *bio)
>   bvec_for_each_segment(bvl, &((bio)->bi_io_vec[iter_all.idx]), 
> i, iter_all)
>  
>  static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter 
> *iter,
> -   unsigned bytes, bool mp)
> +   unsigned bytes, unsigned max_segment)

The new parameter should have been named as 'max_segment_len' or
'max_seg_len'.

>  {
>   iter->bi_sector += bytes >> 9;
>  
>   if (bio_no_advance_iter(bio))
>   iter->bi_size -= bytes;
>   else
> - if (!mp)
> - bvec_iter_advance(bio->bi_io_vec, iter, bytes);
> - else
> - mp_bvec_iter_advance(bio->bi_io_vec, iter, bytes);
> + __bvec_iter_advance(bio->bi_io_vec, iter, bytes, max_segment);
>   /* TODO: It is reasonable to complete bio with error here. */
>  }
>  
>  static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
>   unsigned bytes)
>  {
> - __bio_advance_iter(bio, iter, bytes, false);
> -}
> -
> -static inline void bio_advance_mp_iter(struct bio *bio, struct bvec_iter 
> *iter,
> -unsigned bytes)
> -{
> - __bio_advance_iter(bio, iter, bytes, true);
> + __bio_advance_iter(bio, iter, bytes, PAGE_SIZE);
>  }
>  
>  #define __bio_for_each_segment(bvl, bio, iter, start)
> \
> @@ -177,7 +168,7 @@ static inline void bio_advance_mp_iter(struct bio *bio, 
> struct bvec_iter *iter,
>   for (iter = (start);\
>(iter).bi_size &&  \
>   ((bvl = bio_iter_mp_iovec((bio), (iter))), 1);  \
> -  bio_advance_mp_iter((bio), &(iter), (bvl).bv_len))
> +  __bio_advance_iter((bio), &(iter), (bvl).bv_len, 0))

Even we might pass '-1' for multi-page segment.

>  
>  /* returns one real segment(multipage bvec) each time */
>  #define bio_for_each_bvec(bvl, bio, iter)\
> diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> index 02f26d2b59ad..5e2ed46c1c88 100644
> --- a/include/linux/bvec.h
> +++ b/include/linux/bvec.h
> @@ -138,8 +138,7 @@ struct bvec_iter_all {
>  })
>  
>  static inline bool __bvec_iter_advance(const struct bio_vec *bv,
> -struct bvec_iter *iter,
> -unsigned bytes, bool mp)
> + struct bvec_iter *iter, unsigned bytes, unsigned max_segment)
>  {
>   if (WARN_ONCE(bytes > iter->bi_size,
>"Attempted to advance past end of bvec iter\n")) {
> @@ -148,18 +147,18 @@ static inline bool __bvec_iter_advance(const struct 
> bio_vec *bv,
>   }
>  
>   while (bytes) {
> - unsigned len;
> + unsigned segment_len = mp_bvec_iter_len(bv, *iter);
>  
> - if (mp)
> - len = mp_bvec_iter_len(bv, *iter);
> - else
> - len = bvec_iter_len(bv, *iter);
> + if (max_segment) {
> + max_segment -= bvec_iter_offset(bv, *iter);
> + segment_len = min(segment_len, max_segment);

Looks 'max_segment' needs to be constant, shouldn't be updated.

If '-1' is passed for multipage case, the above change may become:

segment_len = min_t(segment_len, max_seg_len - 
bvec_iter_offset(bv, *iter));

This way is more clean, but with extra cost of the above line for multipage
case.

Thanks,
Ming


[PATCH v2] f2fs: add bio cache for IPU

2018-11-18 Thread Chao Yu
SQLite in Wal mode may trigger sequential IPU write in db-wal file, after
commit d1b3e72d5490 ("f2fs: submit bio of in-place-update pages"), we
lost the chance of merging page in inner managed bio cache, result in
submitting more small-sized IO.

So let's add temporary bio in writepages() to cache mergeable write IO as
much as possible.

Test case:
1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
2. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"

Before:
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65544, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65552, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65560, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65568, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65576, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65584, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65592, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65600, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65608, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65616, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65624, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65632, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65640, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65648, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65656, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65664, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 
57352, size = 4096

After:
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65544, size = 65536
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 
57368, size = 4096

Signed-off-by: Chao Yu 
---
v2:
- submit cached bio for cp error case.
 fs/f2fs/data.c| 61 +--
 fs/f2fs/f2fs.h|  3 +++
 fs/f2fs/segment.c |  5 +++-
 3 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 8780f3d737c4..7dffafb8b2c5 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -474,6 +474,49 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
return 0;
 }
 
+int f2fs_merge_page_bio(struct f2fs_io_info *fio)
+{
+   struct bio *bio = *fio->bio;
+   struct page *page = fio->encrypted_page ?
+   fio->encrypted_page : fio->page;
+
+   if (!f2fs_is_valid_blkaddr(fio->sbi, fio->new_blkaddr,
+   __is_meta_io(fio) ? META_GENERIC : DATA_GENERIC))
+   return -EFAULT;
+
+   trace_f2fs_submit_page_bio(page, fio);
+   f2fs_trace_ios(fio, 0);
+
+   if (bio && (*fio->last_block + 1 != fio->new_blkaddr ||
+   !__same_bdev(fio->sbi, fio->new_blkaddr, bio))) {
+   __submit_bio(fio->sbi, bio, fio->type);
+   bio = NULL;
+   }
+alloc_new:
+   if (!bio) {
+   bio = __bio_alloc(fio->sbi, fio->new_blkaddr, fio->io_wbc,
+   BIO_MAX_PAGES, false, fio->type, fio->temp);
+   *fio->last_block = fio->new_blkaddr;
+   bio_set_op_attrs(bio, fio->op, fio->op_flags);
+   }
+
+   if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
+   __submit_bio(fio->sbi, bio, fio->type);
+   bio = NULL;
+   goto alloc_new;
+   }
+
+   if (fio->io_wbc)
+   wbc_account_io(fio->io_wbc, page, PAGE_SIZE);
+
+   *fio->last_block = fio->new_blkaddr;
+
+   inc_page_count(fio->sbi, WB_DATA_TYPE(fio->page));
+
+   *fio->bio = bio;
+   return 0;
+}
+
 void f2fs_submit_page_write(struct f2fs_io_info *fio)
 {
struct f2fs_sb_info *sbi = fio->sbi;
@@ -1894,6 +1937,8 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
 }
 
 static int __write_data_page(struct page *page, bool *submitted,
+   struct bio **bio,
+   sector_t *last_block,
struct writeback_control *wbc,
enum iostat_type io_type)
 {
@@ -1919,6 +1964,8 @@ static int __write_data_page(struct page *page, bool 
*submitted,
.need_lock = LOCK_RETRY,
.io_type = io_type,
.io_wbc = wbc,
+   .bio = bio,
+   .last_block = last_block,
};
 

Re: [PATCH 1/7] node: Link memory nodes to their compute nodes

2018-11-18 Thread Anshuman Khandual



On 11/17/2018 12:02 AM, Keith Busch wrote:
> On Thu, Nov 15, 2018 at 12:36:54PM -0800, Matthew Wilcox wrote:
>> On Thu, Nov 15, 2018 at 07:59:20AM -0700, Keith Busch wrote:
>>> On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote:
 On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote:
> Memory-only nodes will often have affinity to a compute node, and
> platforms have ways to express that locality relationship.
>
> A node containing CPUs or other DMA devices that can initiate memory
> access are referred to as "memory iniators". A "memory target" is a
> node that provides at least one phyiscal address range accessible to a
> memory initiator.

 I think I may be confused here.  If there is _no_ link from node X to
 node Y, does that mean that node X's CPUs cannot access the memory on
 node Y?  In my mind, all nodes can access all memory in the system,
 just not with uniform bandwidth/latency.
>>>
>>> The link is just about which nodes are "local". It's like how nodes have
>>> a cpulist. Other CPUs not in the node's list can acces that node's memory,
>>> but the ones in the mask are local, and provide useful optimization hints.
>>
>> So ... let's imagine a hypothetical system (I've never seen one built like
>> this, but it doesn't seem too implausible).  Connect four CPU sockets in
>> a square, each of which has some regular DIMMs attached to it.  CPU A is
>> 0 hops to Memory A, one hop to Memory B and Memory C, and two hops from
>> Memory D (each CPU only has two "QPI" links).  Then maybe there's some
>> special memory extender device attached on the PCIe bus.  Now there's
>> Memory B1 and B2 that's attached to CPU B and it's local to CPU B, but
>> not as local as Memory B is ... and we'd probably _prefer_ to allocate
>> memory for CPU A from Memory B1 than from Memory D.  But ... *mumble*,
>> this seems hard.
> 
> Indeed, that particular example is out of scope for this series. The
> first objective is to aid a process running in node B's CPUs to allocate
> memory in B1. Anything that crosses QPI are their own.

This is problematic. Any new kernel API interface should accommodate B2 type
memory as well from the above example which is on a PCIe bus. Because
eventually they would be represented as some sort of a NUMA node and then
applications will have to depend on this sysfs interface for their desired
memory placement requirements. Unless this interface is thought through for
B2 type of memory, it might not be extensible in the future.


[GIT] Networking

2018-11-18 Thread David Miller


1) Fix some potentially uninitialized variables and use-after-free in
   kvaser_usb can drier, from Jimmy Assarsson.

2) Fix leaks in qed driver, from Denis Bolotin.

3) Socket leak in l2tp, from Xin Long.

4) RSS context allocation fix in bnxt_en from Michael Chan.

5) Fix cxgb4 build errors, from Ganesh Goudar.

6) Route leaks in ipv6 when removing exceptions, from Xin Long.

7) Memory leak in IDR allocation handling of act_pedit, from Davide
   Caratti.

8) Use-after-free of bridge vlan stats, from Nikolay Aleksandrov.

9) When MTU is locked, do not force DF bit on ipv4 tunnels.  From
   Sabrina Dubroca.

10) When NAPI cached skb is reused, we must set it to the proper
initial state which includes skb->pkt_type.  From Eric Dumazet.

11) Lockdep and non-linear SKB handling fix in tipc from Jon Maloy.

12) Set RX queue properly in various tuntap receive paths, from
Matthew Cover.

Please pull, thanks a lot!

The following changes since commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad:

  Linux 4.20-rc2 (2018-11-11 17:12:31 -0600)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 

for you to fetch changes up to 8ebebcba559a1bfbaec7bbda64feb9870b9c58da:

  tuntap: fix multiqueue rx (2018-11-18 19:05:43 -0800)


Alexander Stein (1):
  can: flexcan: Always use last mailbox for TX

Andrew Morton (1):
  drivers/net/ethernet/qlogic/qed/qed_rdma.h: fix typo

Aya Levin (1):
  net/mlx4: Fix UBSAN warning of signed integer overflow

Brenda J. Butler (1):
  tc-testing: tdc.py: Guard against lack of returncode in executed command

Christophe JAILLET (1):
  net: lantiq: Fix returned value in case of error in 'xrx200_probe()'

David Ahern (1):
  ipv6: Fix PMTU updates for UDP/raw sockets in presence of VRF

David Howells (1):
  rxrpc: Fix life check

David S. Miller (7):
  Merge tag 'linux-can-fixes-for-4.20-20181109' of 
ssh://gitolite.kernel.org/.../mkl/linux-can
  Merge branch 'qed-Miscellaneous-bug-fixes'
  Merge branch 'bnxt_en-Bug-fixes'
  Merge branch 'mlx4-fixes'
  Merge tag 'batadv-net-for-davem-20181114' of 
git://git.open-mesh.org/linux-merge
  Revert "net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs"
  Merge branch 'tdc-fixes'

Davide Caratti (1):
  net/sched: act_pedit: fix memory leak when IDR allocation fails

Denis Bolotin (3):
  qed: Fix PTT leak in qed_drain()
  qed: Fix overriding offload_tc by protocols without APP TLV
  qed: Fix reading wrong value in loop condition

Eric Dumazet (2):
  net_sched: sch_fq: ensure maxrate fq parameter applies to EDT flows
  net-gro: reset skb->pkt_type in napi_reuse_skb()

Eugeniu Rosca (1):
  dt-bindings: can: rcar_can: document r8a77965 support

Fabrizio Castro (2):
  can: rcar_can: Fix erroneous registration
  dt-bindings: can: rcar_can: Add r8a774a1 support

Ganesh Goudar (1):
  cxgb4: fix thermal zone build error

Jack Morgenstein (1):
  net/mlx4_core: Zero out lkey field in SW2HW_MPT fw command

Jimmy Assarsson (2):
  can: kvaser_usb: Fix potential uninitialized variable use
  can: kvaser_usb: Fix accessing freed memory in kvaser_usb_start_xmit()

Jon Maloy (2):
  tipc: fix lockdep warning when reinitilaizing sockets
  tipc: don't assume linear buffer when reading ancillary data

Lucas Bates (1):
  tc-testing: tdc.py: ignore errors when decoding stdout/stderr

Lukas Wunner (1):
  can: hi311x: Use level-triggered interrupt

Marc Kleine-Budde (5):
  can: flexcan: remove not needed struct flexcan_priv::tx_mb and struct 
flexcan_priv::tx_mb_idx
  can: dev: can_get_echo_skb(): factor out non sending code to 
__can_get_echo_skb()
  can: dev: __can_get_echo_skb(): replace struct can_frame by canfd_frame 
to access frame length
  can: dev: __can_get_echo_skb(): Don't crash the kernel if 
can_priv::echo_skb is accessed out of bounds
  can: dev: __can_get_echo_skb(): print error message, if trying to echo 
non existing skb

Martin Schiller (2):
  net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs
  net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs

Matthew Cover (1):
  tuntap: fix multiqueue rx

Maxime Chevallier (1):
  net: mvneta: Don't advertise 2.5G modes

Michael Chan (5):
  bnxt_en: Fix RSS context allocation.
  bnxt_en: Fix rx_l4_csum_errors counter on 57500 devices.
  bnxt_en: Disable RDMA support on the 57500 chips.
  bnxt_en: Workaround occasional TX timeout on 57500 A0.
  bnxt_en: Add software "missed_irqs" counter.

Michal Kalderon (1):
  qed: Fix rdma_info structure allocation

Nikolay Aleksandrov (1):
  net: bridge: fix vlan stats use-after-free on destruction

Oleksij Rempel (4):
  can: rx-offload: introduce can_rx_offload_get_echo_skb() and 
can_rx_offload_queue_sorted() functions
  can: flexcan: handle tx-complete 

Re: [PATCH V10 01/19] block: introduce multi-page page bvec helpers

2018-11-18 Thread Jens Axboe
On 11/18/18 7:23 PM, Ming Lei wrote:
> On Fri, Nov 16, 2018 at 02:13:05PM +0100, Christoph Hellwig wrote:
>>> -#define bvec_iter_page(bvec, iter) \
>>> +#define mp_bvec_iter_page(bvec, iter)  \
>>> (__bvec_iter_bvec((bvec), (iter))->bv_page)
>>>  
>>> -#define bvec_iter_len(bvec, iter)  \
>>> +#define mp_bvec_iter_len(bvec, iter)   \
>>
>> I'd much prefer if we would stick to the segment naming that
>> we also use in the higher level helper.
>>
>> So segment_iter_page, segment_iter_len, etc.
> 
> We discussed the naming problem before, one big problem is that the 'segment'
> in bio_for_each_segment*() means one single page segment actually.
> 
> If we use segment_iter_page() here for multi-page segment, it may
> confuse people.
> 
> Of course, I prefer to the naming of segment/page, 
> 
> And Jens didn't agree to rename bio_for_each_segment*() before.

I didn't like frivolous renaming (and I still don't), but mp_
is horrible imho. Don't name these after the fact that they
are done in conjunction with supporting multipage bvecs. That
very fact will be irrelevant very soon

-- 
Jens Axboe



Re: a9f38e1dec ("floppy: convert to blk-mq"): INFO: task mount:661 blocked for more than 120 seconds.

2018-11-18 Thread Jens Axboe
On 11/17/18 7:39 PM, kernel test robot wrote:
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> commit a9f38e1dec107af70d81338332494bf0a1e76597
> Author: Omar Sandoval 
> AuthorDate: Mon Oct 15 09:21:34 2018 -0600
> Commit: Jens Axboe 
> CommitDate: Tue Oct 16 09:50:14 2018 -0600
> 
> floppy: convert to blk-mq
> 
> This driver likes to fetch requests from all over the place, so make
> queue_rq put requests on a list so that the logic stays the same. Tested
> with QEMU.
> 
> Signed-off-by: Omar Sandoval 
> 
> Converted to blk_mq_init_sq_queue() and fixed a few spots where the
> tag_set leaked on cleanup.
> 
> Signed-off-by: Jens Axboe 

This should be fixed by:

commit de7b75d82f70c5469675b99ad632983c50b6f7e7
Author: Jens Axboe 
Date:   Fri Nov 9 15:58:40 2018 -0700

floppy: fix race condition in __floppy_read_block_0()

which isn't in the tree that you tested. It's in -rc3, and in Linus's
tree since:

commit 59749c2d49bf28df69ac4bcabf1f69b00d3dca59
Merge: 9b5f361ac4a9 8dc765d438f1
Author: Linus Torvalds 
Date:   Fri Nov 16 09:31:59 2018 -0600

Merge tag 'for-linus-20181115' of git://git.kernel.dk/linux-block

-- 
Jens Axboe



Re: [PATCH 1/7] node: Link memory nodes to their compute nodes

2018-11-18 Thread Anshuman Khandual



On 11/15/2018 11:20 PM, Dan Williams wrote:
> On Thu, Nov 15, 2018 at 7:02 AM Keith Busch  wrote:
>>
>> On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote:
>>> On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote:
 Memory-only nodes will often have affinity to a compute node, and
 platforms have ways to express that locality relationship.

 A node containing CPUs or other DMA devices that can initiate memory
 access are referred to as "memory iniators". A "memory target" is a
 node that provides at least one phyiscal address range accessible to a
 memory initiator.
>>>
>>> I think I may be confused here.  If there is _no_ link from node X to
>>> node Y, does that mean that node X's CPUs cannot access the memory on
>>> node Y?  In my mind, all nodes can access all memory in the system,
>>> just not with uniform bandwidth/latency.
>>
>> The link is just about which nodes are "local". It's like how nodes have
>> a cpulist. Other CPUs not in the node's list can acces that node's memory,
>> but the ones in the mask are local, and provide useful optimization hints.
>>
>> Would a node mask would be prefered to symlinks?
> 
> I think that would be more flexible, because the set of initiators
> that may have "best" or "local" access to a target may be more than 1.

Right. The memory target should have two nodemasks (for now at least). One
enumerating which initiator nodes can access the memory coherently and the
other one which are nearer and can benefit from local allocation.


Re: [PATCH v4 2/2] tpm: add support for partial reads

2018-11-18 Thread Tadeusz Struk
On 11/17/18 11:48 PM, Jarkko Sakkinen wrote:
>> +if (priv->transmit_result || priv->partial_data) {
>> +if (*off == 0)
>> +priv->partial_data = priv->transmit_result;
>> +
>> +ret_size = min_t(ssize_t, size, priv->partial_data);
>> +if (ret_size <= 0) {
> When ret_size < 0? Shouldn't this be just "if (!ret_size)"?

What we want to check here is if ret_size is positive, which is a valid
value, or if it is negative or zero, which is an invalid value, so in
this case (!ret_size) will not work.

> 
>>  /* Holds the resul of the last successful call to tpm_transmit() */
>>  size_t transmit_result;
>> +/* Holds the count how much of the response is still unread */
>> +size_t partial_data;
> I'm otherwise happy how this look like but why call it partial_data.
> You cannot really tell from the name anything about its contents as
> data is very abstract term.
 
so I will rename these two to response_length and response_length_rem,
how does this sound?

> BTW, why you need the new variable anyway and not just decrease the
> variable where the length is original stored?

We need to have two variables, otherwise how do we tell if some part of
response was consumed to allow sending a new command?
The transmit_result is used for that. If it is zero then one can transmit
a new command even if the whole response is not consumed. The new variable
tracks how much of the response is still to be read. 

Thanks,
-- 
Tadeusz


Re: [PATCH] proc: allow killing processes via file descriptors

2018-11-18 Thread Andy Lutomirski
On Sun, Nov 18, 2018 at 6:47 PM Al Viro  wrote:
>
> On Sun, Nov 18, 2018 at 09:42:35AM -0800, Andy Lutomirski wrote:
>
> > Now here's the kicker: if the "running program" calls execve(), it
> > goes away.  The fd gets some sort of notification that this happened
>
> Type error, parser failed.
>
> Define "fd", please.  If it's a "file descriptor", thank you do playing,
> you've lost.  That's not going to work.  If it's "opened file" (aka
> "file description" in horrible POSIXese), who's going to get notifications
> and what kind of exclusion are you going to use?

What I meant was: a program that has one of these fds would be able to
find out that an execve() happened and the program needs to refresh
its access to the target task.  This could be as simple as POLLHUP
and, if needed, some syscall indicating exactly why we got POLLHUP
(e.g. execve vs exit).

There would be some sort of indication that a program that holds an fd
pointing at an "opened file" could get -- probably poll() would return
some status indicating that execve() happened and our capability is
gone, and, if needed


Re: [PATCH] arch/sparc: Use kzalloc_node

2018-11-18 Thread David Miller
From: Sabyasachi Gupta 
Date: Sat, 3 Nov 2018 10:54:52 +0530

> Replaced kmalloc_node + memset with kzalloc_node
> 
> Signed-off-by: Sabyasachi Gupta 

Applied.


Re: [PATCH v3 0/4] sparc: system call table generation support

2018-11-18 Thread David Miller
From: Firoz Khan 
Date: Wed, 14 Nov 2018 10:56:27 +0530

> The purpose of this patch series is, we can easily
> add/modify/delete system call table support by cha-
> nging entry in syscall.tbl file instead of manually
> changing many files. The other goal is to unify the 
> system call table generation support implementation 
> across all the architectures. 
 ...

Series applied to sparc-next.


Re: [PATCH 1/7] node: Link memory nodes to their compute nodes

2018-11-18 Thread Anshuman Khandual



On 11/15/2018 08:29 PM, Keith Busch wrote:
> On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote:
>> On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote:
>>> Memory-only nodes will often have affinity to a compute node, and
>>> platforms have ways to express that locality relationship.
>>>
>>> A node containing CPUs or other DMA devices that can initiate memory
>>> access are referred to as "memory iniators". A "memory target" is a
>>> node that provides at least one phyiscal address range accessible to a
>>> memory initiator.
>>
>> I think I may be confused here.  If there is _no_ link from node X to
>> node Y, does that mean that node X's CPUs cannot access the memory on
>> node Y?  In my mind, all nodes can access all memory in the system,
>> just not with uniform bandwidth/latency.
> 
> The link is just about which nodes are "local". It's like how nodes have
> a cpulist. Other CPUs not in the node's list can acces that node's memory,
> but the ones in the mask are local, and provide useful optimization hints.
> 
> Would a node mask would be prefered to symlinks?

Having hint for local affinity is definitely a plus but this must provide
the coherency matrix to the user preferably in the form of a nodemask for
each memory target.


Re: [v3, PATCH 2/2] dt-binding: mediatek-dwmac: add binding document for MediaTek MT2712 DWMAC

2018-11-18 Thread biao huang
Hi Rob,
Thanks for your comments.
On Sat, 2018-11-17 at 22:56 +0800, Rob Herring wrote:
> On Fri, Nov 16, 2018 at 05:18:46PM +0800, Biao Huang wrote:
> > The commit adds the device tree binding documentation for the MediaTek DWMAC
> > found on MediaTek MT2712.
> > 
> > Change-Id: I3728666bf65927164bd82fa8dddb90df8270bd44
> > Signed-off-by: Biao Huang 
> > ---
> >  .../devicetree/bindings/net/mediatek-dwmac.txt |   77 
> > 
> >  1 file changed, 77 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/net/mediatek-dwmac.txt
> > 
> > diff --git a/Documentation/devicetree/bindings/net/mediatek-dwmac.txt 
> > b/Documentation/devicetree/bindings/net/mediatek-dwmac.txt
> > new file mode 100644
> > index 000..7fd56e0
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/mediatek-dwmac.txt
> > @@ -0,0 +1,77 @@
> > +MediaTek DWMAC glue layer controller
> > +
> > +This file documents platform glue layer for stmmac.
> > +Please see stmmac.txt for the other unchanged properties.
> > +
> > +The device node has following properties.
> > +
> > +Required properties:
> > +- compatible:  Should be "mediatek,mt2712-gmac" for MT2712 SoC
> > +- reg:  Address and length of the register set for the device
> > +- interrupts:  Should contain the MAC interrupts
> 
> How many?
> 
the common stmmac driver will parse interrupt-name "macirq",
so even only one interrupt is used in mediatek dwmac design,
the interrupt-names is still remained in device tree.
> > +- interrupt-names: Should contain a list of interrupt names corresponding 
> > to
> > +   the interrupts in the interrupts property, if available.
> > +   Should be "macirq" for the main MAC IRQ
> > +- clocks: Must contain a phandle for each entry in clock-names.
> > +- clock-names: The name of the clock listed in the clocks property. These 
> > are
> > +   "axi", "apb", "mac_ext", "mac_parent", "ptp_ref", "ptp_parent", 
> > "ptp_top"
> > +   for MT2712 SoC
> 
> Clocks should represent the physical clocks connected to a block. Parent 
> clocks are not in that category.
> 
Got it. assigned-clocks/assigned-clocks-parents properties can handle
it.
> > +- mac-address: See ethernet.txt in the same directory
> > +- phy-mode: See ethernet.txt in the same directory
> > +
> > +Optional properties:
> > +- tx-delay: TX clock delay macro value. Range is 0~31. Default is 0.
> > +   It should be defined for rgmii/rgmii-rxid/mii interface.
> > +- rx-delay: RX clock delay macro value. Range is 0~31. Default is 0.
> > +   It should be defined for rgmii/rgmii-txid/mii/rmii interface.
> > +- fine-tune: This property will select coarse-tune delay or fine delay
> > +   for rgmii interface.
> > +   If fine-tune delay is enabled, tx-delay/rx-delay is 170+/-50ps
> > +   per stage.
> > +   Else coarse-tune delay is enabled, tx-delay/rx-delay is 0.55+/-0.2ns
> > +   per stage.
> > +   This property do not apply to non-rgmii PHYs.
> > +   Only coarse-tune delay is supported for mii/rmii PHYs.
> 
> Perhaps the delays should be in ps and the driver can figure out 
> fine-tune or not based on the value.
> 
the delay time in mediatek dwmac design is not so accurate, 
the current mt2712 and the following ICs will not use the same delay
design, but will use stages to indicate different delay time.
so, maybe "mediatek,tx-delay" represent the delay stage is a good
choice.
> > +- rmii-rxc: Reference clock of rmii is from external PHYs,
> > +   and it can be connected to TXC or RXC pin on MT2712 SoC.
> > +   If ref_clk <--> TXC, disable it.
> > +   Else ref_clk <--> RXC, enable it.
> > +- txc-inverse: Inverse tx clock for mii/rgmii.
> > +   Inverse tx clock inside MAC relative to reference clock for rmii,
> > +   and it rarely happen.
> > +- rxc-inverse: Inverse rx clock for mii/rgmii interfaces.
> > +   Inverse reference clock for rmii.
> 
> These should all have vendor prefixes. 'snps' if these are all standard 
> GMAC controls or 'mediatek' if Mediatek specific.
> 
Got it, will be modified in next version.
> > +
> > +Example:
> > +   eth: ethernet@1101c000 {
> > +   compatible = "mediatek,mt2712-gmac";
> > +   reg = <0 0x1101c000 0 0x1300>;
> > +   interrupts = ;
> > +   interrupt-names = "macirq";
> > +   phy-mode ="rgmii-id";
> > +   mac-address = [00 55 7b b5 7d f7];
> > +   clock-names = "axi",
> > + "apb",
> > + "mac_ext",
> > + "mac_parent",
> > + "ptp_ref",
> > + "ptp_parent",
> > + "ptp_top";
> > +   clocks = < CLK_PERI_GMAC>,
> > +< CLK_PERI_GMAC_PCLK>,
> > +< CLK_TOP_ETHER_125M_SEL>,
> > +< CLK_TOP_ETHERPLL_125M>,
> > +< CLK_TOP_ETHER_50M_SEL>,
> > +< CLK_TOP_APLL1_D3>,
> > +< CLK_TOP_APLL1>;
> > +   

Re: [PATCH] proc: allow killing processes via file descriptors

2018-11-18 Thread Al Viro
On Sun, Nov 18, 2018 at 09:42:35AM -0800, Andy Lutomirski wrote:

> Now here's the kicker: if the "running program" calls execve(), it
> goes away.  The fd gets some sort of notification that this happened

Type error, parser failed.

Define "fd", please.  If it's a "file descriptor", thank you do playing,
you've lost.  That's not going to work.  If it's "opened file" (aka
"file description" in horrible POSIXese), who's going to get notifications
and what kind of exclusion are you going to use?


Re: [PATCH 1/7] node: Link memory nodes to their compute nodes

2018-11-18 Thread Anshuman Khandual



On 11/15/2018 04:19 AM, Keith Busch wrote:
> Memory-only nodes will often have affinity to a compute node, and
> platforms have ways to express that locality relationship.

It may not have a local affinity to any compute node but it might have a
valid NUMA distance from all available compute nodes. This is particularly
true when the coherent device memory which is accessible from all available
compute nodes without having local affinity to any compute node other than
the device compute which may or not be represented as a NUMA node in itself.

But in case of normally system memory also, a memory only node might be far
from other CPU nodes and may not have CPUs of it's own. In that case there
is no local affinity anyways.

> 
> A node containing CPUs or other DMA devices that can initiate memory
> access are referred to as "memory iniators". A "memory target" is a

Memory initiators should also include heterogeneous compute elements like
GPU cores, FPGA elements etc apart from CPU and DMA engines.

> node that provides at least one phyiscal address range accessible to a
> memory initiator.

This definition for "memory target" makes sense. Coherent accesses within
PA range from all possible "memory initiators" which should also include
heterogeneous compute elements as mentioned before.

> 
> In preparation for these systems, provide a new kernel API to link
> the target memory node to its initiator compute node with symlinks to
> each other.

Makes sense but how would we really define NUMA placement for various
heterogeneous compute elements which are connected differently to the
system bus differently than the CPU and DMA. 

> 
> The following example shows the new sysfs hierarchy setup for memory node
> 'Y' local to commpute node 'X':
> 
>   # ls -l /sys/devices/system/node/nodeX/initiator*
>   /sys/devices/system/node/nodeX/targetY -> ../nodeY
> 
>   # ls -l /sys/devices/system/node/nodeY/target*
>   /sys/devices/system/node/nodeY/initiatorX -> ../nodeX

This inter linking makes sense but once we are able to define all possible
memory initiators and memory targets as NUMA nodes (which might not very
trivial) taking into account heterogeneous compute environment. But this
linking at least establishes the coherency relationship between memory
initiators and memory targets.

> 
> Signed-off-by: Keith Busch 
> ---
>  drivers/base/node.c  | 32 
>  include/linux/node.h |  2 ++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 86d6cd92ce3d..a9b7512a9502 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -372,6 +372,38 @@ int register_cpu_under_node(unsigned int cpu, unsigned 
> int nid)
>kobject_name(_devices[nid]->dev.kobj));
>  }
>  
> +int register_memory_node_under_compute_node(unsigned int m, unsigned int p)
> +{
> + int ret;
> + char initiator[20], target[17];

20, 17 seems arbitrary here.

> +
> + if (!node_online(p) || !node_online(m))
> + return -ENODEV;

Just wondering how a NUMA node for group of GPU compute elements will look
like which are not manage by kernel but are still memory initiators having
access to a number of memory targets.

> + if (m == p)
> + return 0;

Why skip ? Should not we link memory target to it's own node which can be
it's memory initiator as well. Caller of this linking function might decide
on whether the memory target is accessible from same NUMA node as a memory
initiator or not.

> +
> + snprintf(initiator, sizeof(initiator), "initiator%d", p);
> + snprintf(target, sizeof(target), "target%d", m);
> +
> + ret = sysfs_create_link(_devices[p]->dev.kobj,
> + _devices[m]->dev.kobj,
> + target);
> + if (ret)
> + return ret;
> +
> + ret = sysfs_create_link(_devices[m]->dev.kobj,
> + _devices[p]->dev.kobj,
> + initiator);
> + if (ret)
> + goto err;
> +
> + return 0;
> + err:
> + sysfs_remove_link(_devices[p]->dev.kobj,
> +   kobject_name(_devices[m]->dev.kobj));
> + return ret;
> +}
> +
>  int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
>  {
>   struct device *obj;
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 257bb3d6d014..1fd734a3fb3f 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -75,6 +75,8 @@ extern int register_mem_sect_under_node(struct memory_block 
> *mem_blk,
>  extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>  unsigned long phys_index);
>  
> +extern int register_memory_node_under_compute_node(unsigned int m, unsigned 
> int p);
> +
>  #ifdef CONFIG_HUGETLBFS
>  extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
>

Re: [PATCH v5] x86/fsgsbase/64: Fix the base write helper functions

2018-11-18 Thread Andy Lutomirski
On Fri, Nov 16, 2018 at 3:27 PM Chang S. Bae  wrote:
>
> The helper functions that purport to write the base should just write it
> only. It shouldn't have magic optimizations to change the index.
>
> Make the index explicitly changed from the caller, instead of including
> the code in the helpers.
>
> Subsequently, the task write helpers do not handle for the current task
> anymore. The range check for a base value is also factored out, to
> minimize code redundancy from the caller.
>
> v2: Fix further on the task write functions. Revert the changes on the
> task read helpers.
>
> v3: Fix putreg(). Edit the changelog.
>
> v4: Update the task write helper functions and do_arch_prctl_64(). Fix
> the comment in putreg().
>
> v5: Fix preempt_disable() calls in do_arch_prctl_64()

Reviewed-by: Andy Lutomirski 

Ingo, Thomas: can we get this in x86/urgent, please?


> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
> index ffae9b9740fd..4b8ee05dd6ad 100644
> --- a/arch/x86/kernel/ptrace.c
> +++ b/arch/x86/kernel/ptrace.c
> @@ -397,11 +397,12 @@ static int putreg(struct task_struct *child,
> if (value >= TASK_SIZE_MAX)
> return -EIO;
> /*
> -* When changing the FS base, use the same
> -* mechanism as for do_arch_prctl_64().
> +* When changing the FS base, use do_arch_prctl_64()
> +* to set the index to zero and to set the base
> +* as requested.
>  */
> if (child->thread.fsbase != value)
> -   return x86_fsbase_write_task(child, value);
> +   return do_arch_prctl_64(child, ARCH_SET_FS, value);

FWIW, this logic is and was nonsensical, but it matches historical
behavior, so I guess it's okay.  I suspect that gdb only works by
luck, since fs_base has a *higher* index than fs (and same for gs),
which means that SETREGS with a nonzero fs or gs likely only works
because the target almost always already has fs_base or gs_base == 0,
so we bypass this entire mess.

Sigh.  When you resubmit the full FSGSBASE series, I'll review the new
code extra carefully.


Re: [PATCH 0/7] ACPI HMAT memory sysfs representation

2018-11-18 Thread Anshuman Khandual



On 11/16/2018 09:21 PM, Keith Busch wrote:
> On Fri, Nov 16, 2018 at 11:57:58AM +0530, Anshuman Khandual wrote:
>> On 11/15/2018 04:19 AM, Keith Busch wrote:
>>> This series provides a new sysfs representation for heterogeneous
>>> system memory.
>>>
>>> The previous series that was specific to HMAT that this series was based
>>> on was last posted here: https://lkml.org/lkml/2017/12/13/968
>>>
>>> Platforms may provide multiple types of cpu attached system memory. The
>>> memory ranges for each type may have different characteristics that
>>> applications may wish to know about when considering what node they want
>>> their memory allocated from. 
>>>
>>> It had previously been difficult to describe these setups as memory
>>> rangers were generally lumped into the NUMA node of the CPUs. New
>>> platform attributes have been created and in use today that describe
>>> the more complex memory hierarchies that can be created.
>>>
>>> This series first creates new generic APIs under the kernel's node
>>> representation. These new APIs can be used to create links among local
>>> memory and compute nodes and export characteristics about the memory
>>> nodes. Documentation desribing the new representation are provided.
>>>
>>> Finally the series adds a kernel user for these new APIs from parsing
>>> the ACPI HMAT.
>>
>> Not able to see the patches from this series either on the list or on the
>> archive (https://lkml.org/lkml/2018/11/15/331). 
> 
> The send-email split the cover-letter from the series, probably
> something I did. Series followed immediately after:
> 
>   https://lkml.org/lkml/2018/11/15/332

Yeah got it. I can see the series on the list. Thanks for pointing out.

> 
>> IIRC last time we discussed
>> about this and the concern which I raised was in absence of a broader NUMA
>> rework for multi attribute memory it might not a good idea to settle down
>> and freeze sysfs interface for the user space. 
> 


INVESTMENT FUNDING OPPORTUNITY

2018-11-18 Thread THOON YAN

Investment funding opportunity
Email: 1045579...@qq.com
Good day,
My name is Thoon Yan. Do you have any project that needs funding? I  
wish to invite you to a special Investment Funding opportunity. All I  
need to know is if I can trust you to partner with me in a funding  
opportunity. If interested, kindly contact me for more information via  
Email: 1045579...@qq.com






































__

The information contained in this  e-mail and its  attachments is  
privileged, confidential and
is proprietary to MFS. It is intended for the use of the addressee   
only. If you have received
this e-mail  and its  attachments by  error, please  notify the sender  
 immediately and kindly

delete it from your computer system. Thank you.
__






Re: [PATCH] proc: allow killing processes via file descriptors

2018-11-18 Thread Andy Lutomirski
On Sun, Nov 18, 2018 at 12:32 PM Daniel Colascione  wrote:
>
> On Sun, Nov 18, 2018 at 12:28 PM, Andy Lutomirski  wrote:
> >> That is, I'm proposing an API that looks like this:
> >>
> >> int process_kill(int procfs_dfd, int signo, const union sigval value)
> >>
> >> If, later, process_kill were to *also* accept process-capability FDs,
> >> nothing would break.
> >
> > Except that this makes it ambiguous to the caller as to whether their 
> > current creds are considered.  So it would need to be a different syscall 
> > or at least a flag.  Otherwise a lot of those nice theoretical properties 
> > go away.
>
> Sure. A flag might make for better ergonomics.
>
> >> Yes, that's what I have in mind. A siginfo_t is small enough that we
> >> could just store it as a blob allocated off the procfs inode or
> >> something like that without bothering with a shmfs file. You'd be able
> >> to read(2) the exit status as many times as you wanted.
> >
> > I think that, if the syscall in question is read(2), then it should work 
> > *once* per struct file.  Otherwise running cat on the file would behave 
> > very oddly.
>
> Why? The file pointer would work normally.

Can you explain the exact semantics?  If I have an fd where read(2)
returns the same 4-byte value every time read(2) is called, then cat
will just return an infinite sequence of the same value.  This is not
a complete disaster, but it's not really a good thing.

>
> > Read and poll have the same problem as write: we can’t check caps in read 
> > or poll either.
>
> Why not? Reading /proc/pid/stat does an access check today and
> conditionally replaces the exit status with zero.

And that's probably a bug.  It's at least a giant kludge that we shouldn't copy.

Here is the general rule: the basic operations that are expected to
treat file descriptors as capabilities *must* treat file descriptors
as capabilities, at least for new APIs.  This includes read(2),
write(2), and poll(2).  We should have an exceedingly good reason to
check current's creds, mm, or anything else about current in those
syscalls.

There is a good reason for this: consider what happens if you type:

sudo >/proc/PID/whatever

or

sudo 

[RFC PATCH v2 02/14] m68k: mac: Fix VIA timer counter accesses

2018-11-18 Thread Finn Thain
This resolves some bugs that affect VIA timer counter accesses.
Avoid lost interrupts caused by reading the counter low byte register.
Make allowance for the fact that the counter will be decremented to
0x before being reloaded.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Finn Thain 
---
Changed since v1:
 - Test the timer interrupt flag unconditionally.
 - Drop some extraneous clean up.
 - Don't try to recover from lost timer interrupts. Don't lose them
in the first place. That means giving up on the timer counter low byte.
The extra precision is probably not worth the extra complexity and
I couldn't make it work anyway.
---
 arch/m68k/mac/via.c | 105 +++-
 1 file changed, 56 insertions(+), 49 deletions(-)

diff --git a/arch/m68k/mac/via.c b/arch/m68k/mac/via.c
index 2ab85b6eb4fe..d1dbf9017300 100644
--- a/arch/m68k/mac/via.c
+++ b/arch/m68k/mac/via.c
@@ -54,16 +54,6 @@ static __u8 rbv_clear;
 
 static int gIER,gIFR,gBufA,gBufB;
 
-/*
- * Timer defs.
- */
-
-#define TICK_SIZE  1
-#define MAC_CLOCK_TICK (783300/HZ) /* ticks per HZ */
-#define MAC_CLOCK_LOW  (MAC_CLOCK_TICK&0xFF)
-#define MAC_CLOCK_HIGH (MAC_CLOCK_TICK>>8)
-
-
 /*
  * On Macs with a genuine VIA chip there is no way to mask an individual slot
  * interrupt. This limitation also seems to apply to VIA clone logic cores in
@@ -267,22 +257,6 @@ void __init via_init(void)
}
 }
 
-/*
- * Start the 100 Hz clock
- */
-
-void __init via_init_clock(irq_handler_t func)
-{
-   via1[vACR] |= 0x40;
-   via1[vT1LL] = MAC_CLOCK_LOW;
-   via1[vT1LH] = MAC_CLOCK_HIGH;
-   via1[vT1CL] = MAC_CLOCK_LOW;
-   via1[vT1CH] = MAC_CLOCK_HIGH;
-
-   if (request_irq(IRQ_MAC_TIMER_1, func, 0, "timer", func))
-   pr_err("Couldn't register %s interrupt\n", "timer");
-}
-
 /*
  * Debugging dump, used in various places to see what's going on.
  */
@@ -310,29 +284,6 @@ void via_debug_dump(void)
}
 }
 
-/*
- * This is always executed with interrupts disabled.
- *
- * TBI: get time offset between scheduling timer ticks
- */
-
-u32 mac_gettimeoffset(void)
-{
-   unsigned long ticks, offset = 0;
-
-   /* read VIA1 timer 2 current value */
-   ticks = via1[vT1CL] | (via1[vT1CH] << 8);
-   /* The probability of underflow is less than 2% */
-   if (ticks > MAC_CLOCK_TICK - MAC_CLOCK_TICK / 50)
-   /* Check for pending timer interrupt in VIA1 IFR */
-   if (via1[vIFR] & 0x40) offset = TICK_SIZE;
-
-   ticks = MAC_CLOCK_TICK - ticks;
-   ticks = ticks * 1L / MAC_CLOCK_TICK;
-
-   return (ticks + offset) * 1000;
-}
-
 /*
  * Flush the L2 cache on Macs that have it by flipping
  * the system into 24-bit mode for an instant.
@@ -618,3 +569,59 @@ int via2_scsi_drq_pending(void)
return via2[gIFR] & (1 << IRQ_IDX(IRQ_MAC_SCSIDRQ));
 }
 EXPORT_SYMBOL(via2_scsi_drq_pending);
+
+/* timer and clock source */
+
+#define VIA_CLOCK_FREQ 783360/* VIA "phase 2" clock in Hz 
*/
+#define VIA_TIMER_INTERVAL (100 / HZ)/* microseconds per jiffy */
+#define VIA_TIMER_CYCLES   (VIA_CLOCK_FREQ / HZ) /* clock cycles per jiffy */
+
+#define VIA_TC (VIA_TIMER_CYCLES - 2) /* including 0 and -1 */
+#define VIA_TC_LOW (VIA_TC & 0xFF)
+#define VIA_TC_HIGH(VIA_TC >> 8)
+
+void __init via_init_clock(irq_handler_t timer_routine)
+{
+   if (request_irq(IRQ_MAC_TIMER_1, timer_routine, 0, "timer", NULL)) {
+   pr_err("Couldn't register %s interrupt\n", "timer");
+   return;
+   }
+
+   via1[vT1LL] = VIA_TC_LOW;
+   via1[vT1LH] = VIA_TC_HIGH;
+   via1[vT1CL] = VIA_TC_LOW;
+   via1[vT1CH] = VIA_TC_HIGH;
+   via1[vACR] |= 0x40;
+}
+
+u32 mac_gettimeoffset(void)
+{
+   unsigned long flags;
+   u8 count_high;
+   u16 count, offset = 0;
+
+   /*
+* Timer counter wrap-around is detected with the timer interrupt flag
+* but reading the counter low byte (vT1CL) would reset the flag.
+* Also, accessing both counter registers is essentially a data race.
+* These problems are avoided by ignoring the low byte. Clock accuracy
+* is 256 times worse (error can reach 0.327 ms) but CPU overhead is
+* reduced by avoiding slow VIA register accesses.
+*/
+
+   local_irq_save(flags);
+   count_high = via1[vT1CH];
+   if (count_high == 0xFF) {
+   count_high = 0;
+   while (via1[vT1CH] == 0xFF)
+   /* spin */;
+   }
+   if (via1[vIFR] & VIA_TIMER_1_INT)
+   offset = VIA_TIMER_CYCLES;
+   local_irq_restore(flags);
+
+   count = count_high << 8;
+   count = VIA_TIMER_CYCLES - count + offset;
+
+   return ((count * VIA_TIMER_INTERVAL) / VIA_TIMER_CYCLES) * 1000;
+}
-- 
2.18.1



[RFC PATCH v2 03/14] m68k: mac: Clean up unused timer definitions

2018-11-18 Thread Finn Thain
Signed-off-by: Finn Thain 
---
 arch/m68k/include/asm/macints.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/m68k/include/asm/macints.h b/arch/m68k/include/asm/macints.h
index cddb2d3ea49b..4da172bd048c 100644
--- a/arch/m68k/include/asm/macints.h
+++ b/arch/m68k/include/asm/macints.h
@@ -121,7 +121,4 @@
 #define SLOT2IRQ(x)  (x + 47)
 #define IRQ2SLOT(x)  (x - 47)
 
-#define INT_CLK   24576/* CLK while int_clk =2.456MHz and divide = 
100 */
-#define INT_TICKS 246  /* to make sched_time = 99.902... HZ */
-
 #endif /* asm/macints.h */
-- 
2.18.1



[RFC PATCH v2 06/14] m68k: amiga: Convert to clocksource API

2018-11-18 Thread Finn Thain
Add a platform clocksource by adapting the existing arch_gettimeoffset
implementation.

Signed-off-by: Finn Thain 
Acked-by: Linus Walleij 
---
Changed since v1:
 - Moved clk_total access to within the irq lock.
---
 arch/m68k/amiga/config.c | 43 
 1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/arch/m68k/amiga/config.c b/arch/m68k/amiga/config.c
index d4976c1aa0cc..c498f8419c87 100644
--- a/arch/m68k/amiga/config.c
+++ b/arch/m68k/amiga/config.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -461,7 +462,28 @@ void __init config_amiga(void)
*(unsigned char *)ZTWO_VADDR(0xde0002) |= 0x80;
 }
 
+static u64 amiga_read_clk(struct clocksource *cs);
+
+static struct clocksource amiga_clk = {
+   .name   = "ciab",
+   .rating = 250,
+   .read   = amiga_read_clk,
+   .mask   = CLOCKSOURCE_MASK(32),
+   .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
 static unsigned short jiffy_ticks;
+static u32 clk_total;
+
+static irqreturn_t ciab_timer_handler(int irq, void *dev_id)
+{
+   irq_handler_t timer_routine = dev_id;
+
+   clk_total += jiffy_ticks;
+   timer_routine(0, NULL);
+
+   return IRQ_HANDLED;
+}
 
 static void __init amiga_sched_init(irq_handler_t timer_routine)
 {
@@ -481,20 +503,23 @@ static void __init amiga_sched_init(irq_handler_t 
timer_routine)
 * Please don't change this to use ciaa, as it interferes with the
 * SCSI code. We'll have to take a look at this later
 */
-   if (request_irq(IRQ_AMIGA_CIAB_TA, timer_routine, 0, "timer", NULL))
+   if (request_irq(IRQ_AMIGA_CIAB_TA, ciab_timer_handler, IRQF_TIMER,
+   "timer", timer_routine))
pr_err("Couldn't register timer interrupt\n");
/* start timer */
ciab.cra |= 0x11;
-}
 
-#define TICK_SIZE 1
+   clocksource_register_hz(_clk, amiga_eclock);
+}
 
-/* This is always executed with interrupts disabled.  */
-static u32 amiga_gettimeoffset(void)
+static u64 amiga_read_clk(struct clocksource *cs)
 {
+   unsigned long flags;
unsigned short hi, lo, hi2;
u32 ticks, offset = 0;
 
+   local_irq_save(flags);
+
/* read CIA B timer A current value */
hi  = ciab.tahi;
lo  = ciab.talo;
@@ -510,12 +535,14 @@ static u32 amiga_gettimeoffset(void)
if (ticks > jiffy_ticks / 2)
/* check for pending interrupt */
if (cia_set_irq(_base, 0) & CIA_ICR_TA)
-   offset = 1;
+   offset = jiffy_ticks;
 
ticks = jiffy_ticks - ticks;
-   ticks = (1 * ticks) / jiffy_ticks;
+   ticks += offset + clk_total;
+
+   local_irq_restore(flags);
 
-   return (ticks + offset) * 1000;
+   return ticks;
 }
 
 static void amiga_reset(void)  __noreturn;
-- 
2.18.1



[RFC PATCH v2 04/14] m68k: apollo, q40, sun3, sun3x: Remove arch_gettimeoffset implementations

2018-11-18 Thread Finn Thain
These dummy implementations are no better than
default_arch_gettimeoffset() so remove them.

Signed-off-by: Finn Thain 
---
 arch/m68k/apollo/config.c | 7 ---
 arch/m68k/q40/config.c| 9 -
 arch/m68k/sun3/config.c   | 2 --
 arch/m68k/sun3/intersil.c | 7 ---
 arch/m68k/sun3x/config.c  | 1 -
 arch/m68k/sun3x/time.c| 5 -
 arch/m68k/sun3x/time.h| 1 -
 7 files changed, 32 deletions(-)

diff --git a/arch/m68k/apollo/config.c b/arch/m68k/apollo/config.c
index aef8d42e078d..7d168e6dfb01 100644
--- a/arch/m68k/apollo/config.c
+++ b/arch/m68k/apollo/config.c
@@ -29,7 +29,6 @@ u_long apollo_model;
 
 extern void dn_sched_init(irq_handler_t handler);
 extern void dn_init_IRQ(void);
-extern u32 dn_gettimeoffset(void);
 extern int dn_dummy_hwclk(int, struct rtc_time *);
 extern void dn_dummy_reset(void);
 #ifdef CONFIG_HEARTBEAT
@@ -152,7 +151,6 @@ void __init config_apollo(void)
 
mach_sched_init=dn_sched_init; /* */
mach_init_IRQ=dn_init_IRQ;
-   arch_gettimeoffset   = dn_gettimeoffset;
mach_max_dma_address = 0x;
mach_hwclk   = dn_dummy_hwclk; /* */
mach_reset   = dn_dummy_reset;  /* */
@@ -205,11 +203,6 @@ void dn_sched_init(irq_handler_t timer_routine)
pr_err("Couldn't register timer interrupt\n");
 }
 
-u32 dn_gettimeoffset(void)
-{
-   return 0xdeadbeef;
-}
-
 int dn_dummy_hwclk(int op, struct rtc_time *t) {
 
 
diff --git a/arch/m68k/q40/config.c b/arch/m68k/q40/config.c
index 96810d91da2b..e63eb5f06999 100644
--- a/arch/m68k/q40/config.c
+++ b/arch/m68k/q40/config.c
@@ -40,7 +40,6 @@ extern void q40_init_IRQ(void);
 static void q40_get_model(char *model);
 extern void q40_sched_init(irq_handler_t handler);
 
-static u32 q40_gettimeoffset(void);
 static int q40_hwclk(int, struct rtc_time *);
 static unsigned int q40_get_ss(void);
 static int q40_get_rtc_pll(struct rtc_pll_info *pll);
@@ -169,7 +168,6 @@ void __init config_q40(void)
mach_sched_init = q40_sched_init;
 
mach_init_IRQ = q40_init_IRQ;
-   arch_gettimeoffset = q40_gettimeoffset;
mach_hwclk = q40_hwclk;
mach_get_ss = q40_get_ss;
mach_get_rtc_pll = q40_get_rtc_pll;
@@ -201,13 +199,6 @@ int __init q40_parse_bootinfo(const struct bi_record *rec)
return 1;
 }
 
-
-static u32 q40_gettimeoffset(void)
-{
-   return 5000 * (ql_ticks != 0) * 1000;
-}
-
-
 /*
  * Looks like op is non-zero for setting the clock, and zero for
  * reading the clock.
diff --git a/arch/m68k/sun3/config.c b/arch/m68k/sun3/config.c
index 79a2bb857906..867e68d92c71 100644
--- a/arch/m68k/sun3/config.c
+++ b/arch/m68k/sun3/config.c
@@ -37,7 +37,6 @@
 
 char sun3_reserved_pmeg[SUN3_PMEGS_NUM];
 
-extern u32 sun3_gettimeoffset(void);
 static void sun3_sched_init(irq_handler_t handler);
 extern void sun3_get_model (char* model);
 extern int sun3_hwclk(int set, struct rtc_time *t);
@@ -138,7 +137,6 @@ void __init config_sun3(void)
 mach_sched_init  =  sun3_sched_init;
 mach_init_IRQ=  sun3_init_IRQ;
 mach_reset   =  sun3_reboot;
-   arch_gettimeoffset   =  sun3_gettimeoffset;
mach_get_model   =  sun3_get_model;
mach_hwclk   =  sun3_hwclk;
mach_halt=  sun3_halt;
diff --git a/arch/m68k/sun3/intersil.c b/arch/m68k/sun3/intersil.c
index d911070af02a..8fc74864de81 100644
--- a/arch/m68k/sun3/intersil.c
+++ b/arch/m68k/sun3/intersil.c
@@ -22,13 +22,6 @@
 #define STOP_VAL (INTERSIL_STOP | INTERSIL_INT_ENABLE | INTERSIL_24H_MODE)
 #define START_VAL (INTERSIL_RUN | INTERSIL_INT_ENABLE | INTERSIL_24H_MODE)
 
-/* does this need to be implemented? */
-u32 sun3_gettimeoffset(void)
-{
-  return 1000;
-}
-
-
 /* get/set hwclock */
 
 int sun3_hwclk(int set, struct rtc_time *t)
diff --git a/arch/m68k/sun3x/config.c b/arch/m68k/sun3x/config.c
index 33d3a1c6fba0..03ce7f9facfe 100644
--- a/arch/m68k/sun3x/config.c
+++ b/arch/m68k/sun3x/config.c
@@ -49,7 +49,6 @@ void __init config_sun3x(void)
mach_sched_init  = sun3x_sched_init;
mach_init_IRQ= sun3_init_IRQ;
 
-   arch_gettimeoffset   = sun3x_gettimeoffset;
mach_reset   = sun3x_reboot;
 
mach_hwclk   = sun3x_hwclk;
diff --git a/arch/m68k/sun3x/time.c b/arch/m68k/sun3x/time.c
index 3c8a86d08508..9163294b0fb6 100644
--- a/arch/m68k/sun3x/time.c
+++ b/arch/m68k/sun3x/time.c
@@ -73,11 +73,6 @@ int sun3x_hwclk(int set, struct rtc_time *t)
 
return 0;
 }
-/* Not much we can do here */
-u32 sun3x_gettimeoffset(void)
-{
-return 0L;
-}
 
 #if 0
 static irqreturn_t sun3x_timer_tick(int irq, void *dev_id)
diff --git a/arch/m68k/sun3x/time.h b/arch/m68k/sun3x/time.h
index 496f406412ad..86ce78bb3c28 100644
--- a/arch/m68k/sun3x/time.h
+++ b/arch/m68k/sun3x/time.h
@@ -3,7 +3,6 @@
 #define SUN3X_TIME_H
 
 extern int sun3x_hwclk(int set, struct rtc_time *t);
-u32 sun3x_gettimeoffset(void);
 void sun3x_sched_init(irq_handler_t vector);
 

[RFC PATCH v2 13/14] m68k: mvme16x: Convert to clocksource API

2018-11-18 Thread Finn Thain
Add a platform clocksource by adapting the existing arch_gettimeoffset
implementation.

Signed-off-by: Finn Thain 
Acked-by: Linus Walleij 
---
Changed since v1:
 - Moved clk_total access to within the irq lock.
---
 arch/m68k/mvme16x/config.c | 39 +++---
 1 file changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/m68k/mvme16x/config.c b/arch/m68k/mvme16x/config.c
index 8bafa6a37593..2c109ee2a1a5 100644
--- a/arch/m68k/mvme16x/config.c
+++ b/arch/m68k/mvme16x/config.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -343,6 +344,21 @@ static irqreturn_t mvme16x_abort_int (int irq, void 
*dev_id)
return IRQ_HANDLED;
 }
 
+static u64 mvme16x_read_clk(struct clocksource *cs);
+
+static struct clocksource mvme16x_clk = {
+   .name   = "pcc",
+   .rating = 250,
+   .read   = mvme16x_read_clk,
+   .mask   = CLOCKSOURCE_MASK(32),
+   .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+static u32 clk_total;
+
+#define PCC_TIMER_CLOCK_FREQ 100
+#define PCC_TIMER_CYCLES (PCC_TIMER_CLOCK_FREQ / HZ)
+
 static irqreturn_t mvme16x_timer_int (int irq, void *dev_id)
 {
irq_handler_t timer_routine = dev_id;
@@ -350,6 +366,7 @@ static irqreturn_t mvme16x_timer_int (int irq, void *dev_id)
 
local_irq_save(flags);
*(volatile unsigned char *)0xfff4201b |= 8;
+   clk_total += PCC_TIMER_CYCLES;
timer_routine(0, NULL);
local_irq_restore(flags);
 
@@ -363,13 +380,15 @@ void mvme16x_sched_init (irq_handler_t timer_routine)
 
 /* Using PCCchip2 or MC2 chip tick timer 1 */
 *(volatile unsigned long *)0xfff42008 = 0;
-*(volatile unsigned long *)0xfff42004 = 1; /* 10ms */
+*(volatile unsigned long *)0xfff42004 = PCC_TIMER_CYCLES;
 *(volatile unsigned char *)0xfff42017 |= 3;
 *(volatile unsigned char *)0xfff4201b = 0x16;
-if (request_irq(MVME16x_IRQ_TIMER, mvme16x_timer_int, 0,
-"timer", timer_routine))
+if (request_irq(MVME16x_IRQ_TIMER, mvme16x_timer_int, IRQF_TIMER, "timer",
+timer_routine))
panic ("Couldn't register timer int");
 
+clocksource_register_hz(_clk, PCC_TIMER_CLOCK_FREQ);
+
 if (brdno == 0x0162 || brdno == 0x172)
irq = MVME162_IRQ_ABORT;
 else
@@ -379,11 +398,17 @@ void mvme16x_sched_init (irq_handler_t timer_routine)
panic ("Couldn't register abort int");
 }
 
-
-/* This is always executed with interrupts disabled.  */
-u32 mvme16x_gettimeoffset(void)
+static u64 mvme16x_read_clk(struct clocksource *cs)
 {
-return (*(volatile u32 *)0xfff42008) * 1000;
+   unsigned long flags;
+   u32 ticks;
+
+   local_irq_save(flags);
+   ticks = *(volatile u32 *)0xfff42008;
+   ticks += clk_total;
+   local_irq_restore(flags);
+
+   return ticks;
 }
 
 int bcd2int (unsigned char b)
-- 
2.18.1



[RFC PATCH v2 12/14] m68k: mvme147: Handle timer counter overflow

2018-11-18 Thread Finn Thain
Reading the timer counter races with timer overflow (and the
corresponding interrupt). This is resolved by reading the overflow
register and taking this value into account. The interrupt handler
must clear the overflow register when it eventually executes.

Suggested-by: Thomas Gleixner 
Signed-off-by: Finn Thain 
---
TODO: find a spare counter for the clocksource, rather than hanging
it off the HZ timer.
---
 arch/m68k/include/asm/mvme147hw.h |  1 +
 arch/m68k/mvme147/config.c| 23 +++
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/m68k/include/asm/mvme147hw.h 
b/arch/m68k/include/asm/mvme147hw.h
index 7c3dd513128e..257b29184af9 100644
--- a/arch/m68k/include/asm/mvme147hw.h
+++ b/arch/m68k/include/asm/mvme147hw.h
@@ -66,6 +66,7 @@ struct pcc_regs {
 #define PCC_INT_ENAB   0x08
 
 #define PCC_TIMER_INT_CLR  0x80
+#define PCC_TIMER_CLR_OVF  0x04
 
 #define PCC_LEVEL_ABORT0x07
 #define PCC_LEVEL_SERIAL   0x04
diff --git a/arch/m68k/mvme147/config.c b/arch/m68k/mvme147/config.c
index 82b53b5ca82b..545a1fe0e119 100644
--- a/arch/m68k/mvme147/config.c
+++ b/arch/m68k/mvme147/config.c
@@ -118,7 +118,7 @@ static irqreturn_t mvme147_timer_int (int irq, void *dev_id)
 
local_irq_save(flags);
m147_pcc->t1_int_cntrl = PCC_TIMER_INT_CLR;
-   m147_pcc->t1_int_cntrl = PCC_INT_ENAB|PCC_LEVEL_TIMER1;
+   m147_pcc->t1_cntrl = PCC_TIMER_CLR_OVF;
clk_total += PCC_TIMER_CYCLES;
timer_routine(0, NULL);
local_irq_restore(flags);
@@ -144,23 +144,22 @@ void mvme147_sched_init (irq_handler_t timer_routine)
clocksource_register_hz(_clk, PCC_TIMER_CLOCK_FREQ);
 }
 
-/* XXX There are race hazards in this code XXX */
 static u64 mvme147_read_clk(struct clocksource *cs)
 {
unsigned long flags;
-   volatile unsigned short *cp = (volatile unsigned short *)0xfffe1012;
-   unsigned short n;
+   u8 overflow, tmp;
+   u16 count;
u32 ticks;
 
local_irq_save(flags);
-
-   n = *cp;
-   while (n != *cp)
-   n = *cp;
-
-   n -= PCC_TIMER_PRELOAD;
-   ticks = clk_total + n;
-
+   tmp = m147_pcc->t1_cntrl >> 4;
+   count = m147_pcc->t1_count;
+   overflow = m147_pcc->t1_cntrl >> 4;
+   if (overflow != tmp)
+   count = m147_pcc->t1_count;
+   count -= PCC_TIMER_PRELOAD;
+   ticks = count + overflow * PCC_TIMER_CYCLES;
+   ticks += clk_total;
local_irq_restore(flags);
 
return ticks;
-- 
2.18.1



[RFC PATCH v2 14/14] m68k: mvme16x: Handle timer counter overflow

2018-11-18 Thread Finn Thain
Reading the timer counter races with timer overflow (and the
corresponding interrupt). This is resolved by reading the overflow
register and taking this value into account. The interrupt handler
must clear the overflow register when it eventually executes.

Suggested-by: Thomas Gleixner 
Signed-off-by: Finn Thain 
---
TODO: find a spare counter for the clocksource, rather than hanging
it off the HZ timer.
---
 arch/m68k/mvme16x/config.c | 45 +++---
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/arch/m68k/mvme16x/config.c b/arch/m68k/mvme16x/config.c
index 2c109ee2a1a5..9bc2da69f80c 100644
--- a/arch/m68k/mvme16x/config.c
+++ b/arch/m68k/mvme16x/config.c
@@ -115,11 +115,11 @@ static void __init mvme16x_init_IRQ (void)
m68k_setup_user_interrupt(VEC_USER, 192);
 }
 
-#define pcc2chip   ((volatile u_char *)0xfff42000)
-#define PccSCCMICR 0x1d
-#define PccSCCTICR 0x1e
-#define PccSCCRICR 0x1f
-#define PccTPIACKR 0x25
+#define PCC2CHIP   (0xfff42000)
+#define PCCSCCMICR (PCC2CHIP + 0x1d)
+#define PCCSCCTICR (PCC2CHIP + 0x1e)
+#define PCCSCCRICR (PCC2CHIP + 0x1f)
+#define PCCTPIACKR (PCC2CHIP + 0x25)
 
 #ifdef CONFIG_EARLY_PRINTK
 
@@ -227,10 +227,10 @@ void mvme16x_cons_write(struct console *co, const char 
*str, unsigned count)
base_addr[CyIER] = CyTxMpty;
 
while (1) {
-   if (pcc2chip[PccSCCTICR] & 0x20)
+   if (in_8(PCCSCCTICR) & 0x20)
{
/* We have a Tx int. Acknowledge it */
-   sink = pcc2chip[PccTPIACKR];
+   sink = in_8(PCCTPIACKR);
if ((base_addr[CyLICR] >> 2) == port) {
if (i == count) {
/* Last char of string is now output */
@@ -359,13 +359,26 @@ static u32 clk_total;
 #define PCC_TIMER_CLOCK_FREQ 100
 #define PCC_TIMER_CYCLES (PCC_TIMER_CLOCK_FREQ / HZ)
 
+#define PCCTCMP1 (PCC2CHIP + 0x04)
+#define PCCTCNT1 (PCC2CHIP + 0x08)
+#define PCCTOVR1 (PCC2CHIP + 0x17)
+#define PCCTIC1  (PCC2CHIP + 0x1b)
+
+#define PCCTOVR1_TIC_EN  0x01
+#define PCCTOVR1_COC_EN  0x02
+#define PCCTOVR1_OVR_CLR 0x04
+
+#define PCCTIC1_INT_CLR  0x08
+#define PCCTIC1_INT_EN   0x10
+
 static irqreturn_t mvme16x_timer_int (int irq, void *dev_id)
 {
irq_handler_t timer_routine = dev_id;
unsigned long flags;
 
local_irq_save(flags);
-   *(volatile unsigned char *)0xfff4201b |= 8;
+   out_8(PCCTIC1, in_8(PCCTIC1) | PCCTIC1_INT_CLR);
+   out_8(PCCTOVR1, PCCTOVR1_OVR_CLR);
clk_total += PCC_TIMER_CYCLES;
timer_routine(0, NULL);
local_irq_restore(flags);
@@ -379,10 +392,10 @@ void mvme16x_sched_init (irq_handler_t timer_routine)
 int irq;
 
 /* Using PCCchip2 or MC2 chip tick timer 1 */
-*(volatile unsigned long *)0xfff42008 = 0;
-*(volatile unsigned long *)0xfff42004 = PCC_TIMER_CYCLES;
-*(volatile unsigned char *)0xfff42017 |= 3;
-*(volatile unsigned char *)0xfff4201b = 0x16;
+out_be32(PCCTCNT1, 0);
+out_be32(PCCTCMP1, PCC_TIMER_CYCLES);
+out_8(PCCTOVR1, in_8(PCCTOVR1) | PCCTOVR1_TIC_EN | PCCTOVR1_COC_EN);
+out_8(PCCTIC1, PCCTIC1_INT_EN | 6);
 if (request_irq(MVME16x_IRQ_TIMER, mvme16x_timer_int, IRQF_TIMER, "timer",
 timer_routine))
panic ("Couldn't register timer int");
@@ -401,10 +414,16 @@ void mvme16x_sched_init (irq_handler_t timer_routine)
 static u64 mvme16x_read_clk(struct clocksource *cs)
 {
unsigned long flags;
+   u8 overflow, tmp;
u32 ticks;
 
local_irq_save(flags);
-   ticks = *(volatile u32 *)0xfff42008;
+   tmp = in_8(PCCTOVR1) >> 4;
+   ticks = in_be32(PCCTCNT1);
+   overflow = in_8(PCCTOVR1) >> 4;
+   if (overflow != tmp)
+   ticks = in_be32(PCCTCNT1);
+   ticks += overflow * PCC_TIMER_CYCLES;
ticks += clk_total;
local_irq_restore(flags);
 
-- 
2.18.1



[RFC PATCH v2 08/14] m68k: bvme6000: Convert to clocksource API

2018-11-18 Thread Finn Thain
Add a platform clocksource by adapting the existing arch_gettimeoffset
implementation.

Signed-off-by: Finn Thain 
Acked-by: Linus Walleij 
---
Changed since v1:
 - Moved clk_total access to within the irq lock.
---
 arch/m68k/bvme6000/config.c | 52 +++--
 1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/arch/m68k/bvme6000/config.c b/arch/m68k/bvme6000/config.c
index c27c104ac7e7..9691d741e9dc 100644
--- a/arch/m68k/bvme6000/config.c
+++ b/arch/m68k/bvme6000/config.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -147,6 +148,21 @@ irqreturn_t bvme6000_abort_int (int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
+static u64 bvme6000_read_clk(struct clocksource *cs);
+
+static struct clocksource bvme6000_clk = {
+   .name   = "rtc",
+   .rating = 250,
+   .read   = bvme6000_read_clk,
+   .mask   = CLOCKSOURCE_MASK(32),
+   .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+static u32 clk_total;
+
+#define RTC_TIMER_CLOCK_FREQ 800
+#define RTC_TIMER_CYCLES (RTC_TIMER_CLOCK_FREQ / HZ)
+#define RTC_TIMER_COUNT  ((RTC_TIMER_CYCLES / 2) - 1)
 
 static irqreturn_t bvme6000_timer_int (int irq, void *dev_id)
 {
@@ -158,6 +174,7 @@ static irqreturn_t bvme6000_timer_int (int irq, void 
*dev_id)
 local_irq_save(flags);
 msr = rtc->msr & 0xc0;
 rtc->msr = msr | 0x20; /* Ack the interrupt */
+clk_total += RTC_TIMER_CYCLES;
 timer_routine(0, NULL);
 local_irq_restore(flags);
 
@@ -180,13 +197,13 @@ void bvme6000_sched_init (irq_handler_t timer_routine)
 
 rtc->msr = 0;  /* Ensure timer registers accessible */
 
-if (request_irq(BVME_IRQ_RTC, bvme6000_timer_int, 0,
-   "timer", timer_routine))
+if (request_irq(BVME_IRQ_RTC, bvme6000_timer_int, IRQF_TIMER, "timer",
+timer_routine))
panic ("Couldn't register timer int");
 
 rtc->t1cr_omr = 0x04;  /* Mode 2, ext clk */
-rtc->t1msb = 3 >> 8;
-rtc->t1lsb = 3 & 0xff;
+rtc->t1msb = RTC_TIMER_COUNT >> 8;
+rtc->t1lsb = RTC_TIMER_COUNT & 0xff;
 rtc->irr_icr1 &= 0xef; /* Route timer 1 to INTR pin */
 rtc->msr = 0x40;   /* Access int.cntrl, etc */
 rtc->pfr_icr0 = 0x80;  /* Just timer 1 ints enabled */
@@ -198,14 +215,14 @@ void bvme6000_sched_init (irq_handler_t timer_routine)
 
 rtc->msr = msr;
 
+clocksource_register_hz(_clk, RTC_TIMER_CLOCK_FREQ);
+
 if (request_irq(BVME_IRQ_ABORT, bvme6000_abort_int, 0,
"abort", bvme6000_abort_int))
panic ("Couldn't register abort int");
 }
 
 
-/* This is always executed with interrupts disabled.  */
-
 /*
  * NOTE:  Don't accept any readings within 5us of rollover, as
  * the T1INT bit may be a little slow getting set.  There is also
@@ -213,14 +230,18 @@ void bvme6000_sched_init (irq_handler_t timer_routine)
  * results...
  */
 
-u32 bvme6000_gettimeoffset(void)
+static u64 bvme6000_read_clk(struct clocksource *cs)
 {
+unsigned long flags;
 volatile RtcPtr_t rtc = (RtcPtr_t)BVME_RTC_BASE;
 volatile PitRegsPtr pit = (PitRegsPtr)BVME_PIT_BASE;
-unsigned char msr = rtc->msr & 0xc0;
+unsigned char msr;
 unsigned char t1int, t1op;
 u32 v = 80, ov;
 
+local_irq_save(flags);
+
+msr = rtc->msr & 0xc0;
 rtc->msr = 0;  /* Ensure timer registers accessible */
 
 do {
@@ -233,17 +254,20 @@ u32 bvme6000_gettimeoffset(void)
 } while (t1int != (rtc->msr & 0x20) ||
t1op != (pit->pcdr & 0x04) ||
abs(ov-v) > 80 ||
-   v > 39960);
+   v > RTC_TIMER_COUNT - (RTC_TIMER_COUNT / 100));
 
-v = 3 - v;
+v = RTC_TIMER_COUNT - v;
 if (!t1op) /* If in second half cycle.. */
-   v += 4;
-v /= 8;/* Convert ticks to microseconds */
+   v += RTC_TIMER_CYCLES / 2;
 if (t1int)
-   v += 1; /* Int pending, + 10ms */
+   v += RTC_TIMER_CYCLES;
 rtc->msr = msr;
 
-return v * 1000;
+v += clk_total;
+
+local_irq_restore(flags);
+
+return v;
 }
 
 /*
-- 
2.18.1



[RFC PATCH v2 07/14] m68k: atari: Convert to clocksource API

2018-11-18 Thread Finn Thain
Add a platform clocksource by adapting the existing arch_gettimeoffset
implementation.

Normally the MFP timer C interrupt flag would be used to check for
timer counter wrap-around. Unfortunately, that flag gets cleared by the
MFP itself (due to automatic EOI mode). This means that
mfp_timer_c_handler() and atari_read_clk() must race when accounting
for counter wrap-around.

That problem is avoided here by effectively stopping the clock when it
might otherwise jump backwards. This harms clock accuracy; the result
is not much better than the jiffies clocksource. Also, clock error is
no longer uniformly distributed.

Signed-off-by: Finn Thain 
Acked-by: Linus Walleij 
---
TODO: find a spare counter for the clocksource, rather than hanging
it off the HZ timer.

It would be simpler to adopt the 'jiffies' clocksource here
(c.f. patch for the hp300 platform in this series).

Changed since v1:
 - Moved clk_total access to within the irq lock.
 - Renamed mfp_timer_handler and mfptimer_handler.
 - Avoid accessing the timer interrupt flag in atari_read_clk(). To
get monotonicity, keep track of the previous timer counter value.
---
 arch/m68k/atari/time.c | 48 +++---
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/arch/m68k/atari/time.c b/arch/m68k/atari/time.c
index fafa20f75ab9..914832e55ec5 100644
--- a/arch/m68k/atari/time.c
+++ b/arch/m68k/atari/time.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -24,12 +25,27 @@
 DEFINE_SPINLOCK(rtc_lock);
 EXPORT_SYMBOL_GPL(rtc_lock);
 
+static u64 atari_read_clk(struct clocksource *cs);
+
+static struct clocksource atari_clk = {
+   .name   = "mfp",
+   .rating = 100,
+   .read   = atari_read_clk,
+   .mask   = CLOCKSOURCE_MASK(32),
+   .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+static u32 clk_total;
+static u32 last_timer_count;
+
 static irqreturn_t mfp_timer_c_handler(int irq, void *dev_id)
 {
irq_handler_t timer_routine = dev_id;
unsigned long flags;
 
local_irq_save(flags);
+   last_timer_count = st_mfp.tim_dt_c;
+   clk_total += INT_TICKS;
timer_routine(0, NULL);
local_irq_restore(flags);
 
@@ -44,32 +60,30 @@ atari_sched_init(irq_handler_t timer_routine)
 /* start timer C, div = 1:100 */
 st_mfp.tim_ct_cd = (st_mfp.tim_ct_cd & 15) | 0x60;
 /* install interrupt service routine for MFP Timer C */
-if (request_irq(IRQ_MFP_TIMC, mfp_timer_c_handler, 0, "timer",
+if (request_irq(IRQ_MFP_TIMC, mfp_timer_c_handler, IRQF_TIMER, "timer",
 timer_routine))
pr_err("Couldn't register timer interrupt\n");
+
+clocksource_register_hz(_clk, INT_CLK);
 }
 
 /* ++andreas: gettimeoffset fixed to check for pending interrupt */
 
-#define TICK_SIZE 1
-
-/* This is always executed with interrupts disabled.  */
-u32 atari_gettimeoffset(void)
+static u64 atari_read_clk(struct clocksource *cs)
 {
-  u32 ticks, offset = 0;
-
-  /* read MFP timer C current value */
-  ticks = st_mfp.tim_dt_c;
-  /* The probability of underflow is less than 2% */
-  if (ticks > INT_TICKS - INT_TICKS / 50)
-/* Check for pending timer interrupt */
-if (st_mfp.int_pn_b & (1 << 5))
-  offset = TICK_SIZE;
+   unsigned long flags;
+   u32 ticks;
 
-  ticks = INT_TICKS - ticks;
-  ticks = ticks * 1L / INT_TICKS;
+   local_irq_save(flags);
+   ticks = st_mfp.tim_dt_c;
+   if (ticks > last_timer_count) /* timer wrapped since last interrupt */
+   ticks = last_timer_count;
+   last_timer_count = ticks;
+   ticks = INT_TICKS - ticks;
+   ticks += clk_total;
+   local_irq_restore(flags);
 
-  return (ticks + offset) * 1000;
+   return ticks;
 }
 
 
-- 
2.18.1



[RFC PATCH v2 01/14] m68k: Call timer_interrupt() with interrupts disabled

2018-11-18 Thread Finn Thain
Some platforms execute their timer handler with the interrupt priority
level below 6. That means the handler could be interrupted by another
driver and this could lead to re-entry of the timer core.

Avoid this by use of local_irq_save/restore for timer interrupt dispatch.
This provides mutual exclusion around the timer interrupt flag access
which is needed later in this series for the clocksource coversion.

Reported-by: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/alpine.deb.2.21.1811131407120.2...@nanos.tec.linutronix.de
Signed-off-by: Finn Thain 
---
I've only checked 680x0 MMU platforms for this issue.
---
 arch/m68k/amiga/cia.c   | 10 ++
 arch/m68k/atari/ataints.c   |  4 ++--
 arch/m68k/atari/time.c  | 15 ++-
 arch/m68k/bvme6000/config.c | 18 +-
 arch/m68k/hp300/time.c  | 10 --
 arch/m68k/mac/via.c | 17 +
 arch/m68k/mvme147/config.c  | 18 ++
 arch/m68k/mvme16x/config.c  | 19 ++-
 arch/m68k/q40/q40ints.c |  7 ++-
 arch/m68k/sun3/sun3ints.c   |  3 +++
 arch/m68k/sun3x/time.c  | 16 ++--
 11 files changed, 99 insertions(+), 38 deletions(-)

diff --git a/arch/m68k/amiga/cia.c b/arch/m68k/amiga/cia.c
index 2081b8cd5591..b65665160711 100644
--- a/arch/m68k/amiga/cia.c
+++ b/arch/m68k/amiga/cia.c
@@ -88,10 +88,20 @@ static irqreturn_t cia_handler(int irq, void *dev_id)
struct ciabase *base = dev_id;
int mach_irq;
unsigned char ints;
+   unsigned long flags;
 
+   /* Interrupts get disabled while the timer irq flag is cleared and
+* the timer interrupt serviced.
+*/
+   local_irq_save(flags);
mach_irq = base->cia_irq;
ints = cia_set_irq(base, CIA_ICR_ALL);
amiga_custom.intreq = base->int_mask;
+   if (ints & 1) {
+   generic_handle_irq(mach_irq);
+   mach_irq++, ints >>= 1;
+   }
+   local_irq_restore(flags);
for (; ints; mach_irq++, ints >>= 1) {
if (ints & 1)
generic_handle_irq(mach_irq);
diff --git a/arch/m68k/atari/ataints.c b/arch/m68k/atari/ataints.c
index 3d2b63bedf05..56f02ea2c248 100644
--- a/arch/m68k/atari/ataints.c
+++ b/arch/m68k/atari/ataints.c
@@ -142,7 +142,7 @@ struct mfptimerbase {
.name   = "MFP Timer D"
 };
 
-static irqreturn_t mfptimer_handler(int irq, void *dev_id)
+static irqreturn_t mfp_timer_d_handler(int irq, void *dev_id)
 {
struct mfptimerbase *base = dev_id;
int mach_irq;
@@ -344,7 +344,7 @@ void __init atari_init_IRQ(void)
st_mfp.tim_ct_cd = (st_mfp.tim_ct_cd & 0xf0) | 0x6;
 
/* request timer D dispatch handler */
-   if (request_irq(IRQ_MFP_TIMD, mfptimer_handler, IRQF_SHARED,
+   if (request_irq(IRQ_MFP_TIMD, mfp_timer_d_handler, IRQF_SHARED,
stmfp_base.name, _base))
pr_err("Couldn't register %s interrupt\n", stmfp_base.name);
 
diff --git a/arch/m68k/atari/time.c b/arch/m68k/atari/time.c
index 9cca64286464..fafa20f75ab9 100644
--- a/arch/m68k/atari/time.c
+++ b/arch/m68k/atari/time.c
@@ -24,6 +24,18 @@
 DEFINE_SPINLOCK(rtc_lock);
 EXPORT_SYMBOL_GPL(rtc_lock);
 
+static irqreturn_t mfp_timer_c_handler(int irq, void *dev_id)
+{
+   irq_handler_t timer_routine = dev_id;
+   unsigned long flags;
+
+   local_irq_save(flags);
+   timer_routine(0, NULL);
+   local_irq_restore(flags);
+
+   return IRQ_HANDLED;
+}
+
 void __init
 atari_sched_init(irq_handler_t timer_routine)
 {
@@ -32,7 +44,8 @@ atari_sched_init(irq_handler_t timer_routine)
 /* start timer C, div = 1:100 */
 st_mfp.tim_ct_cd = (st_mfp.tim_ct_cd & 15) | 0x60;
 /* install interrupt service routine for MFP Timer C */
-if (request_irq(IRQ_MFP_TIMC, timer_routine, 0, "timer", timer_routine))
+if (request_irq(IRQ_MFP_TIMC, mfp_timer_c_handler, 0, "timer",
+timer_routine))
pr_err("Couldn't register timer interrupt\n");
 }
 
diff --git a/arch/m68k/bvme6000/config.c b/arch/m68k/bvme6000/config.c
index 143ee9fa3893..d1de3cb1f8fe 100644
--- a/arch/m68k/bvme6000/config.c
+++ b/arch/m68k/bvme6000/config.c
@@ -44,11 +44,6 @@ extern int bvme6000_hwclk (int, struct rtc_time *);
 extern void bvme6000_reset (void);
 void bvme6000_set_vectors (void);
 
-/* Save tick handler routine pointer, will point to xtime_update() in
- * kernel/timer/timekeeping.c, called via bvme6000_process_int() */
-
-static irq_handler_t tick_handler;
-
 
 int __init bvme6000_parse_bootinfo(const struct bi_record *bi)
 {
@@ -157,12 +152,18 @@ irqreturn_t bvme6000_abort_int (int irq, void *dev_id)
 
 static irqreturn_t bvme6000_timer_int (int irq, void *dev_id)
 {
+irq_handler_t timer_routine = dev_id;
+unsigned long flags;
 volatile RtcPtr_t rtc = (RtcPtr_t)BVME_RTC_BASE;
-unsigned char msr = rtc->msr & 0xc0;
+unsigned char msr;
 
+local_irq_save(flags);
+msr = rtc->msr & 

[RFC PATCH v2 11/14] m68k: mvme147: Convert to clocksource API

2018-11-18 Thread Finn Thain
Add a platform clocksource by adapting the existing arch_gettimeoffset
implementation.

Signed-off-by: Finn Thain 
Acked-by: Linus Walleij 
---
Changed since v1:
 - Moved clk_total access to within the irq lock.
 - Use type u32 for tick counter.
---
 arch/m68k/include/asm/mvme147hw.h |  1 -
 arch/m68k/mvme147/config.c| 38 ++-
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/m68k/include/asm/mvme147hw.h 
b/arch/m68k/include/asm/mvme147hw.h
index 9c7ff67c5ffd..7c3dd513128e 100644
--- a/arch/m68k/include/asm/mvme147hw.h
+++ b/arch/m68k/include/asm/mvme147hw.h
@@ -66,7 +66,6 @@ struct pcc_regs {
 #define PCC_INT_ENAB   0x08
 
 #define PCC_TIMER_INT_CLR  0x80
-#define PCC_TIMER_PRELOAD  63936l
 
 #define PCC_LEVEL_ABORT0x07
 #define PCC_LEVEL_SERIAL   0x04
diff --git a/arch/m68k/mvme147/config.c b/arch/m68k/mvme147/config.c
index 4ef4faa5ed8b..82b53b5ca82b 100644
--- a/arch/m68k/mvme147/config.c
+++ b/arch/m68k/mvme147/config.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -92,6 +93,21 @@ void __init config_mvme147(void)
vme_brdtype = VME_TYPE_MVME147;
 }
 
+static u64 mvme147_read_clk(struct clocksource *cs);
+
+static struct clocksource mvme147_clk = {
+   .name   = "pcc",
+   .rating = 250,
+   .read   = mvme147_read_clk,
+   .mask   = CLOCKSOURCE_MASK(32),
+   .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+static u32 clk_total;
+
+#define PCC_TIMER_CLOCK_FREQ 16
+#define PCC_TIMER_CYCLES (PCC_TIMER_CLOCK_FREQ / HZ)
+#define PCC_TIMER_PRELOAD(0x1 - PCC_TIMER_CYCLES)
 
 /* Using pcc tick timer 1 */
 
@@ -103,6 +119,7 @@ static irqreturn_t mvme147_timer_int (int irq, void *dev_id)
local_irq_save(flags);
m147_pcc->t1_int_cntrl = PCC_TIMER_INT_CLR;
m147_pcc->t1_int_cntrl = PCC_INT_ENAB|PCC_LEVEL_TIMER1;
+   clk_total += PCC_TIMER_CYCLES;
timer_routine(0, NULL);
local_irq_restore(flags);
 
@@ -112,32 +129,41 @@ static irqreturn_t mvme147_timer_int (int irq, void 
*dev_id)
 
 void mvme147_sched_init (irq_handler_t timer_routine)
 {
-   if (request_irq(PCC_IRQ_TIMER1, mvme147_timer_int, 0, "timer 1",
-   timer_routine))
+   if (request_irq(PCC_IRQ_TIMER1, mvme147_timer_int, IRQF_TIMER,
+   "timer 1", timer_routine))
pr_err("Couldn't register timer interrupt\n");
 
/* Init the clock with a value */
-   /* our clock goes off every 6.25us */
+   /* The clock counter increments until 0x then reloads */
m147_pcc->t1_preload = PCC_TIMER_PRELOAD;
m147_pcc->t1_cntrl = 0x0;   /* clear timer */
m147_pcc->t1_cntrl = 0x3;   /* start timer */
m147_pcc->t1_int_cntrl = PCC_TIMER_INT_CLR;  /* clear pending ints */
m147_pcc->t1_int_cntrl = PCC_INT_ENAB|PCC_LEVEL_TIMER1;
+
+   clocksource_register_hz(_clk, PCC_TIMER_CLOCK_FREQ);
 }
 
-/* This is always executed with interrupts disabled.  */
 /* XXX There are race hazards in this code XXX */
-u32 mvme147_gettimeoffset(void)
+static u64 mvme147_read_clk(struct clocksource *cs)
 {
+   unsigned long flags;
volatile unsigned short *cp = (volatile unsigned short *)0xfffe1012;
unsigned short n;
+   u32 ticks;
+
+   local_irq_save(flags);
 
n = *cp;
while (n != *cp)
n = *cp;
 
n -= PCC_TIMER_PRELOAD;
-   return ((unsigned long)n * 25 / 4) * 1000;
+   ticks = clk_total + n;
+
+   local_irq_restore(flags);
+
+   return ticks;
 }
 
 static int bcd2int (unsigned char b)
-- 
2.18.1



[RFC PATCH v2 09/14] m68k: hp300: Remove hp300_gettimeoffset()

2018-11-18 Thread Finn Thain
hp300_gettimeoffset() never checks the timer interrupt flag and will
fail to notice when the timer counter gets reloaded. That means the
clock could jump backwards.

Remove this code and leave this platform on the 'jiffies' clocksource.
Note that this amounts to a regression in clock precision. However,
adopting the 'jiffies' clocksource does resolve the monotonicity issue.

Signed-off-by: Finn Thain 
---
hp300_gettimeoffset() cannot be used in a clocksource conversion
unless it can be made monotonic. I can't fix this without knowing the
details of the timer implementation, such as the relationship between
the timer count and the interrupt flag.
---
 arch/m68k/hp300/time.c | 19 ---
 1 file changed, 19 deletions(-)

diff --git a/arch/m68k/hp300/time.c b/arch/m68k/hp300/time.c
index d30b03ea93a2..37cccdb46def 100644
--- a/arch/m68k/hp300/time.c
+++ b/arch/m68k/hp300/time.c
@@ -31,9 +31,6 @@
 #defineCLKMSB2 0x9
 #defineCLKMSB3 0xD
 
-/* This is for machines which generate the exact clock. */
-#define USECS_PER_JIFFY (100/HZ)
-
 #define INTVAL ((1 / 4) - 1)
 
 static irqreturn_t hp300_tick(int irq, void *dev_id)
@@ -53,22 +50,6 @@ static irqreturn_t hp300_tick(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
-u32 hp300_gettimeoffset(void)
-{
-  /* Read current timer 1 value */
-  unsigned char lsb, msb1, msb2;
-  unsigned short ticks;
-
-  msb1 = in_8(CLOCKBASE + 5);
-  lsb = in_8(CLOCKBASE + 7);
-  msb2 = in_8(CLOCKBASE + 5);
-  if (msb1 != msb2)
-/* A carry happened while we were reading.  Read it again */
-lsb = in_8(CLOCKBASE + 7);
-  ticks = INTVAL - ((msb2 << 8) | lsb);
-  return ((USECS_PER_JIFFY * ticks) / INTVAL) * 1000;
-}
-
 void __init hp300_sched_init(irq_handler_t vector)
 {
   out_8(CLOCKBASE + CLKCR2, 0x1);  /* select CR1 */
-- 
2.18.1



[RFC PATCH v2 10/14] m68k: mac: Convert to clocksource API

2018-11-18 Thread Finn Thain
Add a platform clocksource by adapting the existing arch_gettimeoffset
implementation.

Signed-off-by: Finn Thain 
Acked-by: Linus Walleij 
Tested-by: Stan Johnson 
---
Changed since v1:
 - Moved clk_total access to within the irq lock.
 - Use type u32 for tick counter.
---
 arch/m68k/mac/via.c | 46 -
 1 file changed, 37 insertions(+), 9 deletions(-)

diff --git a/arch/m68k/mac/via.c b/arch/m68k/mac/via.c
index d1dbf9017300..de59a5cb4250 100644
--- a/arch/m68k/mac/via.c
+++ b/arch/m68k/mac/via.c
@@ -23,6 +23,7 @@
  *
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -573,16 +574,40 @@ EXPORT_SYMBOL(via2_scsi_drq_pending);
 /* timer and clock source */
 
 #define VIA_CLOCK_FREQ 783360/* VIA "phase 2" clock in Hz 
*/
-#define VIA_TIMER_INTERVAL (100 / HZ)/* microseconds per jiffy */
 #define VIA_TIMER_CYCLES   (VIA_CLOCK_FREQ / HZ) /* clock cycles per jiffy */
 
 #define VIA_TC (VIA_TIMER_CYCLES - 2) /* including 0 and -1 */
 #define VIA_TC_LOW (VIA_TC & 0xFF)
 #define VIA_TC_HIGH(VIA_TC >> 8)
 
+static u64 mac_read_clk(struct clocksource *cs);
+
+static struct clocksource mac_clk = {
+   .name   = "via1",
+   .rating = 250,
+   .read   = mac_read_clk,
+   .mask   = CLOCKSOURCE_MASK(32),
+   .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+static u32 clk_total;
+static u32 clk_offset;
+
+static irqreturn_t via_timer_handler(int irq, void *dev_id)
+{
+   irq_handler_t timer_routine = dev_id;
+
+   clk_total += VIA_TIMER_CYCLES;
+   clk_offset = 0;
+   timer_routine(0, NULL);
+
+   return IRQ_HANDLED;
+}
+
 void __init via_init_clock(irq_handler_t timer_routine)
 {
-   if (request_irq(IRQ_MAC_TIMER_1, timer_routine, 0, "timer", NULL)) {
+   if (request_irq(IRQ_MAC_TIMER_1, via_timer_handler, IRQF_TIMER, "timer",
+   timer_routine)) {
pr_err("Couldn't register %s interrupt\n", "timer");
return;
}
@@ -592,13 +617,16 @@ void __init via_init_clock(irq_handler_t timer_routine)
via1[vT1CL] = VIA_TC_LOW;
via1[vT1CH] = VIA_TC_HIGH;
via1[vACR] |= 0x40;
+
+   clocksource_register_hz(_clk, VIA_CLOCK_FREQ);
 }
 
-u32 mac_gettimeoffset(void)
+static u64 mac_read_clk(struct clocksource *cs)
 {
unsigned long flags;
u8 count_high;
-   u16 count, offset = 0;
+   u16 count;
+   u32 ticks;
 
/*
 * Timer counter wrap-around is detected with the timer interrupt flag
@@ -617,11 +645,11 @@ u32 mac_gettimeoffset(void)
/* spin */;
}
if (via1[vIFR] & VIA_TIMER_1_INT)
-   offset = VIA_TIMER_CYCLES;
-   local_irq_restore(flags);
-
+   clk_offset = VIA_TIMER_CYCLES;
count = count_high << 8;
-   count = VIA_TIMER_CYCLES - count + offset;
+   ticks = VIA_TIMER_CYCLES - count;
+   ticks += clk_offset + clk_total;
+   local_irq_restore(flags);
 
-   return ((count * VIA_TIMER_INTERVAL) / VIA_TIMER_CYCLES) * 1000;
+   return ticks;
 }
-- 
2.18.1



[RFC PATCH v2 00/14] m68k: Drop arch_gettimeoffset and adopt clocksource API

2018-11-18 Thread Finn Thain
This series removes "select ARCH_USES_GETTIMEOFFSET" from arch/m68k
and converts users of arch_gettimeoffset to the clocksource API.
Various bugs are fixed along the way.

Those platforms which do not actually implement arch_gettimeoffset
(apollo, q40, sun3, sun3x) use the "jiffies" clocksource by default.

The atari and hp300 platforms have an arch_gettimeoffset() implementation
which can't readily be converted to a clocksource. Getting a workable
clocksource on these platforms will require the insight of a platform
expert.

The difficulty with these patches is the use of the timer interrupt to
update the counter for the clock source. The timer interrupt handler races
with clocksource read method, and both of those functions race with the
timer hardware.

Hence, more testing and code review would be appreciated.

Changed since v1:

 - Dropped patches 1/13 and 2/13. These were a failed attempt to fix
5cfc8ee0bb51 and 4ad4c76b7afb. By adopting the clocksource API we can fix
this issue in mainline. By backporting this series we can fix it for -stable
(for m68k at least).

 - Dropped patch "m68k: hp300: Convert to clocksource API" and added
patch "m68k: hp300: Remove hp300_gettimeoffset".

 - Added a new patch to address an old m68k bug pointed out by Thomas
Gleixner. The bug can arise when a timer interrupt handler gets interrupted.

 - Added new patches to address old mvme16x and mvme147 bugs pointed out
by Thomas Gleixner. The bug could cause the clock to jump backwards.

 - Various other changes summarized in the relevant patches.


Finn Thain (14):
  m68k: Call timer_interrupt() with interrupts disabled
  m68k: mac: Fix VIA timer counter accesses
  m68k: mac: Clean up unused timer definitions
  m68k: apollo, q40, sun3, sun3x: Remove arch_gettimeoffset
implementations
  m68k: Drop ARCH_USES_GETTIMEOFFSET
  m68k: amiga: Convert to clocksource API
  m68k: atari: Convert to clocksource API
  m68k: bvme6000: Convert to clocksource API
  m68k: hp300: Remove hp300_gettimeoffset()
  m68k: mac: Convert to clocksource API
  m68k: mvme147: Convert to clocksource API
  m68k: mvme147: Handle timer counter overflow
  m68k: mvme16x: Convert to clocksource API
  m68k: mvme16x: Handle timer counter overflow

 arch/m68k/Kconfig |   1 -
 arch/m68k/amiga/cia.c |  10 ++
 arch/m68k/amiga/config.c  |  46 ++---
 arch/m68k/apollo/config.c |   7 --
 arch/m68k/atari/ataints.c |   4 +-
 arch/m68k/atari/config.c  |   2 -
 arch/m68k/atari/time.c|  65 +
 arch/m68k/bvme6000/config.c   |  70 +-
 arch/m68k/hp300/config.c  |   1 -
 arch/m68k/hp300/time.c|  29 ++
 arch/m68k/hp300/time.h|   1 -
 arch/m68k/include/asm/macints.h   |   3 -
 arch/m68k/include/asm/mvme147hw.h |   2 +-
 arch/m68k/mac/config.c|   3 -
 arch/m68k/mac/via.c   | 150 --
 arch/m68k/mvme147/config.c|  73 ++-
 arch/m68k/mvme16x/config.c|  97 +--
 arch/m68k/q40/config.c|   9 --
 arch/m68k/q40/q40ints.c   |   7 +-
 arch/m68k/sun3/config.c   |   2 -
 arch/m68k/sun3/intersil.c |   7 --
 arch/m68k/sun3/sun3ints.c |   3 +
 arch/m68k/sun3x/config.c  |   1 -
 arch/m68k/sun3x/time.c|  21 ++---
 arch/m68k/sun3x/time.h|   1 -
 25 files changed, 387 insertions(+), 228 deletions(-)

-- 
2.18.1



[RFC PATCH v2 05/14] m68k: Drop ARCH_USES_GETTIMEOFFSET

2018-11-18 Thread Finn Thain
The functions that implement arch_gettimeoffset are re-used by
new clocksource drivers in subsequent patches.

Signed-off-by: Finn Thain 
Acked-by: Linus Walleij 
---
 arch/m68k/Kconfig   | 1 -
 arch/m68k/amiga/config.c| 3 ---
 arch/m68k/atari/config.c| 2 --
 arch/m68k/bvme6000/config.c | 2 --
 arch/m68k/hp300/config.c| 1 -
 arch/m68k/hp300/time.h  | 1 -
 arch/m68k/mac/config.c  | 3 ---
 arch/m68k/mvme147/config.c  | 2 --
 arch/m68k/mvme16x/config.c  | 2 --
 9 files changed, 17 deletions(-)

diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 070553791e97..cc62660a5760 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -19,7 +19,6 @@ config M68K
select GENERIC_STRNCPY_FROM_USER if MMU
select GENERIC_STRNLEN_USER if MMU
select ARCH_WANT_IPC_PARSE_VERSION
-   select ARCH_USES_GETTIMEOFFSET if MMU && !COLDFIRE
select HAVE_FUTEX_CMPXCHG if MMU && FUTEX
select HAVE_MOD_ARCH_SPECIFIC
select MODULES_USE_ELF_REL
diff --git a/arch/m68k/amiga/config.c b/arch/m68k/amiga/config.c
index 65f63a457130..d4976c1aa0cc 100644
--- a/arch/m68k/amiga/config.c
+++ b/arch/m68k/amiga/config.c
@@ -95,8 +95,6 @@ static char amiga_model_name[13] = "Amiga ";
 static void amiga_sched_init(irq_handler_t handler);
 static void amiga_get_model(char *model);
 static void amiga_get_hardware_list(struct seq_file *m);
-/* amiga specific timer functions */
-static u32 amiga_gettimeoffset(void);
 extern void amiga_mksound(unsigned int count, unsigned int ticks);
 static void amiga_reset(void);
 extern void amiga_init_sound(void);
@@ -386,7 +384,6 @@ void __init config_amiga(void)
mach_init_IRQ= amiga_init_IRQ;
mach_get_model   = amiga_get_model;
mach_get_hardware_list = amiga_get_hardware_list;
-   arch_gettimeoffset   = amiga_gettimeoffset;
 
/*
 * default MAX_DMA=0x on all machines. If we don't do so, the 
SCSI
diff --git a/arch/m68k/atari/config.c b/arch/m68k/atari/config.c
index bd96702a1ad0..89e65be2655f 100644
--- a/arch/m68k/atari/config.c
+++ b/arch/m68k/atari/config.c
@@ -78,7 +78,6 @@ static void atari_heartbeat(int on);
 
 /* atari specific timer functions (in time.c) */
 extern void atari_sched_init(irq_handler_t);
-extern u32 atari_gettimeoffset(void);
 extern int atari_mste_hwclk (int, struct rtc_time *);
 extern int atari_tt_hwclk (int, struct rtc_time *);
 
@@ -205,7 +204,6 @@ void __init config_atari(void)
mach_init_IRQ= atari_init_IRQ;
mach_get_model   = atari_get_model;
mach_get_hardware_list = atari_get_hardware_list;
-   arch_gettimeoffset   = atari_gettimeoffset;
mach_reset   = atari_reset;
mach_max_dma_address = 0xff;
 #if IS_ENABLED(CONFIG_INPUT_M68K_BEEP)
diff --git a/arch/m68k/bvme6000/config.c b/arch/m68k/bvme6000/config.c
index d1de3cb1f8fe..c27c104ac7e7 100644
--- a/arch/m68k/bvme6000/config.c
+++ b/arch/m68k/bvme6000/config.c
@@ -39,7 +39,6 @@
 
 static void bvme6000_get_model(char *model);
 extern void bvme6000_sched_init(irq_handler_t handler);
-extern u32 bvme6000_gettimeoffset(void);
 extern int bvme6000_hwclk (int, struct rtc_time *);
 extern void bvme6000_reset (void);
 void bvme6000_set_vectors (void);
@@ -105,7 +104,6 @@ void __init config_bvme6000(void)
 mach_max_dma_address = 0x;
 mach_sched_init  = bvme6000_sched_init;
 mach_init_IRQ= bvme6000_init_IRQ;
-arch_gettimeoffset   = bvme6000_gettimeoffset;
 mach_hwclk   = bvme6000_hwclk;
 mach_reset  = bvme6000_reset;
 mach_get_model   = bvme6000_get_model;
diff --git a/arch/m68k/hp300/config.c b/arch/m68k/hp300/config.c
index a19bcd23f80b..a161d44fd20b 100644
--- a/arch/m68k/hp300/config.c
+++ b/arch/m68k/hp300/config.c
@@ -254,7 +254,6 @@ void __init config_hp300(void)
mach_sched_init  = hp300_sched_init;
mach_init_IRQ= hp300_init_IRQ;
mach_get_model   = hp300_get_model;
-   arch_gettimeoffset   = hp300_gettimeoffset;
mach_hwclk   = hp300_hwclk;
mach_get_ss  = hp300_get_ss;
mach_reset   = hp300_reset;
diff --git a/arch/m68k/hp300/time.h b/arch/m68k/hp300/time.h
index f5583ec4033d..1d77b55cc72a 100644
--- a/arch/m68k/hp300/time.h
+++ b/arch/m68k/hp300/time.h
@@ -1,2 +1 @@
 extern void hp300_sched_init(irq_handler_t vector);
-extern u32 hp300_gettimeoffset(void);
diff --git a/arch/m68k/mac/config.c b/arch/m68k/mac/config.c
index cd9317d53276..11be08f4f750 100644
--- a/arch/m68k/mac/config.c
+++ b/arch/m68k/mac/config.c
@@ -54,8 +54,6 @@ struct mac_booter_data mac_bi_data;
 /* The phys. video addr. - might be bogus on some machines */
 static unsigned long mac_orig_videoaddr;
 
-/* Mac specific timer functions */
-extern u32 mac_gettimeoffset(void);
 extern int mac_hwclk(int, struct rtc_time *);
 extern void iop_preinit(void);
 extern void iop_init(void);
@@ -155,7 +153,6 @@ void __init 

Re: [PATCH] proc: allow killing processes via file descriptors

2018-11-18 Thread Daniel Colascione
On Sun, Nov 18, 2018 at 4:53 PM, Daniel Colascione  wrote:
>> Sure, I'd propose that ptrace_may_access() is what we should use for
>> operation permission checks.
>
> The tricky part is that ptrace_may_access takes a struct task. We want
> logic that's *like* ptrace_may_access, but that works posthumously.
> It's especially tricky because there's an LSM hook that lets
> __ptrace_may_access do arbitrary things. And we can't just run that
> hook upon process death, since *after* a process dies, a process
> holding an exithand FD (or whatever we call it) may pass that FD to
> another process, and *that* process can read(2) from it.
>
> Another option is doing the exithand access check at open time. I
> think that's probably fine, and it would make things a lot simpler.
> But if we use this option, we should understand what we're doing, and
> get some security-conscious people to think through the implications.

A ptrace check is also probably too strict. Yama's ptrace_scope
feature will block ptrace between unrelated processes within a single
user context, but applying this restriction to exit code monitoring
seems too severe to me.


Re: [PATCH] proc: allow killing processes via file descriptors

2018-11-18 Thread Daniel Colascione
On Sun, Nov 18, 2018 at 4:08 PM, Aleksa Sarai  wrote:
> On 2018-11-18, Daniel Colascione  wrote:
>> > The gist is to have file descriptors for processes which is obviously not 
>> > a new
>> > idea. This has been done before in other OSes and it has been tried before 
>> > in
>> > Linux [2], [3] (Thanks to Kees for pointing out these patches.). So I want 
>> > to
>> > make it very clear that I'm not laying claim to this being my or even a 
>> > novel
>> > idea in any way. However, I want to diverge from previous approaches with 
>> > my
>> > suggestion. (Though I can't be sure that there's not something similar in 
>> > other
>> > OSes already.)
>>
>> Windows works basically as you describe. You can create a process is
>> suspended state, configure it however you want, then let it run.
>> CreateProcess (and even moreso, NtCreateProcess) also provide a rich
>> (and *extensible*) interface for pre-creation process configuration.
>>
>> >> One of the main motivations for having procfds is to have a race-free way 
>> >> of
>> > configuring, starting, polling, and killing a process. Basically, a process
>> > lifecycle api if you want to think about it that way. The api should also 
>> > be
>> > easily extendable in the future to avoid running into the limitations we
>> > currently see with the clone*() syscall(s) again.
>> >
>> > One of the crucial points of the api is to *separate the configuration
>> > of a process through a procfd from actually creating the process*.
>> > This is a crucial property expressed in the open*() system calls. First, 
>> > get a
>> > stable handle on an object then allow for ways to configure it. As such the
>> > procfd api shares the same insight with Al's and David's new mount api.
>> > (Fwiw, Andy also pointed out similarities with posix_spawn().)
>> > What I envisioned was to have the following syscalls (multiple name 
>> > suggestions):
>> >
>> > 1. int process_open / proc_open / procopen
>> > 2. int process_config / proc_config / procconfig or ioctl()-based
>> > 3. int process_info / proc_info / procinfo or ioctl()-based
>> > 4. int process_manage / proc_manage / procmanage or ioctl()-based
>>
>> The API you've proposed seems fine to me, although I'd either 1)
>> consolidate operations further into one system call, or 2) separate
>> the different management operations into more and different system
>> calls that can be audited independently. The grouping you've proposed
>> seems to have the worst aspects of API splitting and API multiplexing.
>> But I have no objection to it in spirit.
>
> I think combining it all into one API is going to be a soft re-invention
> of ioctls, but specifically for procfds. This would be an improvement
> over just ioctls (since the current ioctl namespacing is based on
> well-behaved drivers and hoping we never have more than 256 ioctl
> drivers), but I'm not sure it would help make the API nicer than having
> separate syscalls (we'd have to do something similar to bpf(2) which I'm
> not a huge fan of).

Right. Multiplexers are nothing new, and I'm not a huge fan of them.
>From an API design perspective, having a bunch of different system
calls is probably best.

 (I do wonder what happens to system call cache behavior once the
top-level system call table becomes huge though.)

>> That said, while I do want to fix process configuration and startup
>> generally, I want to fix specific holes in the existing API surface
>> first. The two patches I've already sent do that, and this work
>> shouldn't wait on an ideal larger process-API overhaul that may or may
>> not arrive. Based on previous history, I suspect that an API of the
>> scope you're proposing would take years to overcome all LKML
>> objections and land. I don't want to wait that long when we can make
>> smaller fixes that would not conflict with the general architecture.
>
> I believe this is precisely what Christian is trying to do with this
> patch (and you say as much later in your mail).
>
> I think that adding all of {sighand,sighand_exitcode,kill,...} would not
> help the path of landing a much larger API change. We should instead
> think about the API we want at the end of the day, and then land smaller
> changes which are long-term compatible (and won't just become deprecated
> APIs -- there's far too many of them, let's not add more needlessly).

I don't think we need to reach consensus on some long-term design to
address specific problems that we know today. The changes we're
talking about here *are* long-term compatible with a bigger process
API overhaul. They may or may not be *part* of that solution, but I
don't see them making that solution harder either. And the proposals
so far all seem to go in the right direction.

>> Next, I want to merge my exithand proposal, or something like it. It's
>> likewise a simple change that, in a minimal way, addresses a
>> longstanding API deficiency. I'm very strongly against the
>> POLLERR-on-directory variant of the idea.
>
> I agree 

[PATCH v2] mm: fix swap offset when replacing shmem page

2018-11-18 Thread Yu Zhao
We used to have a single swap address space with swp_entry_t.val
as its radix tree index. This is not the case anymore. Now Each
swp_type() has its own address space and should use swp_offset()
as radix tree index.

Signed-off-by: Yu Zhao 
---
 mm/shmem.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index d44991ea5ed4..685faa3e0191 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1509,11 +1509,13 @@ static int shmem_replace_page(struct page **pagep, 
gfp_t gfp,
 {
struct page *oldpage, *newpage;
struct address_space *swap_mapping;
-   pgoff_t swap_index;
+   swp_entry_t entry;
int error;
 
+   VM_BUG_ON(!PageSwapCache(*pagep));
+
oldpage = *pagep;
-   swap_index = page_private(oldpage);
+   entry.val = page_private(oldpage);
swap_mapping = page_mapping(oldpage);
 
/*
@@ -1532,7 +1534,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t 
gfp,
__SetPageLocked(newpage);
__SetPageSwapBacked(newpage);
SetPageUptodate(newpage);
-   set_page_private(newpage, swap_index);
+   set_page_private(newpage, entry.val);
SetPageSwapCache(newpage);
 
/*
@@ -1540,7 +1542,8 @@ static int shmem_replace_page(struct page **pagep, gfp_t 
gfp,
 * a nice clean interface for us to replace oldpage by newpage there.
 */
xa_lock_irq(_mapping->i_pages);
-   error = shmem_replace_entry(swap_mapping, swap_index, oldpage, newpage);
+   error = shmem_replace_entry(swap_mapping, swp_offset(entry),
+   oldpage, newpage);
if (!error) {
__inc_node_page_state(newpage, NR_FILE_PAGES);
__dec_node_page_state(oldpage, NR_FILE_PAGES);
-- 
2.19.1.1215.g8438c0b245-goog



Re: [PATCH v2 2/2] PCI: pciehp: Add HXT quirk for Command Completed errata

2018-11-18 Thread Yang, Shunyong
Hi, Bjorn,
  Would you please help to review and pull these two quirk patches to
your branch if there is no problem?

Thanks.
Shunyong.

On 2018/11/7 15:24, Yang, Shunyong wrote:
> The HXT SD4800 PCI controller does not set the Command Completed
> bit unless writes to the Slot Command register change "Control"
> bits.
> 
> This patch adds SD4800 to the quirk.
> 
> Cc: Joey Zheng 
> Signed-off-by: Shunyong Yang 
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c 
> b/drivers/pci/hotplug/pciehp_hpc.c
> index 7dd443aea5a5..91db67963aea 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -920,3 +920,5 @@ static void quirk_cmd_compl(struct pci_dev *pdev)
> PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl);
>  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_QCOM, 0x0401,
> PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_HXT, 0x0401,
> +   PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl);
> 



Re: [PATCH] mm/filemap.c: minor optimization in write_iter file operation

2018-11-18 Thread Matthew Wilcox
On Sun, Nov 18, 2018 at 11:02:19PM +0800, Yafang Shao wrote:
> On Sun, Nov 18, 2018 at 8:13 PM Matthew Wilcox  wrote:
> > Did you check the before/after code generation with this patch applied?
> 
> Yes, I did.
> My oompiler is gcc-4.8.5, a litte old, and with CONFIG_CC_OPTIMIZE_FOR_SIZE 
> on.
> > with gcc 8.2.0, I see no difference, indicating that the compiler already
> > makes this optimisation.
> 
> Could pls. try set CONFIG_CC_OPTIMIZE_FOR_SIZE on and then compare them again 
> ?

Actually it was already on:

# CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y

I happened to build it in my build-tiny output tree.


  1   2   3   >