date:20171019

Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume

2017-10-19 Thread Ulf Hansson

On 20 October 2017 at 03:19, Rafael J. Wysocki  wrote:
> On Thursday, October 19, 2017 2:21:07 PM CEST Ulf Hansson wrote:
>> On 19 October 2017 at 00:12, Rafael J. Wysocki  wrote:
>> > On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote:
>> >> [...]
>> >>
>> >> >>
>> >> >> The reason why pm_runtime_force_* needs to respects the hierarchy of
>> >> >> the RPM callbacks, is because otherwise it can't safely update the
>> >> >> runtime PM status of the device.
>> >> >
>> >> > I'm not sure I follow this requirement.  Why is that so?
>> >>
>> >> If the PM domain controls some resources for the device in its RPM
>> >> callbacks and the driver controls some other resources in its RPM
>> >> callbacks - then these resources needs to be managed together.
>> >
>> > Right, but that doesn't automatically make it necessary to use runtime PM
>> > callbacks in the middle layer.  Its system-wide PM callbacks may be
>> > suitable for that just fine.
>> >
>> > That is, at least in some cases, you can combine ->runtime_suspend from a
>> > driver and ->suspend_late from a middle layer with no problems, for 
>> > example.
>> >
>> > That's why some middle layers allow drivers to point ->suspend_late and
>> > ->runtime_suspend to the same routine if they want to reuse that code.
>> >
>> >> This follows the behavior of when a regular call to
>> >> pm_runtime_get|put(), triggers the RPM callbacks to be invoked.
>> >
>> > But (a) it doesn't have to follow it and (b) in some cases it should not
>> > follow it.
>>
>> Of course you don't explicitly *have to* respect the hierarchy of the
>> RPM callbacks in pm_runtime_force_*.
>>
>> However, changing that would require some additional information
>> exchange between the driver and the middle-layer/PM domain, as to
>> instruct the middle-layer/PM domain of what to do during system-wide
>> PM. Especially in cases when the driver deals with wakeup, as in those
>> cases the instructions may change dynamically.
>
> Well, if wakeup matters, drivers can't simply point their PM callbacks
> to pm_runtime_force_* anyway.
>
> If the driver itself deals with wakeups, it clearly needs different callback
> routines for system-wide PM and for runtime PM, so it can't reuse its runtime
> PM callbacks at all then.

It can still re-use its runtime PM callbacks, simply by calling
pm_runtime_force_ from its system sleep callbacks.

Drivers already do that today, not only to deal with wakeups, but
generally when they need to deal with some additional operations.

>
> If the middle layer deals with wakeups, different callbacks are needed at
> that level and so pm_runtime_force_* are unsuitable too.
>
> Really, invoking runtime PM callbacks from the middle layer in
> pm_runtime_force_* is a not a idea at all and there's no general requirement
> for it whatever.
>
>> [...]
>>
>> >> > In general, not if the wakeup settings are adjusted by the middle layer.
>> >>
>> >> Correct!
>> >>
>> >> To use pm_runtime_force* for these cases, one would need some
>> >> additional information exchange between the driver and the
>> >> middle-layer.
>> >
>> > Which pretty much defeats the purpose of the wrappers, doesn't it?
>>
>> Well, no matter if the wrappers are used or not, we need some kind of
>> information exchange between the driver and the middle-layers/PM
>> domains.
>
> Right.
>
> But if that information is exchanged, then why use wrappers *in* *addition*
> to that?

If we can find a different method that both avoids both open coding
and offers the optimize system-wide PM path at resume, I am open to
that.

>
>> Anyway, me personally think it's too early to conclude that using the
>> wrappers may not be useful going forward. At this point, they clearly
>> helps trivial cases to remain being trivial.
>
> I'm not sure about that really.  So far I've seen more complexity resulting
> from using them than being avoided by using them, but I guess the beauty is
> in the eye of the beholder. :-)

Hehe, yeah you may be right. :-)

>
>> >
>> >> >
>> >> >> Regarding hibernation, honestly that's not really my area of
>> >> >> expertise. Although, I assume the middle-layer and driver can treat
>> >> >> that as a separate case, so if it's not suitable to use
>> >> >> pm_runtime_force* for that case, then they shouldn't do it.
>> >> >
>> >> > Well, agreed.
>> >> >
>> >> > In some simple cases, though, driver callbacks can be reused for 
>> >> > hibernation
>> >> > too, so it would be good to have a common way to do that too, IMO.
>> >>
>> >> Okay, that makes sense!
>> >>
>> >> >
>> >> >> >
>> >> >> > Also, quite so often other middle layers interact with PCI directly 
>> >> >> > or
>> >> >> > indirectly (eg. a platform device may be a child or a consumer of a 
>> >> >> > PCI
>> >> >> > device) and some optimizations need to take that into account (eg. 
>> >> >> > parents
>> >> >> > generally need to be accessible when their childres are resumed and 
>> >> >> > so on).
>> >> >>
>>

Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume

2017-10-19 Thread Ulf Hansson

On 20 October 2017 at 03:19, Rafael J. Wysocki  wrote:
> On Thursday, October 19, 2017 2:21:07 PM CEST Ulf Hansson wrote:
>> On 19 October 2017 at 00:12, Rafael J. Wysocki  wrote:
>> > On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote:
>> >> [...]
>> >>
>> >> >>
>> >> >> The reason why pm_runtime_force_* needs to respects the hierarchy of
>> >> >> the RPM callbacks, is because otherwise it can't safely update the
>> >> >> runtime PM status of the device.
>> >> >
>> >> > I'm not sure I follow this requirement.  Why is that so?
>> >>
>> >> If the PM domain controls some resources for the device in its RPM
>> >> callbacks and the driver controls some other resources in its RPM
>> >> callbacks - then these resources needs to be managed together.
>> >
>> > Right, but that doesn't automatically make it necessary to use runtime PM
>> > callbacks in the middle layer.  Its system-wide PM callbacks may be
>> > suitable for that just fine.
>> >
>> > That is, at least in some cases, you can combine ->runtime_suspend from a
>> > driver and ->suspend_late from a middle layer with no problems, for 
>> > example.
>> >
>> > That's why some middle layers allow drivers to point ->suspend_late and
>> > ->runtime_suspend to the same routine if they want to reuse that code.
>> >
>> >> This follows the behavior of when a regular call to
>> >> pm_runtime_get|put(), triggers the RPM callbacks to be invoked.
>> >
>> > But (a) it doesn't have to follow it and (b) in some cases it should not
>> > follow it.
>>
>> Of course you don't explicitly *have to* respect the hierarchy of the
>> RPM callbacks in pm_runtime_force_*.
>>
>> However, changing that would require some additional information
>> exchange between the driver and the middle-layer/PM domain, as to
>> instruct the middle-layer/PM domain of what to do during system-wide
>> PM. Especially in cases when the driver deals with wakeup, as in those
>> cases the instructions may change dynamically.
>
> Well, if wakeup matters, drivers can't simply point their PM callbacks
> to pm_runtime_force_* anyway.
>
> If the driver itself deals with wakeups, it clearly needs different callback
> routines for system-wide PM and for runtime PM, so it can't reuse its runtime
> PM callbacks at all then.

It can still re-use its runtime PM callbacks, simply by calling
pm_runtime_force_ from its system sleep callbacks.

Drivers already do that today, not only to deal with wakeups, but
generally when they need to deal with some additional operations.

>
> If the middle layer deals with wakeups, different callbacks are needed at
> that level and so pm_runtime_force_* are unsuitable too.
>
> Really, invoking runtime PM callbacks from the middle layer in
> pm_runtime_force_* is a not a idea at all and there's no general requirement
> for it whatever.
>
>> [...]
>>
>> >> > In general, not if the wakeup settings are adjusted by the middle layer.
>> >>
>> >> Correct!
>> >>
>> >> To use pm_runtime_force* for these cases, one would need some
>> >> additional information exchange between the driver and the
>> >> middle-layer.
>> >
>> > Which pretty much defeats the purpose of the wrappers, doesn't it?
>>
>> Well, no matter if the wrappers are used or not, we need some kind of
>> information exchange between the driver and the middle-layers/PM
>> domains.
>
> Right.
>
> But if that information is exchanged, then why use wrappers *in* *addition*
> to that?

If we can find a different method that both avoids both open coding
and offers the optimize system-wide PM path at resume, I am open to
that.

>
>> Anyway, me personally think it's too early to conclude that using the
>> wrappers may not be useful going forward. At this point, they clearly
>> helps trivial cases to remain being trivial.
>
> I'm not sure about that really.  So far I've seen more complexity resulting
> from using them than being avoided by using them, but I guess the beauty is
> in the eye of the beholder. :-)

Hehe, yeah you may be right. :-)

>
>> >
>> >> >
>> >> >> Regarding hibernation, honestly that's not really my area of
>> >> >> expertise. Although, I assume the middle-layer and driver can treat
>> >> >> that as a separate case, so if it's not suitable to use
>> >> >> pm_runtime_force* for that case, then they shouldn't do it.
>> >> >
>> >> > Well, agreed.
>> >> >
>> >> > In some simple cases, though, driver callbacks can be reused for 
>> >> > hibernation
>> >> > too, so it would be good to have a common way to do that too, IMO.
>> >>
>> >> Okay, that makes sense!
>> >>
>> >> >
>> >> >> >
>> >> >> > Also, quite so often other middle layers interact with PCI directly 
>> >> >> > or
>> >> >> > indirectly (eg. a platform device may be a child or a consumer of a 
>> >> >> > PCI
>> >> >> > device) and some optimizations need to take that into account (eg. 
>> >> >> > parents
>> >> >> > generally need to be accessible when their childres are resumed and 
>> >> >> > so on).
>> >> >>
>> >> >> A device's parent becomes

x86/kdump: crashkernel=X try to reserve below 896M first then below 4G and MAXMEM

2017-10-19 Thread Dave Young

Now crashkernel=X will fail if there's not enough memory at low region
(below 896M) when trying to reserve large memory size.  One can use
crashkernel=xM,high to reserve it at high region (>4G) but it is more
convinient to improve crashkernel=X to: 

 - First try to reserve X below 896M (for being compatible with old
   kexec-tools).
 - If fails, try to reserve X below 4G (swiotlb need to stay below 4G).
 - If fails, try to reserve X from MAXMEM top down.

It's more transparent and user-friendly.

If crashkernel is large and the reserved is beyond 896M, old kexec-tools
is not compatible with new kernel because old kexec-tools can not load
kernel at high memory region, there was an old discussion below
(previously posted by Chao Wang):
https://lkml.org/lkml/2013/10/15/601

But actually the behavior is consistent during my test. Suppose
old kernel fail to reserve memory at low areas, kdump does not
work because no memory reserved. With this patch, suppose new kernel
successfully reserved memory at high areas, old kexec-tools still fail
to load kdump kernel (tested 2.0.2), so it is acceptable, no need to
worry about the compatibility.

Here is the test result (kexec-tools 2.0.2, no high memory load
support):
Crashkernel over 4G:
# cat /proc/iomem|grep Crash
  be00-cdff : Crash kernel
  21300-21eff : Crash kernel
# ./kexec  -p /boot/vmlinuz-`uname -r`
Memory for crashkernel is not reserved
Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
Then try loading kdump kernel

crashkernel: 896M-4G:
# cat /proc/iomem|grep Crash
  9600-cdef : Crash kernel
# ./kexec -p /boot/vmlinuz-4.14.0-rc4+
ELF core (kcore) parse failed
Cannot load /boot/vmlinuz-4.14.0-rc4+

Signed-off-by: Dave Young 
---
 arch/x86/kernel/setup.c |   16 
 1 file changed, 16 insertions(+)

--- linux-x86.orig/arch/x86/kernel/setup.c
+++ linux-x86/arch/x86/kernel/setup.c
@@ -568,6 +568,22 @@ static void __init reserve_crashkernel(v
high ? CRASH_ADDR_HIGH_MAX
 : CRASH_ADDR_LOW_MAX,
crash_size, CRASH_ALIGN);
+#ifdef CONFIG_X86_64
+   /*
+* crashkernel=X reserve below 896M fails? Try below 4G
+*/
+   if (!high && !crash_base)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   (1ULL << 32),
+   crash_size, CRASH_ALIGN);
+   /*
+* crashkernel=X reserve below 4G fails? Try MAXMEM
+*/
+   if (!high && !crash_base)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   CRASH_ADDR_HIGH_MAX,
+   crash_size, CRASH_ALIGN);
+#endif
if (!crash_base) {
pr_info("crashkernel reservation failed - No suitable 
area found.\n");
return;

x86/kdump: crashkernel=X try to reserve below 896M first then below 4G and MAXMEM

2017-10-19 Thread Dave Young

Now crashkernel=X will fail if there's not enough memory at low region
(below 896M) when trying to reserve large memory size.  One can use
crashkernel=xM,high to reserve it at high region (>4G) but it is more
convinient to improve crashkernel=X to: 

 - First try to reserve X below 896M (for being compatible with old
   kexec-tools).
 - If fails, try to reserve X below 4G (swiotlb need to stay below 4G).
 - If fails, try to reserve X from MAXMEM top down.

It's more transparent and user-friendly.

If crashkernel is large and the reserved is beyond 896M, old kexec-tools
is not compatible with new kernel because old kexec-tools can not load
kernel at high memory region, there was an old discussion below
(previously posted by Chao Wang):
https://lkml.org/lkml/2013/10/15/601

But actually the behavior is consistent during my test. Suppose
old kernel fail to reserve memory at low areas, kdump does not
work because no memory reserved. With this patch, suppose new kernel
successfully reserved memory at high areas, old kexec-tools still fail
to load kdump kernel (tested 2.0.2), so it is acceptable, no need to
worry about the compatibility.

Here is the test result (kexec-tools 2.0.2, no high memory load
support):
Crashkernel over 4G:
# cat /proc/iomem|grep Crash
  be00-cdff : Crash kernel
  21300-21eff : Crash kernel
# ./kexec  -p /boot/vmlinuz-`uname -r`
Memory for crashkernel is not reserved
Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
Then try loading kdump kernel

crashkernel: 896M-4G:
# cat /proc/iomem|grep Crash
  9600-cdef : Crash kernel
# ./kexec -p /boot/vmlinuz-4.14.0-rc4+
ELF core (kcore) parse failed
Cannot load /boot/vmlinuz-4.14.0-rc4+

Signed-off-by: Dave Young 
---
 arch/x86/kernel/setup.c |   16 
 1 file changed, 16 insertions(+)

--- linux-x86.orig/arch/x86/kernel/setup.c
+++ linux-x86/arch/x86/kernel/setup.c
@@ -568,6 +568,22 @@ static void __init reserve_crashkernel(v
high ? CRASH_ADDR_HIGH_MAX
 : CRASH_ADDR_LOW_MAX,
crash_size, CRASH_ALIGN);
+#ifdef CONFIG_X86_64
+   /*
+* crashkernel=X reserve below 896M fails? Try below 4G
+*/
+   if (!high && !crash_base)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   (1ULL << 32),
+   crash_size, CRASH_ALIGN);
+   /*
+* crashkernel=X reserve below 4G fails? Try MAXMEM
+*/
+   if (!high && !crash_base)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   CRASH_ADDR_HIGH_MAX,
+   crash_size, CRASH_ALIGN);
+#endif
if (!crash_base) {
pr_info("crashkernel reservation failed - No suitable 
area found.\n");
return;

[PATCH] iio: adc: sun4i-gpadc: use of_device_get_match_data

2017-10-19 Thread Corentin Labbe

The usage of of_device_get_match_data reduce the code size a bit.
Furthermore, it prevents an improbable dereference when
of_match_device() return NULL.

Signed-off-by: Corentin Labbe 
---
 drivers/iio/adc/sun4i-gpadc-iio.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iio/adc/sun4i-gpadc-iio.c 
b/drivers/iio/adc/sun4i-gpadc-iio.c
index c4e70f1cad79..04d7147e0110 100644
--- a/drivers/iio/adc/sun4i-gpadc-iio.c
+++ b/drivers/iio/adc/sun4i-gpadc-iio.c
@@ -501,17 +501,15 @@ static int sun4i_gpadc_probe_dt(struct platform_device 
*pdev,
struct iio_dev *indio_dev)
 {
struct sun4i_gpadc_iio *info = iio_priv(indio_dev);
-   const struct of_device_id *of_dev;
struct resource *mem;
void __iomem *base;
int ret;
 
-   of_dev = of_match_device(sun4i_gpadc_of_id, >dev);
-   if (!of_dev)
+   info->data = of_device_get_match_data(>dev);
+   if (!info->data)
return -ENODEV;
 
info->no_irq = true;
-   info->data = (struct gpadc_data *)of_dev->data;
indio_dev->num_channels = ARRAY_SIZE(sun8i_a33_gpadc_channels);
indio_dev->channels = sun8i_a33_gpadc_channels;
 
-- 
2.13.6

[PATCH] iio: adc: sun4i-gpadc: use of_device_get_match_data

2017-10-19 Thread Corentin Labbe

The usage of of_device_get_match_data reduce the code size a bit.
Furthermore, it prevents an improbable dereference when
of_match_device() return NULL.

Signed-off-by: Corentin Labbe 
---
 drivers/iio/adc/sun4i-gpadc-iio.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iio/adc/sun4i-gpadc-iio.c 
b/drivers/iio/adc/sun4i-gpadc-iio.c
index c4e70f1cad79..04d7147e0110 100644
--- a/drivers/iio/adc/sun4i-gpadc-iio.c
+++ b/drivers/iio/adc/sun4i-gpadc-iio.c
@@ -501,17 +501,15 @@ static int sun4i_gpadc_probe_dt(struct platform_device 
*pdev,
struct iio_dev *indio_dev)
 {
struct sun4i_gpadc_iio *info = iio_priv(indio_dev);
-   const struct of_device_id *of_dev;
struct resource *mem;
void __iomem *base;
int ret;
 
-   of_dev = of_match_device(sun4i_gpadc_of_id, >dev);
-   if (!of_dev)
+   info->data = of_device_get_match_data(>dev);
+   if (!info->data)
return -ENODEV;
 
info->no_irq = true;
-   info->data = (struct gpadc_data *)of_dev->data;
indio_dev->num_channels = ARRAY_SIZE(sun8i_a33_gpadc_channels);
indio_dev->channels = sun8i_a33_gpadc_channels;
 
-- 
2.13.6

Re: [Xen-devel] [PATCH 1/1] xen/time: do not decrease steal time after live migration on xen

2017-10-19 Thread Dongli Zhang

Hi Boris,

- boris.ostrov...@oracle.com wrote:

> On 10/19/2017 04:02 AM, Dongli Zhang wrote:
> > After guest live migration on xen, steal time in /proc/stat
> > (cpustat[CPUTIME_STEAL]) might decrease because steal returned by
> > xen_steal_lock() might be less than this_rq()->prev_steal_time which
> is
> > derived from previous return value of xen_steal_clock().
> >
> > For instance, steal time of each vcpu is 335 before live migration.
> >
> > cpu  198 0 368 200064 1962 0 0 1340 0 0
> > cpu0 38 0 81 50063 492 0 0 335 0 0
> > cpu1 65 0 97 49763 634 0 0 335 0 0
> > cpu2 38 0 81 50098 462 0 0 335 0 0
> > cpu3 56 0 107 50138 374 0 0 335 0 0
> >
> > After live migration, steal time is reduced to 312.
> >
> > cpu  200 0 370 200330 1971 0 0 1248 0 0
> > cpu0 38 0 82 50123 500 0 0 312 0 0
> > cpu1 65 0 97 49832 634 0 0 312 0 0
> > cpu2 39 0 82 50167 462 0 0 312 0 0
> > cpu3 56 0 107 50207 374 0 0 312 0 0
> >
> > The code in this patch is borrowed from do_stolen_accounting() which
> has
> > already been removed from linux source code since commit
> ecb23dc6f2ef
> > ("xen: add steal_clock support on x86"). The core idea of both
> > do_stolen_accounting() and this patch is to avoid accounting new
> steal
> > clock if it is smaller than previous old steal clock.
> >
> > Similar and more severe issue would impact prior linux 4.8-4.10 as
> > discussed by Michael Las at
> >
> https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest,
> > which would overflow steal time and lead to 100% st usage in top
> command
> > for linux 4.8-4.10. A backport of this patch would fix that issue.
> >
> > References:
> https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest
> > Signed-off-by: Dongli Zhang 
> > ---
> >  drivers/xen/time.c | 15 ++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/xen/time.c b/drivers/xen/time.c
> > index ac5f23f..2b3a996 100644
> > --- a/drivers/xen/time.c
> > +++ b/drivers/xen/time.c
> > @@ -19,6 +19,8 @@
> >  /* runstate info updated by Xen */
> >  static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate);
> >
> > +static DEFINE_PER_CPU(u64, xen_old_steal);
> > +
> >  /* return an consistent snapshot of 64-bit time/counter value */
> >  static u64 get64(const u64 *p)
> >  {
> > @@ -83,9 +85,20 @@ bool xen_vcpu_stolen(int vcpu)
> >  u64 xen_steal_clock(int cpu)
> >  {
> > struct vcpu_runstate_info state;
> > +   u64 xen_new_steal;
> > +   s64 steal_delta;
> >
> > xen_get_runstate_snapshot_cpu(, cpu);
> > -   return state.time[RUNSTATE_runnable] +
> state.time[RUNSTATE_offline];
> > +   xen_new_steal = state.time[RUNSTATE_runnable]
> > +   + state.time[RUNSTATE_offline];
> > +   steal_delta = xen_new_steal - per_cpu(xen_old_steal, cpu);
> > +
> > +   if (steal_delta < 0)
> > +   xen_new_steal = per_cpu(xen_old_steal, cpu);
> > +   else
> > +   per_cpu(xen_old_steal, cpu) = xen_new_steal;
> > +
> > +   return xen_new_steal;
> >  }
> >
> >  void xen_setup_runstate_info(int cpu)
> 
> Can we stash state.time[] during suspend and then add stashed values
> inside xen_get_runstate_snapshot_cpu()?


Would you like to stash state.time[] during do_suspend() (or xen_suspend()) or
code below is expected:

-

--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -19,6 +19,8 @@
 /* runstate info updated by Xen */
 static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate);
 
+static DEFINE_PER_CPU(u64[4], old_runstate_time);
+
 /* return an consistent snapshot of 64-bit time/counter value */
 static u64 get64(const u64 *p)
 {
@@ -52,6 +54,8 @@ static void xen_get_runstate_snapshot_cpu(struct 
vcpu_runstate_info *res,
 {
u64 state_time;
struct vcpu_runstate_info *state;
+   int i;
+   s64 time_delta;
 
BUG_ON(preemptible());
 
@@ -64,6 +68,17 @@ static void xen_get_runstate_snapshot_cpu(struct 
vcpu_runstate_info *res,
rmb();  /* Hypervisor might update data. */
} while (get64(>state_entry_time) != state_time ||
 (state_time & XEN_RUNSTATE_UPDATE));
+
+   for (i = 0; i < 4; i++) {
+   if (i == RUNSTATE_runnable || i == RUNSTATE_offline) {
+   time_delta = res->time[i] - per_cpu(old_runstate_time, 
cpu)[i];
+
+   if (unlikely(time_delta < 0))
+   res->time[i] = per_cpu(old_runstate_time, 
cpu)[i];
+   else
+   per_cpu(old_runstate_time, cpu)[i] = 
res->time[i];
+   }
+   }
 }

-

Thank you very much!

Dongli Zhang

> 
> This will make xen_steal_clock() simpler.
> 
> -boris
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
>

Re: [Xen-devel] [PATCH 1/1] xen/time: do not decrease steal time after live migration on xen

2017-10-19 Thread Dongli Zhang

Hi Boris,

- boris.ostrov...@oracle.com wrote:

> On 10/19/2017 04:02 AM, Dongli Zhang wrote:
> > After guest live migration on xen, steal time in /proc/stat
> > (cpustat[CPUTIME_STEAL]) might decrease because steal returned by
> > xen_steal_lock() might be less than this_rq()->prev_steal_time which
> is
> > derived from previous return value of xen_steal_clock().
> >
> > For instance, steal time of each vcpu is 335 before live migration.
> >
> > cpu  198 0 368 200064 1962 0 0 1340 0 0
> > cpu0 38 0 81 50063 492 0 0 335 0 0
> > cpu1 65 0 97 49763 634 0 0 335 0 0
> > cpu2 38 0 81 50098 462 0 0 335 0 0
> > cpu3 56 0 107 50138 374 0 0 335 0 0
> >
> > After live migration, steal time is reduced to 312.
> >
> > cpu  200 0 370 200330 1971 0 0 1248 0 0
> > cpu0 38 0 82 50123 500 0 0 312 0 0
> > cpu1 65 0 97 49832 634 0 0 312 0 0
> > cpu2 39 0 82 50167 462 0 0 312 0 0
> > cpu3 56 0 107 50207 374 0 0 312 0 0
> >
> > The code in this patch is borrowed from do_stolen_accounting() which
> has
> > already been removed from linux source code since commit
> ecb23dc6f2ef
> > ("xen: add steal_clock support on x86"). The core idea of both
> > do_stolen_accounting() and this patch is to avoid accounting new
> steal
> > clock if it is smaller than previous old steal clock.
> >
> > Similar and more severe issue would impact prior linux 4.8-4.10 as
> > discussed by Michael Las at
> >
> https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest,
> > which would overflow steal time and lead to 100% st usage in top
> command
> > for linux 4.8-4.10. A backport of this patch would fix that issue.
> >
> > References:
> https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest
> > Signed-off-by: Dongli Zhang 
> > ---
> >  drivers/xen/time.c | 15 ++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/xen/time.c b/drivers/xen/time.c
> > index ac5f23f..2b3a996 100644
> > --- a/drivers/xen/time.c
> > +++ b/drivers/xen/time.c
> > @@ -19,6 +19,8 @@
> >  /* runstate info updated by Xen */
> >  static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate);
> >
> > +static DEFINE_PER_CPU(u64, xen_old_steal);
> > +
> >  /* return an consistent snapshot of 64-bit time/counter value */
> >  static u64 get64(const u64 *p)
> >  {
> > @@ -83,9 +85,20 @@ bool xen_vcpu_stolen(int vcpu)
> >  u64 xen_steal_clock(int cpu)
> >  {
> > struct vcpu_runstate_info state;
> > +   u64 xen_new_steal;
> > +   s64 steal_delta;
> >
> > xen_get_runstate_snapshot_cpu(, cpu);
> > -   return state.time[RUNSTATE_runnable] +
> state.time[RUNSTATE_offline];
> > +   xen_new_steal = state.time[RUNSTATE_runnable]
> > +   + state.time[RUNSTATE_offline];
> > +   steal_delta = xen_new_steal - per_cpu(xen_old_steal, cpu);
> > +
> > +   if (steal_delta < 0)
> > +   xen_new_steal = per_cpu(xen_old_steal, cpu);
> > +   else
> > +   per_cpu(xen_old_steal, cpu) = xen_new_steal;
> > +
> > +   return xen_new_steal;
> >  }
> >
> >  void xen_setup_runstate_info(int cpu)
> 
> Can we stash state.time[] during suspend and then add stashed values
> inside xen_get_runstate_snapshot_cpu()?


Would you like to stash state.time[] during do_suspend() (or xen_suspend()) or
code below is expected:

-

--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -19,6 +19,8 @@
 /* runstate info updated by Xen */
 static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate);
 
+static DEFINE_PER_CPU(u64[4], old_runstate_time);
+
 /* return an consistent snapshot of 64-bit time/counter value */
 static u64 get64(const u64 *p)
 {
@@ -52,6 +54,8 @@ static void xen_get_runstate_snapshot_cpu(struct 
vcpu_runstate_info *res,
 {
u64 state_time;
struct vcpu_runstate_info *state;
+   int i;
+   s64 time_delta;
 
BUG_ON(preemptible());
 
@@ -64,6 +68,17 @@ static void xen_get_runstate_snapshot_cpu(struct 
vcpu_runstate_info *res,
rmb();  /* Hypervisor might update data. */
} while (get64(>state_entry_time) != state_time ||
 (state_time & XEN_RUNSTATE_UPDATE));
+
+   for (i = 0; i < 4; i++) {
+   if (i == RUNSTATE_runnable || i == RUNSTATE_offline) {
+   time_delta = res->time[i] - per_cpu(old_runstate_time, 
cpu)[i];
+
+   if (unlikely(time_delta < 0))
+   res->time[i] = per_cpu(old_runstate_time, 
cpu)[i];
+   else
+   per_cpu(old_runstate_time, cpu)[i] = 
res->time[i];
+   }
+   }
 }

-

Thank you very much!

Dongli Zhang

> 
> This will make xen_steal_clock() simpler.
> 
> -boris
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> https://lists.xen.org/xen-devel

Re: v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()

2017-10-19 Thread Eric Dumazet

On Thu, Oct 19, 2017 at 8:13 PM, Wei Wei  wrote:
> Sry. Here it is.
>
> Unable to handle kernel paging request at virtual address 80005bfb81ed
> Mem abort info:
> Exception class = DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> Data abort info:
> ISV = 0, ISS = 0x0033
> CM = 0, WnR = 0
> swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000
> [80005bfb81ed] *pgd=beff7003, *pud=00e88711
> Internal error: Oops: 9621 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3
> Hardware name: linux,dummy-virt (DT)
> task: 800074409e00 task.stack: 800033db
> PC is at __skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 
> (discriminator 4) /net/core/skbuff.c:873 (discriminator 4))
> LR is at __skb_clone (/net/core/skbuff.c:861 (discriminator 4))
> pc : lr : pstate: 1145
>
> sp : 800033db33d0
> x29: 800033db33d0 x28: 298ac378
> x27: 16a860e1 x26: 167b66b6
> x25: 8000743340a0 x24: 800035430708
> x23: 80005bfb80c9 x22: 800035430710
> x21: 0380 x20: 800035430640
> x19: 8000354312c0 x18: 
> x17: 004af000 x16: 2845e8c8
> x15: 1e518060 x14: d8316070
> x13: d8316090 x12: 
> x11: 16a8626f x10: 16a8626f
> x9 : dfff2000 x8 : 0082009000900608
> x7 :  x6 : 800035431380
> x5 : 16a86270 x4 : 
> x3 : 16a86273 x2 : 
> x1 : 0100 x0 : 80005bfb81ed
> Process syz-executor0 (pid: 4725, stack limit = 0x800033db)
> Call trace:
> Exception stack(0x800033db3290 to 0x800033db33d0)
> 3280:   80005bfb81ed 0100
> 32a0:  16a86273  16a86270
> 32c0: 800035431380  0082009000900608 dfff2000
> 32e0: 16a8626f 16a8626f  d8316090
> 3300: d8316070 1e518060 2845e8c8 004af000
> 3320:  8000354312c0 800035430640 0380
> 3340: 800035430710 80005bfb80c9 800035430708 8000743340a0
> 3360: 167b66b6 16a860e1 298ac378 800033db33d0
> 3380: 29705cfc 800033db33d0 29705f50 1145
> 33a0: 8000354312c0 800035430640 0001 800074334000
> 33c0: 800033db33d0 29705f50
> __skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 (discriminator 4) 
> /net/core/skbuff.c:873 (discriminator 4))
> skb_clone (/net/core/skbuff.c:1286)
> arp_rcv (/./include/linux/skbuff.h:1518 /net/ipv4/arp.c:946)
> __netif_receive_skb_core (/net/core/dev.c:1859 /net/core/dev.c:1874 
> /net/core/dev.c:4416)
> __netif_receive_skb (/net/core/dev.c:4466)
> netif_receive_skb_internal (/net/core/dev.c:4539)
> netif_receive_skb (/net/core/dev.c:4564)
> tun_get_user (/./include/linux/bottom_half.h:31 /drivers/net/tun.c:1219 
> /drivers/net/tun.c:1553)
> tun_chr_write_iter (/drivers/net/tun.c:1579)
> do_iter_readv_writev (/./include/linux/fs.h:1770 /fs/read_write.c:673)
> do_iter_write (/fs/read_write.c:952)
> vfs_writev (/fs/read_write.c:997)
> do_writev (/fs/read_write.c:1032)
> SyS_writev (/fs/read_write.c:1102)
> Exception stack(0x800033db3ec0 to 0x800033db4000)
> 3ec0: 0015 829985e0 0001 8299851c
> 3ee0: 82999068 82998f60 82999650 
> 3f00: 0042 0036 00406608 82998400
> 3f20: 82998f60 d8316090 d8316070 1e518060
> 3f40:  004af000  0036
> 3f60: 20004fca 2000 0046ccf0 0530
> 3f80: 0046cce8 004ade98  395fa6f0
> 3fa0: 82998f60 82998560 00431448 82998520
> 3fc0: 0043145c 8000 0015 0042
> 3fe0:    
> el0_svc_naked (/arch/arm64/kernel/entry.S:853)
> Code: f9406680 8b01 91009000 f9800011 (885f7c01)
> All code
> 
>0:   80 66 40 f9 andb   $0xf9,0x40(%rsi)
>4:   00 00   add%al,(%rax)
>6:   01 8b 00 90 00 91   add%ecx,-0x6eff7000(%rbx)
>c:   11 00   adc%eax,(%rax)
>e:   80 f9 01cmp$0x1,%cl
>   11:   7c 5f   jl 0x72
>   13:*  88 00   mov%al,(%rax)   <-- trapping 
> instruction
>   15:   00 00   add%al,(%rax)
> ...
>
> Code starting with the faulting instruction
> ===
>0:   01 7c 5f 88 add

Re: v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()

2017-10-19 Thread Eric Dumazet

On Thu, Oct 19, 2017 at 8:13 PM, Wei Wei  wrote:
> Sry. Here it is.
>
> Unable to handle kernel paging request at virtual address 80005bfb81ed
> Mem abort info:
> Exception class = DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> Data abort info:
> ISV = 0, ISS = 0x0033
> CM = 0, WnR = 0
> swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000
> [80005bfb81ed] *pgd=beff7003, *pud=00e88711
> Internal error: Oops: 9621 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3
> Hardware name: linux,dummy-virt (DT)
> task: 800074409e00 task.stack: 800033db
> PC is at __skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 
> (discriminator 4) /net/core/skbuff.c:873 (discriminator 4))
> LR is at __skb_clone (/net/core/skbuff.c:861 (discriminator 4))
> pc : lr : pstate: 1145
>
> sp : 800033db33d0
> x29: 800033db33d0 x28: 298ac378
> x27: 16a860e1 x26: 167b66b6
> x25: 8000743340a0 x24: 800035430708
> x23: 80005bfb80c9 x22: 800035430710
> x21: 0380 x20: 800035430640
> x19: 8000354312c0 x18: 
> x17: 004af000 x16: 2845e8c8
> x15: 1e518060 x14: d8316070
> x13: d8316090 x12: 
> x11: 16a8626f x10: 16a8626f
> x9 : dfff2000 x8 : 0082009000900608
> x7 :  x6 : 800035431380
> x5 : 16a86270 x4 : 
> x3 : 16a86273 x2 : 
> x1 : 0100 x0 : 80005bfb81ed
> Process syz-executor0 (pid: 4725, stack limit = 0x800033db)
> Call trace:
> Exception stack(0x800033db3290 to 0x800033db33d0)
> 3280:   80005bfb81ed 0100
> 32a0:  16a86273  16a86270
> 32c0: 800035431380  0082009000900608 dfff2000
> 32e0: 16a8626f 16a8626f  d8316090
> 3300: d8316070 1e518060 2845e8c8 004af000
> 3320:  8000354312c0 800035430640 0380
> 3340: 800035430710 80005bfb80c9 800035430708 8000743340a0
> 3360: 167b66b6 16a860e1 298ac378 800033db33d0
> 3380: 29705cfc 800033db33d0 29705f50 1145
> 33a0: 8000354312c0 800035430640 0001 800074334000
> 33c0: 800033db33d0 29705f50
> __skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 (discriminator 4) 
> /net/core/skbuff.c:873 (discriminator 4))
> skb_clone (/net/core/skbuff.c:1286)
> arp_rcv (/./include/linux/skbuff.h:1518 /net/ipv4/arp.c:946)
> __netif_receive_skb_core (/net/core/dev.c:1859 /net/core/dev.c:1874 
> /net/core/dev.c:4416)
> __netif_receive_skb (/net/core/dev.c:4466)
> netif_receive_skb_internal (/net/core/dev.c:4539)
> netif_receive_skb (/net/core/dev.c:4564)
> tun_get_user (/./include/linux/bottom_half.h:31 /drivers/net/tun.c:1219 
> /drivers/net/tun.c:1553)
> tun_chr_write_iter (/drivers/net/tun.c:1579)
> do_iter_readv_writev (/./include/linux/fs.h:1770 /fs/read_write.c:673)
> do_iter_write (/fs/read_write.c:952)
> vfs_writev (/fs/read_write.c:997)
> do_writev (/fs/read_write.c:1032)
> SyS_writev (/fs/read_write.c:1102)
> Exception stack(0x800033db3ec0 to 0x800033db4000)
> 3ec0: 0015 829985e0 0001 8299851c
> 3ee0: 82999068 82998f60 82999650 
> 3f00: 0042 0036 00406608 82998400
> 3f20: 82998f60 d8316090 d8316070 1e518060
> 3f40:  004af000  0036
> 3f60: 20004fca 2000 0046ccf0 0530
> 3f80: 0046cce8 004ade98  395fa6f0
> 3fa0: 82998f60 82998560 00431448 82998520
> 3fc0: 0043145c 8000 0015 0042
> 3fe0:    
> el0_svc_naked (/arch/arm64/kernel/entry.S:853)
> Code: f9406680 8b01 91009000 f9800011 (885f7c01)
> All code
> 
>0:   80 66 40 f9 andb   $0xf9,0x40(%rsi)
>4:   00 00   add%al,(%rax)
>6:   01 8b 00 90 00 91   add%ecx,-0x6eff7000(%rbx)
>c:   11 00   adc%eax,(%rax)
>e:   80 f9 01cmp$0x1,%cl
>   11:   7c 5f   jl 0x72
>   13:*  88 00   mov%al,(%rax)   <-- trapping 
> instruction
>   15:   00 00   add%al,(%rax)
> ...
>
> Code starting with the faulting instruction
> ===
>0:   01 7c 5f 88 add%edi,-0x78(%rdi,%rbx,2)
>4:   00

Re: [PATCH 1/4] kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Dou Liyang


[...]


+#ifdef CONFIG_MEMORY_HOTPLUG
+static void mem_mark_immovable(char *str)
+{
+   int i = 0;
+


you have use num_immovable_region, 'i' is useless. just remove it.


Using num_immovable_region makes code too long. Using i will be
clear and make sure shoter than 80 characters.


Oh, God, that's horrific. Did you find that your code is wrong?

num_immovable_region will be reset each time.


Did you test?



you can try with more than one movable_node, eg:

"...movable_node=128G@128G movable_node=128G@256G..."

then, you will find the problem of you code.

Thanks,
dou

Re: [PATCH 1/4] kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Dou Liyang


[...]


+#ifdef CONFIG_MEMORY_HOTPLUG
+static void mem_mark_immovable(char *str)
+{
+   int i = 0;
+


you have use num_immovable_region, 'i' is useless. just remove it.


Using num_immovable_region makes code too long. Using i will be
clear and make sure shoter than 80 characters.


Oh, God, that's horrific. Did you find that your code is wrong?

num_immovable_region will be reset each time.


Did you test?



you can try with more than one movable_node, eg:

"...movable_node=128G@128G movable_node=128G@256G..."

then, you will find the problem of you code.

Thanks,
dou

[fstests PATCH] generic: add test for DAX MAP_SYNC support

2017-10-19 Thread Ross Zwisler

Add a test that exercises DAX's new MAP_SYNC flag.

This test creates a file and writes to it via an mmap(), but never syncs
via fsync/msync.  This process is tracked via dm-log-writes, then replayed.

If MAP_SYNC is working the dm-log-writes replay will show the test file
with the same size that we wrote via the mmap() because each allocating
page fault included an implicit metadata sync.  If MAP_SYNC isn't working
(which you can test by fiddling with the parameters to mmap()) the file
will be smaller or missing entirely.

Note that dm-log-writes doesn't track the data that we write via the
mmap(), so we can't do any data integrity checking.  We can only verify
that the metadata writes for the page faults happened.

Signed-off-by: Ross Zwisler 
---

For this test to run successfully you'll need both Jan's MAP_SYNC series:

https://www.spinics.net/lists/linux-xfs/msg11852.html

and my series adding DAX support to dm-log-writes:

https://lists.01.org/pipermail/linux-nvdimm/2017-October/012972.html

---
 .gitignore|  1 +
 common/dmlogwrites|  1 -
 src/Makefile  |  3 +-
 src/t_map_sync.c  | 74 +
 tests/generic/466 | 77 +++
 tests/generic/466.out |  3 ++
 tests/generic/group   |  1 +
 7 files changed, 158 insertions(+), 2 deletions(-)
 create mode 100644 src/t_map_sync.c
 create mode 100755 tests/generic/466
 create mode 100644 tests/generic/466.out

diff --git a/.gitignore b/.gitignore
index 2014c08..9fc0695 100644
--- a/.gitignore
+++ b/.gitignore
@@ -119,6 +119,7 @@
 /src/t_getcwd
 /src/t_holes
 /src/t_immutable
+/src/t_map_sync
 /src/t_mmap_cow_race
 /src/t_mmap_dio
 /src/t_mmap_fallocate
diff --git a/common/dmlogwrites b/common/dmlogwrites
index 247c744..5b57df9 100644
--- a/common/dmlogwrites
+++ b/common/dmlogwrites
@@ -23,7 +23,6 @@ _require_log_writes()
[ -z "$LOGWRITES_DEV" -o ! -b "$LOGWRITES_DEV" ] && \
_notrun "This test requires a valid \$LOGWRITES_DEV"
 
-   _exclude_scratch_mount_option dax
_require_dm_target log-writes
_require_test_program "log-writes/replay-log"
 }
diff --git a/src/Makefile b/src/Makefile
index 3eb25b1..af7e7e9 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -13,7 +13,8 @@ TARGETS = dirstress fill fill2 getpagesize holes lstat64 \
multi_open_unlink dmiperf unwritten_sync genhashnames t_holes \
t_mmap_writev t_truncate_cmtime dirhash_collide t_rename_overwrite \
holetest t_truncate_self t_mmap_dio af_unix t_mmap_stale_pmd \
-   t_mmap_cow_race t_mmap_fallocate fsync-err t_mmap_write_ro
+   t_mmap_cow_race t_mmap_fallocate fsync-err t_mmap_write_ro \
+   t_map_sync
 
 LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
preallo_rw_pattern_writer ftrunc trunc fs_perms testx looptest \
diff --git a/src/t_map_sync.c b/src/t_map_sync.c
new file mode 100644
index 000..8190f3c
--- /dev/null
+++ b/src/t_map_sync.c
@@ -0,0 +1,74 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MiB(a) ((a)*1024*1024)
+
+/*
+ * These two defines were added to the kernel via commits entitled
+ * "mm: Define MAP_SYNC and VM_SYNC flags" and
+ * "mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap
+ * flags", respectively.
+ */
+#define MAP_SYNC 0x8
+#define MAP_SHARED_VALIDATE 0x3
+
+void err_exit(char *op)
+{
+   fprintf(stderr, "%s: %s\n", op, strerror(errno));
+   exit(1);
+}
+
+int main(int argc, char *argv[])
+{
+   int page_size = getpagesize();
+   int len = MiB(1);
+   int i, fd, err;
+   char *data;
+
+   if (argc < 2) {
+   printf("Usage: %s \n", basename(argv[0]));
+   exit(0);
+   }
+
+   fd = open(argv[1], O_RDWR|O_CREAT, S_IRUSR|S_IWUSR);
+   if (fd < 0)
+   err_exit("fd");
+
+   ftruncate(fd, 0);
+   ftruncate(fd, len);
+
+   data = mmap(NULL, len, PROT_READ|PROT_WRITE,
+   MAP_SHARED_VALIDATE|MAP_SYNC, fd, 0);
+   if (data == MAP_FAILED)
+   err_exit("mmap");
+
+   /*
+* We intentionally don't sync 'fd' manually.  If MAP_SYNC is working
+* these allocating page faults will cause the filesystem to sync its
+* metadata so that when we replay the dm-log-writes log the test file
+* will be 1 MiB in size.
+*
+* dm-log-writes doesn't track the data that we write via the mmap(),
+* so we can't check that, we can only verify that the metadata writes
+* happened.
+*/
+   for (i = 0; i < len; i+=page_size)
+   data[i] = 0xff;
+
+   err = munmap(data, len);
+   if (err < 0)
+   err_exit("munmap");
+
+   err = close(fd);
+   if (err < 0)
+   err_exit("close");
+
+   return 0;
+}
diff --git

[fstests PATCH] generic: add test for DAX MAP_SYNC support

2017-10-19 Thread Ross Zwisler

Add a test that exercises DAX's new MAP_SYNC flag.

This test creates a file and writes to it via an mmap(), but never syncs
via fsync/msync.  This process is tracked via dm-log-writes, then replayed.

If MAP_SYNC is working the dm-log-writes replay will show the test file
with the same size that we wrote via the mmap() because each allocating
page fault included an implicit metadata sync.  If MAP_SYNC isn't working
(which you can test by fiddling with the parameters to mmap()) the file
will be smaller or missing entirely.

Note that dm-log-writes doesn't track the data that we write via the
mmap(), so we can't do any data integrity checking.  We can only verify
that the metadata writes for the page faults happened.

Signed-off-by: Ross Zwisler 
---

For this test to run successfully you'll need both Jan's MAP_SYNC series:

https://www.spinics.net/lists/linux-xfs/msg11852.html

and my series adding DAX support to dm-log-writes:

https://lists.01.org/pipermail/linux-nvdimm/2017-October/012972.html

---
 .gitignore|  1 +
 common/dmlogwrites|  1 -
 src/Makefile  |  3 +-
 src/t_map_sync.c  | 74 +
 tests/generic/466 | 77 +++
 tests/generic/466.out |  3 ++
 tests/generic/group   |  1 +
 7 files changed, 158 insertions(+), 2 deletions(-)
 create mode 100644 src/t_map_sync.c
 create mode 100755 tests/generic/466
 create mode 100644 tests/generic/466.out

diff --git a/.gitignore b/.gitignore
index 2014c08..9fc0695 100644
--- a/.gitignore
+++ b/.gitignore
@@ -119,6 +119,7 @@
 /src/t_getcwd
 /src/t_holes
 /src/t_immutable
+/src/t_map_sync
 /src/t_mmap_cow_race
 /src/t_mmap_dio
 /src/t_mmap_fallocate
diff --git a/common/dmlogwrites b/common/dmlogwrites
index 247c744..5b57df9 100644
--- a/common/dmlogwrites
+++ b/common/dmlogwrites
@@ -23,7 +23,6 @@ _require_log_writes()
[ -z "$LOGWRITES_DEV" -o ! -b "$LOGWRITES_DEV" ] && \
_notrun "This test requires a valid \$LOGWRITES_DEV"
 
-   _exclude_scratch_mount_option dax
_require_dm_target log-writes
_require_test_program "log-writes/replay-log"
 }
diff --git a/src/Makefile b/src/Makefile
index 3eb25b1..af7e7e9 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -13,7 +13,8 @@ TARGETS = dirstress fill fill2 getpagesize holes lstat64 \
multi_open_unlink dmiperf unwritten_sync genhashnames t_holes \
t_mmap_writev t_truncate_cmtime dirhash_collide t_rename_overwrite \
holetest t_truncate_self t_mmap_dio af_unix t_mmap_stale_pmd \
-   t_mmap_cow_race t_mmap_fallocate fsync-err t_mmap_write_ro
+   t_mmap_cow_race t_mmap_fallocate fsync-err t_mmap_write_ro \
+   t_map_sync
 
 LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
preallo_rw_pattern_writer ftrunc trunc fs_perms testx looptest \
diff --git a/src/t_map_sync.c b/src/t_map_sync.c
new file mode 100644
index 000..8190f3c
--- /dev/null
+++ b/src/t_map_sync.c
@@ -0,0 +1,74 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MiB(a) ((a)*1024*1024)
+
+/*
+ * These two defines were added to the kernel via commits entitled
+ * "mm: Define MAP_SYNC and VM_SYNC flags" and
+ * "mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap
+ * flags", respectively.
+ */
+#define MAP_SYNC 0x8
+#define MAP_SHARED_VALIDATE 0x3
+
+void err_exit(char *op)
+{
+   fprintf(stderr, "%s: %s\n", op, strerror(errno));
+   exit(1);
+}
+
+int main(int argc, char *argv[])
+{
+   int page_size = getpagesize();
+   int len = MiB(1);
+   int i, fd, err;
+   char *data;
+
+   if (argc < 2) {
+   printf("Usage: %s \n", basename(argv[0]));
+   exit(0);
+   }
+
+   fd = open(argv[1], O_RDWR|O_CREAT, S_IRUSR|S_IWUSR);
+   if (fd < 0)
+   err_exit("fd");
+
+   ftruncate(fd, 0);
+   ftruncate(fd, len);
+
+   data = mmap(NULL, len, PROT_READ|PROT_WRITE,
+   MAP_SHARED_VALIDATE|MAP_SYNC, fd, 0);
+   if (data == MAP_FAILED)
+   err_exit("mmap");
+
+   /*
+* We intentionally don't sync 'fd' manually.  If MAP_SYNC is working
+* these allocating page faults will cause the filesystem to sync its
+* metadata so that when we replay the dm-log-writes log the test file
+* will be 1 MiB in size.
+*
+* dm-log-writes doesn't track the data that we write via the mmap(),
+* so we can't check that, we can only verify that the metadata writes
+* happened.
+*/
+   for (i = 0; i < len; i+=page_size)
+   data[i] = 0xff;
+
+   err = munmap(data, len);
+   if (err < 0)
+   err_exit("munmap");
+
+   err = close(fd);
+   if (err < 0)
+   err_exit("close");
+
+   return 0;
+}
diff --git a/tests/generic/466

Re: [alsa-devel] [PATCH 06/14] soundwire: Add IO transfer

2017-10-19 Thread Vinod Koul

On Thu, Oct 19, 2017 at 11:13:48AM +0200, Takashi Iwai wrote:
> On Thu, 19 Oct 2017 05:03:22 +0200,
> Vinod Koul wrote:
> > 
> > +static inline int find_error_code(unsigned int sdw_ret)
> > +{
> > +   switch (sdw_ret) {
> > +   case SDW_CMD_OK:
> > +   return 0;
> > +
> > +   case SDW_CMD_IGNORED:
> > +   return -ENODATA;
> > +
> > +   case SDW_CMD_TIMEOUT:
> > +   return -ETIMEDOUT;
> > +   }
> > +
> > +   return -EIO;
> > +}
> > +
> > +static inline int do_transfer(struct sdw_bus *bus,
> > +   struct sdw_msg *msg, bool page)
> > +{
> > +   int retry = bus->prop.err_threshold;
> > +   int ret, i;
> > +
> > +   for (ret = 0, i = 0; i <= retry; i++) {
> 
> Initializing ret here is a bit messy.  Better to do it outside.

sounds good

> > +   ret = bus->ops->xfer_msg(bus, msg, page);
> > +   ret = find_error_code(ret);
> > +   /* if cmd is ok or ignored return */
> > +   if (ret == 0 || ret == -ENODATA)
> > +   return ret;
> 
> Hmm, it's not good to use the same variable for representing two
> different things.  Either drop the substitution to ret for
> bus->ops->xfer_msg() call, or use another variable to make clear which
> one is for SDW_CMD_* and which one is for -EXXX.  The former should be
> basically an enum.

yes will do, sometimes we should not reuse :)

> > +/**
> > + * sdw_transfer: Synchronous transfer message to a SDW Slave device
> > + *
> > + * @bus: SDW bus
> > + * @slave: SDW Slave
> > + * @msg: SDW message to be xfered
> > + */
> > +int sdw_transfer(struct sdw_bus *bus, struct sdw_slave *slave,
> > +   struct sdw_msg *msg)
> > +{
> > +   bool page;
> > +   int ret;
> > +
> > +   mutex_lock(>msg_lock);
> > +
> > +   page = sdw_get_page(slave, msg);
> > +
> > +   ret = do_transfer(bus, msg, page);
> > +   if (ret != 0 && ret != -ENODATA) {
> > +   dev_err(bus->dev, "trf on Slave %d failed:%d\n",
> > +   msg->dev_num, ret);
> > +   goto error;
> > +   }
> > +
> > +   if (page)
> > +   ret = sdw_reset_page(bus, msg->dev_num);
> > +
> > +error:
> > +   mutex_unlock(>msg_lock);
> > +
> > +   return ret;
> 
> So the logic here is that when -ENODATA is returned and page is false,
> this function should return -ENODATA to the caller,  but when page
> is set, it returns 0?

Sorry no. do_transfer can succced (0) or in some case where Slaves didn't
care for return error (ENODATA), or other errors.
No ENODATA can be error depending on message sent so we dont treat this as
failure and let caller decide.

In case of errors (others) we don't need to reset page and we bail out

> 
> > +static inline int sdw_fill_msg(struct sdw_msg *msg, u16 addr,
> > +   size_t count, u16 dev_num, u8 flags, u8 *buf)
> > +{
> > +   msg->addr = (addr >> SDW_REG_SHIFT(SDW_REGADDR));
> > +   msg->len = count;
> > +   msg->dev_num = dev_num;
> > +   msg->addr_page1 = (addr >> SDW_REG_SHIFT(SDW_SCP_ADDRPAGE1_MASK));
> > +   msg->addr_page2 = (addr >> SDW_REG_SHIFT(SDW_SCP_ADDRPAGE2_MASK));
> > +   msg->flags = flags;
> > +   msg->buf = buf;
> > +   msg->ssp_sync = false;
> > +
> > +   return 0;
> 
> This function can be void.

yup

-- 
~Vinod

Re: [alsa-devel] [PATCH 06/14] soundwire: Add IO transfer

2017-10-19 Thread Vinod Koul

On Thu, Oct 19, 2017 at 11:13:48AM +0200, Takashi Iwai wrote:
> On Thu, 19 Oct 2017 05:03:22 +0200,
> Vinod Koul wrote:
> > 
> > +static inline int find_error_code(unsigned int sdw_ret)
> > +{
> > +   switch (sdw_ret) {
> > +   case SDW_CMD_OK:
> > +   return 0;
> > +
> > +   case SDW_CMD_IGNORED:
> > +   return -ENODATA;
> > +
> > +   case SDW_CMD_TIMEOUT:
> > +   return -ETIMEDOUT;
> > +   }
> > +
> > +   return -EIO;
> > +}
> > +
> > +static inline int do_transfer(struct sdw_bus *bus,
> > +   struct sdw_msg *msg, bool page)
> > +{
> > +   int retry = bus->prop.err_threshold;
> > +   int ret, i;
> > +
> > +   for (ret = 0, i = 0; i <= retry; i++) {
> 
> Initializing ret here is a bit messy.  Better to do it outside.

sounds good

> > +   ret = bus->ops->xfer_msg(bus, msg, page);
> > +   ret = find_error_code(ret);
> > +   /* if cmd is ok or ignored return */
> > +   if (ret == 0 || ret == -ENODATA)
> > +   return ret;
> 
> Hmm, it's not good to use the same variable for representing two
> different things.  Either drop the substitution to ret for
> bus->ops->xfer_msg() call, or use another variable to make clear which
> one is for SDW_CMD_* and which one is for -EXXX.  The former should be
> basically an enum.

yes will do, sometimes we should not reuse :)

> > +/**
> > + * sdw_transfer: Synchronous transfer message to a SDW Slave device
> > + *
> > + * @bus: SDW bus
> > + * @slave: SDW Slave
> > + * @msg: SDW message to be xfered
> > + */
> > +int sdw_transfer(struct sdw_bus *bus, struct sdw_slave *slave,
> > +   struct sdw_msg *msg)
> > +{
> > +   bool page;
> > +   int ret;
> > +
> > +   mutex_lock(>msg_lock);
> > +
> > +   page = sdw_get_page(slave, msg);
> > +
> > +   ret = do_transfer(bus, msg, page);
> > +   if (ret != 0 && ret != -ENODATA) {
> > +   dev_err(bus->dev, "trf on Slave %d failed:%d\n",
> > +   msg->dev_num, ret);
> > +   goto error;
> > +   }
> > +
> > +   if (page)
> > +   ret = sdw_reset_page(bus, msg->dev_num);
> > +
> > +error:
> > +   mutex_unlock(>msg_lock);
> > +
> > +   return ret;
> 
> So the logic here is that when -ENODATA is returned and page is false,
> this function should return -ENODATA to the caller,  but when page
> is set, it returns 0?

Sorry no. do_transfer can succced (0) or in some case where Slaves didn't
care for return error (ENODATA), or other errors.
No ENODATA can be error depending on message sent so we dont treat this as
failure and let caller decide.

In case of errors (others) we don't need to reset page and we bail out

> 
> > +static inline int sdw_fill_msg(struct sdw_msg *msg, u16 addr,
> > +   size_t count, u16 dev_num, u8 flags, u8 *buf)
> > +{
> > +   msg->addr = (addr >> SDW_REG_SHIFT(SDW_REGADDR));
> > +   msg->len = count;
> > +   msg->dev_num = dev_num;
> > +   msg->addr_page1 = (addr >> SDW_REG_SHIFT(SDW_SCP_ADDRPAGE1_MASK));
> > +   msg->addr_page2 = (addr >> SDW_REG_SHIFT(SDW_SCP_ADDRPAGE2_MASK));
> > +   msg->flags = flags;
> > +   msg->buf = buf;
> > +   msg->ssp_sync = false;
> > +
> > +   return 0;
> 
> This function can be void.

yup

-- 
~Vinod

[PATCH 1/2] dm log writes: Add support for inline data buffers

2017-10-19 Thread Ross Zwisler

Currently dm-log-writes supports writing filesystem data via BIOs, and
writing internal metadata from a flat buffer via write_metadata().

For DAX writes, though, we won't have a BIO, but will instead have an
iterator that we'll want to use to fill a flat data buffer.

So, create write_inline_data() which allows us to write filesystem data
using a flat buffer as a source, and wire it up in log_one_block().

Signed-off-by: Ross Zwisler 
---
 drivers/md/dm-log-writes.c | 90 +++---
 1 file changed, 86 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 8b80a9c..c65f9d1 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -246,27 +246,109 @@ static int write_metadata(struct log_writes_c *lc, void 
*entry,
return -1;
 }
 
+static int write_inline_data(struct log_writes_c *lc, void *entry,
+ size_t entrylen, void *data, size_t datalen,
+ sector_t sector)
+{
+   int num_pages, bio_pages, pg_datalen, pg_sectorlen, i;
+   struct page *page;
+   struct bio *bio;
+   size_t ret;
+   void *ptr;
+
+   while (datalen) {
+   num_pages = ALIGN(datalen, PAGE_SIZE) >> PAGE_SHIFT;
+   bio_pages = min(num_pages, BIO_MAX_PAGES);
+
+   atomic_inc(>io_blocks);
+
+   bio = bio_alloc(GFP_KERNEL, bio_pages);
+   if (!bio) {
+   DMERR("Couldn't alloc inline data bio");
+   goto error;
+   }
+
+   bio->bi_iter.bi_size = 0;
+   bio->bi_iter.bi_sector = sector;
+   bio_set_dev(bio, lc->logdev->bdev);
+   bio->bi_end_io = log_end_io;
+   bio->bi_private = lc;
+   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
+
+   for (i = 0; i < bio_pages; i++) {
+   pg_datalen = min(datalen, PAGE_SIZE);
+   pg_sectorlen = ALIGN(pg_datalen, lc->sectorsize);
+
+   page = alloc_page(GFP_KERNEL);
+   if (!page) {
+   DMERR("Couldn't alloc inline data page");
+   goto error_bio;
+   }
+
+   ptr = kmap_atomic(page);
+   memcpy(ptr, data, pg_datalen);
+   if (pg_sectorlen > pg_datalen)
+   memset(ptr + pg_datalen, 0,
+   pg_sectorlen - pg_datalen);
+   kunmap_atomic(ptr);
+
+   ret = bio_add_page(bio, page, pg_sectorlen, 0);
+   if (ret != pg_sectorlen) {
+   DMERR("Couldn't add page of inline data");
+   __free_page(page);
+   goto error_bio;
+   }
+
+   datalen -= pg_datalen;
+   data+= pg_datalen;
+   }
+   submit_bio(bio);
+
+   sector += bio_pages * PAGE_SECTORS;
+   }
+   return 0;
+error_bio:
+   bio_free_pages(bio);
+   bio_put(bio);
+error:
+   put_io_block(lc);
+   return -1;
+}
+
 static int log_one_block(struct log_writes_c *lc,
 struct pending_block *block, sector_t sector)
 {
struct bio *bio;
struct log_write_entry entry;
-   size_t ret;
+   size_t metadlen, ret;
int i;
 
entry.sector = cpu_to_le64(block->sector);
entry.nr_sectors = cpu_to_le64(block->nr_sectors);
entry.flags = cpu_to_le64(block->flags);
entry.data_len = cpu_to_le64(block->datalen);
-   if (write_metadata(lc, , sizeof(entry), block->data,
-  block->datalen, sector)) {
+
+   metadlen = (block->flags & LOG_MARK_FLAG) ?  block->datalen : 0;
+   if (write_metadata(lc, , sizeof(entry), block->data, metadlen,
+   sector)) {
free_pending_block(lc, block);
return -1;
}
 
+   sector += dev_to_bio_sectors(lc, 1);
+
+   if (block->datalen && metadlen == 0) {
+   if (write_inline_data(lc, , sizeof(entry), block->data,
+   block->datalen, sector)) {
+   free_pending_block(lc, block);
+   return -1;
+   }
+   /* we don't support both inline data & bio data */
+   goto out;
+   }
+
if (!block->vec_cnt)
goto out;
-   sector += dev_to_bio_sectors(lc, 1);
 
atomic_inc(>io_blocks);
bio = bio_alloc(GFP_KERNEL, min(block->vec_cnt, BIO_MAX_PAGES));
-- 
2.9.5

[PATCH 1/2] dm log writes: Add support for inline data buffers

2017-10-19 Thread Ross Zwisler

Currently dm-log-writes supports writing filesystem data via BIOs, and
writing internal metadata from a flat buffer via write_metadata().

For DAX writes, though, we won't have a BIO, but will instead have an
iterator that we'll want to use to fill a flat data buffer.

So, create write_inline_data() which allows us to write filesystem data
using a flat buffer as a source, and wire it up in log_one_block().

Signed-off-by: Ross Zwisler 
---
 drivers/md/dm-log-writes.c | 90 +++---
 1 file changed, 86 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 8b80a9c..c65f9d1 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -246,27 +246,109 @@ static int write_metadata(struct log_writes_c *lc, void 
*entry,
return -1;
 }
 
+static int write_inline_data(struct log_writes_c *lc, void *entry,
+ size_t entrylen, void *data, size_t datalen,
+ sector_t sector)
+{
+   int num_pages, bio_pages, pg_datalen, pg_sectorlen, i;
+   struct page *page;
+   struct bio *bio;
+   size_t ret;
+   void *ptr;
+
+   while (datalen) {
+   num_pages = ALIGN(datalen, PAGE_SIZE) >> PAGE_SHIFT;
+   bio_pages = min(num_pages, BIO_MAX_PAGES);
+
+   atomic_inc(>io_blocks);
+
+   bio = bio_alloc(GFP_KERNEL, bio_pages);
+   if (!bio) {
+   DMERR("Couldn't alloc inline data bio");
+   goto error;
+   }
+
+   bio->bi_iter.bi_size = 0;
+   bio->bi_iter.bi_sector = sector;
+   bio_set_dev(bio, lc->logdev->bdev);
+   bio->bi_end_io = log_end_io;
+   bio->bi_private = lc;
+   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
+
+   for (i = 0; i < bio_pages; i++) {
+   pg_datalen = min(datalen, PAGE_SIZE);
+   pg_sectorlen = ALIGN(pg_datalen, lc->sectorsize);
+
+   page = alloc_page(GFP_KERNEL);
+   if (!page) {
+   DMERR("Couldn't alloc inline data page");
+   goto error_bio;
+   }
+
+   ptr = kmap_atomic(page);
+   memcpy(ptr, data, pg_datalen);
+   if (pg_sectorlen > pg_datalen)
+   memset(ptr + pg_datalen, 0,
+   pg_sectorlen - pg_datalen);
+   kunmap_atomic(ptr);
+
+   ret = bio_add_page(bio, page, pg_sectorlen, 0);
+   if (ret != pg_sectorlen) {
+   DMERR("Couldn't add page of inline data");
+   __free_page(page);
+   goto error_bio;
+   }
+
+   datalen -= pg_datalen;
+   data+= pg_datalen;
+   }
+   submit_bio(bio);
+
+   sector += bio_pages * PAGE_SECTORS;
+   }
+   return 0;
+error_bio:
+   bio_free_pages(bio);
+   bio_put(bio);
+error:
+   put_io_block(lc);
+   return -1;
+}
+
 static int log_one_block(struct log_writes_c *lc,
 struct pending_block *block, sector_t sector)
 {
struct bio *bio;
struct log_write_entry entry;
-   size_t ret;
+   size_t metadlen, ret;
int i;
 
entry.sector = cpu_to_le64(block->sector);
entry.nr_sectors = cpu_to_le64(block->nr_sectors);
entry.flags = cpu_to_le64(block->flags);
entry.data_len = cpu_to_le64(block->datalen);
-   if (write_metadata(lc, , sizeof(entry), block->data,
-  block->datalen, sector)) {
+
+   metadlen = (block->flags & LOG_MARK_FLAG) ?  block->datalen : 0;
+   if (write_metadata(lc, , sizeof(entry), block->data, metadlen,
+   sector)) {
free_pending_block(lc, block);
return -1;
}
 
+   sector += dev_to_bio_sectors(lc, 1);
+
+   if (block->datalen && metadlen == 0) {
+   if (write_inline_data(lc, , sizeof(entry), block->data,
+   block->datalen, sector)) {
+   free_pending_block(lc, block);
+   return -1;
+   }
+   /* we don't support both inline data & bio data */
+   goto out;
+   }
+
if (!block->vec_cnt)
goto out;
-   sector += dev_to_bio_sectors(lc, 1);
 
atomic_inc(>io_blocks);
bio = bio_alloc(GFP_KERNEL, min(block->vec_cnt, BIO_MAX_PAGES));
-- 
2.9.5

[PATCH 2/2] dm log writes: add support for DAX

2017-10-19 Thread Ross Zwisler

Now that we have the ability log filesystem writes using a flat buffer, add
support for DAX.  Unfortunately we can't easily track data that has been
written via mmap() now that the dax_flush() abstraction was removed by this
commit:

commit c3ca015fab6d ("dax: remove the pmem_dax_ops->flush abstraction")

Otherwise we could just treat each flush as a big write, and store the data
that is being synced to media.  It may be worthwhile to add the dax_flush()
entry point back, just as a notifier so we can do this logging.

The motivation for this support is the need for an xfstest that can test
the new MAP_SYNC DAX flag.  By logging the filesystem activity with
dm-log-writes we can show that the MAP_SYNC page faults are writing out
their metadata as they happen, instead of requiring an explicit
msync/fsync.

Signed-off-by: Ross Zwisler 
---

Here's a link to Jan's latest MAP_SYNC set, which can be used for the
fstest:

https://www.spinics.net/lists/linux-xfs/msg11852.html

MAP_SYNC is not needed for basic DAX+dm-log-writes functionality.

---
 drivers/md/dm-log-writes.c | 90 +-
 1 file changed, 89 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index c65f9d1..6a8d352 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -10,9 +10,11 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 #define DM_MSG_PREFIX "log-writes"
 
@@ -609,6 +611,50 @@ static int log_mark(struct log_writes_c *lc, char *data)
return 0;
 }
 
+static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes,
+   struct iov_iter *i)
+{
+   struct pending_block *block;
+
+   if (!bytes)
+   return 0;
+
+   block = kzalloc(sizeof(struct pending_block), GFP_KERNEL);
+   if (!block) {
+   DMERR("Error allocating dax pending block");
+   return -ENOMEM;
+   }
+
+   block->data = kzalloc(bytes, GFP_KERNEL);
+   if (!block->data) {
+   DMERR("Error allocating dax data space");
+   kfree(block);
+   return -ENOMEM;
+   }
+
+   /* write data provided via the iterator */
+   if (!copy_from_iter(block->data, bytes, i)) {
+   DMERR("Error copying dax data");
+   kfree(block->data);
+   kfree(block);
+   return -EIO;
+   }
+
+   /* rewind the iterator so that the block driver can use it */
+   iov_iter_revert(i, bytes);
+
+   block->datalen = bytes;
+   block->sector = bio_to_dev_sectors(lc, sector);
+   block->nr_sectors = ALIGN(bytes, lc->sectorsize) >> lc->sectorshift;
+
+   atomic_inc(>pending_blocks);
+   spin_lock_irq(>blocks_lock);
+   list_add_tail(>list, >unflushed_blocks);
+   spin_unlock_irq(>blocks_lock);
+   wake_up_process(lc->log_kthread);
+   return 0;
+}
+
 static void log_writes_dtr(struct dm_target *ti)
 {
struct log_writes_c *lc = ti->private;
@@ -874,9 +920,49 @@ static void log_writes_io_hints(struct dm_target *ti, 
struct queue_limits *limit
limits->io_min = limits->physical_block_size;
 }
 
+static long log_writes_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
+   long nr_pages, void **kaddr, pfn_t *pfn)
+{
+   struct log_writes_c *lc = ti->private;
+   struct block_device *bdev = lc->dev->bdev;
+   struct dax_device *dax_dev = lc->dev->dax_dev;
+   sector_t sector = pgoff * PAGE_SECTORS;
+   int ret;
+
+   ret = bdev_dax_pgoff(bdev, sector, nr_pages * PAGE_SIZE, );
+   if (ret)
+   return ret;
+   return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
+}
+
+static size_t log_writes_dax_copy_from_iter(struct dm_target *ti,
+   pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i)
+{
+   struct log_writes_c *lc = ti->private;
+   struct block_device *bdev = lc->dev->bdev;
+   struct dax_device *dax_dev = lc->dev->dax_dev;
+   sector_t sector = pgoff * PAGE_SECTORS;
+   int err;
+
+   if (bdev_dax_pgoff(bdev, sector, ALIGN(bytes, PAGE_SIZE), ))
+   return 0;
+
+   /* Don't bother doing anything if logging has been disabled */
+   if (!lc->logging_enabled)
+   goto dax_copy;
+
+   err = log_dax(lc, sector, bytes, i);
+   if (err) {
+   DMWARN("Error %d logging DAX write", err);
+   return 0;
+   }
+dax_copy:
+   return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
+}
+
 static struct target_type log_writes_target = {
.name   = "log-writes",
-   .version = {1, 0, 0},
+   .version = {1, 0, 1},
.module = THIS_MODULE,
.ctr= log_writes_ctr,
.dtr= log_writes_dtr,
@@ -887,6 +973,8 @@ static struct target_type log_writes_target = {
.message = log_writes_message,

[PATCH 2/2] dm log writes: add support for DAX

2017-10-19 Thread Ross Zwisler

Now that we have the ability log filesystem writes using a flat buffer, add
support for DAX.  Unfortunately we can't easily track data that has been
written via mmap() now that the dax_flush() abstraction was removed by this
commit:

commit c3ca015fab6d ("dax: remove the pmem_dax_ops->flush abstraction")

Otherwise we could just treat each flush as a big write, and store the data
that is being synced to media.  It may be worthwhile to add the dax_flush()
entry point back, just as a notifier so we can do this logging.

The motivation for this support is the need for an xfstest that can test
the new MAP_SYNC DAX flag.  By logging the filesystem activity with
dm-log-writes we can show that the MAP_SYNC page faults are writing out
their metadata as they happen, instead of requiring an explicit
msync/fsync.

Signed-off-by: Ross Zwisler 
---

Here's a link to Jan's latest MAP_SYNC set, which can be used for the
fstest:

https://www.spinics.net/lists/linux-xfs/msg11852.html

MAP_SYNC is not needed for basic DAX+dm-log-writes functionality.

---
 drivers/md/dm-log-writes.c | 90 +-
 1 file changed, 89 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index c65f9d1..6a8d352 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -10,9 +10,11 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 #define DM_MSG_PREFIX "log-writes"
 
@@ -609,6 +611,50 @@ static int log_mark(struct log_writes_c *lc, char *data)
return 0;
 }
 
+static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes,
+   struct iov_iter *i)
+{
+   struct pending_block *block;
+
+   if (!bytes)
+   return 0;
+
+   block = kzalloc(sizeof(struct pending_block), GFP_KERNEL);
+   if (!block) {
+   DMERR("Error allocating dax pending block");
+   return -ENOMEM;
+   }
+
+   block->data = kzalloc(bytes, GFP_KERNEL);
+   if (!block->data) {
+   DMERR("Error allocating dax data space");
+   kfree(block);
+   return -ENOMEM;
+   }
+
+   /* write data provided via the iterator */
+   if (!copy_from_iter(block->data, bytes, i)) {
+   DMERR("Error copying dax data");
+   kfree(block->data);
+   kfree(block);
+   return -EIO;
+   }
+
+   /* rewind the iterator so that the block driver can use it */
+   iov_iter_revert(i, bytes);
+
+   block->datalen = bytes;
+   block->sector = bio_to_dev_sectors(lc, sector);
+   block->nr_sectors = ALIGN(bytes, lc->sectorsize) >> lc->sectorshift;
+
+   atomic_inc(>pending_blocks);
+   spin_lock_irq(>blocks_lock);
+   list_add_tail(>list, >unflushed_blocks);
+   spin_unlock_irq(>blocks_lock);
+   wake_up_process(lc->log_kthread);
+   return 0;
+}
+
 static void log_writes_dtr(struct dm_target *ti)
 {
struct log_writes_c *lc = ti->private;
@@ -874,9 +920,49 @@ static void log_writes_io_hints(struct dm_target *ti, 
struct queue_limits *limit
limits->io_min = limits->physical_block_size;
 }
 
+static long log_writes_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
+   long nr_pages, void **kaddr, pfn_t *pfn)
+{
+   struct log_writes_c *lc = ti->private;
+   struct block_device *bdev = lc->dev->bdev;
+   struct dax_device *dax_dev = lc->dev->dax_dev;
+   sector_t sector = pgoff * PAGE_SECTORS;
+   int ret;
+
+   ret = bdev_dax_pgoff(bdev, sector, nr_pages * PAGE_SIZE, );
+   if (ret)
+   return ret;
+   return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
+}
+
+static size_t log_writes_dax_copy_from_iter(struct dm_target *ti,
+   pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i)
+{
+   struct log_writes_c *lc = ti->private;
+   struct block_device *bdev = lc->dev->bdev;
+   struct dax_device *dax_dev = lc->dev->dax_dev;
+   sector_t sector = pgoff * PAGE_SECTORS;
+   int err;
+
+   if (bdev_dax_pgoff(bdev, sector, ALIGN(bytes, PAGE_SIZE), ))
+   return 0;
+
+   /* Don't bother doing anything if logging has been disabled */
+   if (!lc->logging_enabled)
+   goto dax_copy;
+
+   err = log_dax(lc, sector, bytes, i);
+   if (err) {
+   DMWARN("Error %d logging DAX write", err);
+   return 0;
+   }
+dax_copy:
+   return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
+}
+
 static struct target_type log_writes_target = {
.name   = "log-writes",
-   .version = {1, 0, 0},
+   .version = {1, 0, 1},
.module = THIS_MODULE,
.ctr= log_writes_ctr,
.dtr= log_writes_dtr,
@@ -887,6 +973,8 @@ static struct target_type log_writes_target = {
.message = log_writes_message,
.iterate_devices =

Re: [alsa-devel] [PATCH 04/14] soundwire: Add MIPI DisCo property helpers

2017-10-19 Thread Vinod Koul

On Thu, Oct 19, 2017 at 11:02:02AM +0200, Takashi Iwai wrote:
> On Thu, 19 Oct 2017 05:03:20 +0200,
> Vinod Koul wrote:
> >  
> > +   slave->ops = drv->ops;
> > +
> > ret = drv->probe(slave, id);
> > if (ret) {
> > dev_err(dev, "Probe of %s failed: %d\n", drv->name, ret);
> > return ret;
> > }
> >  
> > +   /* device is probed so let's read the properties now */
> > +   if (slave->ops && slave->ops->read_prop)
> > +   slave->ops->read_prop(slave);
> > +
> > +   /*
> > +* Check for valid clk_stop_timeout, use DisCo worst case value of
> > +* 300ms
> > +*/
> > +   if (slave->prop.clk_stop_timeout == 0)
> > +   slave->prop.clk_stop_timeout = 300;
> > +
> > +   slave->bus->clk_stop_timeout = max_t(u32, slave->bus->clk_stop_timeout,
> > +   slave->prop.clk_stop_timeout);
> 
> Isn't it racy?
> Also what happens after removing a driver?  The clk_stop_timeout is
> kept high?

Well the spec mandates 300 as worst case, in practice this _should_ be
lesser. I need to double check on behaviour of clock stop on driver removal.
We need to keep in mind that multiple Slaves can be on a bus and removal
maybe from one of them, so we may not do clock stop.

> > +
> > +int sdw_slave_read_dpn(struct sdw_slave *slave,
> > +   struct sdw_dpn_prop *dpn, int count, int ports, char *type)
> 
> Missing comment for a public API function.

Ah missed this one. It was made public for debug, lets make it static now :)

-- 
~Vinod

Re: [alsa-devel] [PATCH 04/14] soundwire: Add MIPI DisCo property helpers

2017-10-19 Thread Vinod Koul

On Thu, Oct 19, 2017 at 11:02:02AM +0200, Takashi Iwai wrote:
> On Thu, 19 Oct 2017 05:03:20 +0200,
> Vinod Koul wrote:
> >  
> > +   slave->ops = drv->ops;
> > +
> > ret = drv->probe(slave, id);
> > if (ret) {
> > dev_err(dev, "Probe of %s failed: %d\n", drv->name, ret);
> > return ret;
> > }
> >  
> > +   /* device is probed so let's read the properties now */
> > +   if (slave->ops && slave->ops->read_prop)
> > +   slave->ops->read_prop(slave);
> > +
> > +   /*
> > +* Check for valid clk_stop_timeout, use DisCo worst case value of
> > +* 300ms
> > +*/
> > +   if (slave->prop.clk_stop_timeout == 0)
> > +   slave->prop.clk_stop_timeout = 300;
> > +
> > +   slave->bus->clk_stop_timeout = max_t(u32, slave->bus->clk_stop_timeout,
> > +   slave->prop.clk_stop_timeout);
> 
> Isn't it racy?
> Also what happens after removing a driver?  The clk_stop_timeout is
> kept high?

Well the spec mandates 300 as worst case, in practice this _should_ be
lesser. I need to double check on behaviour of clock stop on driver removal.
We need to keep in mind that multiple Slaves can be on a bus and removal
maybe from one of them, so we may not do clock stop.

> > +
> > +int sdw_slave_read_dpn(struct sdw_slave *slave,
> > +   struct sdw_dpn_prop *dpn, int count, int ports, char *type)
> 
> Missing comment for a public API function.

Ah missed this one. It was made public for debug, lets make it static now :)

-- 
~Vinod

[PATCH 1/2] ARM: dts: uniphier: add STDMAC clock to EHCI nodes

2017-10-19 Thread Masahiro Yamada

Without the STDMAC clock enabled, the USB 2.0 hosts do not work.
This clock must be explicitly listed in the "clocks" property because
it is independent from the other clocks.

Signed-off-by: Masahiro Yamada 
---

 arch/arm/boot/dts/uniphier-ld4.dtsi  | 9 ++---
 arch/arm/boot/dts/uniphier-pro4.dtsi | 6 --
 arch/arm/boot/dts/uniphier-sld8.dtsi | 9 ++---
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/arm/boot/dts/uniphier-ld4.dtsi 
b/arch/arm/boot/dts/uniphier-ld4.dtsi
index 18f7e73..ecab5f3 100644
--- a/arch/arm/boot/dts/uniphier-ld4.dtsi
+++ b/arch/arm/boot/dts/uniphier-ld4.dtsi
@@ -258,7 +258,8 @@
interrupts = <0 80 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb0>;
-   clocks = <_clk 7>, <_clk 8>, <_clk 12>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 8>,
+<_clk 12>;
resets = <_rst 8>, <_rst 7>, <_rst 8>,
 <_rst 12>;
};
@@ -270,7 +271,8 @@
interrupts = <0 81 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb1>;
-   clocks = <_clk 7>, <_clk 9>, <_clk 13>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 9>,
+<_clk 13>;
resets = <_rst 8>, <_rst 7>, <_rst 9>,
 <_rst 13>;
};
@@ -282,7 +284,8 @@
interrupts = <0 82 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb2>;
-   clocks = <_clk 7>, <_clk 10>, <_clk 14>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 10>,
+<_clk 14>;
resets = <_rst 8>, <_rst 7>, <_rst 10>,
 <_rst 14>;
};
diff --git a/arch/arm/boot/dts/uniphier-pro4.dtsi 
b/arch/arm/boot/dts/uniphier-pro4.dtsi
index 6bbca27..688e356 100644
--- a/arch/arm/boot/dts/uniphier-pro4.dtsi
+++ b/arch/arm/boot/dts/uniphier-pro4.dtsi
@@ -308,7 +308,8 @@
interrupts = <0 80 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb2>;
-   clocks = <_clk 7>, <_clk 8>, <_clk 12>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 8>,
+<_clk 12>;
resets = <_rst 8>, <_rst 7>, <_rst 8>,
 <_rst 12>;
};
@@ -320,7 +321,8 @@
interrupts = <0 81 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb3>;
-   clocks = <_clk 7>, <_clk 9>, <_clk 13>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 9>,
+<_clk 13>;
resets = <_rst 8>, <_rst 7>, <_rst 9>,
 <_rst 13>;
};
diff --git a/arch/arm/boot/dts/uniphier-sld8.dtsi 
b/arch/arm/boot/dts/uniphier-sld8.dtsi
index 39cc0e0..782ca99 100644
--- a/arch/arm/boot/dts/uniphier-sld8.dtsi
+++ b/arch/arm/boot/dts/uniphier-sld8.dtsi
@@ -262,7 +262,8 @@
interrupts = <0 80 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb0>;
-   clocks = <_clk 7>, <_clk 8>, <_clk 12>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 8>,
+<_clk 12>;
resets = <_rst 8>, <_rst 7>, <_rst 8>,
 <_rst 12>;
};
@@ -274,7 +275,8 @@
interrupts = <0 81 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb1>;
-   clocks = <_clk 7>, <_clk 9>, <_clk 13>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 9>,
+<_clk 13>;
resets = <_rst 8>, <_rst 7>, <_rst 9>,
 <_rst 13>;
};
@@ -286,7 +288,8 @@
interrupts = <0 82 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb2>;
-   clocks = <_clk 7>, <_clk 10>, <_clk 14>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 10>,
+<_clk 14>;
resets = <_rst 8>, <_rst 7>, <_rst 10>,
 <_rst 14>;
};
-- 
2.7.4

[PATCH 1/2] ARM: dts: uniphier: add STDMAC clock to EHCI nodes

2017-10-19 Thread Masahiro Yamada

Without the STDMAC clock enabled, the USB 2.0 hosts do not work.
This clock must be explicitly listed in the "clocks" property because
it is independent from the other clocks.

Signed-off-by: Masahiro Yamada 
---

 arch/arm/boot/dts/uniphier-ld4.dtsi  | 9 ++---
 arch/arm/boot/dts/uniphier-pro4.dtsi | 6 --
 arch/arm/boot/dts/uniphier-sld8.dtsi | 9 ++---
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/arm/boot/dts/uniphier-ld4.dtsi 
b/arch/arm/boot/dts/uniphier-ld4.dtsi
index 18f7e73..ecab5f3 100644
--- a/arch/arm/boot/dts/uniphier-ld4.dtsi
+++ b/arch/arm/boot/dts/uniphier-ld4.dtsi
@@ -258,7 +258,8 @@
interrupts = <0 80 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb0>;
-   clocks = <_clk 7>, <_clk 8>, <_clk 12>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 8>,
+<_clk 12>;
resets = <_rst 8>, <_rst 7>, <_rst 8>,
 <_rst 12>;
};
@@ -270,7 +271,8 @@
interrupts = <0 81 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb1>;
-   clocks = <_clk 7>, <_clk 9>, <_clk 13>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 9>,
+<_clk 13>;
resets = <_rst 8>, <_rst 7>, <_rst 9>,
 <_rst 13>;
};
@@ -282,7 +284,8 @@
interrupts = <0 82 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb2>;
-   clocks = <_clk 7>, <_clk 10>, <_clk 14>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 10>,
+<_clk 14>;
resets = <_rst 8>, <_rst 7>, <_rst 10>,
 <_rst 14>;
};
diff --git a/arch/arm/boot/dts/uniphier-pro4.dtsi 
b/arch/arm/boot/dts/uniphier-pro4.dtsi
index 6bbca27..688e356 100644
--- a/arch/arm/boot/dts/uniphier-pro4.dtsi
+++ b/arch/arm/boot/dts/uniphier-pro4.dtsi
@@ -308,7 +308,8 @@
interrupts = <0 80 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb2>;
-   clocks = <_clk 7>, <_clk 8>, <_clk 12>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 8>,
+<_clk 12>;
resets = <_rst 8>, <_rst 7>, <_rst 8>,
 <_rst 12>;
};
@@ -320,7 +321,8 @@
interrupts = <0 81 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb3>;
-   clocks = <_clk 7>, <_clk 9>, <_clk 13>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 9>,
+<_clk 13>;
resets = <_rst 8>, <_rst 7>, <_rst 9>,
 <_rst 13>;
};
diff --git a/arch/arm/boot/dts/uniphier-sld8.dtsi 
b/arch/arm/boot/dts/uniphier-sld8.dtsi
index 39cc0e0..782ca99 100644
--- a/arch/arm/boot/dts/uniphier-sld8.dtsi
+++ b/arch/arm/boot/dts/uniphier-sld8.dtsi
@@ -262,7 +262,8 @@
interrupts = <0 80 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb0>;
-   clocks = <_clk 7>, <_clk 8>, <_clk 12>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 8>,
+<_clk 12>;
resets = <_rst 8>, <_rst 7>, <_rst 8>,
 <_rst 12>;
};
@@ -274,7 +275,8 @@
interrupts = <0 81 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb1>;
-   clocks = <_clk 7>, <_clk 9>, <_clk 13>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 9>,
+<_clk 13>;
resets = <_rst 8>, <_rst 7>, <_rst 9>,
 <_rst 13>;
};
@@ -286,7 +288,8 @@
interrupts = <0 82 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb2>;
-   clocks = <_clk 7>, <_clk 10>, <_clk 14>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 10>,
+<_clk 14>;
resets = <_rst 8>, <_rst 7>, <_rst 10>,
 <_rst 14>;
};
-- 
2.7.4

[PATCH 2/2] arm64: dts: uniphier: add STDMAC clock to EHCI nodes

2017-10-19 Thread Masahiro Yamada

Without the STDMAC clock enabled, the USB 2.0 hosts do not work.
This clock must be explicitly listed in the "clocks" property because
it is independent from the other clocks.

Signed-off-by: Masahiro Yamada 
---

 arch/arm64/boot/dts/socionext/uniphier-ld11.dtsi | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/boot/dts/socionext/uniphier-ld11.dtsi 
b/arch/arm64/boot/dts/socionext/uniphier-ld11.dtsi
index 99f14cc..2c03d96 100644
--- a/arch/arm64/boot/dts/socionext/uniphier-ld11.dtsi
+++ b/arch/arm64/boot/dts/socionext/uniphier-ld11.dtsi
@@ -324,7 +324,8 @@
interrupts = <0 243 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb0>;
-   clocks = <_clk 7>, <_clk 8>, <_clk 12>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 8>,
+<_clk 12>;
resets = <_rst 8>, <_rst 7>, <_rst 8>,
 <_rst 12>;
};
@@ -336,7 +337,8 @@
interrupts = <0 244 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb1>;
-   clocks = <_clk 7>, <_clk 9>, <_clk 13>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 9>,
+<_clk 13>;
resets = <_rst 8>, <_rst 7>, <_rst 9>,
 <_rst 13>;
};
@@ -348,7 +350,8 @@
interrupts = <0 245 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb2>;
-   clocks = <_clk 7>, <_clk 10>, <_clk 14>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 10>,
+<_clk 14>;
resets = <_rst 8>, <_rst 7>, <_rst 10>,
 <_rst 14>;
};
-- 
2.7.4

[PATCH 2/2] arm64: dts: uniphier: add STDMAC clock to EHCI nodes

2017-10-19 Thread Masahiro Yamada

Without the STDMAC clock enabled, the USB 2.0 hosts do not work.
This clock must be explicitly listed in the "clocks" property because
it is independent from the other clocks.

Signed-off-by: Masahiro Yamada 
---

 arch/arm64/boot/dts/socionext/uniphier-ld11.dtsi | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/boot/dts/socionext/uniphier-ld11.dtsi 
b/arch/arm64/boot/dts/socionext/uniphier-ld11.dtsi
index 99f14cc..2c03d96 100644
--- a/arch/arm64/boot/dts/socionext/uniphier-ld11.dtsi
+++ b/arch/arm64/boot/dts/socionext/uniphier-ld11.dtsi
@@ -324,7 +324,8 @@
interrupts = <0 243 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb0>;
-   clocks = <_clk 7>, <_clk 8>, <_clk 12>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 8>,
+<_clk 12>;
resets = <_rst 8>, <_rst 7>, <_rst 8>,
 <_rst 12>;
};
@@ -336,7 +337,8 @@
interrupts = <0 244 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb1>;
-   clocks = <_clk 7>, <_clk 9>, <_clk 13>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 9>,
+<_clk 13>;
resets = <_rst 8>, <_rst 7>, <_rst 9>,
 <_rst 13>;
};
@@ -348,7 +350,8 @@
interrupts = <0 245 4>;
pinctrl-names = "default";
pinctrl-0 = <_usb2>;
-   clocks = <_clk 7>, <_clk 10>, <_clk 14>;
+   clocks = <_clk 8>, <_clk 7>, <_clk 10>,
+<_clk 14>;
resets = <_rst 8>, <_rst 7>, <_rst 10>,
 <_rst 14>;
};
-- 
2.7.4

Re: [alsa-devel] [PATCH 03/14] soundwire: Add Master registration

2017-10-19 Thread Vinod Koul

On Thu, Oct 19, 2017 at 10:54:50AM +0200, Takashi Iwai wrote:
> On Thu, 19 Oct 2017 05:03:19 +0200,
> Vinod Koul wrote:

> > +int sdw_add_bus_master(struct sdw_bus *bus)
> > +{
> > +   int ret;
> > +
> > +   if (!bus->dev) {
> > +   pr_err("SoundWire bus has no device");
> > +   return -ENODEV;
> > +   }
> > +
> > +   mutex_init(>bus_lock);
> > +   INIT_LIST_HEAD(>slaves);
> > +
> > +   /*
> > +* SDW is an enumerable bus, but devices can be powered off. So,
> > +* they won't be able to report as present.
> > +*
> > +* Create Slave devices based on Slaves described in
> > +* the respective firmware (ACPI/DT)
> > +*/
> > +
> > +   if (IS_ENABLED(CONFIG_ACPI) && bus->dev && ACPI_HANDLE(bus->dev))
> > +   ret = sdw_acpi_find_slaves(bus);
> > +   else if (IS_ENABLED(CONFIG_OF) && bus->dev && bus->dev->of_node)
> 
> The bus->dev NULL check is already done at the beginning of the
> function, so here are superfluous.

right

> > +static int sdw_delete_slave(struct device *dev, void *data)
> > +{
> > +   struct sdw_slave *slave = dev_to_sdw_dev(dev);
> > +   struct sdw_bus *bus = slave->bus;
> > +
> > +   mutex_lock(>bus_lock);
> > +   if (!list_empty(>slaves))
> > +   list_del(>node);
> 
> You can perform list_del_init() without empty check.

Better :)

> 
> > +void sdw_extract_slave_id(struct sdw_bus *bus,
> > +   unsigned long long addr, struct sdw_slave_id *id)
> 
> Use u64 instead.

okay

> > +{
> > +   dev_dbg(bus->dev, "SDW Slave Addr: %llx", addr);
> > +
> > +   /*
> > +* Spec definition
> > +*   Register   Bit Contents
> > +*   DevId_0 [7:4]  47:44   sdw_version
> > +*   DevId_0 [3:0]  43:40   unique_id
> > +*   DevId_139:32   mfg_id [15:8]
> > +*   DevId_231:24   mfg_id [7:0]
> > +*   DevId_323:16   part_id [15:8]
> > +*   DevId_415:08   part_id [7:0]
> > +*   DevId_507:00   class_id
> > +*/
> > +   id->sdw_version = (addr >> 44) & GENMASK(3, 0);
> > +   id->unique_id = (addr >> 40) & GENMASK(3, 0);
> > +   id->mfg_id = (addr >> 24) & GENMASK(15, 0);
> > +   id->part_id = (addr >> 8) & GENMASK(15, 0);
> > +   id->class_id = addr & GENMASK(7, 0);
> > +
> > +   dev_info(bus->dev,
> > +   "SDW Slave class_id %x, part_id %x, mfg_id %x, unique_id %x, 
> > version %x",
> > +   id->class_id, id->part_id, id->mfg_id,
> > +   id->unique_id, id->sdw_version);
> > +
> 
> Do we want to print a message always at each invocation?

Not really, lets make it debug

> > +static int sdw_slave_add(struct sdw_bus *bus,
> > +   struct sdw_slave_id *id, struct fwnode_handle *fwnode)
> > +{
> > +   struct sdw_slave *slave;
> > +   char name[32];
> > +   int ret;
> > +
> > +   slave = kzalloc(sizeof(*slave), GFP_KERNEL);
> > +   if (!slave)
> > +   return -ENOMEM;
> > +
> > +   /* Initialize data structure */
> > +   memcpy(>id, id, sizeof(*id));
> > +
> > +   /* name shall be sdw:link:mfg:part:class:unique */
> > +   snprintf(name, sizeof(name), "sdw:%x:%x:%x:%x:%x",
> > +   bus->link_id, id->mfg_id, id->part_id,
> > +   id->class_id, id->unique_id);
> 
> You can set the name directly via dev_set_name().  It's printf format,
> after all.

right, am using it but with this string :D

> > +   slave->dev.parent = bus->dev;
> > +   slave->dev.fwnode = fwnode;
> > +   dev_set_name(>dev, "%s", name);
> > +   slave->dev.release = sdw_slave_release;
> > +   slave->dev.bus = _bus_type;
> > +   slave->bus = bus;
> > +   slave->status = SDW_SLAVE_UNATTACHED;
> > +   slave->dev_num = 0;
> > +
> > +   mutex_lock(>bus_lock);
> > +   list_add_tail(>node, >slaves);
> > +   mutex_unlock(>bus_lock);
> > +
> > +   ret = device_register(>dev);
> > +   if (ret) {
> > +   dev_err(bus->dev, "Failed to add slave: ret %d\n", ret);
> > +
> > +   /*
> > +* On err, don't free but drop ref as this will be freed
> > +* when release method is invoked.
> > +*/
> > +   put_device(>dev);
> 
> Wouldn't it leave a stale link to bus?

yes that needs to be removed too, thanks for pointing

-- 
~Vinod

Re: [alsa-devel] [PATCH 03/14] soundwire: Add Master registration

2017-10-19 Thread Vinod Koul

On Thu, Oct 19, 2017 at 10:54:50AM +0200, Takashi Iwai wrote:
> On Thu, 19 Oct 2017 05:03:19 +0200,
> Vinod Koul wrote:

> > +int sdw_add_bus_master(struct sdw_bus *bus)
> > +{
> > +   int ret;
> > +
> > +   if (!bus->dev) {
> > +   pr_err("SoundWire bus has no device");
> > +   return -ENODEV;
> > +   }
> > +
> > +   mutex_init(>bus_lock);
> > +   INIT_LIST_HEAD(>slaves);
> > +
> > +   /*
> > +* SDW is an enumerable bus, but devices can be powered off. So,
> > +* they won't be able to report as present.
> > +*
> > +* Create Slave devices based on Slaves described in
> > +* the respective firmware (ACPI/DT)
> > +*/
> > +
> > +   if (IS_ENABLED(CONFIG_ACPI) && bus->dev && ACPI_HANDLE(bus->dev))
> > +   ret = sdw_acpi_find_slaves(bus);
> > +   else if (IS_ENABLED(CONFIG_OF) && bus->dev && bus->dev->of_node)
> 
> The bus->dev NULL check is already done at the beginning of the
> function, so here are superfluous.

right

> > +static int sdw_delete_slave(struct device *dev, void *data)
> > +{
> > +   struct sdw_slave *slave = dev_to_sdw_dev(dev);
> > +   struct sdw_bus *bus = slave->bus;
> > +
> > +   mutex_lock(>bus_lock);
> > +   if (!list_empty(>slaves))
> > +   list_del(>node);
> 
> You can perform list_del_init() without empty check.

Better :)

> 
> > +void sdw_extract_slave_id(struct sdw_bus *bus,
> > +   unsigned long long addr, struct sdw_slave_id *id)
> 
> Use u64 instead.

okay

> > +{
> > +   dev_dbg(bus->dev, "SDW Slave Addr: %llx", addr);
> > +
> > +   /*
> > +* Spec definition
> > +*   Register   Bit Contents
> > +*   DevId_0 [7:4]  47:44   sdw_version
> > +*   DevId_0 [3:0]  43:40   unique_id
> > +*   DevId_139:32   mfg_id [15:8]
> > +*   DevId_231:24   mfg_id [7:0]
> > +*   DevId_323:16   part_id [15:8]
> > +*   DevId_415:08   part_id [7:0]
> > +*   DevId_507:00   class_id
> > +*/
> > +   id->sdw_version = (addr >> 44) & GENMASK(3, 0);
> > +   id->unique_id = (addr >> 40) & GENMASK(3, 0);
> > +   id->mfg_id = (addr >> 24) & GENMASK(15, 0);
> > +   id->part_id = (addr >> 8) & GENMASK(15, 0);
> > +   id->class_id = addr & GENMASK(7, 0);
> > +
> > +   dev_info(bus->dev,
> > +   "SDW Slave class_id %x, part_id %x, mfg_id %x, unique_id %x, 
> > version %x",
> > +   id->class_id, id->part_id, id->mfg_id,
> > +   id->unique_id, id->sdw_version);
> > +
> 
> Do we want to print a message always at each invocation?

Not really, lets make it debug

> > +static int sdw_slave_add(struct sdw_bus *bus,
> > +   struct sdw_slave_id *id, struct fwnode_handle *fwnode)
> > +{
> > +   struct sdw_slave *slave;
> > +   char name[32];
> > +   int ret;
> > +
> > +   slave = kzalloc(sizeof(*slave), GFP_KERNEL);
> > +   if (!slave)
> > +   return -ENOMEM;
> > +
> > +   /* Initialize data structure */
> > +   memcpy(>id, id, sizeof(*id));
> > +
> > +   /* name shall be sdw:link:mfg:part:class:unique */
> > +   snprintf(name, sizeof(name), "sdw:%x:%x:%x:%x:%x",
> > +   bus->link_id, id->mfg_id, id->part_id,
> > +   id->class_id, id->unique_id);
> 
> You can set the name directly via dev_set_name().  It's printf format,
> after all.

right, am using it but with this string :D

> > +   slave->dev.parent = bus->dev;
> > +   slave->dev.fwnode = fwnode;
> > +   dev_set_name(>dev, "%s", name);
> > +   slave->dev.release = sdw_slave_release;
> > +   slave->dev.bus = _bus_type;
> > +   slave->bus = bus;
> > +   slave->status = SDW_SLAVE_UNATTACHED;
> > +   slave->dev_num = 0;
> > +
> > +   mutex_lock(>bus_lock);
> > +   list_add_tail(>node, >slaves);
> > +   mutex_unlock(>bus_lock);
> > +
> > +   ret = device_register(>dev);
> > +   if (ret) {
> > +   dev_err(bus->dev, "Failed to add slave: ret %d\n", ret);
> > +
> > +   /*
> > +* On err, don't free but drop ref as this will be freed
> > +* when release method is invoked.
> > +*/
> > +   put_device(>dev);
> 
> Wouldn't it leave a stale link to bus?

yes that needs to be removed too, thanks for pointing

-- 
~Vinod

Re: [PATCH GHAK16 V5 00/10] capabilities: do not audit log BPRM_FCAPS on set*id

2017-10-19 Thread James Morris

On Thu, 19 Oct 2017, Richard Guy Briggs wrote:

> On 2017-10-11 20:57, Richard Guy Briggs wrote:
> > The audit subsystem is adding a BPRM_FCAPS record when auditing setuid
> > application execution (SYSCALL execve). This is not expected as it was
> > supposed to be limited to when the file system actually had capabilities
> > in an extended attribute.  It lists all capabilities making the event
> > really ugly to parse what is happening.  The PATH record correctly
> > records the setuid bit and owner.  Suppress the BPRM_FCAPS record on
> > set*id.
> 
> 
> 
> Serge?  James?  Can one of you two take this via your trees since Paul
> has backed down citing (reasonably) that it is mostly capabilities
> patches rather than audit?

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
next-general

-- 
James Morris

Re: [PATCH GHAK16 V5 00/10] capabilities: do not audit log BPRM_FCAPS on set*id

2017-10-19 Thread James Morris

On Thu, 19 Oct 2017, Richard Guy Briggs wrote:

> On 2017-10-11 20:57, Richard Guy Briggs wrote:
> > The audit subsystem is adding a BPRM_FCAPS record when auditing setuid
> > application execution (SYSCALL execve). This is not expected as it was
> > supposed to be limited to when the file system actually had capabilities
> > in an extended attribute.  It lists all capabilities making the event
> > really ugly to parse what is happening.  The PATH record correctly
> > records the setuid bit and owner.  Suppress the BPRM_FCAPS record on
> > set*id.
> 
> 
> 
> Serge?  James?  Can one of you two take this via your trees since Paul
> has backed down citing (reasonably) that it is mostly capabilities
> patches rather than audit?

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
next-general

-- 
James Morris

Re: [PATCH v6 6/6] perf util: use correct IP mapping to find srcline for hist entry

2017-10-19 Thread Namhyung Kim

Hi Milian,

On Thu, Oct 19, 2017 at 12:54:18PM +0200, Milian Wolff wrote:
> On Mittwoch, 18. Oktober 2017 20:53:50 CEST Milian Wolff wrote:
> > When inline frame resolution is disabled, a bogus srcline is obtained
> > for hist entries:
> > 
> > ~
> > $ perf report -s sym,srcline --no-inline --stdio -g none
> > 95.21% 0.00%  [.] __libc_start_main 
> > 
> > __libc_start_main+18446603358170398953 95.21% 0.00%  [.] _start
> >
> >  _start+18446650082411225129 46.67% 0.00%  [.]
> > main   
> > main+18446650082411225208 38.75%   
> >  0.00%  [.] hypot  
> >
> > hypot+18446603358164312084 23.75% 0.00%  [.] main  
> >
> >  main+18446650082411225151 20.83%20.83%  [.]
> > std::generate_canonical > std::linear_congruential_engine
> > >  random.h:143 18.12% 0.00%  [.] main 
> >
> >   main+18446650082411225165 13.12%13.12%  [.]
> > std::generate_canonical > std::linear_congruential_engine
> > >  random.tcc:3330 4.17% 4.17%  [.] __hypot_finite 
> >
> > __hypot_finite+163 4.17% 4.17%  [.] std::generate_canonical > 53ul, std::linear_congruential_engine > 2147483647ul> >  random.tcc: 4.17% 0.00%  [.] __hypot_finite   
> >
> >   __hypot_finite+18446603358164312227 4.17% 0.00%  [.]
> > std::generate_canonical > std::linear_congruential_engine
> > >  std::generate_canonical > std::generate_canonical > std::linear_congruential_engine
> > >  std::generate_canonical > __hypot_finite 
> > __hypot_finite+11 2.50% 2.50% 
> > [.] __hypot_finite 
> > __hypot_finite+24 2.50%
> > 0.00%  [.] __hypot_finite  
> >   
> > __hypot_finite+18446603358164312075 2.50% 0.00%  [.] __hypot_finite
> >
> >  __hypot_finite+18446603358164312088 ~
> > 
> > Note how we get very large offsets to main and cannot see any srcline
> > from one of the complex or random headers, even though the instruction
> > pointers actually lie in code inlined from there.
> > 
> > This patch fixes the mapping to use map__objdump_2mem instead of
> > map__objdump_2mem in hist_entry__get_srcline. This fixes the srcline
> > values for me when inline resolution is disabled:
> > 
> > ~
> > $ perf report -s sym,srcline --no-inline --stdio -g none
> > 95.21% 0.00%  [.] __libc_start_main 
> > 
> > __libc_start_main+233 95.21% 0.00%  [.] _start 
> >
> > _start+41 46.88% 0.00%  [.] main   
> >
> > complex:589 43.96% 0.00%  [.] main 
> >
> >   random.h:185 38.75% 0.00%  [.] hypot 
> >
> >  hypot+20 20.83% 0.00%  [.] std::generate_canonical > std::linear_congruential_engine
> > >  random.h:143 13.12% 0.00%  [.] std::generate_canonical > std::linear_congruential_engine
> > >  random.tcc:3330 4.17% 4.17%  [.] __hypot_finite 
> >
> > __hypot_finite+140715545239715 4.17% 4.17%  [.]
> > std::generate_canonical > std::linear_congruential_engine
> > >  std::generate_canonical > __hypot_finite

Re: [PATCH v6 6/6] perf util: use correct IP mapping to find srcline for hist entry

2017-10-19 Thread Namhyung Kim

Hi Milian,

On Thu, Oct 19, 2017 at 12:54:18PM +0200, Milian Wolff wrote:
> On Mittwoch, 18. Oktober 2017 20:53:50 CEST Milian Wolff wrote:
> > When inline frame resolution is disabled, a bogus srcline is obtained
> > for hist entries:
> > 
> > ~
> > $ perf report -s sym,srcline --no-inline --stdio -g none
> > 95.21% 0.00%  [.] __libc_start_main 
> > 
> > __libc_start_main+18446603358170398953 95.21% 0.00%  [.] _start
> >
> >  _start+18446650082411225129 46.67% 0.00%  [.]
> > main   
> > main+18446650082411225208 38.75%   
> >  0.00%  [.] hypot  
> >
> > hypot+18446603358164312084 23.75% 0.00%  [.] main  
> >
> >  main+18446650082411225151 20.83%20.83%  [.]
> > std::generate_canonical > std::linear_congruential_engine
> > >  random.h:143 18.12% 0.00%  [.] main 
> >
> >   main+18446650082411225165 13.12%13.12%  [.]
> > std::generate_canonical > std::linear_congruential_engine
> > >  random.tcc:3330 4.17% 4.17%  [.] __hypot_finite 
> >
> > __hypot_finite+163 4.17% 4.17%  [.] std::generate_canonical > 53ul, std::linear_congruential_engine > 2147483647ul> >  random.tcc: 4.17% 0.00%  [.] __hypot_finite   
> >
> >   __hypot_finite+18446603358164312227 4.17% 0.00%  [.]
> > std::generate_canonical > std::linear_congruential_engine
> > >  std::generate_canonical > std::generate_canonical > std::linear_congruential_engine
> > >  std::generate_canonical > __hypot_finite 
> > __hypot_finite+11 2.50% 2.50% 
> > [.] __hypot_finite 
> > __hypot_finite+24 2.50%
> > 0.00%  [.] __hypot_finite  
> >   
> > __hypot_finite+18446603358164312075 2.50% 0.00%  [.] __hypot_finite
> >
> >  __hypot_finite+18446603358164312088 ~
> > 
> > Note how we get very large offsets to main and cannot see any srcline
> > from one of the complex or random headers, even though the instruction
> > pointers actually lie in code inlined from there.
> > 
> > This patch fixes the mapping to use map__objdump_2mem instead of
> > map__objdump_2mem in hist_entry__get_srcline. This fixes the srcline
> > values for me when inline resolution is disabled:
> > 
> > ~
> > $ perf report -s sym,srcline --no-inline --stdio -g none
> > 95.21% 0.00%  [.] __libc_start_main 
> > 
> > __libc_start_main+233 95.21% 0.00%  [.] _start 
> >
> > _start+41 46.88% 0.00%  [.] main   
> >
> > complex:589 43.96% 0.00%  [.] main 
> >
> >   random.h:185 38.75% 0.00%  [.] hypot 
> >
> >  hypot+20 20.83% 0.00%  [.] std::generate_canonical > std::linear_congruential_engine
> > >  random.h:143 13.12% 0.00%  [.] std::generate_canonical > std::linear_congruential_engine
> > >  random.tcc:3330 4.17% 4.17%  [.] __hypot_finite 
> >
> > __hypot_finite+140715545239715 4.17% 4.17%  [.]
> > std::generate_canonical > std::linear_congruential_engine
> > >  std::generate_canonical > __hypot_finite 
> > __hypot_finite+163 4.17% 0.00% 
> > [.] std::generate_canonical > std::linear_congruential_engine
> > >  random.tcc: 2.92% 2.92%  [.]

Re: [PATCH 02/14] soundwire: Add SoundWire bus type

2017-10-19 Thread Vinod Koul

On Thu, Oct 19, 2017 at 09:40:06AM +0200, Takashi Iwai wrote:
> On Thu, 19 Oct 2017 05:03:18 +0200,
> Vinod Koul wrote:
> > +
> > +config SOUNDWIRE_BUS
> > +   tristate
> > +   default SOUNDWIRE
> > +
> 
> Does it make sense to be tristate?
> Since CONFIG_SOUNDWIRE is a bool, the above would be also only either
> Y or N.  If it's Y and others select M, it'll be still Y.

hmmm good point. I think would make sense to make SOUNDWIRE as tristate too,
just like SOUND :)

> > + * sdw_get_device_id: find the matching SoundWire device id
> > + *
> > + * @slave: SoundWire Slave device
> > + * @drv: SoundWire Slave Driver
> 
> Inconsistent upper/lower letters in these two lines.

thanks for spotting, will fix

> > + * The match is done by comparing the mfg_id and part_id from the
> > + * struct sdw_device_id. class_id is unused, as it is a placeholder
> > + * in MIPI Spec.
> > + */
> > +static const struct sdw_device_id *
> > +sdw_get_device_id(struct sdw_slave *slave, struct sdw_driver *drv)
> > +{
> > +   const struct sdw_device_id *id = drv->id_table;
> > +
> > +   while (id && id->mfg_id) {
> > +   if (slave->id.mfg_id == id->mfg_id &&
> > +   slave->id.part_id == id->part_id) {
> 
> Please indentation properly.

what do you advise?

if (slave->id.mfg_id == id->mfg_id &&
slave->id.part_id == id->part_id) {

would mean below one is at same indent. Some people use:

if (slave->id.mfg_id == id->mfg_id &&
   slave->id.part_id == id->part_id) {

Is it Documented anywhere...


> 
> > +   return id;
> > +   }
> 
> Superfluous braces for a single-line.

That bit was intentional. Yes it is not required but given that if condition
was falling to two lines, I wanted to help readability by adding these. I can
remove them..

> 
> > +   id++;
> > +   }
> > +
> > +   return NULL;
> > +}
> > +
> > +static int sdw_bus_match(struct device *dev, struct device_driver *ddrv)
> > +{
> > +   struct sdw_slave *slave = dev_to_sdw_dev(dev);
> > +   struct sdw_driver *drv = drv_to_sdw_driver(ddrv);
> > +
> > +   return !!sdw_get_device_id(slave, drv);
> > +}
> > +
> > +int sdw_slave_modalias(struct sdw_slave *slave, char *buf, size_t size)
> 
> I'd put const to slave argument, as it won't be modified.

right...

> 
> > --- a/include/linux/mod_devicetable.h
> > +++ b/include/linux/mod_devicetable.h
> > @@ -228,6 +228,13 @@ struct hda_device_id {
> > unsigned long driver_data;
> >  };
> >  
> > +struct sdw_device_id {
> > +   __u16 mfg_id;
> > +   __u16 part_id;
> > +   __u8 class_id;
> > +   kernel_ulong_t driver_data;
> 
> Better to think of alignment.

sorry not quite clear, do you mind elaborating which ones to align?

> 
> 
> > --- /dev/null
> > +++ b/include/linux/soundwire/sdw.h
> 
> > +/**
> > + * struct sdw_bus: SoundWire bus
> > + *
> > + * @dev: Master linux device
> > + * @link_id: Link id number, can be 0 to N, unique for each Master
> > + * @slaves: list of Slaves on this bus
> > + * @assigned: logical addresses assigned, Index 0 (broadcast) would be 
> > unused
> > + * @bus_lock: bus lock
> > + */
> > +struct sdw_bus {
> > +   struct device *dev;
> > +   unsigned int link_id;
> > +   struct list_head slaves;
> > +   bool assigned[SDW_MAX_DEVICES + 1];
> 
> Why not a bitmap?

That a very good suggestion, it will help in other things too :)

-- 
~Vinod

Re: [PATCH 02/14] soundwire: Add SoundWire bus type

2017-10-19 Thread Vinod Koul

On Thu, Oct 19, 2017 at 09:40:06AM +0200, Takashi Iwai wrote:
> On Thu, 19 Oct 2017 05:03:18 +0200,
> Vinod Koul wrote:
> > +
> > +config SOUNDWIRE_BUS
> > +   tristate
> > +   default SOUNDWIRE
> > +
> 
> Does it make sense to be tristate?
> Since CONFIG_SOUNDWIRE is a bool, the above would be also only either
> Y or N.  If it's Y and others select M, it'll be still Y.

hmmm good point. I think would make sense to make SOUNDWIRE as tristate too,
just like SOUND :)

> > + * sdw_get_device_id: find the matching SoundWire device id
> > + *
> > + * @slave: SoundWire Slave device
> > + * @drv: SoundWire Slave Driver
> 
> Inconsistent upper/lower letters in these two lines.

thanks for spotting, will fix

> > + * The match is done by comparing the mfg_id and part_id from the
> > + * struct sdw_device_id. class_id is unused, as it is a placeholder
> > + * in MIPI Spec.
> > + */
> > +static const struct sdw_device_id *
> > +sdw_get_device_id(struct sdw_slave *slave, struct sdw_driver *drv)
> > +{
> > +   const struct sdw_device_id *id = drv->id_table;
> > +
> > +   while (id && id->mfg_id) {
> > +   if (slave->id.mfg_id == id->mfg_id &&
> > +   slave->id.part_id == id->part_id) {
> 
> Please indentation properly.

what do you advise?

if (slave->id.mfg_id == id->mfg_id &&
slave->id.part_id == id->part_id) {

would mean below one is at same indent. Some people use:

if (slave->id.mfg_id == id->mfg_id &&
   slave->id.part_id == id->part_id) {

Is it Documented anywhere...


> 
> > +   return id;
> > +   }
> 
> Superfluous braces for a single-line.

That bit was intentional. Yes it is not required but given that if condition
was falling to two lines, I wanted to help readability by adding these. I can
remove them..

> 
> > +   id++;
> > +   }
> > +
> > +   return NULL;
> > +}
> > +
> > +static int sdw_bus_match(struct device *dev, struct device_driver *ddrv)
> > +{
> > +   struct sdw_slave *slave = dev_to_sdw_dev(dev);
> > +   struct sdw_driver *drv = drv_to_sdw_driver(ddrv);
> > +
> > +   return !!sdw_get_device_id(slave, drv);
> > +}
> > +
> > +int sdw_slave_modalias(struct sdw_slave *slave, char *buf, size_t size)
> 
> I'd put const to slave argument, as it won't be modified.

right...

> 
> > --- a/include/linux/mod_devicetable.h
> > +++ b/include/linux/mod_devicetable.h
> > @@ -228,6 +228,13 @@ struct hda_device_id {
> > unsigned long driver_data;
> >  };
> >  
> > +struct sdw_device_id {
> > +   __u16 mfg_id;
> > +   __u16 part_id;
> > +   __u8 class_id;
> > +   kernel_ulong_t driver_data;
> 
> Better to think of alignment.

sorry not quite clear, do you mind elaborating which ones to align?

> 
> 
> > --- /dev/null
> > +++ b/include/linux/soundwire/sdw.h
> 
> > +/**
> > + * struct sdw_bus: SoundWire bus
> > + *
> > + * @dev: Master linux device
> > + * @link_id: Link id number, can be 0 to N, unique for each Master
> > + * @slaves: list of Slaves on this bus
> > + * @assigned: logical addresses assigned, Index 0 (broadcast) would be 
> > unused
> > + * @bus_lock: bus lock
> > + */
> > +struct sdw_bus {
> > +   struct device *dev;
> > +   unsigned int link_id;
> > +   struct list_head slaves;
> > +   bool assigned[SDW_MAX_DEVICES + 1];
> 
> Why not a bitmap?

That a very good suggestion, it will help in other things too :)

-- 
~Vinod

Re: [Patch v6 6/7] regmap: add SLIMBUS support

2017-10-19 Thread Bjorn Andersson

On Fri 06 Oct 08:51 PDT 2017, srinivas.kandaga...@linaro.org wrote:

> diff --git a/drivers/base/regmap/regmap-slimbus.c 
> b/drivers/base/regmap/regmap-slimbus.c
[..]
> +static int regmap_slimbus_byte_reg_read(void *context, unsigned int reg,
> + unsigned int *val)
> +{
> + struct slim_device *slim = context;
> + struct slim_val_inf msg = {0,};
> +
> + msg.start_offset = reg;
> + msg.num_bytes = 1;
> + msg.rbuf = (void *)val;

Turn rbuf into a void * and you don't need this cast (think I commented
on this on a previous patch as well).

> +
> + return slim_request_val_element(slim, );
> +}
> +
> +static int regmap_slimbus_byte_reg_write(void *context, unsigned int reg,
> + unsigned int val)
> +{
> + struct slim_device *slim = context;
> + struct slim_val_inf msg = {0,};
> +
> + msg.start_offset = reg;
> + msg.num_bytes = 1;
> + msg.wbuf = (void *)

Dito

> +
> + return slim_change_val_element(slim, );
> +}

Regards,
Bjorn

Re: [PATCH 1/3] dt-bindings: soc: qcom: Support GLINK intents

2017-10-19 Thread Bjorn Andersson

On Wed 18 Oct 18:10 PDT 2017, Chris Lew wrote:

> Virtual GLINK channels may know what throughput to expect from a
> remoteproc. An intent advertises to the remoteproc this channel is
> ready to receive data. Allow a channel to define the size and amount of
> intents to be prequeued.
>
> Signed-off-by: Chris Lew 
> ---
>  Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt 
> b/Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt
> index b277eca861f7..6c21f76822ca 100644
> --- a/Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt
> +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt
> @@ -39,6 +39,14 @@ of these nodes are defined by the individual bindings for 
> the specific function
>   Definition: a list of channels tied to this function, used for matching
>  the function to a set of virtual channels
>
> +- intents:

qcom,intents or qcom,default-intents

> + Usage: optional
> + Value type: 
> + Definition: a list of size,amount pairs describing what intents should
> +be preallocated for this virtual channel. If a GLINK node
> +supports intents, an intent advertises this channel is ready
> +to receive data.

Rather than describing what a GLINK intent is I would suggest that you
replace this second sentence with something like "This can be used to
tweak the default intents available for the channel to meet expectations
of the remote."

> +
>  = EXAMPLE
>  The following example represents the GLINK RPM node on a MSM8996 device, with
>  the function for the "rpm_request" channel defined, which is used for
> @@ -69,6 +77,8 @@ regualtors and root clocks.
>   compatible = "qcom,rpm-msm8996";
>   qcom,glink-channels = "rpm_requests";
>
> + intents = <0x400 5
> +   0x800 1>;
>   ...
>   };
>   };

Regards,
Bjorn

Re: [Patch v6 6/7] regmap: add SLIMBUS support

2017-10-19 Thread Bjorn Andersson

On Fri 06 Oct 08:51 PDT 2017, srinivas.kandaga...@linaro.org wrote:

> diff --git a/drivers/base/regmap/regmap-slimbus.c 
> b/drivers/base/regmap/regmap-slimbus.c
[..]
> +static int regmap_slimbus_byte_reg_read(void *context, unsigned int reg,
> + unsigned int *val)
> +{
> + struct slim_device *slim = context;
> + struct slim_val_inf msg = {0,};
> +
> + msg.start_offset = reg;
> + msg.num_bytes = 1;
> + msg.rbuf = (void *)val;

Turn rbuf into a void * and you don't need this cast (think I commented
on this on a previous patch as well).

> +
> + return slim_request_val_element(slim, );
> +}
> +
> +static int regmap_slimbus_byte_reg_write(void *context, unsigned int reg,
> + unsigned int val)
> +{
> + struct slim_device *slim = context;
> + struct slim_val_inf msg = {0,};
> +
> + msg.start_offset = reg;
> + msg.num_bytes = 1;
> + msg.wbuf = (void *)

Dito

> +
> + return slim_change_val_element(slim, );
> +}

Regards,
Bjorn

Re: [PATCH 1/3] dt-bindings: soc: qcom: Support GLINK intents

2017-10-19 Thread Bjorn Andersson

On Wed 18 Oct 18:10 PDT 2017, Chris Lew wrote:

> Virtual GLINK channels may know what throughput to expect from a
> remoteproc. An intent advertises to the remoteproc this channel is
> ready to receive data. Allow a channel to define the size and amount of
> intents to be prequeued.
>
> Signed-off-by: Chris Lew 
> ---
>  Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt 
> b/Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt
> index b277eca861f7..6c21f76822ca 100644
> --- a/Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt
> +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt
> @@ -39,6 +39,14 @@ of these nodes are defined by the individual bindings for 
> the specific function
>   Definition: a list of channels tied to this function, used for matching
>  the function to a set of virtual channels
>
> +- intents:

qcom,intents or qcom,default-intents

> + Usage: optional
> + Value type: 
> + Definition: a list of size,amount pairs describing what intents should
> +be preallocated for this virtual channel. If a GLINK node
> +supports intents, an intent advertises this channel is ready
> +to receive data.

Rather than describing what a GLINK intent is I would suggest that you
replace this second sentence with something like "This can be used to
tweak the default intents available for the channel to meet expectations
of the remote."

> +
>  = EXAMPLE
>  The following example represents the GLINK RPM node on a MSM8996 device, with
>  the function for the "rpm_request" channel defined, which is used for
> @@ -69,6 +77,8 @@ regualtors and root clocks.
>   compatible = "qcom,rpm-msm8996";
>   qcom,glink-channels = "rpm_requests";
>
> + intents = <0x400 5
> +   0x800 1>;
>   ...
>   };
>   };

Regards,
Bjorn

Re: [RFC PATCH] kbuild: Allow specifying some base host CFLAGS

2017-10-19 Thread Doug Anderson

Hi,

On Wed, Oct 18, 2017 at 9:45 AM, Masahiro Yamada
 wrote:
> 2017-10-14 3:02 GMT+09:00 Douglas Anderson :
>> Right now there is a way to add some CFLAGS that affect target builds,
>> but no way to add CFLAGS that affect host builds.  Let's add a way.
>> We'll document two environment variables: CFLAGS_HOST and
>> CXXFLAGS_HOST.
>>
>> We'll document that these variables get appended to by the kernel to
>> make the final CFLAGS.  That means that, though the environment can
>> specify some flags, if there is a conflict the kernel can override and
>> win.  This works differently than KCFLAGS which is appended (and thus
>> can override) the kernel specified CFLAGS.
>>
>> Why would I make KCFLAGS and CFLAGS_HOST work differently in this way?
>> My argument is that it's about expected usage.  Typically the build
>> system invoking the kernel has some idea about some basic CFLAGS that
>> it wants to use to build things for the host and things for the
>> target.  In general the build system would expect that its flags can
>> be overridden if necessary (perhaps we need to turn off a warning when
>> compiling a certain file, for instance).  So, all other things being
>> equal, the way I'm making CFLAGS_HOST is the way I'd expect things to
>> work.
>>
>> So, if it's expected that the build system can pass in a base set of
>> flags, why didn't we make KCFLAGS work that way?  The short answer is:
>> when building for the target the kernel is just "special".  The build
>> system's "target" CFLAGS are likely intended for userspace programs
>> and likely make very little sense to use as a basis.  This was talked
>> about in the seminal commit 69ee0b352242 ("kbuild: do not pick up
>> CFLAGS from the environment").  Basically: if the build system REALLY
>> knows what it's doing then it can pass in flags that the kernel will
>> use, but otherwise it should butt out.  Presumably this build system
>> that really knows what it's doing knows better than the kernel so
>> KCFLAGS comes after the kernel's normal flags.
>>
>> One last note: I chose to add new variables rather than just having
>> the build system try to pass HOSTCFLAGS in somehow (either through the
>> environment or the command line) to avoid weird interactions with
>> recursive invocations of make.
>>
>> Signed-off-by: Douglas Anderson 
>> ---
>
> I'd like to know for-instance cases where this is useful.

I'm not sure I have any exact use cases.  I know vapier@ (CCed) was
pushing for making sure that these flags get passed from the portage
ebuild into the kernel build, so maybe he has some cases?  Right now
we have the "-pipe" flag that ought to be passed in to the host
compiler but we're dropping it on the floor, but that doesn't seem
terribly critical.

...but in general the Linux kernel doesn't have all the details about
the host system.  That means it can't necessarily build the tools
quite as optimally (it can't pass "-mtune, right?).  I could also
imagine that there could be ABI flags that need to be specified?  Like
if we had floating point math in a host tool it would be important
that the build system could tell the kernel what to use for
"-mfloat-abi".

...so basically: it's all theoretical at this point in time from my
point of view, but I can definitely understand how it could be
necessary in the right environment.


-Doug

Re: [RFC PATCH] kbuild: Allow specifying some base host CFLAGS

2017-10-19 Thread Doug Anderson

Hi,

On Wed, Oct 18, 2017 at 9:45 AM, Masahiro Yamada
 wrote:
> 2017-10-14 3:02 GMT+09:00 Douglas Anderson :
>> Right now there is a way to add some CFLAGS that affect target builds,
>> but no way to add CFLAGS that affect host builds.  Let's add a way.
>> We'll document two environment variables: CFLAGS_HOST and
>> CXXFLAGS_HOST.
>>
>> We'll document that these variables get appended to by the kernel to
>> make the final CFLAGS.  That means that, though the environment can
>> specify some flags, if there is a conflict the kernel can override and
>> win.  This works differently than KCFLAGS which is appended (and thus
>> can override) the kernel specified CFLAGS.
>>
>> Why would I make KCFLAGS and CFLAGS_HOST work differently in this way?
>> My argument is that it's about expected usage.  Typically the build
>> system invoking the kernel has some idea about some basic CFLAGS that
>> it wants to use to build things for the host and things for the
>> target.  In general the build system would expect that its flags can
>> be overridden if necessary (perhaps we need to turn off a warning when
>> compiling a certain file, for instance).  So, all other things being
>> equal, the way I'm making CFLAGS_HOST is the way I'd expect things to
>> work.
>>
>> So, if it's expected that the build system can pass in a base set of
>> flags, why didn't we make KCFLAGS work that way?  The short answer is:
>> when building for the target the kernel is just "special".  The build
>> system's "target" CFLAGS are likely intended for userspace programs
>> and likely make very little sense to use as a basis.  This was talked
>> about in the seminal commit 69ee0b352242 ("kbuild: do not pick up
>> CFLAGS from the environment").  Basically: if the build system REALLY
>> knows what it's doing then it can pass in flags that the kernel will
>> use, but otherwise it should butt out.  Presumably this build system
>> that really knows what it's doing knows better than the kernel so
>> KCFLAGS comes after the kernel's normal flags.
>>
>> One last note: I chose to add new variables rather than just having
>> the build system try to pass HOSTCFLAGS in somehow (either through the
>> environment or the command line) to avoid weird interactions with
>> recursive invocations of make.
>>
>> Signed-off-by: Douglas Anderson 
>> ---
>
> I'd like to know for-instance cases where this is useful.

I'm not sure I have any exact use cases.  I know vapier@ (CCed) was
pushing for making sure that these flags get passed from the portage
ebuild into the kernel build, so maybe he has some cases?  Right now
we have the "-pipe" flag that ought to be passed in to the host
compiler but we're dropping it on the floor, but that doesn't seem
terribly critical.

...but in general the Linux kernel doesn't have all the details about
the host system.  That means it can't necessarily build the tools
quite as optimally (it can't pass "-mtune, right?).  I could also
imagine that there could be ABI flags that need to be specified?  Like
if we had floating point math in a host tool it would be important
that the build system could tell the kernel what to use for
"-mfloat-abi".

...so basically: it's all theoretical at this point in time from my
point of view, but I can definitely understand how it could be
necessary in the right environment.


-Doug

Re: [Patch v6 7/7] MAINTAINERS: Add SLIMbus maintainer

2017-10-19 Thread Bjorn Andersson

On Fri 06 Oct 08:51 PDT 2017, srinivas.kandaga...@linaro.org wrote:

> From: Srinivas Kandagatla 
>
> Add myself as maintainer for slimbus.
>

Acked-by: Bjorn Andersson 

Regards,
Bjorn

> Signed-off-by: Srinivas Kandagatla 
> ---
>  MAINTAINERS | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2281af4..014f74b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12320,6 +12320,14 @@ T: git 
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
>  F: include/linux/srcu.h
>  F: kernel/rcu/srcu.c
>
> +SERIAL LOW-POWER INTER-CHIP MEDIA BUS (SLIMbus)
> +M: Srinivas Kandagatla 
> +L: alsa-de...@alsa-project.org (moderated for non-subscribers)
> +S: Maintained
> +F: drivers/slimbus/
> +F: Documentation/devicetree/bindings/slimbus/
> +F: include/linux/slimbus.h
> +
>  SMACK SECURITY MODULE
>  M: Casey Schaufler 
>  L: linux-security-mod...@vger.kernel.org
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch v6 7/7] MAINTAINERS: Add SLIMbus maintainer

2017-10-19 Thread Bjorn Andersson

On Fri 06 Oct 08:51 PDT 2017, srinivas.kandaga...@linaro.org wrote:

> From: Srinivas Kandagatla 
>
> Add myself as maintainer for slimbus.
>

Acked-by: Bjorn Andersson 

Regards,
Bjorn

> Signed-off-by: Srinivas Kandagatla 
> ---
>  MAINTAINERS | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2281af4..014f74b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12320,6 +12320,14 @@ T: git 
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
>  F: include/linux/srcu.h
>  F: kernel/rcu/srcu.c
>
> +SERIAL LOW-POWER INTER-CHIP MEDIA BUS (SLIMbus)
> +M: Srinivas Kandagatla 
> +L: alsa-de...@alsa-project.org (moderated for non-subscribers)
> +S: Maintained
> +F: drivers/slimbus/
> +F: Documentation/devicetree/bindings/slimbus/
> +F: include/linux/slimbus.h
> +
>  SMACK SECURITY MODULE
>  M: Casey Schaufler 
>  L: linux-security-mod...@vger.kernel.org
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch v6 2/7] slimbus: Add messaging APIs to slimbus framework

2017-10-19 Thread Bjorn Andersson

On Wed 18 Oct 09:39 PDT 2017, Srinivas Kandagatla wrote:

> Thanks for Review Comments,
>
>
> On 18/10/17 07:15, Bjorn Andersson wrote:
> > On Fri 06 Oct 08:51 PDT 2017, srinivas.kandaga...@linaro.org wrote:
[..]
> >
> > > + if (!async) {
> > > + txn->msg->comp_cb = NULL;
> > > + txn->msg->ctx = NULL;
> >
> > I believe txn->msg is always required, so you don't need to do this
> > contidionally.
>
> I don't get this, why do you want to set comp_cb to NULL unconditionally?
>

I'm just not happy about the complexity of this function, but perhaps
it's confusing to always set them, regardless of them being used. Feel
free to keep it.

[..]
> > > +void slim_return_tx(struct slim_controller *ctrl, int err)
> > > +{
> > > + unsigned long flags;
> > > + int idx;
> > > + struct slim_pending cur;
> > > +
> > > + spin_lock_irqsave(>tx.lock, flags);
> > > + idx = ctrl->tx.head;
> > > + ctrl->tx.head = (ctrl->tx.head + 1) % ctrl->tx.n;
> > > + cur = ctrl->pending_wr[idx];
> >
> > Why is this doing struct copy?
> >
> Not sure, do you see any issue with this?
>

It's a rarely used feature and I don't see a reason for using it here.

It's probably better to make a copy of cur.cb and cur.ctx to make their
use after the spin-unlock more obvious (but should be fine as the
spinlock is for the pending_wr array.

> > > + spin_unlock_irqrestore(>tx.lock, flags);
> > > +
> > > + if (!cur.cb)
> > > + dev_err(>dev, "NULL Transaction or completion");
> > > + else
> > > + cur.cb(cur.ctx, err);
> > > +
> > > + up(>tx_sem);
> > > +}
> > > +EXPORT_SYMBOL_GPL(slim_return_tx);
[..]
> > > +/**
> > > + * struct slim_ctrl_buf: circular buffer used by contoller for TX, RX
> > > + * @base: virtual base address for this buffer
> > > + * @phy: physical address for this buffer (this is useful if controller 
> > > can
> > > + *  DMA the buffers for TX and RX to/from controller hardware
> > > + * @lock: lock protecting head and tail
> > > + * @head: index where buffer is returned back
> > > + * @tail: index from where buffer is consumed
> > > + * @sl_sz: byte-size of each slot in this buffer
> > > + * @n:  number of elements in this circular ring, note that this needs 
> > > to be
> > > + * 1 more than actual buffers to allow for one open slot
> > > + */
> >
> > Is this ringbuffer mechanism defined in the slimbus specification? Looks
> > like something specific to the Qualcomm controller, rather than
> > something that should be enforced in the framework.
> >
>
> Yes, this is not part of the slimbus specs, but Qcom SOCs have concept of
> Message Queues.
>
> Are you suggesting that this buffer handling has to be moved out of core
> into controller driver?
>

The fact that this seems to describe a physical ring buffer, with some
set of properties that are related to how a ring buffer works in the
Qualcomm hardware and it carries a notion of physical mapping, all
indicates to me that this describes some Qualcomm hardware interface.

I believe this is a hardware implementation detail that should reside in
the hardware part of the implementation (i.e. the Qualcomm driver).

>
> > > +struct slim_ctrl_buf {
> > > + void *base;
> > > + phys_addr_t phy;
> > > + spinlock_t lock;
> > > + int head;
> > > + int tail;
> > > + int sl_sz;
> > > + int n;
> > > +};

Regards,
Bjorn

Re: [Patch v6 2/7] slimbus: Add messaging APIs to slimbus framework

2017-10-19 Thread Bjorn Andersson

On Wed 18 Oct 09:39 PDT 2017, Srinivas Kandagatla wrote:

> Thanks for Review Comments,
>
>
> On 18/10/17 07:15, Bjorn Andersson wrote:
> > On Fri 06 Oct 08:51 PDT 2017, srinivas.kandaga...@linaro.org wrote:
[..]
> >
> > > + if (!async) {
> > > + txn->msg->comp_cb = NULL;
> > > + txn->msg->ctx = NULL;
> >
> > I believe txn->msg is always required, so you don't need to do this
> > contidionally.
>
> I don't get this, why do you want to set comp_cb to NULL unconditionally?
>

I'm just not happy about the complexity of this function, but perhaps
it's confusing to always set them, regardless of them being used. Feel
free to keep it.

[..]
> > > +void slim_return_tx(struct slim_controller *ctrl, int err)
> > > +{
> > > + unsigned long flags;
> > > + int idx;
> > > + struct slim_pending cur;
> > > +
> > > + spin_lock_irqsave(>tx.lock, flags);
> > > + idx = ctrl->tx.head;
> > > + ctrl->tx.head = (ctrl->tx.head + 1) % ctrl->tx.n;
> > > + cur = ctrl->pending_wr[idx];
> >
> > Why is this doing struct copy?
> >
> Not sure, do you see any issue with this?
>

It's a rarely used feature and I don't see a reason for using it here.

It's probably better to make a copy of cur.cb and cur.ctx to make their
use after the spin-unlock more obvious (but should be fine as the
spinlock is for the pending_wr array.

> > > + spin_unlock_irqrestore(>tx.lock, flags);
> > > +
> > > + if (!cur.cb)
> > > + dev_err(>dev, "NULL Transaction or completion");
> > > + else
> > > + cur.cb(cur.ctx, err);
> > > +
> > > + up(>tx_sem);
> > > +}
> > > +EXPORT_SYMBOL_GPL(slim_return_tx);
[..]
> > > +/**
> > > + * struct slim_ctrl_buf: circular buffer used by contoller for TX, RX
> > > + * @base: virtual base address for this buffer
> > > + * @phy: physical address for this buffer (this is useful if controller 
> > > can
> > > + *  DMA the buffers for TX and RX to/from controller hardware
> > > + * @lock: lock protecting head and tail
> > > + * @head: index where buffer is returned back
> > > + * @tail: index from where buffer is consumed
> > > + * @sl_sz: byte-size of each slot in this buffer
> > > + * @n:  number of elements in this circular ring, note that this needs 
> > > to be
> > > + * 1 more than actual buffers to allow for one open slot
> > > + */
> >
> > Is this ringbuffer mechanism defined in the slimbus specification? Looks
> > like something specific to the Qualcomm controller, rather than
> > something that should be enforced in the framework.
> >
>
> Yes, this is not part of the slimbus specs, but Qcom SOCs have concept of
> Message Queues.
>
> Are you suggesting that this buffer handling has to be moved out of core
> into controller driver?
>

The fact that this seems to describe a physical ring buffer, with some
set of properties that are related to how a ring buffer works in the
Qualcomm hardware and it carries a notion of physical mapping, all
indicates to me that this describes some Qualcomm hardware interface.

I believe this is a hardware implementation detail that should reside in
the hardware part of the implementation (i.e. the Qualcomm driver).

>
> > > +struct slim_ctrl_buf {
> > > + void *base;
> > > + phys_addr_t phy;
> > > + spinlock_t lock;
> > > + int head;
> > > + int tail;
> > > + int sl_sz;
> > > + int n;
> > > +};

Regards,
Bjorn

Re: [PATCH -next v2] mtd: nand: Add support for Toshiba BENAND (Built-in ECC NAND)

2017-10-19 Thread KOBAYASHI Yoshitake

On 2017/10/12 22:26, Boris Brezillon wrote:
> On Thu, 12 Oct 2017 22:03:23 +0900
> KOBAYASHI Yoshitake  wrote:
> 
>> On 2017/10/05 16:31, Boris Brezillon wrote:
>>> On Thu, 5 Oct 2017 16:24:08 +0900
>>> KOBAYASHI Yoshitake  wrote:  
 @@ -39,9 +105,43 @@ static void toshiba_nand_decode_id(struct 
 nand_chip *chip)
  
  static int toshiba_nand_init(struct nand_chip *chip)
  {
 +  struct mtd_info *mtd = nand_to_mtd(chip);
 +
if (nand_is_slc(chip))
chip->bbt_options |= NAND_BBT_SCAN2NDPAGE;
  
 +  if (nand_is_slc(chip) && (chip->id.data[4] & 0x80)) {
 +  /* BENAND */
 +
 +  /*
 +   * We can't disable the internal ECC engine, the user
 +   * has to use on-die ECC, there is no alternative.
 +   */
 +  if (chip->ecc.mode != NAND_ECC_ON_DIE) {
 +  pr_err("On-die ECC should be selected.\n");
 +  return -EINVAL;
 +  }  
>>>
>>> According to your previous explanation that's not exactly true. Since
>>> ECC bytes are stored in a separate area, the user can decide to use
>>> another mode without trouble. Just skip the BENAND initialization when
>>> mode != NAND_ECC_ON_DIE and we should be good, or am I missing 
>>> something?  
>>
>> I am asking to product department to confirm it.
>
> I'm almost sure this is the case ;-).

 According to the command sequence written in BENAND's datasheet, the status
 of the internal ECC must be checked after reading. To do that, ecc.mode 
 has been
 set to NAND_ECC_ON_DIE and the status of the internal ECC is checked 
 through 
 the 0x70 or 0x7A command. That's the reason we are returning EINVAL here.  
>>>
>>> But the status will anyway be retrieved, and what's the point of
>>> checking the ECC flags if the user wants to use its own ECC engine? I
>>> mean, since you have the whole OOB area exposed why would you prevent
>>> existing setup from working (by existing setup I mean those that already
>>> have a BENAND but haven't modified their driver to accept ON_DIE_ECC).
>>>
>>> Maybe I'm missing something, but AFAICT it's safe to allow users to
>>> completely ignore the on-die ECC engine and use their own, even if
>>> that means duplicating the work since on-die ECC cannot be disabled on
>>> BENAND devices.  
>>
>> If user host controller ECC engine can support 8bit ECC or more ,
>> Toshiba offers 24nm SLC NAND products (not BENAND).  If user host
>> controller ECC engine is less that 8bit ECC (for example: 1bit or
>> 4bit ECC) Toshiba offers BENAND.  When using BENAND, checking
>> BENAND own ECC status (ECC flag) is required as per BENAND
>> product datasheet. Ignoring BENAND on-die ECC operation status,
>> and rely only on host 1 bit ECC or 4 bit ECC status, is not
>> recommended because the host ECC capability is inferior to BENAND
>> 8bit ECC and data refresh or other operations may not work
>> properly.
> 
> Well, that's not really your problem. The framework already complains
> if someone tries to use an ECC that is weaker than the chip
> requirement. On the other hand, it's perfectly valid to use on
> host-side ECC engine that meets NAND requirements (8bit/xxxbytes).

I have assumed to specify the ecc strength and size by devicetree.
Before BENAND patch updated, I would like to submit a patch which read
the ECC strength and size from the nand using extended-id like the
Samsung nand patch[1].
 [1] https://patchwork.ozlabs.org/patch/712549/

> The use case I'm trying to gracefully handle here is: your NAND
> controller refuses to use anything but the host-side ECC engine and you
> have a BENAND connected to this controller.
> Before your patch this use case worked just fine, and the user didn't
> even notice it was using a NAND chip that was capable of correcting
> bitflips. After your patch it fails to probe the NAND chip and users
> will have to patch their controller driver to make it work again. Sorry
> but this is not really an option: we have to keep existing setup in a
> working state, and that means allowing people to use their BENAND in a
> degraded state where they'll just ignore the on-die ECC and use their
> own ECC engine instead.
> 
> I really don't see the problem here. It's not worse than it was before
> your patch, and those wanting to activate on-die ECC support will have
> to patch their controller driver anyway.

If the above approach is acceptable, I will update BENAND patch according to
your idea.

-- Yoshi

>> Also when using BENAND, turning off Host ECC is
>> recommended because this can eliminate the latency due to double
>> ECC operation(by both host & BENAND).
> 
> I thought you were

Re: [PATCH -next v2] mtd: nand: Add support for Toshiba BENAND (Built-in ECC NAND)

2017-10-19 Thread KOBAYASHI Yoshitake

On 2017/10/12 22:26, Boris Brezillon wrote:
> On Thu, 12 Oct 2017 22:03:23 +0900
> KOBAYASHI Yoshitake  wrote:
> 
>> On 2017/10/05 16:31, Boris Brezillon wrote:
>>> On Thu, 5 Oct 2017 16:24:08 +0900
>>> KOBAYASHI Yoshitake  wrote:  
 @@ -39,9 +105,43 @@ static void toshiba_nand_decode_id(struct 
 nand_chip *chip)
  
  static int toshiba_nand_init(struct nand_chip *chip)
  {
 +  struct mtd_info *mtd = nand_to_mtd(chip);
 +
if (nand_is_slc(chip))
chip->bbt_options |= NAND_BBT_SCAN2NDPAGE;
  
 +  if (nand_is_slc(chip) && (chip->id.data[4] & 0x80)) {
 +  /* BENAND */
 +
 +  /*
 +   * We can't disable the internal ECC engine, the user
 +   * has to use on-die ECC, there is no alternative.
 +   */
 +  if (chip->ecc.mode != NAND_ECC_ON_DIE) {
 +  pr_err("On-die ECC should be selected.\n");
 +  return -EINVAL;
 +  }  
>>>
>>> According to your previous explanation that's not exactly true. Since
>>> ECC bytes are stored in a separate area, the user can decide to use
>>> another mode without trouble. Just skip the BENAND initialization when
>>> mode != NAND_ECC_ON_DIE and we should be good, or am I missing 
>>> something?  
>>
>> I am asking to product department to confirm it.
>
> I'm almost sure this is the case ;-).

 According to the command sequence written in BENAND's datasheet, the status
 of the internal ECC must be checked after reading. To do that, ecc.mode 
 has been
 set to NAND_ECC_ON_DIE and the status of the internal ECC is checked 
 through 
 the 0x70 or 0x7A command. That's the reason we are returning EINVAL here.  
>>>
>>> But the status will anyway be retrieved, and what's the point of
>>> checking the ECC flags if the user wants to use its own ECC engine? I
>>> mean, since you have the whole OOB area exposed why would you prevent
>>> existing setup from working (by existing setup I mean those that already
>>> have a BENAND but haven't modified their driver to accept ON_DIE_ECC).
>>>
>>> Maybe I'm missing something, but AFAICT it's safe to allow users to
>>> completely ignore the on-die ECC engine and use their own, even if
>>> that means duplicating the work since on-die ECC cannot be disabled on
>>> BENAND devices.  
>>
>> If user host controller ECC engine can support 8bit ECC or more ,
>> Toshiba offers 24nm SLC NAND products (not BENAND).  If user host
>> controller ECC engine is less that 8bit ECC (for example: 1bit or
>> 4bit ECC) Toshiba offers BENAND.  When using BENAND, checking
>> BENAND own ECC status (ECC flag) is required as per BENAND
>> product datasheet. Ignoring BENAND on-die ECC operation status,
>> and rely only on host 1 bit ECC or 4 bit ECC status, is not
>> recommended because the host ECC capability is inferior to BENAND
>> 8bit ECC and data refresh or other operations may not work
>> properly.
> 
> Well, that's not really your problem. The framework already complains
> if someone tries to use an ECC that is weaker than the chip
> requirement. On the other hand, it's perfectly valid to use on
> host-side ECC engine that meets NAND requirements (8bit/xxxbytes).

I have assumed to specify the ecc strength and size by devicetree.
Before BENAND patch updated, I would like to submit a patch which read
the ECC strength and size from the nand using extended-id like the
Samsung nand patch[1].
 [1] https://patchwork.ozlabs.org/patch/712549/

> The use case I'm trying to gracefully handle here is: your NAND
> controller refuses to use anything but the host-side ECC engine and you
> have a BENAND connected to this controller.
> Before your patch this use case worked just fine, and the user didn't
> even notice it was using a NAND chip that was capable of correcting
> bitflips. After your patch it fails to probe the NAND chip and users
> will have to patch their controller driver to make it work again. Sorry
> but this is not really an option: we have to keep existing setup in a
> working state, and that means allowing people to use their BENAND in a
> degraded state where they'll just ignore the on-die ECC and use their
> own ECC engine instead.
> 
> I really don't see the problem here. It's not worse than it was before
> your patch, and those wanting to activate on-die ECC support will have
> to patch their controller driver anyway.

If the above approach is acceptable, I will update BENAND patch according to
your idea.

-- Yoshi

>> Also when using BENAND, turning off Host ECC is
>> recommended because this can eliminate the latency due to double
>> ECC operation(by both host & BENAND).
> 
> I thought you were not able to turn it off.
> 
>>
>> -- YOSHI
>>
> 
> 
>

[PULL REQUEST] i2c for 4.14

2017-10-19 Thread Wolfram Sang

Linus,

here are a couple of bugfixes for I2C drivers. Because the changes for
the piix4 driver are larger than usual, the patches have been in
linux-next for more than a week with no reports coming in. The rest is
usual stuff.

Please pull.

Thanks,

   Wolfram


The following changes since commit 8a5776a5f49812d29fe4b2d0a2d71675c3facf3f:

  Linux 4.14-rc4 (2017-10-08 20:53:29 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-current

for you to fetch changes up to 883b3b6567bfc8b5da7b3f0cec80513af111d2f5:

  i2c: omap: Fix error handling for clk_get() (2017-10-18 00:19:26 +0200)


Clemens Gruber (1):
  i2c: imx: fix misleading bus recovery debug message

Guenter Roeck (1):
  i2c: piix4: Fix SMBus port selection for AMD Family 17h chips

Pontus Andersson (1):
  i2c: ismt: Separate I2C block read from SMBus block read

Ricardo Ribalda Delgado (1):
  i2c: piix4: Disable completely the IMC during SMBUS_BLOCK_DATA

Tony Lindgren (1):
  i2c: omap: Fix error handling for clk_get()

Wei Jinhua (1):
  i2c: imx: use IRQF_SHARED mode to request IRQ


with much appreciated quality assurance from

Jean Delvare (2):
  (Rev.) i2c: piix4: Disable completely the IMC during SMBUS_BLOCK_DATA
  (Rev.) i2c: piix4: Fix SMBus port selection for AMD Family 17h chips

Jiang Biao (1):
  (Rev.) i2c: imx: use IRQF_SHARED mode to request IRQ

Stephen Douthit (1):
  (Test) i2c: ismt: Separate I2C block read from SMBus block read

 drivers/i2c/busses/i2c-imx.c   |   4 +-
 drivers/i2c/busses/i2c-ismt.c  |   5 +-
 drivers/i2c/busses/i2c-omap.c  |  14 
 drivers/i2c/busses/i2c-piix4.c | 162 ++---
 4 files changed, 172 insertions(+), 13 deletions(-)


signature.asc
Description: PGP signature

[PULL REQUEST] i2c for 4.14

2017-10-19 Thread Wolfram Sang

Linus,

here are a couple of bugfixes for I2C drivers. Because the changes for
the piix4 driver are larger than usual, the patches have been in
linux-next for more than a week with no reports coming in. The rest is
usual stuff.

Please pull.

Thanks,

   Wolfram


The following changes since commit 8a5776a5f49812d29fe4b2d0a2d71675c3facf3f:

  Linux 4.14-rc4 (2017-10-08 20:53:29 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-current

for you to fetch changes up to 883b3b6567bfc8b5da7b3f0cec80513af111d2f5:

  i2c: omap: Fix error handling for clk_get() (2017-10-18 00:19:26 +0200)


Clemens Gruber (1):
  i2c: imx: fix misleading bus recovery debug message

Guenter Roeck (1):
  i2c: piix4: Fix SMBus port selection for AMD Family 17h chips

Pontus Andersson (1):
  i2c: ismt: Separate I2C block read from SMBus block read

Ricardo Ribalda Delgado (1):
  i2c: piix4: Disable completely the IMC during SMBUS_BLOCK_DATA

Tony Lindgren (1):
  i2c: omap: Fix error handling for clk_get()

Wei Jinhua (1):
  i2c: imx: use IRQF_SHARED mode to request IRQ


with much appreciated quality assurance from

Jean Delvare (2):
  (Rev.) i2c: piix4: Disable completely the IMC during SMBUS_BLOCK_DATA
  (Rev.) i2c: piix4: Fix SMBus port selection for AMD Family 17h chips

Jiang Biao (1):
  (Rev.) i2c: imx: use IRQF_SHARED mode to request IRQ

Stephen Douthit (1):
  (Test) i2c: ismt: Separate I2C block read from SMBus block read

 drivers/i2c/busses/i2c-imx.c   |   4 +-
 drivers/i2c/busses/i2c-ismt.c  |   5 +-
 drivers/i2c/busses/i2c-omap.c  |  14 
 drivers/i2c/busses/i2c-piix4.c | 162 ++---
 4 files changed, 172 insertions(+), 13 deletions(-)


signature.asc
Description: PGP signature

Re: [PATCH V8 5/5] libata: Align DMA buffer todma_get_cache_alignment()

2017-10-19 Thread 陈华才

Hi, Matt,

I found that 4ee34ea3a12396f35b26d90a094c75db ("libata: Align ata_device's id 
on a cacheline") can resolve everything. Because the size of id[ATA_ID_WORDS] 
is already aligned and devslp_timing needn't to be aligned. So, In V9 of this 
series I will drop this patch. Why I had problems before? because I used 
linux-4.4.

Huacai
 
 
-- Original --
From:  "Matt Redfearn";
Date:  Thu, Oct 19, 2017 03:52 PM
To:  "Tejun Heo"; "Huacai Chen"; 
Cc:  "Christoph Hellwig"; "Marek 
Szyprowski"; "Robin Murphy"; 
"AndrewMorton"; "Fuxin Zhang"; 
"linux-kernel"; "Ralf 
Baechle"; "JamesHogan"; 
"linux-mips"; "James E . J 
.Bottomley"; "Martin K . 
Petersen"; 
"linux-scsi"; 
"linux-ide"; "stable"; 
Subject:  Re: [PATCH V8 5/5] libata: Align DMA buffer 
todma_get_cache_alignment()

 


On 18/10/17 14:03, Tejun Heo wrote:
> On Tue, Oct 17, 2017 at 04:05:42PM +0800, Huacai Chen wrote:
>> In non-coherent DMA mode, kernel uses cache flushing operations to
>> maintain I/O coherency, so in ata_do_dev_read_id() the DMA buffer
>> should be aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer
>> and a kernel structure share a same cache line, and if the kernel
>> structure has dirty data, cache_invalidate (no writeback) will cause
>> data corruption.
>>
>> Cc: sta...@vger.kernel.org
>> Signed-off-by: Huacai Chen 
>> ---
>>   drivers/ata/libata-core.c | 15 +--
>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
>> index ee4c1ec..e134955 100644
>> --- a/drivers/ata/libata-core.c
>> +++ b/drivers/ata/libata-core.c
>> @@ -1833,8 +1833,19 @@ static u32 ata_pio_mask_no_iordy(const struct 
>> ata_device *adev)
>>   unsigned int ata_do_dev_read_id(struct ata_device *dev,
>>  struct ata_taskfile *tf, u16 *id)
>>   {
>> -return ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE,
>> - id, sizeof(id[0]) * ATA_ID_WORDS, 0);
>> +u16 *devid;
>> +int res, size = sizeof(u16) * ATA_ID_WORDS;
>> +
>> +if (IS_ALIGNED((unsigned long)id, dma_get_cache_alignment(>tdev)))
>> +res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, id, 
>> size, 0);
>> +else {
>> +devid = kmalloc(size, GFP_KERNEL);
>> +res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, devid, 
>> size, 0);
>> +memcpy(id, devid, size);
>> +kfree(devid);
>> +}
>> +
>> +return res;
> Hmm... I think it'd be a lot better to ensure that the buffers are
> aligned properly to begin with.  There are only two buffers which are
> used for id reading - ata_port->sector_buf and ata_device->id.  Both
> are embedded arrays but making them separately allocated aligned
> buffers shouldn't be difficult.
>
> Thanks.

FWIW, I agree that the buffers used for DMA should be split out from the 
structure. We ran into this problem on MIPS last year, 
4ee34ea3a12396f35b26d90a094c75db95080baa ("libata: Align ata_device's id 
on a cacheline") partially fixed it, but likely should have also 
cacheline aligned the following devslp_timing in the struct such that we 
guarantee that members of the struct not used for DMA do not share the 
same cacheline as the DMA buffer. Not having this means that 
architectures, such as MIPS, which in some cases have to perform manual 
invalidation of DMA buffer can clobber valid adjacent data if it is in 
the same cacheline.

Thanks,
Matt

Re: [PATCH V8 5/5] libata: Align DMA buffer todma_get_cache_alignment()

2017-10-19 Thread 陈华才

Hi, Matt,

I found that 4ee34ea3a12396f35b26d90a094c75db ("libata: Align ata_device's id 
on a cacheline") can resolve everything. Because the size of id[ATA_ID_WORDS] 
is already aligned and devslp_timing needn't to be aligned. So, In V9 of this 
series I will drop this patch. Why I had problems before? because I used 
linux-4.4.

Huacai
 
 
-- Original --
From:  "Matt Redfearn";
Date:  Thu, Oct 19, 2017 03:52 PM
To:  "Tejun Heo"; "Huacai Chen"; 
Cc:  "Christoph Hellwig"; "Marek 
Szyprowski"; "Robin Murphy"; 
"AndrewMorton"; "Fuxin Zhang"; 
"linux-kernel"; "Ralf 
Baechle"; "JamesHogan"; 
"linux-mips"; "James E . J 
.Bottomley"; "Martin K . 
Petersen"; 
"linux-scsi"; 
"linux-ide"; "stable"; 
Subject:  Re: [PATCH V8 5/5] libata: Align DMA buffer 
todma_get_cache_alignment()

 


On 18/10/17 14:03, Tejun Heo wrote:
> On Tue, Oct 17, 2017 at 04:05:42PM +0800, Huacai Chen wrote:
>> In non-coherent DMA mode, kernel uses cache flushing operations to
>> maintain I/O coherency, so in ata_do_dev_read_id() the DMA buffer
>> should be aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer
>> and a kernel structure share a same cache line, and if the kernel
>> structure has dirty data, cache_invalidate (no writeback) will cause
>> data corruption.
>>
>> Cc: sta...@vger.kernel.org
>> Signed-off-by: Huacai Chen 
>> ---
>>   drivers/ata/libata-core.c | 15 +--
>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
>> index ee4c1ec..e134955 100644
>> --- a/drivers/ata/libata-core.c
>> +++ b/drivers/ata/libata-core.c
>> @@ -1833,8 +1833,19 @@ static u32 ata_pio_mask_no_iordy(const struct 
>> ata_device *adev)
>>   unsigned int ata_do_dev_read_id(struct ata_device *dev,
>>  struct ata_taskfile *tf, u16 *id)
>>   {
>> -return ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE,
>> - id, sizeof(id[0]) * ATA_ID_WORDS, 0);
>> +u16 *devid;
>> +int res, size = sizeof(u16) * ATA_ID_WORDS;
>> +
>> +if (IS_ALIGNED((unsigned long)id, dma_get_cache_alignment(>tdev)))
>> +res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, id, 
>> size, 0);
>> +else {
>> +devid = kmalloc(size, GFP_KERNEL);
>> +res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, devid, 
>> size, 0);
>> +memcpy(id, devid, size);
>> +kfree(devid);
>> +}
>> +
>> +return res;
> Hmm... I think it'd be a lot better to ensure that the buffers are
> aligned properly to begin with.  There are only two buffers which are
> used for id reading - ata_port->sector_buf and ata_device->id.  Both
> are embedded arrays but making them separately allocated aligned
> buffers shouldn't be difficult.
>
> Thanks.

FWIW, I agree that the buffers used for DMA should be split out from the 
structure. We ran into this problem on MIPS last year, 
4ee34ea3a12396f35b26d90a094c75db95080baa ("libata: Align ata_device's id 
on a cacheline") partially fixed it, but likely should have also 
cacheline aligned the following devslp_timing in the struct such that we 
guarantee that members of the struct not used for DMA do not share the 
same cacheline as the DMA buffer. Not having this means that 
architectures, such as MIPS, which in some cases have to perform manual 
invalidation of DMA buffer can clobber valid adjacent data if it is in 
the same cacheline.

Thanks,
Matt

Re: [PATCH 4/4] kaslr: clean up a useless variable and some usless space

2017-10-19 Thread Chao Fan

On Fri, Oct 20, 2017 at 11:19:48AM +0800, Dou Liyang wrote:
>Hi Chao,
>
>At 10/19/2017 06:02 PM, Chao Fan wrote:
>> There are two same variable "rc" in this function. One is in the
>> circulation, the other is out of the circulation. The one out will never
>> be used, so drop it.
>> 
>> Signed-off-by: Chao Fan 
>> ---
>>  arch/x86/boot/compressed/kaslr.c | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/x86/boot/compressed/kaslr.c 
>> b/arch/x86/boot/compressed/kaslr.c
>> index 22330cbe8515..8a33ed82fd0b 100644
>> --- a/arch/x86/boot/compressed/kaslr.c
>> +++ b/arch/x86/boot/compressed/kaslr.c
>> @@ -198,7 +198,6 @@ static int parse_immovable_mem(char *p,
>>  static void mem_avoid_memmap(char *str)
>>  {
>>  static int i;
>> -int rc;
>> 
>>  if (i >= MAX_MEMMAP_REGIONS)
>>  return;

Hi Dou-san,

>
>Seems it is redundant too,

Thanks for your suggestion.

Thanks,
Chao Fan

>
>Thanks,
>   dou.
>
>> @@ -277,7 +276,7 @@ static int handle_mem_memmap(void)
>>  return 0;
>> 
>>  tmp_cmdline = malloc(len + 1);
>> -if (!tmp_cmdline )
>> +if (!tmp_cmdline)
>>  error("Failed to allocate space for tmp_cmdline");
>> 
>>  memcpy(tmp_cmdline, args, len);
>> @@ -423,7 +422,7 @@ static void mem_avoid_init(unsigned long input, unsigned 
>> long input_size,
>>  cmd_line |= boot_params->hdr.cmd_line_ptr;
>>  /* Calculate size of cmd_line. */
>>  ptr = (char *)(unsigned long)cmd_line;
>> -for (cmd_line_size = 0; ptr[cmd_line_size++]; )
>> +for (cmd_line_size = 0; ptr[cmd_line_size++];)
>>  ;
>>  mem_avoid[MEM_AVOID_CMDLINE].start = cmd_line;
>>  mem_avoid[MEM_AVOID_CMDLINE].size = cmd_line_size;
>>

Re: [PATCH 4/4] kaslr: clean up a useless variable and some usless space

2017-10-19 Thread Chao Fan

On Fri, Oct 20, 2017 at 11:19:48AM +0800, Dou Liyang wrote:
>Hi Chao,
>
>At 10/19/2017 06:02 PM, Chao Fan wrote:
>> There are two same variable "rc" in this function. One is in the
>> circulation, the other is out of the circulation. The one out will never
>> be used, so drop it.
>> 
>> Signed-off-by: Chao Fan 
>> ---
>>  arch/x86/boot/compressed/kaslr.c | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/x86/boot/compressed/kaslr.c 
>> b/arch/x86/boot/compressed/kaslr.c
>> index 22330cbe8515..8a33ed82fd0b 100644
>> --- a/arch/x86/boot/compressed/kaslr.c
>> +++ b/arch/x86/boot/compressed/kaslr.c
>> @@ -198,7 +198,6 @@ static int parse_immovable_mem(char *p,
>>  static void mem_avoid_memmap(char *str)
>>  {
>>  static int i;
>> -int rc;
>> 
>>  if (i >= MAX_MEMMAP_REGIONS)
>>  return;

Hi Dou-san,

>
>Seems it is redundant too,

Thanks for your suggestion.

Thanks,
Chao Fan

>
>Thanks,
>   dou.
>
>> @@ -277,7 +276,7 @@ static int handle_mem_memmap(void)
>>  return 0;
>> 
>>  tmp_cmdline = malloc(len + 1);
>> -if (!tmp_cmdline )
>> +if (!tmp_cmdline)
>>  error("Failed to allocate space for tmp_cmdline");
>> 
>>  memcpy(tmp_cmdline, args, len);
>> @@ -423,7 +422,7 @@ static void mem_avoid_init(unsigned long input, unsigned 
>> long input_size,
>>  cmd_line |= boot_params->hdr.cmd_line_ptr;
>>  /* Calculate size of cmd_line. */
>>  ptr = (char *)(unsigned long)cmd_line;
>> -for (cmd_line_size = 0; ptr[cmd_line_size++]; )
>> +for (cmd_line_size = 0; ptr[cmd_line_size++];)
>>  ;
>>  mem_avoid[MEM_AVOID_CMDLINE].start = cmd_line;
>>  mem_avoid[MEM_AVOID_CMDLINE].size = cmd_line_size;
>>

Re: [PATCH -mm -V2] mm, pagemap: Fix soft dirty marking for PMD migration entry

2017-10-19 Thread Naoya Horiguchi



On 10/20/2017 12:10 AM, Huang, Ying wrote:
> From: Huang Ying 
> 
> Now, when the page table is walked in the implementation of
> /proc//pagemap, pmd_soft_dirty() is used for both the PMD huge
> page map and the PMD migration entries.  That is wrong,
> pmd_swp_soft_dirty() should be used for the PMD migration entries
> instead because the different page table entry flag is used.
> Otherwise, the soft dirty information in /proc//pagemap may be
> wrong.
> 
> Cc: Michal Hocko 
> Cc: "Kirill A. Shutemov" 
> Cc: David Rientjes 
> Cc: Arnd Bergmann 
> Cc: Hugh Dickins 
> Cc: "Jérôme Glisse" 
> Cc: Daniel Colascione 
> Cc: Zi Yan 
> Cc: Naoya Horiguchi 
> Signed-off-by: "Huang, Ying" 
> Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
> ---
>  fs/proc/task_mmu.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 2593a0c609d7..01aad772f8db 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1311,13 +1311,15 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned 
> long addr, unsigned long end,
>   pmd_t pmd = *pmdp;
>   struct page *page = NULL;
>  
> - if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(pmd))
> + if (vma->vm_flags & VM_SOFTDIRTY)
>   flags |= PM_SOFT_DIRTY;

right, checking bits in pmd must be done after pmd_present is confirmed.

Acked-by: Naoya Horiguchi 

Thanks,
Naoya Horiguchi

>  
>   if (pmd_present(pmd)) {
>   page = pmd_page(pmd);
>  
>   flags |= PM_PRESENT;
> + if (pmd_soft_dirty(pmd))
> + flags |= PM_SOFT_DIRTY;
>   if (pm->show_pfn)
>   frame = pmd_pfn(pmd) +
>   ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> @@ -1329,6 +1331,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long 
> addr, unsigned long end,
>   frame = swp_type(entry) |
>   (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
>   flags |= PM_SWAP;
> + if (pmd_swp_soft_dirty(pmd))
> + flags |= PM_SOFT_DIRTY;
>   VM_BUG_ON(!is_pmd_migration_entry(pmd));
>   page = migration_entry_to_page(entry);
>   }
>

Re: [PATCH -mm -V2] mm, pagemap: Fix soft dirty marking for PMD migration entry

2017-10-19 Thread Naoya Horiguchi



On 10/20/2017 12:10 AM, Huang, Ying wrote:
> From: Huang Ying 
> 
> Now, when the page table is walked in the implementation of
> /proc//pagemap, pmd_soft_dirty() is used for both the PMD huge
> page map and the PMD migration entries.  That is wrong,
> pmd_swp_soft_dirty() should be used for the PMD migration entries
> instead because the different page table entry flag is used.
> Otherwise, the soft dirty information in /proc//pagemap may be
> wrong.
> 
> Cc: Michal Hocko 
> Cc: "Kirill A. Shutemov" 
> Cc: David Rientjes 
> Cc: Arnd Bergmann 
> Cc: Hugh Dickins 
> Cc: "Jérôme Glisse" 
> Cc: Daniel Colascione 
> Cc: Zi Yan 
> Cc: Naoya Horiguchi 
> Signed-off-by: "Huang, Ying" 
> Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
> ---
>  fs/proc/task_mmu.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 2593a0c609d7..01aad772f8db 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1311,13 +1311,15 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned 
> long addr, unsigned long end,
>   pmd_t pmd = *pmdp;
>   struct page *page = NULL;
>  
> - if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(pmd))
> + if (vma->vm_flags & VM_SOFTDIRTY)
>   flags |= PM_SOFT_DIRTY;

right, checking bits in pmd must be done after pmd_present is confirmed.

Acked-by: Naoya Horiguchi 

Thanks,
Naoya Horiguchi

>  
>   if (pmd_present(pmd)) {
>   page = pmd_page(pmd);
>  
>   flags |= PM_PRESENT;
> + if (pmd_soft_dirty(pmd))
> + flags |= PM_SOFT_DIRTY;
>   if (pm->show_pfn)
>   frame = pmd_pfn(pmd) +
>   ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> @@ -1329,6 +1331,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long 
> addr, unsigned long end,
>   frame = swp_type(entry) |
>   (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
>   flags |= PM_SWAP;
> + if (pmd_swp_soft_dirty(pmd))
> + flags |= PM_SOFT_DIRTY;
>   VM_BUG_ON(!is_pmd_migration_entry(pmd));
>   page = migration_entry_to_page(entry);
>   }
>

Re: [PATCH 0/4] kaslr: extend movable_node to movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Chao Fan

On Fri, Oct 20, 2017 at 11:37:11AM +0800, Dou Liyang wrote:
>Hi Chao,
>
>At 10/20/2017 10:53 AM, Chao Fan wrote:
>> On Fri, Oct 20, 2017 at 10:37:52AM +0800, Dou Liyang wrote:
>> > Hi Chao,
>> > 
>> Hi Dou-san,
>> 
>> > Cheer! I have some concerns below.
>> 
>> Thanks for your reply.
>> 
>> > 
>> > At 10/19/2017 06:02 PM, Chao Fan wrote:
>> > > Here is a problem:
>> > > Here is a machine with several NUMA nodes and some of them are 
>> > > hot-pluggable.
>> > > It's not good for kernel to be extracted in the memory region of movable 
>> > > node.
>> > > But in current code, I print the address choosen by kaslr and found it 
>> > > may be
>> > > placed in movable node sometimes. To solve this problem, it's better to 
>> > > limit
>> > > the memory region choosen by kaslr to immovable node in kaslr.c. But the 
>> > > memory
>> > > infomation about if it's hot-pluggable is stored in ACPI SRAT table, 
>> > > which is
>> > > parsed after kernel is extracted. So we can't get the detail memory 
>> > > infomation
>> > > before extracting kernel.
>> > > 
>> > > So extend the movable_node to movable_node=nn@ss, in which nn means
>> > > the size of memory in *immovable* node, and ss means the start position 
>> > > of
>> > > this memory region. Then limit kaslr choose memory in these regions.
>> > 
>> > Yes, great. Here we should remember that the situation of
>> > 'movable_node=nn@ss' is rare, normal situation is 'movable_node=nn'.
>> > 
>> > So, we should consider our code tendencies for normal situation. ;-)
>> 
>> Yes, it's normal. But you can not make sure the special situation will
>> never happen,. If it happens, we can make sure codes work well, right?
>> 
>> We can not make sure that the movable nodes are continuous, or even if
>> the movable nodes are continuous, we can not make sure the memory
>> address are continuous.
>> 
>> It is easy to avoid the memory region in movable node.
>> But if we can handle more special situations, and at the same time,
>> make kernel more safe, why not?
>
>You misunderstand my opinions, I means that
>when we code, we need to know the problem clearly and which part of
>problem will often be executed.
>
>Make our code more suitable for the normal situation without affecting the
>function of the problem.
>Just like:
>
>likely() and unlikely()
Hi Dou-san,

Thanks for that. I think likely() is suitable for another place.

Thanks,
Chao Fan

>
>Here I guess you don't consider that. so I said that.
>
>> 
>> > 
>> > > 
>> > > There are two policies:
>> > > 1. Specify the memory region in *movable* node to avoid:
>> > >Then we can use the existing mem_avoid to handle. But if the memory
>> > >one movable node was separated by memory hole or different movable 
>> > > nodes
>> > >are discontinuous, we don't know how many regions need to avoid.
>> > 
>> > It is not a problem.
>> > 
>> > As you said, we should provide an interface for users later, like that:
>> > 
>> > # cat /sys/device/system/memory/movable_node
>> > nn@ss
>> > 
>> 
>> Both are OK. I think outputing the memory region in movable_node or
>> immovable_node are both reasonable. So the interface of both methods
>> will be useful. And after we decided which policy used in kaslr, then
>> add the interface of /sys.
>> 
>
>Actually, I prefer the first one, are you ready to post the patches
>for the first policy?
>
>Thanks,
>   dou.
>> Thanks,
>> Chao Fan
>> 
>> > 
>> > Thanks,
>> >dou.
>> > >OTOH, we must avoid all of the movable memory, otherwise, kaslr may
>> > >choose the wrong place.
>> > > 2. Specify the memory region in "immovable* node to select:
>> > >Only support 4 regions in this parameter. Then user can use two nodes
>> > >at least for kaslr to choose, it's enough for the kernel to extract.
>> > >At the same time, because we need only 4 new mem_vector, the usage
>> > >of memory here is not too big.
>> > > 
>> > > PATCH 1/4 parse the extended movable_node=nn[KMG]@ss[KMG], then
>> > >store the memory regions.
>> > > PATCH 2/4 selects the memory region in immovable node when process
>> > >memmap.
>> > > PATCH 3/4 is the change of document.
>> > > PATCH 4/4 cleans up some little problems.
>> > > 
>> > > Chao Fan (4):
>> > >   kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]
>> > >   kaslr: select the memory region in immovable node to process
>> > >   document: change the document for the extended movable_node
>> > >   kaslr: clean up a useless variable and some usless space
>> > > 
>> > >  Documentation/admin-guide/kernel-parameters.txt |   9 ++
>> > >  arch/x86/boot/compressed/kaslr.c| 140 
>> > > +---
>> > >  2 files changed, 131 insertions(+), 18 deletions(-)
>> > > 
>>

Re: [PATCH 0/4] kaslr: extend movable_node to movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Chao Fan

On Fri, Oct 20, 2017 at 11:37:11AM +0800, Dou Liyang wrote:
>Hi Chao,
>
>At 10/20/2017 10:53 AM, Chao Fan wrote:
>> On Fri, Oct 20, 2017 at 10:37:52AM +0800, Dou Liyang wrote:
>> > Hi Chao,
>> > 
>> Hi Dou-san,
>> 
>> > Cheer! I have some concerns below.
>> 
>> Thanks for your reply.
>> 
>> > 
>> > At 10/19/2017 06:02 PM, Chao Fan wrote:
>> > > Here is a problem:
>> > > Here is a machine with several NUMA nodes and some of them are 
>> > > hot-pluggable.
>> > > It's not good for kernel to be extracted in the memory region of movable 
>> > > node.
>> > > But in current code, I print the address choosen by kaslr and found it 
>> > > may be
>> > > placed in movable node sometimes. To solve this problem, it's better to 
>> > > limit
>> > > the memory region choosen by kaslr to immovable node in kaslr.c. But the 
>> > > memory
>> > > infomation about if it's hot-pluggable is stored in ACPI SRAT table, 
>> > > which is
>> > > parsed after kernel is extracted. So we can't get the detail memory 
>> > > infomation
>> > > before extracting kernel.
>> > > 
>> > > So extend the movable_node to movable_node=nn@ss, in which nn means
>> > > the size of memory in *immovable* node, and ss means the start position 
>> > > of
>> > > this memory region. Then limit kaslr choose memory in these regions.
>> > 
>> > Yes, great. Here we should remember that the situation of
>> > 'movable_node=nn@ss' is rare, normal situation is 'movable_node=nn'.
>> > 
>> > So, we should consider our code tendencies for normal situation. ;-)
>> 
>> Yes, it's normal. But you can not make sure the special situation will
>> never happen,. If it happens, we can make sure codes work well, right?
>> 
>> We can not make sure that the movable nodes are continuous, or even if
>> the movable nodes are continuous, we can not make sure the memory
>> address are continuous.
>> 
>> It is easy to avoid the memory region in movable node.
>> But if we can handle more special situations, and at the same time,
>> make kernel more safe, why not?
>
>You misunderstand my opinions, I means that
>when we code, we need to know the problem clearly and which part of
>problem will often be executed.
>
>Make our code more suitable for the normal situation without affecting the
>function of the problem.
>Just like:
>
>likely() and unlikely()
Hi Dou-san,

Thanks for that. I think likely() is suitable for another place.

Thanks,
Chao Fan

>
>Here I guess you don't consider that. so I said that.
>
>> 
>> > 
>> > > 
>> > > There are two policies:
>> > > 1. Specify the memory region in *movable* node to avoid:
>> > >Then we can use the existing mem_avoid to handle. But if the memory
>> > >one movable node was separated by memory hole or different movable 
>> > > nodes
>> > >are discontinuous, we don't know how many regions need to avoid.
>> > 
>> > It is not a problem.
>> > 
>> > As you said, we should provide an interface for users later, like that:
>> > 
>> > # cat /sys/device/system/memory/movable_node
>> > nn@ss
>> > 
>> 
>> Both are OK. I think outputing the memory region in movable_node or
>> immovable_node are both reasonable. So the interface of both methods
>> will be useful. And after we decided which policy used in kaslr, then
>> add the interface of /sys.
>> 
>
>Actually, I prefer the first one, are you ready to post the patches
>for the first policy?
>
>Thanks,
>   dou.
>> Thanks,
>> Chao Fan
>> 
>> > 
>> > Thanks,
>> >dou.
>> > >OTOH, we must avoid all of the movable memory, otherwise, kaslr may
>> > >choose the wrong place.
>> > > 2. Specify the memory region in "immovable* node to select:
>> > >Only support 4 regions in this parameter. Then user can use two nodes
>> > >at least for kaslr to choose, it's enough for the kernel to extract.
>> > >At the same time, because we need only 4 new mem_vector, the usage
>> > >of memory here is not too big.
>> > > 
>> > > PATCH 1/4 parse the extended movable_node=nn[KMG]@ss[KMG], then
>> > >store the memory regions.
>> > > PATCH 2/4 selects the memory region in immovable node when process
>> > >memmap.
>> > > PATCH 3/4 is the change of document.
>> > > PATCH 4/4 cleans up some little problems.
>> > > 
>> > > Chao Fan (4):
>> > >   kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]
>> > >   kaslr: select the memory region in immovable node to process
>> > >   document: change the document for the extended movable_node
>> > >   kaslr: clean up a useless variable and some usless space
>> > > 
>> > >  Documentation/admin-guide/kernel-parameters.txt |   9 ++
>> > >  arch/x86/boot/compressed/kaslr.c| 140 
>> > > +---
>> > >  2 files changed, 131 insertions(+), 18 deletions(-)
>> > > 
>>

Re: [PATCH 1/4] kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Chao Fan

On Fri, Oct 20, 2017 at 11:41:19AM +0800, Dou Liyang wrote:
>
>
>At 10/20/2017 11:22 AM, Chao Fan wrote:
>> On Fri, Oct 20, 2017 at 11:04:42AM +0800, Dou Liyang wrote:
>> > Hi Chao,
>> > 
>> Hi Dou-san,
>> 
>> > At 10/19/2017 06:02 PM, Chao Fan wrote:
>> > > Extend the movable_node to movable_node=nn[KMG]@ss[KMG].
>> > > Since in current code, kaslr may choose the memory region in 
>> > > hot-pluggable
>> > > nodes. So we can specific the region in immovable node. And store the
>> > > regions in immovable_mem.
>> > > 
>> > 
>> > I guess you may mean that:
>> > 
>> > In current Linux with KASLR. Kernel may choose a memory region in
>> > movable node for extracting kernel code, which will make the node
>> > can't be hot-removed.
>> > 
>> > Solve it by only specific the region in immovable node. So create
>> > immovable_mem to store the region of movable node, and only choose
>> > the memory in immovable_mem array.
>> > 
>> 
>> Thanks for the explaination.
>> 
>> > > Multiple regions can be specified, comma delimited.
>> > > Considering the usage of memory, only support for 4 regions.
>> > > 4 regions contains 2 nodes at least, and is enough for kernel to
>> > > extract.
>> > > 
>> > > Signed-off-by: Chao Fan 
>> > > ---
>> > >  arch/x86/boot/compressed/kaslr.c | 63 
>> > > +++-
>> > >  1 file changed, 62 insertions(+), 1 deletion(-)
>> > > 
>> > > diff --git a/arch/x86/boot/compressed/kaslr.c 
>> > > b/arch/x86/boot/compressed/kaslr.c
>> > > index 17818ba6906f..3c1f5204693b 100644
>> > > --- a/arch/x86/boot/compressed/kaslr.c
>> > > +++ b/arch/x86/boot/compressed/kaslr.c
>> > > @@ -107,6 +107,12 @@ enum mem_avoid_index {
>> > > 
>> > >  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>> > > 
>> > > +/* Only supporting at most 4 immovable memory regions with kaslr */
>> > > +#define MAX_IMMOVABLE_MEM   4
>> > > +
>> > > +static struct mem_vector immovable_mem[MAX_IMMOVABLE_MEM];
>> > > +static int num_immovable_region;
>> > > +
>> > >  static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
>> > >  {
>> > >  /* Item one is entirely before item two. */
>> > > @@ -167,6 +173,28 @@ parse_memmap(char *p, unsigned long long *start, 
>> > > unsigned long long *size)
>> > >  return -EINVAL;
>> > >  }
>> > > 
>> > > +static int parse_immovable_mem(char *p,
>> > > +   unsigned long long *start,
>> > > +   unsigned long long *size)
>> > > +{
>> > > +char *oldp;
>> > > +
>> > > +if (!p)
>> > > +return -EINVAL;
>> > > +
>> > > +oldp = p;
>> > > +*size = memparse(p, );
>> > > +if (p == oldp)
>> > > +return -EINVAL;
>> > > +
>> > > +if (*p == '@') {
>> > > +*start = memparse(p + 1, );
>> > > +return 0;
>> > > +}
>> > > +
>> > 
>> > Here you don't consider that if @ss[KMG] is omitted.
>> 
>> Yes, will add. Many thanks.
>> 
>> > 
>> > > +return -EINVAL;
>> > > +}
>> > > +
>> > >  static void mem_avoid_memmap(char *str)
>> > >  {
>> > >  static int i;
>> > > @@ -206,6 +234,36 @@ static void mem_avoid_memmap(char *str)
>> > >  memmap_too_large = true;
>> > >  }
>> > > 
>> > > +#ifdef CONFIG_MEMORY_HOTPLUG
>> > > +static void mem_mark_immovable(char *str)
>> > > +{
>> > > +int i = 0;
>> > > +
>> > 
>> > you have use num_immovable_region, 'i' is useless. just remove it.
>> 
>> Using num_immovable_region makes code too long. Using i will be
>> clear and make sure shoter than 80 characters.
>
>Oh, God, that's horrific. Did you find that your code is wrong?
>
>num_immovable_region will be reset each time.

Did you test?

Thanks,
Chao Fan

>
>Thanks,
>   dou.
>
>> 
>> > 
>> > > +while (str && (i < MAX_IMMOVABLE_MEM)) {
>> > > +int rc;
>> > > +unsigned long long start, size;
>> > > +char *k = strchr(str, ',');
>> > > +
>> > 
>> > Why do you put this definition here? IMO, moving it out is better.
>> > 
>> > > +if (k)
>> > > +*k++ = 0;
>> > > +
>> > > +rc = parse_immovable_mem(str, , );
>> > > +if (rc < 0)
>> > > +break;
>> > > +str = k;
>> > > +
>> > > +immovable_mem[i].start = start;
>> > > +immovable_mem[i].size = size;
>> > > +i++;
>> > 
>> > Replace it with num_immovable_region
>> 
>> ditto...
>> Why do you care this little variable so much.
>> 
>> > 
>> > > +}
>> > > +num_immovable_region = i;
>> > 
>> > Just remove it.
>> 
>> ditto.
>> 
>> > 
>> > > +}
>> > > +#else
>> > > +static inline void mem_mark_immovable(char *str)
>> > > +{
>> > > +}
>> > > +#endif
>> > > +
>> > >  static int handle_mem_memmap(void)
>> > >  {
>> > >  char *args = (char *)get_cmd_line_ptr();
>> > > @@

Re: [PATCH 1/4] kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Chao Fan

On Fri, Oct 20, 2017 at 11:41:19AM +0800, Dou Liyang wrote:
>
>
>At 10/20/2017 11:22 AM, Chao Fan wrote:
>> On Fri, Oct 20, 2017 at 11:04:42AM +0800, Dou Liyang wrote:
>> > Hi Chao,
>> > 
>> Hi Dou-san,
>> 
>> > At 10/19/2017 06:02 PM, Chao Fan wrote:
>> > > Extend the movable_node to movable_node=nn[KMG]@ss[KMG].
>> > > Since in current code, kaslr may choose the memory region in 
>> > > hot-pluggable
>> > > nodes. So we can specific the region in immovable node. And store the
>> > > regions in immovable_mem.
>> > > 
>> > 
>> > I guess you may mean that:
>> > 
>> > In current Linux with KASLR. Kernel may choose a memory region in
>> > movable node for extracting kernel code, which will make the node
>> > can't be hot-removed.
>> > 
>> > Solve it by only specific the region in immovable node. So create
>> > immovable_mem to store the region of movable node, and only choose
>> > the memory in immovable_mem array.
>> > 
>> 
>> Thanks for the explaination.
>> 
>> > > Multiple regions can be specified, comma delimited.
>> > > Considering the usage of memory, only support for 4 regions.
>> > > 4 regions contains 2 nodes at least, and is enough for kernel to
>> > > extract.
>> > > 
>> > > Signed-off-by: Chao Fan 
>> > > ---
>> > >  arch/x86/boot/compressed/kaslr.c | 63 
>> > > +++-
>> > >  1 file changed, 62 insertions(+), 1 deletion(-)
>> > > 
>> > > diff --git a/arch/x86/boot/compressed/kaslr.c 
>> > > b/arch/x86/boot/compressed/kaslr.c
>> > > index 17818ba6906f..3c1f5204693b 100644
>> > > --- a/arch/x86/boot/compressed/kaslr.c
>> > > +++ b/arch/x86/boot/compressed/kaslr.c
>> > > @@ -107,6 +107,12 @@ enum mem_avoid_index {
>> > > 
>> > >  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>> > > 
>> > > +/* Only supporting at most 4 immovable memory regions with kaslr */
>> > > +#define MAX_IMMOVABLE_MEM   4
>> > > +
>> > > +static struct mem_vector immovable_mem[MAX_IMMOVABLE_MEM];
>> > > +static int num_immovable_region;
>> > > +
>> > >  static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
>> > >  {
>> > >  /* Item one is entirely before item two. */
>> > > @@ -167,6 +173,28 @@ parse_memmap(char *p, unsigned long long *start, 
>> > > unsigned long long *size)
>> > >  return -EINVAL;
>> > >  }
>> > > 
>> > > +static int parse_immovable_mem(char *p,
>> > > +   unsigned long long *start,
>> > > +   unsigned long long *size)
>> > > +{
>> > > +char *oldp;
>> > > +
>> > > +if (!p)
>> > > +return -EINVAL;
>> > > +
>> > > +oldp = p;
>> > > +*size = memparse(p, );
>> > > +if (p == oldp)
>> > > +return -EINVAL;
>> > > +
>> > > +if (*p == '@') {
>> > > +*start = memparse(p + 1, );
>> > > +return 0;
>> > > +}
>> > > +
>> > 
>> > Here you don't consider that if @ss[KMG] is omitted.
>> 
>> Yes, will add. Many thanks.
>> 
>> > 
>> > > +return -EINVAL;
>> > > +}
>> > > +
>> > >  static void mem_avoid_memmap(char *str)
>> > >  {
>> > >  static int i;
>> > > @@ -206,6 +234,36 @@ static void mem_avoid_memmap(char *str)
>> > >  memmap_too_large = true;
>> > >  }
>> > > 
>> > > +#ifdef CONFIG_MEMORY_HOTPLUG
>> > > +static void mem_mark_immovable(char *str)
>> > > +{
>> > > +int i = 0;
>> > > +
>> > 
>> > you have use num_immovable_region, 'i' is useless. just remove it.
>> 
>> Using num_immovable_region makes code too long. Using i will be
>> clear and make sure shoter than 80 characters.
>
>Oh, God, that's horrific. Did you find that your code is wrong?
>
>num_immovable_region will be reset each time.

Did you test?

Thanks,
Chao Fan

>
>Thanks,
>   dou.
>
>> 
>> > 
>> > > +while (str && (i < MAX_IMMOVABLE_MEM)) {
>> > > +int rc;
>> > > +unsigned long long start, size;
>> > > +char *k = strchr(str, ',');
>> > > +
>> > 
>> > Why do you put this definition here? IMO, moving it out is better.
>> > 
>> > > +if (k)
>> > > +*k++ = 0;
>> > > +
>> > > +rc = parse_immovable_mem(str, , );
>> > > +if (rc < 0)
>> > > +break;
>> > > +str = k;
>> > > +
>> > > +immovable_mem[i].start = start;
>> > > +immovable_mem[i].size = size;
>> > > +i++;
>> > 
>> > Replace it with num_immovable_region
>> 
>> ditto...
>> Why do you care this little variable so much.
>> 
>> > 
>> > > +}
>> > > +num_immovable_region = i;
>> > 
>> > Just remove it.
>> 
>> ditto.
>> 
>> > 
>> > > +}
>> > > +#else
>> > > +static inline void mem_mark_immovable(char *str)
>> > > +{
>> > > +}
>> > > +#endif
>> > > +
>> > >  static int handle_mem_memmap(void)
>> > >  {
>> > >  char *args = (char *)get_cmd_line_ptr();
>> > > @@ -214,7 +272,8 @@ static int

Re: [PATCH 1/4] kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Dou Liyang




At 10/20/2017 11:22 AM, Chao Fan wrote:

On Fri, Oct 20, 2017 at 11:04:42AM +0800, Dou Liyang wrote:

Hi Chao,


Hi Dou-san,


At 10/19/2017 06:02 PM, Chao Fan wrote:

Extend the movable_node to movable_node=nn[KMG]@ss[KMG].
Since in current code, kaslr may choose the memory region in hot-pluggable
nodes. So we can specific the region in immovable node. And store the
regions in immovable_mem.



I guess you may mean that:

In current Linux with KASLR. Kernel may choose a memory region in
movable node for extracting kernel code, which will make the node
can't be hot-removed.

Solve it by only specific the region in immovable node. So create
immovable_mem to store the region of movable node, and only choose
the memory in immovable_mem array.



Thanks for the explaination.


Multiple regions can be specified, comma delimited.
Considering the usage of memory, only support for 4 regions.
4 regions contains 2 nodes at least, and is enough for kernel to
extract.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/kaslr.c | 63 +++-
 1 file changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 17818ba6906f..3c1f5204693b 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -107,6 +107,12 @@ enum mem_avoid_index {

 static struct mem_vector mem_avoid[MEM_AVOID_MAX];

+/* Only supporting at most 4 immovable memory regions with kaslr */
+#define MAX_IMMOVABLE_MEM  4
+
+static struct mem_vector immovable_mem[MAX_IMMOVABLE_MEM];
+static int num_immovable_region;
+
 static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 {
/* Item one is entirely before item two. */
@@ -167,6 +173,28 @@ parse_memmap(char *p, unsigned long long *start, unsigned 
long long *size)
return -EINVAL;
 }

+static int parse_immovable_mem(char *p,
+  unsigned long long *start,
+  unsigned long long *size)
+{
+   char *oldp;
+
+   if (!p)
+   return -EINVAL;
+
+   oldp = p;
+   *size = memparse(p, );
+   if (p == oldp)
+   return -EINVAL;
+
+   if (*p == '@') {
+   *start = memparse(p + 1, );
+   return 0;
+   }
+


Here you don't consider that if @ss[KMG] is omitted.


Yes, will add. Many thanks.




+   return -EINVAL;
+}
+
 static void mem_avoid_memmap(char *str)
 {
static int i;
@@ -206,6 +234,36 @@ static void mem_avoid_memmap(char *str)
memmap_too_large = true;
 }

+#ifdef CONFIG_MEMORY_HOTPLUG
+static void mem_mark_immovable(char *str)
+{
+   int i = 0;
+


you have use num_immovable_region, 'i' is useless. just remove it.


Using num_immovable_region makes code too long. Using i will be
clear and make sure shoter than 80 characters.


Oh, God, that's horrific. Did you find that your code is wrong?

num_immovable_region will be reset each time.

Thanks,
dou.






+   while (str && (i < MAX_IMMOVABLE_MEM)) {
+   int rc;
+   unsigned long long start, size;
+   char *k = strchr(str, ',');
+


Why do you put this definition here? IMO, moving it out is better.


+   if (k)
+   *k++ = 0;
+
+   rc = parse_immovable_mem(str, , );
+   if (rc < 0)
+   break;
+   str = k;
+
+   immovable_mem[i].start = start;
+   immovable_mem[i].size = size;
+   i++;


Replace it with num_immovable_region


ditto...
Why do you care this little variable so much.




+   }
+   num_immovable_region = i;


Just remove it.


ditto.




+}
+#else
+static inline void mem_mark_immovable(char *str)
+{
+}
+#endif
+
 static int handle_mem_memmap(void)
 {
char *args = (char *)get_cmd_line_ptr();
@@ -214,7 +272,8 @@ static int handle_mem_memmap(void)
char *param, *val;
u64 mem_size;

-   if (!strstr(args, "memmap=") && !strstr(args, "mem="))
+   if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
+   !strstr(args, "movable_node="))
return 0;

tmp_cmdline = malloc(len + 1);
@@ -239,6 +298,8 @@ static int handle_mem_memmap(void)

if (!strcmp(param, "memmap")) {
mem_avoid_memmap(val);
+   } else if (!strcmp(param, "movable_node")) {
+   mem_mark_immovable(val);


AFAIK, handle_mem_memmap() is invoked in mem_avoid_init(), which is used to
avoid mem. But, here the value of immovable node is the memory
you want to mark and use, it is better that we split it out.


There is existing and useful code, so it's better to reuse but not
re-write.



BTW, Using movable_node to store the memory of immovable node is
strange and make me confuse. How about adding a new command option.



I have added the

Re: [PATCH 2/4] kaslr: select the memory region in immovable node to process

2017-10-19 Thread Chao Fan

On Fri, Oct 20, 2017 at 11:17:26AM +0800, Dou Liyang wrote:
>Hi Chao
>
Hi Dou-san,

>At 10/19/2017 06:02 PM, Chao Fan wrote:
>> Since the interrelationship between e820 or efi entries and memory
>> region in immovable_mem is different:
>> One memory region in one node may contain several entries of e820 or
>> efi sometimes, and one entry of e820 or efi may contain the memory in
>> different nodes sometimes. So select the intersection as a region to
>> process_mem_region. It may split one node or one entry to several regions.
>> 
>> Signed-off-by: Chao Fan 
>> ---
>>  arch/x86/boot/compressed/kaslr.c | 72 
>> 
>>  1 file changed, 58 insertions(+), 14 deletions(-)
>> 
>> diff --git a/arch/x86/boot/compressed/kaslr.c 
>> b/arch/x86/boot/compressed/kaslr.c
>> index 3c1f5204693b..22330cbe8515 100644
>> --- a/arch/x86/boot/compressed/kaslr.c
>> +++ b/arch/x86/boot/compressed/kaslr.c
>> @@ -563,6 +563,7 @@ static void process_mem_region(struct mem_vector *entry,
>>  end = min(entry->size + entry->start, mem_limit);
>>  if (entry->start >= end)
>>  return;
>> +
>>  cur_entry.start = entry->start;
>>  cur_entry.size = end - entry->start;
>
>Above code has nothing to do with this patch. remove it.
>
>> 
>> @@ -621,6 +622,52 @@ static void process_mem_region(struct mem_vector *entry,
>>  }
>>  }
>> 
>> +static bool select_immovable_node(unsigned long long start,
>> +  unsigned long long size,
>> +  unsigned long long minimum,
>> +  unsigned long long image_size)
>> +{
>> +struct mem_vector region;
>> +int i;
>> +
>> +if (num_immovable_region == 0) {
>
>Seems it more better:
>
>#ifdef CONFIG_MEMORY_HOTPLUG
>for (i = 0; i < num_immovable_region; i++) {
>  ...
>}
>#else
>...
>process_mem_region(, minimum, image_size);
>...
>#endif

No, if we set CONFIG_MEMORY_HOTPLUG=y(the default is y in fedora), but we do not
use movable_node, we should go to process_mem_region straight, right?
But in your change, we still go to the for(...), even if
num_immovable_region == 0.
So if num_immovable_region is 0, it includes the situation of
CONFIG_MEMORY_HOTPLUG=n

>
>> +region.start = start;
>> +region.size = size;
>> +process_mem_region(, minimum, image_size);
>> +
>> +if (slot_area_index == MAX_SLOT_AREA) {
>> +debug_putstr("Aborted memmap scan (slot_areas 
>> full)!\n");
>> +return 1;
>> +}
>> +} else {
>> +for (i = 0; i < num_immovable_region; i++) {
>> +unsigned long long end, select_end;
>> +unsigned long long region_start, region_end;
>> +
>> +end = start + size - 1;
>> +region_start = immovable_mem[i].start;
>> +region_end = region_start + immovable_mem[i].size - 1;
>> +
>> +if (end < region_start || start > region_end)
>> +continue;
>> +
>> +region.start = start > region_start ?
>> +   start : region_start;
>> +select_end = end > region_end ? region_end : end;
>> +
>> +region.size = select_end - region.start + 1;
>> +
>> +process_mem_region(, minimum, image_size);
>> +
>> +if (slot_area_index == MAX_SLOT_AREA) {
>> +debug_putstr("Aborted memmap scan (slot_areas 
>> full)!\n");
>> +return 1;
>> +}
>> +}
>> +}
>> +return 0;
>> +}
>> +
>>  #ifdef CONFIG_EFI
>>  /*
>>   * Returns true if mirror region found (and must have been processed
>> @@ -631,7 +678,6 @@ process_efi_entries(unsigned long minimum, unsigned long 
>> image_size)
>>  {
>>  struct efi_info *e = _params->efi_info;
>>  bool efi_mirror_found = false;
>> -struct mem_vector region;
>>  efi_memory_desc_t *md;
>>  unsigned long pmap;
>>  char *signature;
>> @@ -664,6 +710,8 @@ process_efi_entries(unsigned long minimum, unsigned long 
>> image_size)
>>  }
>> 
>>  for (i = 0; i < nr_desc; i++) {
>> +unsigned long long start, size;
>> +
>>  md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);
>> 
>>  /*
>> @@ -684,13 +732,11 @@ process_efi_entries(unsigned long minimum, unsigned 
>> long image_size)
>>  !(md->attribute & EFI_MEMORY_MORE_RELIABLE))
>>  continue;
>> 
>> -region.start = md->phys_addr;
>> -region.size = md->num_pages << EFI_PAGE_SHIFT;
>> -process_mem_region(, minimum, image_size);
>> -if (slot_area_index == MAX_SLOT_AREA) {
>> -debug_putstr("Aborted EFI scan (slot_areas full)!\n");
>> +start =

Re: [PATCH 1/4] kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Dou Liyang




At 10/20/2017 11:22 AM, Chao Fan wrote:

On Fri, Oct 20, 2017 at 11:04:42AM +0800, Dou Liyang wrote:

Hi Chao,


Hi Dou-san,


At 10/19/2017 06:02 PM, Chao Fan wrote:

Extend the movable_node to movable_node=nn[KMG]@ss[KMG].
Since in current code, kaslr may choose the memory region in hot-pluggable
nodes. So we can specific the region in immovable node. And store the
regions in immovable_mem.



I guess you may mean that:

In current Linux with KASLR. Kernel may choose a memory region in
movable node for extracting kernel code, which will make the node
can't be hot-removed.

Solve it by only specific the region in immovable node. So create
immovable_mem to store the region of movable node, and only choose
the memory in immovable_mem array.



Thanks for the explaination.


Multiple regions can be specified, comma delimited.
Considering the usage of memory, only support for 4 regions.
4 regions contains 2 nodes at least, and is enough for kernel to
extract.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/kaslr.c | 63 +++-
 1 file changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 17818ba6906f..3c1f5204693b 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -107,6 +107,12 @@ enum mem_avoid_index {

 static struct mem_vector mem_avoid[MEM_AVOID_MAX];

+/* Only supporting at most 4 immovable memory regions with kaslr */
+#define MAX_IMMOVABLE_MEM  4
+
+static struct mem_vector immovable_mem[MAX_IMMOVABLE_MEM];
+static int num_immovable_region;
+
 static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 {
/* Item one is entirely before item two. */
@@ -167,6 +173,28 @@ parse_memmap(char *p, unsigned long long *start, unsigned 
long long *size)
return -EINVAL;
 }

+static int parse_immovable_mem(char *p,
+  unsigned long long *start,
+  unsigned long long *size)
+{
+   char *oldp;
+
+   if (!p)
+   return -EINVAL;
+
+   oldp = p;
+   *size = memparse(p, );
+   if (p == oldp)
+   return -EINVAL;
+
+   if (*p == '@') {
+   *start = memparse(p + 1, );
+   return 0;
+   }
+


Here you don't consider that if @ss[KMG] is omitted.


Yes, will add. Many thanks.




+   return -EINVAL;
+}
+
 static void mem_avoid_memmap(char *str)
 {
static int i;
@@ -206,6 +234,36 @@ static void mem_avoid_memmap(char *str)
memmap_too_large = true;
 }

+#ifdef CONFIG_MEMORY_HOTPLUG
+static void mem_mark_immovable(char *str)
+{
+   int i = 0;
+


you have use num_immovable_region, 'i' is useless. just remove it.


Using num_immovable_region makes code too long. Using i will be
clear and make sure shoter than 80 characters.


Oh, God, that's horrific. Did you find that your code is wrong?

num_immovable_region will be reset each time.

Thanks,
dou.






+   while (str && (i < MAX_IMMOVABLE_MEM)) {
+   int rc;
+   unsigned long long start, size;
+   char *k = strchr(str, ',');
+


Why do you put this definition here? IMO, moving it out is better.


+   if (k)
+   *k++ = 0;
+
+   rc = parse_immovable_mem(str, , );
+   if (rc < 0)
+   break;
+   str = k;
+
+   immovable_mem[i].start = start;
+   immovable_mem[i].size = size;
+   i++;


Replace it with num_immovable_region


ditto...
Why do you care this little variable so much.




+   }
+   num_immovable_region = i;


Just remove it.


ditto.




+}
+#else
+static inline void mem_mark_immovable(char *str)
+{
+}
+#endif
+
 static int handle_mem_memmap(void)
 {
char *args = (char *)get_cmd_line_ptr();
@@ -214,7 +272,8 @@ static int handle_mem_memmap(void)
char *param, *val;
u64 mem_size;

-   if (!strstr(args, "memmap=") && !strstr(args, "mem="))
+   if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
+   !strstr(args, "movable_node="))
return 0;

tmp_cmdline = malloc(len + 1);
@@ -239,6 +298,8 @@ static int handle_mem_memmap(void)

if (!strcmp(param, "memmap")) {
mem_avoid_memmap(val);
+   } else if (!strcmp(param, "movable_node")) {
+   mem_mark_immovable(val);


AFAIK, handle_mem_memmap() is invoked in mem_avoid_init(), which is used to
avoid mem. But, here the value of immovable node is the memory
you want to mark and use, it is better that we split it out.


There is existing and useful code, so it's better to reuse but not
re-write.



BTW, Using movable_node to store the memory of immovable node is
strange and make me confuse. How about adding a new command option.



I have added the document, if you think there

Re: [PATCH 2/4] kaslr: select the memory region in immovable node to process

2017-10-19 Thread Chao Fan

On Fri, Oct 20, 2017 at 11:17:26AM +0800, Dou Liyang wrote:
>Hi Chao
>
Hi Dou-san,

>At 10/19/2017 06:02 PM, Chao Fan wrote:
>> Since the interrelationship between e820 or efi entries and memory
>> region in immovable_mem is different:
>> One memory region in one node may contain several entries of e820 or
>> efi sometimes, and one entry of e820 or efi may contain the memory in
>> different nodes sometimes. So select the intersection as a region to
>> process_mem_region. It may split one node or one entry to several regions.
>> 
>> Signed-off-by: Chao Fan 
>> ---
>>  arch/x86/boot/compressed/kaslr.c | 72 
>> 
>>  1 file changed, 58 insertions(+), 14 deletions(-)
>> 
>> diff --git a/arch/x86/boot/compressed/kaslr.c 
>> b/arch/x86/boot/compressed/kaslr.c
>> index 3c1f5204693b..22330cbe8515 100644
>> --- a/arch/x86/boot/compressed/kaslr.c
>> +++ b/arch/x86/boot/compressed/kaslr.c
>> @@ -563,6 +563,7 @@ static void process_mem_region(struct mem_vector *entry,
>>  end = min(entry->size + entry->start, mem_limit);
>>  if (entry->start >= end)
>>  return;
>> +
>>  cur_entry.start = entry->start;
>>  cur_entry.size = end - entry->start;
>
>Above code has nothing to do with this patch. remove it.
>
>> 
>> @@ -621,6 +622,52 @@ static void process_mem_region(struct mem_vector *entry,
>>  }
>>  }
>> 
>> +static bool select_immovable_node(unsigned long long start,
>> +  unsigned long long size,
>> +  unsigned long long minimum,
>> +  unsigned long long image_size)
>> +{
>> +struct mem_vector region;
>> +int i;
>> +
>> +if (num_immovable_region == 0) {
>
>Seems it more better:
>
>#ifdef CONFIG_MEMORY_HOTPLUG
>for (i = 0; i < num_immovable_region; i++) {
>  ...
>}
>#else
>...
>process_mem_region(, minimum, image_size);
>...
>#endif

No, if we set CONFIG_MEMORY_HOTPLUG=y(the default is y in fedora), but we do not
use movable_node, we should go to process_mem_region straight, right?
But in your change, we still go to the for(...), even if
num_immovable_region == 0.
So if num_immovable_region is 0, it includes the situation of
CONFIG_MEMORY_HOTPLUG=n

>
>> +region.start = start;
>> +region.size = size;
>> +process_mem_region(, minimum, image_size);
>> +
>> +if (slot_area_index == MAX_SLOT_AREA) {
>> +debug_putstr("Aborted memmap scan (slot_areas 
>> full)!\n");
>> +return 1;
>> +}
>> +} else {
>> +for (i = 0; i < num_immovable_region; i++) {
>> +unsigned long long end, select_end;
>> +unsigned long long region_start, region_end;
>> +
>> +end = start + size - 1;
>> +region_start = immovable_mem[i].start;
>> +region_end = region_start + immovable_mem[i].size - 1;
>> +
>> +if (end < region_start || start > region_end)
>> +continue;
>> +
>> +region.start = start > region_start ?
>> +   start : region_start;
>> +select_end = end > region_end ? region_end : end;
>> +
>> +region.size = select_end - region.start + 1;
>> +
>> +process_mem_region(, minimum, image_size);
>> +
>> +if (slot_area_index == MAX_SLOT_AREA) {
>> +debug_putstr("Aborted memmap scan (slot_areas 
>> full)!\n");
>> +return 1;
>> +}
>> +}
>> +}
>> +return 0;
>> +}
>> +
>>  #ifdef CONFIG_EFI
>>  /*
>>   * Returns true if mirror region found (and must have been processed
>> @@ -631,7 +678,6 @@ process_efi_entries(unsigned long minimum, unsigned long 
>> image_size)
>>  {
>>  struct efi_info *e = _params->efi_info;
>>  bool efi_mirror_found = false;
>> -struct mem_vector region;
>>  efi_memory_desc_t *md;
>>  unsigned long pmap;
>>  char *signature;
>> @@ -664,6 +710,8 @@ process_efi_entries(unsigned long minimum, unsigned long 
>> image_size)
>>  }
>> 
>>  for (i = 0; i < nr_desc; i++) {
>> +unsigned long long start, size;
>> +
>>  md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);
>> 
>>  /*
>> @@ -684,13 +732,11 @@ process_efi_entries(unsigned long minimum, unsigned 
>> long image_size)
>>  !(md->attribute & EFI_MEMORY_MORE_RELIABLE))
>>  continue;
>> 
>> -region.start = md->phys_addr;
>> -region.size = md->num_pages << EFI_PAGE_SHIFT;
>> -process_mem_region(, minimum, image_size);
>> -if (slot_area_index == MAX_SLOT_AREA) {
>> -debug_putstr("Aborted EFI scan (slot_areas full)!\n");
>> +start = md->phys_addr;
>> +

[RFC PATCH 4/5] gpio: gpiolib: Add sysfs support for maintaining GPIO values on reset

2017-10-19 Thread Andrew Jeffery

Expose a new 'maintain' sysfs attribute to control both suspend and
reset tolerance.

Signed-off-by: Andrew Jeffery 
---
 Documentation/gpio/sysfs.txt |  9 +
 drivers/gpio/gpiolib-sysfs.c | 88 ++--
 2 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/Documentation/gpio/sysfs.txt b/Documentation/gpio/sysfs.txt
index aeab01aa4d00..f447f0746884 100644
--- a/Documentation/gpio/sysfs.txt
+++ b/Documentation/gpio/sysfs.txt
@@ -96,6 +96,15 @@ and have the following read/write attributes:
for "rising" and "falling" edges will follow this
setting.
 
+   "maintain" ... displays and controls whether the state of the GPIO is
+   maintained or lost on suspend or reset. The valid values take
+   the following meanings:
+
+   0: Do not maintain state on either suspend or reset
+   1: Maintain state for suspend only
+   2: Maintain state for reset only
+   3: Maintain state for both suspend and reset
+
 GPIO controllers have paths like /sys/class/gpio/gpiochip42/ (for the
 controller implementing GPIOs starting at #42) and have the following
 read-only attributes:
diff --git a/drivers/gpio/gpiolib-sysfs.c b/drivers/gpio/gpiolib-sysfs.c
index 3f454eaf2101..bfa186e73e26 100644
--- a/drivers/gpio/gpiolib-sysfs.c
+++ b/drivers/gpio/gpiolib-sysfs.c
@@ -289,6 +289,74 @@ static ssize_t edge_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(edge);
 
+#define GPIOLIB_SYSFS_MAINTAIN_SUSPEND BIT(0)
+#define GPIOLIB_SYSFS_MAINTAIN_RESET   BIT(1)
+#define GPIOLIB_SYSFS_MAINTAIN_ALL GENMASK(1, 0)
+static ssize_t maintain_show(struct device *dev, struct device_attribute *attr,
+char *buf)
+{
+   struct gpiod_data *data = dev_get_drvdata(dev);
+   ssize_t status = 0;
+   int val = 0;
+
+   mutex_lock(>mutex);
+
+   if (!test_bit(FLAG_SLEEP_MAY_LOSE_VALUE, >desc->flags))
+   val |= GPIOLIB_SYSFS_MAINTAIN_SUSPEND;
+
+   if (test_bit(FLAG_RESET_TOLERANT, >desc->flags))
+   val |= GPIOLIB_SYSFS_MAINTAIN_RESET;
+
+   status = sprintf(buf, "%d\n", val);
+
+   mutex_unlock(>mutex);
+
+   return status;
+}
+
+static ssize_t maintain_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf,
+ size_t size)
+{
+   struct gpiod_data *data = dev_get_drvdata(dev);
+   struct gpio_chip *chip;
+   ssize_t status;
+   long provided;
+
+   mutex_lock(>mutex);
+
+   chip = data->desc->gdev->chip;
+
+   if (!chip->set_config)
+   return -ENOTSUPP;
+
+   status = kstrtol(buf, 0, );
+   if (status < 0)
+   goto out;
+
+   if (provided & ~GPIOLIB_SYSFS_MAINTAIN_ALL) {
+   status = -EINVAL;
+   goto out;
+   }
+
+   if (!(provided & GPIOLIB_SYSFS_MAINTAIN_SUSPEND))
+   set_bit(FLAG_SLEEP_MAY_LOSE_VALUE, >desc->flags);
+   else
+   clear_bit(FLAG_SLEEP_MAY_LOSE_VALUE,
+ >desc->flags);
+
+   /* Configure reset tolerance */
+   status = gpiod_set_reset_tolerant(data->desc,
+   !!(provided & GPIOLIB_SYSFS_MAINTAIN_RESET));
+out:
+   mutex_unlock(>mutex);
+
+   return status ? : size;
+
+}
+static DEVICE_ATTR_RW(maintain);
+
 /* Caller holds gpiod-data mutex. */
 static int gpio_sysfs_set_active_low(struct device *dev, int value)
 {
@@ -378,6 +446,7 @@ static struct attribute *gpio_attrs[] = {
_attr_edge.attr,
_attr_value.attr,
_attr_active_low.attr,
+   _attr_maintain.attr,
NULL,
 };
 
@@ -474,11 +543,22 @@ static ssize_t export_store(struct class *class,
status = -ENODEV;
goto done;
}
-   status = gpiod_export(desc, true);
-   if (status < 0)
+
+   /*
+* If userspace is requesting the GPIO via sysfs, make them explicitly
+* configure reset tolerance each time by unconditionally disabling it
+* here, as the export and configuration steps are not atomic.
+*/
+   status = gpiod_set_reset_tolerant(desc, false);
+   if (status < 0) {
gpiod_free(desc);
-   else
-   set_bit(FLAG_SYSFS, >flags);
+   } else {
+   status = gpiod_export(desc, true);
+   if (status < 0)
+   gpiod_free(desc);
+   else
+   set_bit(FLAG_SYSFS, >flags);
+   }
 
 done:
if (status)
-- 
2.11.0

[RFC PATCH 4/5] gpio: gpiolib: Add sysfs support for maintaining GPIO values on reset

2017-10-19 Thread Andrew Jeffery

Expose a new 'maintain' sysfs attribute to control both suspend and
reset tolerance.

Signed-off-by: Andrew Jeffery 
---
 Documentation/gpio/sysfs.txt |  9 +
 drivers/gpio/gpiolib-sysfs.c | 88 ++--
 2 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/Documentation/gpio/sysfs.txt b/Documentation/gpio/sysfs.txt
index aeab01aa4d00..f447f0746884 100644
--- a/Documentation/gpio/sysfs.txt
+++ b/Documentation/gpio/sysfs.txt
@@ -96,6 +96,15 @@ and have the following read/write attributes:
for "rising" and "falling" edges will follow this
setting.
 
+   "maintain" ... displays and controls whether the state of the GPIO is
+   maintained or lost on suspend or reset. The valid values take
+   the following meanings:
+
+   0: Do not maintain state on either suspend or reset
+   1: Maintain state for suspend only
+   2: Maintain state for reset only
+   3: Maintain state for both suspend and reset
+
 GPIO controllers have paths like /sys/class/gpio/gpiochip42/ (for the
 controller implementing GPIOs starting at #42) and have the following
 read-only attributes:
diff --git a/drivers/gpio/gpiolib-sysfs.c b/drivers/gpio/gpiolib-sysfs.c
index 3f454eaf2101..bfa186e73e26 100644
--- a/drivers/gpio/gpiolib-sysfs.c
+++ b/drivers/gpio/gpiolib-sysfs.c
@@ -289,6 +289,74 @@ static ssize_t edge_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(edge);
 
+#define GPIOLIB_SYSFS_MAINTAIN_SUSPEND BIT(0)
+#define GPIOLIB_SYSFS_MAINTAIN_RESET   BIT(1)
+#define GPIOLIB_SYSFS_MAINTAIN_ALL GENMASK(1, 0)
+static ssize_t maintain_show(struct device *dev, struct device_attribute *attr,
+char *buf)
+{
+   struct gpiod_data *data = dev_get_drvdata(dev);
+   ssize_t status = 0;
+   int val = 0;
+
+   mutex_lock(>mutex);
+
+   if (!test_bit(FLAG_SLEEP_MAY_LOSE_VALUE, >desc->flags))
+   val |= GPIOLIB_SYSFS_MAINTAIN_SUSPEND;
+
+   if (test_bit(FLAG_RESET_TOLERANT, >desc->flags))
+   val |= GPIOLIB_SYSFS_MAINTAIN_RESET;
+
+   status = sprintf(buf, "%d\n", val);
+
+   mutex_unlock(>mutex);
+
+   return status;
+}
+
+static ssize_t maintain_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf,
+ size_t size)
+{
+   struct gpiod_data *data = dev_get_drvdata(dev);
+   struct gpio_chip *chip;
+   ssize_t status;
+   long provided;
+
+   mutex_lock(>mutex);
+
+   chip = data->desc->gdev->chip;
+
+   if (!chip->set_config)
+   return -ENOTSUPP;
+
+   status = kstrtol(buf, 0, );
+   if (status < 0)
+   goto out;
+
+   if (provided & ~GPIOLIB_SYSFS_MAINTAIN_ALL) {
+   status = -EINVAL;
+   goto out;
+   }
+
+   if (!(provided & GPIOLIB_SYSFS_MAINTAIN_SUSPEND))
+   set_bit(FLAG_SLEEP_MAY_LOSE_VALUE, >desc->flags);
+   else
+   clear_bit(FLAG_SLEEP_MAY_LOSE_VALUE,
+ >desc->flags);
+
+   /* Configure reset tolerance */
+   status = gpiod_set_reset_tolerant(data->desc,
+   !!(provided & GPIOLIB_SYSFS_MAINTAIN_RESET));
+out:
+   mutex_unlock(>mutex);
+
+   return status ? : size;
+
+}
+static DEVICE_ATTR_RW(maintain);
+
 /* Caller holds gpiod-data mutex. */
 static int gpio_sysfs_set_active_low(struct device *dev, int value)
 {
@@ -378,6 +446,7 @@ static struct attribute *gpio_attrs[] = {
_attr_edge.attr,
_attr_value.attr,
_attr_active_low.attr,
+   _attr_maintain.attr,
NULL,
 };
 
@@ -474,11 +543,22 @@ static ssize_t export_store(struct class *class,
status = -ENODEV;
goto done;
}
-   status = gpiod_export(desc, true);
-   if (status < 0)
+
+   /*
+* If userspace is requesting the GPIO via sysfs, make them explicitly
+* configure reset tolerance each time by unconditionally disabling it
+* here, as the export and configuration steps are not atomic.
+*/
+   status = gpiod_set_reset_tolerant(desc, false);
+   if (status < 0) {
gpiod_free(desc);
-   else
-   set_bit(FLAG_SYSFS, >flags);
+   } else {
+   status = gpiod_export(desc, true);
+   if (status < 0)
+   gpiod_free(desc);
+   else
+   set_bit(FLAG_SYSFS, >flags);
+   }
 
 done:
if (status)
-- 
2.11.0

[RFC PATCH 5/5] gpio: aspeed: Add support for reset tolerance

2017-10-19 Thread Andrew Jeffery

Use the new pinconf parameter for reset tolerance to expose the
associated capability of the Aspeed GPIO controller.

Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpio-aspeed.c | 39 +--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-aspeed.c b/drivers/gpio/gpio-aspeed.c
index bfc53995064a..0492cd917178 100644
--- a/drivers/gpio/gpio-aspeed.c
+++ b/drivers/gpio/gpio-aspeed.c
@@ -60,6 +60,7 @@ struct aspeed_gpio_bank {
uint16_tval_regs;
uint16_tirq_regs;
uint16_tdebounce_regs;
+   uint16_ttolerance_regs;
const char  names[4][3];
 };
 
@@ -70,48 +71,56 @@ static const struct aspeed_gpio_bank aspeed_gpio_banks[] = {
.val_regs = 0x,
.irq_regs = 0x0008,
.debounce_regs = 0x0040,
+   .tolerance_regs = 0x001c,
.names = { "A", "B", "C", "D" },
},
{
.val_regs = 0x0020,
.irq_regs = 0x0028,
.debounce_regs = 0x0048,
+   .tolerance_regs = 0x003c,
.names = { "E", "F", "G", "H" },
},
{
.val_regs = 0x0070,
.irq_regs = 0x0098,
.debounce_regs = 0x00b0,
+   .tolerance_regs = 0x00ac,
.names = { "I", "J", "K", "L" },
},
{
.val_regs = 0x0078,
.irq_regs = 0x00e8,
.debounce_regs = 0x0100,
+   .tolerance_regs = 0x00fc,
.names = { "M", "N", "O", "P" },
},
{
.val_regs = 0x0080,
.irq_regs = 0x0118,
.debounce_regs = 0x0130,
+   .tolerance_regs = 0x012c,
.names = { "Q", "R", "S", "T" },
},
{
.val_regs = 0x0088,
.irq_regs = 0x0148,
.debounce_regs = 0x0160,
+   .tolerance_regs = 0x015c,
.names = { "U", "V", "W", "X" },
},
{
.val_regs = 0x01E0,
.irq_regs = 0x0178,
.debounce_regs = 0x0190,
+   .tolerance_regs = 0x018c,
.names = { "Y", "Z", "AA", "AB" },
},
{
-   .val_regs = 0x01E8,
-   .irq_regs = 0x01A8,
+   .val_regs = 0x01e8,
+   .irq_regs = 0x01a8,
.debounce_regs = 0x01c0,
+   .tolerance_regs = 0x01bc,
.names = { "AC", "", "", "" },
},
 };
@@ -531,6 +540,30 @@ static int aspeed_gpio_setup_irqs(struct aspeed_gpio *gpio,
return 0;
 }
 
+static int aspeed_gpio_reset_tolerance(struct gpio_chip *chip,
+   unsigned int offset, bool enable)
+{
+   struct aspeed_gpio *gpio = gpiochip_get_data(chip);
+   const struct aspeed_gpio_bank *bank;
+   unsigned long flags;
+   u32 val;
+
+   bank = to_bank(offset);
+
+   spin_lock_irqsave(>lock, flags);
+   val = readl(gpio->base + bank->tolerance_regs);
+
+   if (enable)
+   val |= GPIO_BIT(offset);
+   else
+   val &= ~GPIO_BIT(offset);
+
+   writel(val, gpio->base + bank->tolerance_regs);
+   spin_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
+
 static int aspeed_gpio_request(struct gpio_chip *chip, unsigned int offset)
 {
if (!have_gpio(gpiochip_get_data(chip), offset))
@@ -768,6 +801,8 @@ static int aspeed_gpio_set_config(struct gpio_chip *chip, 
unsigned int offset,
param == PIN_CONFIG_DRIVE_OPEN_SOURCE)
/* Return -ENOTSUPP to trigger emulation, as per datasheet */
return -ENOTSUPP;
+   else if (param == PIN_CONFIG_RESET_TOLERANT)
+   return aspeed_gpio_reset_tolerance(chip, offset, arg);
 
return -ENOTSUPP;
 }
-- 
2.11.0

[RFC PATCH 5/5] gpio: aspeed: Add support for reset tolerance

2017-10-19 Thread Andrew Jeffery

Use the new pinconf parameter for reset tolerance to expose the
associated capability of the Aspeed GPIO controller.

Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpio-aspeed.c | 39 +--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-aspeed.c b/drivers/gpio/gpio-aspeed.c
index bfc53995064a..0492cd917178 100644
--- a/drivers/gpio/gpio-aspeed.c
+++ b/drivers/gpio/gpio-aspeed.c
@@ -60,6 +60,7 @@ struct aspeed_gpio_bank {
uint16_tval_regs;
uint16_tirq_regs;
uint16_tdebounce_regs;
+   uint16_ttolerance_regs;
const char  names[4][3];
 };
 
@@ -70,48 +71,56 @@ static const struct aspeed_gpio_bank aspeed_gpio_banks[] = {
.val_regs = 0x,
.irq_regs = 0x0008,
.debounce_regs = 0x0040,
+   .tolerance_regs = 0x001c,
.names = { "A", "B", "C", "D" },
},
{
.val_regs = 0x0020,
.irq_regs = 0x0028,
.debounce_regs = 0x0048,
+   .tolerance_regs = 0x003c,
.names = { "E", "F", "G", "H" },
},
{
.val_regs = 0x0070,
.irq_regs = 0x0098,
.debounce_regs = 0x00b0,
+   .tolerance_regs = 0x00ac,
.names = { "I", "J", "K", "L" },
},
{
.val_regs = 0x0078,
.irq_regs = 0x00e8,
.debounce_regs = 0x0100,
+   .tolerance_regs = 0x00fc,
.names = { "M", "N", "O", "P" },
},
{
.val_regs = 0x0080,
.irq_regs = 0x0118,
.debounce_regs = 0x0130,
+   .tolerance_regs = 0x012c,
.names = { "Q", "R", "S", "T" },
},
{
.val_regs = 0x0088,
.irq_regs = 0x0148,
.debounce_regs = 0x0160,
+   .tolerance_regs = 0x015c,
.names = { "U", "V", "W", "X" },
},
{
.val_regs = 0x01E0,
.irq_regs = 0x0178,
.debounce_regs = 0x0190,
+   .tolerance_regs = 0x018c,
.names = { "Y", "Z", "AA", "AB" },
},
{
-   .val_regs = 0x01E8,
-   .irq_regs = 0x01A8,
+   .val_regs = 0x01e8,
+   .irq_regs = 0x01a8,
.debounce_regs = 0x01c0,
+   .tolerance_regs = 0x01bc,
.names = { "AC", "", "", "" },
},
 };
@@ -531,6 +540,30 @@ static int aspeed_gpio_setup_irqs(struct aspeed_gpio *gpio,
return 0;
 }
 
+static int aspeed_gpio_reset_tolerance(struct gpio_chip *chip,
+   unsigned int offset, bool enable)
+{
+   struct aspeed_gpio *gpio = gpiochip_get_data(chip);
+   const struct aspeed_gpio_bank *bank;
+   unsigned long flags;
+   u32 val;
+
+   bank = to_bank(offset);
+
+   spin_lock_irqsave(>lock, flags);
+   val = readl(gpio->base + bank->tolerance_regs);
+
+   if (enable)
+   val |= GPIO_BIT(offset);
+   else
+   val &= ~GPIO_BIT(offset);
+
+   writel(val, gpio->base + bank->tolerance_regs);
+   spin_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
+
 static int aspeed_gpio_request(struct gpio_chip *chip, unsigned int offset)
 {
if (!have_gpio(gpiochip_get_data(chip), offset))
@@ -768,6 +801,8 @@ static int aspeed_gpio_set_config(struct gpio_chip *chip, 
unsigned int offset,
param == PIN_CONFIG_DRIVE_OPEN_SOURCE)
/* Return -ENOTSUPP to trigger emulation, as per datasheet */
return -ENOTSUPP;
+   else if (param == PIN_CONFIG_RESET_TOLERANT)
+   return aspeed_gpio_reset_tolerance(chip, offset, arg);
 
return -ENOTSUPP;
 }
-- 
2.11.0

[RFC PATCH 3/5] gpio: gpiolib: Add chardev support for maintaining GPIO values on reset

2017-10-19 Thread Andrew Jeffery

Similar to devicetree support, add flags and mappings to expose reset
tolerance configuration through the chardev interface.

Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpiolib.c| 14 +-
 include/uapi/linux/gpio.h | 11 ++-
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 6b4c5df10e84..442ee5ceee08 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -357,7 +357,8 @@ struct linehandle_state {
GPIOHANDLE_REQUEST_OUTPUT | \
GPIOHANDLE_REQUEST_ACTIVE_LOW | \
GPIOHANDLE_REQUEST_OPEN_DRAIN | \
-   GPIOHANDLE_REQUEST_OPEN_SOURCE)
+   GPIOHANDLE_REQUEST_OPEN_SOURCE | \
+   GPIOHANDLE_REQUEST_RESET_TOLERANT)
 
 static long linehandle_ioctl(struct file *filep, unsigned int cmd,
 unsigned long arg)
@@ -498,6 +499,17 @@ static int linehandle_create(struct gpio_device *gdev, 
void __user *ip)
set_bit(FLAG_OPEN_SOURCE, >flags);
 
/*
+* Unconditionally configure reset tolerance, as it's possible
+* that the tolerance flag itself becomes tolerant to resets.
+* Thus it could remain set from a previous environment, but
+* the current environment may not expect it so.
+*/
+   ret = gpiod_set_reset_tolerant(desc,
+   !!(lflags & GPIOHANDLE_REQUEST_RESET_TOLERANT));
+   if (ret < 0)
+   goto out_free_descs;
+
+   /*
 * Lines have to be requested explicitly for input
 * or output, else the line will be treated "as is".
 */
diff --git a/include/uapi/linux/gpio.h b/include/uapi/linux/gpio.h
index 333d3544c964..1b1ce1af8653 100644
--- a/include/uapi/linux/gpio.h
+++ b/include/uapi/linux/gpio.h
@@ -56,11 +56,12 @@ struct gpioline_info {
 #define GPIOHANDLES_MAX 64
 
 /* Linerequest flags */
-#define GPIOHANDLE_REQUEST_INPUT   (1UL << 0)
-#define GPIOHANDLE_REQUEST_OUTPUT  (1UL << 1)
-#define GPIOHANDLE_REQUEST_ACTIVE_LOW  (1UL << 2)
-#define GPIOHANDLE_REQUEST_OPEN_DRAIN  (1UL << 3)
-#define GPIOHANDLE_REQUEST_OPEN_SOURCE (1UL << 4)
+#define GPIOHANDLE_REQUEST_INPUT   (1UL << 0)
+#define GPIOHANDLE_REQUEST_OUTPUT  (1UL << 1)
+#define GPIOHANDLE_REQUEST_ACTIVE_LOW  (1UL << 2)
+#define GPIOHANDLE_REQUEST_OPEN_DRAIN  (1UL << 3)
+#define GPIOHANDLE_REQUEST_OPEN_SOURCE (1UL << 4)
+#define GPIOHANDLE_REQUEST_RESET_TOLERANT  (1UL << 5)
 
 /**
  * struct gpiohandle_request - Information about a GPIO handle request
-- 
2.11.0

[RFC PATCH 2/5] gpio: gpiolib: Add OF support for maintaining GPIO values on reset

2017-10-19 Thread Andrew Jeffery

Add flags and the associated flag mappings between interfaces to enable
GPIO reset tolerance to be specified via devicetree.

Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpiolib-of.c   | 2 ++
 drivers/gpio/gpiolib.c  | 5 +
 include/dt-bindings/gpio/gpio.h | 4 
 include/linux/of_gpio.h | 1 +
 4 files changed, 12 insertions(+)

diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c
index e0d59e61b52f..4a268ba52998 100644
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -155,6 +155,8 @@ struct gpio_desc *of_find_gpio(struct device *dev, const 
char *con_id,
 
if (of_flags & OF_GPIO_SLEEP_MAY_LOSE_VALUE)
*flags |= GPIO_SLEEP_MAY_LOSE_VALUE;
+   if (of_flags & OF_GPIO_RESET_TOLERANT)
+   *flags |= GPIO_RESET_TOLERANT;
 
return desc;
 }
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index d9dc7e588699..6b4c5df10e84 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -3434,6 +3434,7 @@ struct gpio_desc *fwnode_get_named_gpiod(struct 
fwnode_handle *fwnode,
bool active_low = false;
bool single_ended = false;
bool open_drain = false;
+   bool reset_tolerant = false;
int ret;
 
if (!fwnode)
@@ -3448,6 +3449,7 @@ struct gpio_desc *fwnode_get_named_gpiod(struct 
fwnode_handle *fwnode,
active_low = flags & OF_GPIO_ACTIVE_LOW;
single_ended = flags & OF_GPIO_SINGLE_ENDED;
open_drain = flags & OF_GPIO_OPEN_DRAIN;
+   reset_tolerant = flags & OF_GPIO_RESET_TOLERANT;
}
} else if (is_acpi_node(fwnode)) {
struct acpi_gpio_info info;
@@ -3478,6 +3480,9 @@ struct gpio_desc *fwnode_get_named_gpiod(struct 
fwnode_handle *fwnode,
lflags |= GPIO_OPEN_SOURCE;
}
 
+   if (reset_tolerant)
+   lflags |= GPIO_RESET_TOLERANT;
+
ret = gpiod_configure_flags(desc, propname, lflags, dflags);
if (ret < 0) {
gpiod_put(desc);
diff --git a/include/dt-bindings/gpio/gpio.h b/include/dt-bindings/gpio/gpio.h
index 70de5b7a6c9b..01c75d9e308e 100644
--- a/include/dt-bindings/gpio/gpio.h
+++ b/include/dt-bindings/gpio/gpio.h
@@ -32,4 +32,8 @@
 #define GPIO_SLEEP_MAINTAIN_VALUE 0
 #define GPIO_SLEEP_MAY_LOSE_VALUE 8
 
+/* Bit 4 express GPIO persistence on reset */
+#define GPIO_RESET_INTOLERANT 0
+#define GPIO_RESET_TOLERANT 16
+
 #endif
diff --git a/include/linux/of_gpio.h b/include/linux/of_gpio.h
index 1fe205582111..9b34737706a7 100644
--- a/include/linux/of_gpio.h
+++ b/include/linux/of_gpio.h
@@ -32,6 +32,7 @@ enum of_gpio_flags {
OF_GPIO_SINGLE_ENDED = 0x2,
OF_GPIO_OPEN_DRAIN = 0x4,
OF_GPIO_SLEEP_MAY_LOSE_VALUE = 0x8,
+   OF_GPIO_RESET_TOLERANT = 0x16,
 };
 
 #ifdef CONFIG_OF_GPIO
-- 
2.11.0

[RFC PATCH 3/5] gpio: gpiolib: Add chardev support for maintaining GPIO values on reset

2017-10-19 Thread Andrew Jeffery

Similar to devicetree support, add flags and mappings to expose reset
tolerance configuration through the chardev interface.

Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpiolib.c| 14 +-
 include/uapi/linux/gpio.h | 11 ++-
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 6b4c5df10e84..442ee5ceee08 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -357,7 +357,8 @@ struct linehandle_state {
GPIOHANDLE_REQUEST_OUTPUT | \
GPIOHANDLE_REQUEST_ACTIVE_LOW | \
GPIOHANDLE_REQUEST_OPEN_DRAIN | \
-   GPIOHANDLE_REQUEST_OPEN_SOURCE)
+   GPIOHANDLE_REQUEST_OPEN_SOURCE | \
+   GPIOHANDLE_REQUEST_RESET_TOLERANT)
 
 static long linehandle_ioctl(struct file *filep, unsigned int cmd,
 unsigned long arg)
@@ -498,6 +499,17 @@ static int linehandle_create(struct gpio_device *gdev, 
void __user *ip)
set_bit(FLAG_OPEN_SOURCE, >flags);
 
/*
+* Unconditionally configure reset tolerance, as it's possible
+* that the tolerance flag itself becomes tolerant to resets.
+* Thus it could remain set from a previous environment, but
+* the current environment may not expect it so.
+*/
+   ret = gpiod_set_reset_tolerant(desc,
+   !!(lflags & GPIOHANDLE_REQUEST_RESET_TOLERANT));
+   if (ret < 0)
+   goto out_free_descs;
+
+   /*
 * Lines have to be requested explicitly for input
 * or output, else the line will be treated "as is".
 */
diff --git a/include/uapi/linux/gpio.h b/include/uapi/linux/gpio.h
index 333d3544c964..1b1ce1af8653 100644
--- a/include/uapi/linux/gpio.h
+++ b/include/uapi/linux/gpio.h
@@ -56,11 +56,12 @@ struct gpioline_info {
 #define GPIOHANDLES_MAX 64
 
 /* Linerequest flags */
-#define GPIOHANDLE_REQUEST_INPUT   (1UL << 0)
-#define GPIOHANDLE_REQUEST_OUTPUT  (1UL << 1)
-#define GPIOHANDLE_REQUEST_ACTIVE_LOW  (1UL << 2)
-#define GPIOHANDLE_REQUEST_OPEN_DRAIN  (1UL << 3)
-#define GPIOHANDLE_REQUEST_OPEN_SOURCE (1UL << 4)
+#define GPIOHANDLE_REQUEST_INPUT   (1UL << 0)
+#define GPIOHANDLE_REQUEST_OUTPUT  (1UL << 1)
+#define GPIOHANDLE_REQUEST_ACTIVE_LOW  (1UL << 2)
+#define GPIOHANDLE_REQUEST_OPEN_DRAIN  (1UL << 3)
+#define GPIOHANDLE_REQUEST_OPEN_SOURCE (1UL << 4)
+#define GPIOHANDLE_REQUEST_RESET_TOLERANT  (1UL << 5)
 
 /**
  * struct gpiohandle_request - Information about a GPIO handle request
-- 
2.11.0

[RFC PATCH 2/5] gpio: gpiolib: Add OF support for maintaining GPIO values on reset

2017-10-19 Thread Andrew Jeffery

Add flags and the associated flag mappings between interfaces to enable
GPIO reset tolerance to be specified via devicetree.

Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpiolib-of.c   | 2 ++
 drivers/gpio/gpiolib.c  | 5 +
 include/dt-bindings/gpio/gpio.h | 4 
 include/linux/of_gpio.h | 1 +
 4 files changed, 12 insertions(+)

diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c
index e0d59e61b52f..4a268ba52998 100644
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -155,6 +155,8 @@ struct gpio_desc *of_find_gpio(struct device *dev, const 
char *con_id,
 
if (of_flags & OF_GPIO_SLEEP_MAY_LOSE_VALUE)
*flags |= GPIO_SLEEP_MAY_LOSE_VALUE;
+   if (of_flags & OF_GPIO_RESET_TOLERANT)
+   *flags |= GPIO_RESET_TOLERANT;
 
return desc;
 }
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index d9dc7e588699..6b4c5df10e84 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -3434,6 +3434,7 @@ struct gpio_desc *fwnode_get_named_gpiod(struct 
fwnode_handle *fwnode,
bool active_low = false;
bool single_ended = false;
bool open_drain = false;
+   bool reset_tolerant = false;
int ret;
 
if (!fwnode)
@@ -3448,6 +3449,7 @@ struct gpio_desc *fwnode_get_named_gpiod(struct 
fwnode_handle *fwnode,
active_low = flags & OF_GPIO_ACTIVE_LOW;
single_ended = flags & OF_GPIO_SINGLE_ENDED;
open_drain = flags & OF_GPIO_OPEN_DRAIN;
+   reset_tolerant = flags & OF_GPIO_RESET_TOLERANT;
}
} else if (is_acpi_node(fwnode)) {
struct acpi_gpio_info info;
@@ -3478,6 +3480,9 @@ struct gpio_desc *fwnode_get_named_gpiod(struct 
fwnode_handle *fwnode,
lflags |= GPIO_OPEN_SOURCE;
}
 
+   if (reset_tolerant)
+   lflags |= GPIO_RESET_TOLERANT;
+
ret = gpiod_configure_flags(desc, propname, lflags, dflags);
if (ret < 0) {
gpiod_put(desc);
diff --git a/include/dt-bindings/gpio/gpio.h b/include/dt-bindings/gpio/gpio.h
index 70de5b7a6c9b..01c75d9e308e 100644
--- a/include/dt-bindings/gpio/gpio.h
+++ b/include/dt-bindings/gpio/gpio.h
@@ -32,4 +32,8 @@
 #define GPIO_SLEEP_MAINTAIN_VALUE 0
 #define GPIO_SLEEP_MAY_LOSE_VALUE 8
 
+/* Bit 4 express GPIO persistence on reset */
+#define GPIO_RESET_INTOLERANT 0
+#define GPIO_RESET_TOLERANT 16
+
 #endif
diff --git a/include/linux/of_gpio.h b/include/linux/of_gpio.h
index 1fe205582111..9b34737706a7 100644
--- a/include/linux/of_gpio.h
+++ b/include/linux/of_gpio.h
@@ -32,6 +32,7 @@ enum of_gpio_flags {
OF_GPIO_SINGLE_ENDED = 0x2,
OF_GPIO_OPEN_DRAIN = 0x4,
OF_GPIO_SLEEP_MAY_LOSE_VALUE = 0x8,
+   OF_GPIO_RESET_TOLERANT = 0x16,
 };
 
 #ifdef CONFIG_OF_GPIO
-- 
2.11.0

[RFC PATCH 1/5] gpio: gpiolib: Add core support for maintaining GPIO values on reset

2017-10-19 Thread Andrew Jeffery

GPIO state reset tolerance is implemented in gpiolib through the
addition of a new pinconf parameter. With that, some renaming of helpers
is done to clarify the scope of the already existing
gpiochip_line_is_persistent(), as it's now ambiguous as to whether that
means on suspend, reset or both. This in-turn impacts gpio-arizona, but
not in any complicated way.

This change lays the groundwork for implementing reset tolerance support
in all of the external interfaces that can influence GPIOs.

Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpio-arizona.c |  4 +--
 drivers/gpio/gpiolib.c  | 55 +++--
 drivers/gpio/gpiolib.h  |  1 +
 include/linux/gpio/consumer.h   |  9 ++
 include/linux/gpio/driver.h |  5 ++-
 include/linux/gpio/machine.h|  2 ++
 include/linux/pinctrl/pinconf-generic.h |  2 ++
 7 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/drivers/gpio/gpio-arizona.c b/drivers/gpio/gpio-arizona.c
index d4e6ba0301bc..d3fe23569811 100644
--- a/drivers/gpio/gpio-arizona.c
+++ b/drivers/gpio/gpio-arizona.c
@@ -33,7 +33,7 @@ static int arizona_gpio_direction_in(struct gpio_chip *chip, 
unsigned offset)
 {
struct arizona_gpio *arizona_gpio = gpiochip_get_data(chip);
struct arizona *arizona = arizona_gpio->arizona;
-   bool persistent = gpiochip_line_is_persistent(chip, offset);
+   bool persistent = gpiochip_line_is_persistent_suspend(chip, offset);
bool change;
int ret;
 
@@ -99,7 +99,7 @@ static int arizona_gpio_direction_out(struct gpio_chip *chip,
 {
struct arizona_gpio *arizona_gpio = gpiochip_get_data(chip);
struct arizona *arizona = arizona_gpio->arizona;
-   bool persistent = gpiochip_line_is_persistent(chip, offset);
+   bool persistent = gpiochip_line_is_persistent_suspend(chip, offset);
unsigned int val;
int ret;
 
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index a56b29fd8bb1..d9dc7e588699 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2414,6 +2414,40 @@ int gpiod_set_debounce(struct gpio_desc *desc, unsigned 
debounce)
 EXPORT_SYMBOL_GPL(gpiod_set_debounce);
 
 /**
+ * gpiod_set_reset_tolerant - Hold state across controller reset
+ * @desc: descriptor of the GPIO for which to set debounce time
+ * @tolerant: True to hold state across a controller reset, false otherwise
+ *
+ * Returns:
+ * 0 on success, %-ENOTSUPP if the controller doesn't support setting the
+ * reset tolerance or less than zero on other failures.
+ */
+int gpiod_set_reset_tolerant(struct gpio_desc *desc, bool tolerant)
+{
+   struct gpio_chip *chip;
+   unsigned long packed;
+   int rc;
+
+   chip = desc->gdev->chip;
+   if (!chip->set_config)
+   return -ENOTSUPP;
+
+   packed = pinconf_to_config_packed(PIN_CONFIG_RESET_TOLERANT, tolerant);
+
+   rc = chip->set_config(chip, gpio_chip_hwgpio(desc), packed);
+   if (rc < 0)
+   return rc;
+
+   if (tolerant)
+   set_bit(FLAG_RESET_TOLERANT, >flags);
+   else
+   clear_bit(FLAG_RESET_TOLERANT, >flags);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(gpiod_set_reset_tolerant);
+
+/**
  * gpiod_is_active_low - test whether a GPIO is active-low or not
  * @desc: the gpio descriptor to test
  *
@@ -2885,7 +2919,8 @@ bool gpiochip_line_is_open_source(struct gpio_chip *chip, 
unsigned int offset)
 }
 EXPORT_SYMBOL_GPL(gpiochip_line_is_open_source);
 
-bool gpiochip_line_is_persistent(struct gpio_chip *chip, unsigned int offset)
+bool gpiochip_line_is_persistent_suspend(struct gpio_chip *chip,
+unsigned int offset)
 {
if (offset >= chip->ngpio)
return false;
@@ -2893,7 +2928,18 @@ bool gpiochip_line_is_persistent(struct gpio_chip *chip, 
unsigned int offset)
return !test_bit(FLAG_SLEEP_MAY_LOSE_VALUE,
 >gpiodev->descs[offset].flags);
 }
-EXPORT_SYMBOL_GPL(gpiochip_line_is_persistent);
+EXPORT_SYMBOL_GPL(gpiochip_line_is_persistent_suspend);
+
+bool gpiochip_line_is_persistent_reset(struct gpio_chip *chip,
+  unsigned int offset)
+{
+   if (offset >= chip->ngpio)
+   return false;
+
+   return test_bit(FLAG_RESET_TOLERANT,
+   >gpiodev->descs[offset].flags);
+}
+EXPORT_SYMBOL_GPL(gpiochip_line_is_persistent_reset);
 
 /**
  * gpiod_get_raw_value_cansleep() - return a gpio's raw value
@@ -3271,6 +3317,11 @@ int gpiod_configure_flags(struct gpio_desc *desc, const 
char *con_id,
if (lflags & GPIO_SLEEP_MAY_LOSE_VALUE)
set_bit(FLAG_SLEEP_MAY_LOSE_VALUE, >flags);
 
+   status = gpiod_set_reset_tolerant(desc,
+ !!(lflags & GPIO_RESET_TOLERANT));
+   if (status < 0)
+   return status;
+
/* No

[RFC PATCH 1/5] gpio: gpiolib: Add core support for maintaining GPIO values on reset

2017-10-19 Thread Andrew Jeffery

GPIO state reset tolerance is implemented in gpiolib through the
addition of a new pinconf parameter. With that, some renaming of helpers
is done to clarify the scope of the already existing
gpiochip_line_is_persistent(), as it's now ambiguous as to whether that
means on suspend, reset or both. This in-turn impacts gpio-arizona, but
not in any complicated way.

This change lays the groundwork for implementing reset tolerance support
in all of the external interfaces that can influence GPIOs.

Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpio-arizona.c |  4 +--
 drivers/gpio/gpiolib.c  | 55 +++--
 drivers/gpio/gpiolib.h  |  1 +
 include/linux/gpio/consumer.h   |  9 ++
 include/linux/gpio/driver.h |  5 ++-
 include/linux/gpio/machine.h|  2 ++
 include/linux/pinctrl/pinconf-generic.h |  2 ++
 7 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/drivers/gpio/gpio-arizona.c b/drivers/gpio/gpio-arizona.c
index d4e6ba0301bc..d3fe23569811 100644
--- a/drivers/gpio/gpio-arizona.c
+++ b/drivers/gpio/gpio-arizona.c
@@ -33,7 +33,7 @@ static int arizona_gpio_direction_in(struct gpio_chip *chip, 
unsigned offset)
 {
struct arizona_gpio *arizona_gpio = gpiochip_get_data(chip);
struct arizona *arizona = arizona_gpio->arizona;
-   bool persistent = gpiochip_line_is_persistent(chip, offset);
+   bool persistent = gpiochip_line_is_persistent_suspend(chip, offset);
bool change;
int ret;
 
@@ -99,7 +99,7 @@ static int arizona_gpio_direction_out(struct gpio_chip *chip,
 {
struct arizona_gpio *arizona_gpio = gpiochip_get_data(chip);
struct arizona *arizona = arizona_gpio->arizona;
-   bool persistent = gpiochip_line_is_persistent(chip, offset);
+   bool persistent = gpiochip_line_is_persistent_suspend(chip, offset);
unsigned int val;
int ret;
 
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index a56b29fd8bb1..d9dc7e588699 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2414,6 +2414,40 @@ int gpiod_set_debounce(struct gpio_desc *desc, unsigned 
debounce)
 EXPORT_SYMBOL_GPL(gpiod_set_debounce);
 
 /**
+ * gpiod_set_reset_tolerant - Hold state across controller reset
+ * @desc: descriptor of the GPIO for which to set debounce time
+ * @tolerant: True to hold state across a controller reset, false otherwise
+ *
+ * Returns:
+ * 0 on success, %-ENOTSUPP if the controller doesn't support setting the
+ * reset tolerance or less than zero on other failures.
+ */
+int gpiod_set_reset_tolerant(struct gpio_desc *desc, bool tolerant)
+{
+   struct gpio_chip *chip;
+   unsigned long packed;
+   int rc;
+
+   chip = desc->gdev->chip;
+   if (!chip->set_config)
+   return -ENOTSUPP;
+
+   packed = pinconf_to_config_packed(PIN_CONFIG_RESET_TOLERANT, tolerant);
+
+   rc = chip->set_config(chip, gpio_chip_hwgpio(desc), packed);
+   if (rc < 0)
+   return rc;
+
+   if (tolerant)
+   set_bit(FLAG_RESET_TOLERANT, >flags);
+   else
+   clear_bit(FLAG_RESET_TOLERANT, >flags);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(gpiod_set_reset_tolerant);
+
+/**
  * gpiod_is_active_low - test whether a GPIO is active-low or not
  * @desc: the gpio descriptor to test
  *
@@ -2885,7 +2919,8 @@ bool gpiochip_line_is_open_source(struct gpio_chip *chip, 
unsigned int offset)
 }
 EXPORT_SYMBOL_GPL(gpiochip_line_is_open_source);
 
-bool gpiochip_line_is_persistent(struct gpio_chip *chip, unsigned int offset)
+bool gpiochip_line_is_persistent_suspend(struct gpio_chip *chip,
+unsigned int offset)
 {
if (offset >= chip->ngpio)
return false;
@@ -2893,7 +2928,18 @@ bool gpiochip_line_is_persistent(struct gpio_chip *chip, 
unsigned int offset)
return !test_bit(FLAG_SLEEP_MAY_LOSE_VALUE,
 >gpiodev->descs[offset].flags);
 }
-EXPORT_SYMBOL_GPL(gpiochip_line_is_persistent);
+EXPORT_SYMBOL_GPL(gpiochip_line_is_persistent_suspend);
+
+bool gpiochip_line_is_persistent_reset(struct gpio_chip *chip,
+  unsigned int offset)
+{
+   if (offset >= chip->ngpio)
+   return false;
+
+   return test_bit(FLAG_RESET_TOLERANT,
+   >gpiodev->descs[offset].flags);
+}
+EXPORT_SYMBOL_GPL(gpiochip_line_is_persistent_reset);
 
 /**
  * gpiod_get_raw_value_cansleep() - return a gpio's raw value
@@ -3271,6 +3317,11 @@ int gpiod_configure_flags(struct gpio_desc *desc, const 
char *con_id,
if (lflags & GPIO_SLEEP_MAY_LOSE_VALUE)
set_bit(FLAG_SLEEP_MAY_LOSE_VALUE, >flags);
 
+   status = gpiod_set_reset_tolerant(desc,
+ !!(lflags & GPIO_RESET_TOLERANT));
+   if (status < 0)
+   return status;
+
/* No particular flag request,

[RFC PATCH 0/5] gpio: Expose reset tolerance capability

2017-10-19 Thread Andrew Jeffery

Hello,

This series exposes a "reset tolerant" property for GPIOs. For example, the
controller implemented in Aspeed BMCs provides such a feature to allow the BMC
to be reset whilst maintaining necessary state to keep host systems alive or
status LEDs in-tact.

I'm sending it as an RFC because I'm not sure using pinconf is the right way
to go about it, or that expanding the sysfs interface is a good idea, or that
the approach taken is right in the context of the existing suspend support.
pinconf just ended up being a convenient abstraction whilst supporting the
sysfs change, and didn't feel unreasonable to use for devicetree or the chardev
interface either. My concern with using pinconf is that the reset-tolerant
property is (currently) GPIO-centric, but maybe that's not a worry.

So the patches in the series support configuring the property via devicetree,
the chardev interface and the sysfs interface. The sysfs interface also exposes
the ability to configure the suspend tolerance, though there are some ordering
requirements with respect to setting the direction (the suspend tolerance will
only take if configured before setting the pin direction on the Arizona
controller).

Please review!

Cheers,

Andrew

Andrew Jeffery (5):
  gpio: gpiolib: Add core support for maintaining GPIO values on reset
  gpio: gpiolib: Add OF support for maintaining GPIO values on reset
  gpio: gpiolib: Add chardev support for maintaining GPIO values on
reset
  gpio: gpiolib: Add sysfs support for maintaining GPIO values on reset
  gpio: aspeed: Add support for reset tolerance

 Documentation/gpio/sysfs.txt|  9 
 drivers/gpio/gpio-arizona.c |  4 +-
 drivers/gpio/gpio-aspeed.c  | 39 ++-
 drivers/gpio/gpiolib-of.c   |  2 +
 drivers/gpio/gpiolib-sysfs.c| 88 +++--
 drivers/gpio/gpiolib.c  | 74 +--
 drivers/gpio/gpiolib.h  |  1 +
 include/dt-bindings/gpio/gpio.h |  4 ++
 include/linux/gpio/consumer.h   |  9 
 include/linux/gpio/driver.h |  5 +-
 include/linux/gpio/machine.h|  2 +
 include/linux/of_gpio.h |  1 +
 include/linux/pinctrl/pinconf-generic.h |  2 +
 include/uapi/linux/gpio.h   | 11 +++--
 14 files changed, 234 insertions(+), 17 deletions(-)

-- 
2.11.0

[RFC PATCH 0/5] gpio: Expose reset tolerance capability

2017-10-19 Thread Andrew Jeffery

Hello,

This series exposes a "reset tolerant" property for GPIOs. For example, the
controller implemented in Aspeed BMCs provides such a feature to allow the BMC
to be reset whilst maintaining necessary state to keep host systems alive or
status LEDs in-tact.

I'm sending it as an RFC because I'm not sure using pinconf is the right way
to go about it, or that expanding the sysfs interface is a good idea, or that
the approach taken is right in the context of the existing suspend support.
pinconf just ended up being a convenient abstraction whilst supporting the
sysfs change, and didn't feel unreasonable to use for devicetree or the chardev
interface either. My concern with using pinconf is that the reset-tolerant
property is (currently) GPIO-centric, but maybe that's not a worry.

So the patches in the series support configuring the property via devicetree,
the chardev interface and the sysfs interface. The sysfs interface also exposes
the ability to configure the suspend tolerance, though there are some ordering
requirements with respect to setting the direction (the suspend tolerance will
only take if configured before setting the pin direction on the Arizona
controller).

Please review!

Cheers,

Andrew

Andrew Jeffery (5):
  gpio: gpiolib: Add core support for maintaining GPIO values on reset
  gpio: gpiolib: Add OF support for maintaining GPIO values on reset
  gpio: gpiolib: Add chardev support for maintaining GPIO values on
reset
  gpio: gpiolib: Add sysfs support for maintaining GPIO values on reset
  gpio: aspeed: Add support for reset tolerance

 Documentation/gpio/sysfs.txt|  9 
 drivers/gpio/gpio-arizona.c |  4 +-
 drivers/gpio/gpio-aspeed.c  | 39 ++-
 drivers/gpio/gpiolib-of.c   |  2 +
 drivers/gpio/gpiolib-sysfs.c| 88 +++--
 drivers/gpio/gpiolib.c  | 74 +--
 drivers/gpio/gpiolib.h  |  1 +
 include/dt-bindings/gpio/gpio.h |  4 ++
 include/linux/gpio/consumer.h   |  9 
 include/linux/gpio/driver.h |  5 +-
 include/linux/gpio/machine.h|  2 +
 include/linux/of_gpio.h |  1 +
 include/linux/pinctrl/pinconf-generic.h |  2 +
 include/uapi/linux/gpio.h   | 11 +++--
 14 files changed, 234 insertions(+), 17 deletions(-)

-- 
2.11.0

Re: [PATCH 0/4] kaslr: extend movable_node to movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Dou Liyang


Hi Chao,

At 10/20/2017 10:53 AM, Chao Fan wrote:

On Fri, Oct 20, 2017 at 10:37:52AM +0800, Dou Liyang wrote:

Hi Chao,


Hi Dou-san,


Cheer! I have some concerns below.


Thanks for your reply.



At 10/19/2017 06:02 PM, Chao Fan wrote:

Here is a problem:
Here is a machine with several NUMA nodes and some of them are hot-pluggable.
It's not good for kernel to be extracted in the memory region of movable node.
But in current code, I print the address choosen by kaslr and found it may be
placed in movable node sometimes. To solve this problem, it's better to limit
the memory region choosen by kaslr to immovable node in kaslr.c. But the memory
infomation about if it's hot-pluggable is stored in ACPI SRAT table, which is
parsed after kernel is extracted. So we can't get the detail memory infomation
before extracting kernel.

So extend the movable_node to movable_node=nn@ss, in which nn means
the size of memory in *immovable* node, and ss means the start position of
this memory region. Then limit kaslr choose memory in these regions.


Yes, great. Here we should remember that the situation of
'movable_node=nn@ss' is rare, normal situation is 'movable_node=nn'.

So, we should consider our code tendencies for normal situation. ;-)


Yes, it's normal. But you can not make sure the special situation will
never happen,. If it happens, we can make sure codes work well, right?

We can not make sure that the movable nodes are continuous, or even if
the movable nodes are continuous, we can not make sure the memory
address are continuous.

It is easy to avoid the memory region in movable node.
But if we can handle more special situations, and at the same time,
make kernel more safe, why not?


You misunderstand my opinions, I means that
when we code, we need to know the problem clearly and which part of
problem will often be executed.

Make our code more suitable for the normal situation without affecting 
the function of the problem.

Just like:

likely() and unlikely()

Here I guess you don't consider that. so I said that.







There are two policies:
1. Specify the memory region in *movable* node to avoid:
   Then we can use the existing mem_avoid to handle. But if the memory
   one movable node was separated by memory hole or different movable nodes
   are discontinuous, we don't know how many regions need to avoid.


It is not a problem.

As you said, we should provide an interface for users later, like that:

# cat /sys/device/system/memory/movable_node
nn@ss



Both are OK. I think outputing the memory region in movable_node or
immovable_node are both reasonable. So the interface of both methods
will be useful. And after we decided which policy used in kaslr, then
add the interface of /sys.



Actually, I prefer the first one, are you ready to post the patches
for the first policy?

Thanks,
dou.

Thanks,
Chao Fan



Thanks,
dou.

   OTOH, we must avoid all of the movable memory, otherwise, kaslr may
   choose the wrong place.
2. Specify the memory region in "immovable* node to select:
   Only support 4 regions in this parameter. Then user can use two nodes
   at least for kaslr to choose, it's enough for the kernel to extract.
   At the same time, because we need only 4 new mem_vector, the usage
   of memory here is not too big.

PATCH 1/4 parse the extended movable_node=nn[KMG]@ss[KMG], then
  store the memory regions.
PATCH 2/4 selects the memory region in immovable node when process
  memmap.
PATCH 3/4 is the change of document.
PATCH 4/4 cleans up some little problems.

Chao Fan (4):
  kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]
  kaslr: select the memory region in immovable node to process
  document: change the document for the extended movable_node
  kaslr: clean up a useless variable and some usless space

 Documentation/admin-guide/kernel-parameters.txt |   9 ++
 arch/x86/boot/compressed/kaslr.c| 140 +---
 2 files changed, 131 insertions(+), 18 deletions(-)

Re: [PATCH 0/4] kaslr: extend movable_node to movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Dou Liyang


Hi Chao,

At 10/20/2017 10:53 AM, Chao Fan wrote:

On Fri, Oct 20, 2017 at 10:37:52AM +0800, Dou Liyang wrote:

Hi Chao,


Hi Dou-san,


Cheer! I have some concerns below.


Thanks for your reply.



At 10/19/2017 06:02 PM, Chao Fan wrote:

Here is a problem:
Here is a machine with several NUMA nodes and some of them are hot-pluggable.
It's not good for kernel to be extracted in the memory region of movable node.
But in current code, I print the address choosen by kaslr and found it may be
placed in movable node sometimes. To solve this problem, it's better to limit
the memory region choosen by kaslr to immovable node in kaslr.c. But the memory
infomation about if it's hot-pluggable is stored in ACPI SRAT table, which is
parsed after kernel is extracted. So we can't get the detail memory infomation
before extracting kernel.

So extend the movable_node to movable_node=nn@ss, in which nn means
the size of memory in *immovable* node, and ss means the start position of
this memory region. Then limit kaslr choose memory in these regions.


Yes, great. Here we should remember that the situation of
'movable_node=nn@ss' is rare, normal situation is 'movable_node=nn'.

So, we should consider our code tendencies for normal situation. ;-)


Yes, it's normal. But you can not make sure the special situation will
never happen,. If it happens, we can make sure codes work well, right?

We can not make sure that the movable nodes are continuous, or even if
the movable nodes are continuous, we can not make sure the memory
address are continuous.

It is easy to avoid the memory region in movable node.
But if we can handle more special situations, and at the same time,
make kernel more safe, why not?


You misunderstand my opinions, I means that
when we code, we need to know the problem clearly and which part of
problem will often be executed.

Make our code more suitable for the normal situation without affecting 
the function of the problem.

Just like:

likely() and unlikely()

Here I guess you don't consider that. so I said that.







There are two policies:
1. Specify the memory region in *movable* node to avoid:
   Then we can use the existing mem_avoid to handle. But if the memory
   one movable node was separated by memory hole or different movable nodes
   are discontinuous, we don't know how many regions need to avoid.


It is not a problem.

As you said, we should provide an interface for users later, like that:

# cat /sys/device/system/memory/movable_node
nn@ss



Both are OK. I think outputing the memory region in movable_node or
immovable_node are both reasonable. So the interface of both methods
will be useful. And after we decided which policy used in kaslr, then
add the interface of /sys.



Actually, I prefer the first one, are you ready to post the patches
for the first policy?

Thanks,
dou.

Thanks,
Chao Fan



Thanks,
dou.

   OTOH, we must avoid all of the movable memory, otherwise, kaslr may
   choose the wrong place.
2. Specify the memory region in "immovable* node to select:
   Only support 4 regions in this parameter. Then user can use two nodes
   at least for kaslr to choose, it's enough for the kernel to extract.
   At the same time, because we need only 4 new mem_vector, the usage
   of memory here is not too big.

PATCH 1/4 parse the extended movable_node=nn[KMG]@ss[KMG], then
  store the memory regions.
PATCH 2/4 selects the memory region in immovable node when process
  memmap.
PATCH 3/4 is the change of document.
PATCH 4/4 cleans up some little problems.

Chao Fan (4):
  kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]
  kaslr: select the memory region in immovable node to process
  document: change the document for the extended movable_node
  kaslr: clean up a useless variable and some usless space

 Documentation/admin-guide/kernel-parameters.txt |   9 ++
 arch/x86/boot/compressed/kaslr.c| 140 +---
 2 files changed, 131 insertions(+), 18 deletions(-)

Re: [VFS PATCH] constify more dcache.h inlined helpers.

2017-10-19 Thread Al Viro

On Fri, Oct 20, 2017 at 11:41:17AM +1100, NeilBrown wrote:
> On Wed, Aug 02 2017, NeilBrown wrote:
> 
> > Many of the inlines in dcache.h were changed to accept
> > const struct pointers in commit f0d3b3ded999 ("constify dcache.c
> > inlined helpers where possible").
> > This patch allows 'const' in a couple that were added since then.
> >
> > Signed-off-by: NeilBrown 
> 
> Ping ... should I be sending this somewhere else?  Is there a problem
> with it?

Nothing, just slipped through the cracks; July and August had been
insane (move from nc.us to ma.us after well over a decade in one
apartment; 'nuff said...)

Applied.

Re: [VFS PATCH] constify more dcache.h inlined helpers.

2017-10-19 Thread Al Viro

On Fri, Oct 20, 2017 at 11:41:17AM +1100, NeilBrown wrote:
> On Wed, Aug 02 2017, NeilBrown wrote:
> 
> > Many of the inlines in dcache.h were changed to accept
> > const struct pointers in commit f0d3b3ded999 ("constify dcache.c
> > inlined helpers where possible").
> > This patch allows 'const' in a couple that were added since then.
> >
> > Signed-off-by: NeilBrown 
> 
> Ping ... should I be sending this somewhere else?  Is there a problem
> with it?

Nothing, just slipped through the cracks; July and August had been
insane (move from nc.us to ma.us after well over a decade in one
apartment; 'nuff said...)

Applied.

Re: [PATCH 1/4] kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Chao Fan

On Fri, Oct 20, 2017 at 11:04:42AM +0800, Dou Liyang wrote:
>Hi Chao,
>
Hi Dou-san,

>At 10/19/2017 06:02 PM, Chao Fan wrote:
>> Extend the movable_node to movable_node=nn[KMG]@ss[KMG].
>> Since in current code, kaslr may choose the memory region in hot-pluggable
>> nodes. So we can specific the region in immovable node. And store the
>> regions in immovable_mem.
>> 
>
>I guess you may mean that:
>
>In current Linux with KASLR. Kernel may choose a memory region in
>movable node for extracting kernel code, which will make the node
>can't be hot-removed.
>
>Solve it by only specific the region in immovable node. So create
>immovable_mem to store the region of movable node, and only choose
>the memory in immovable_mem array.
>

Thanks for the explaination.

>> Multiple regions can be specified, comma delimited.
>> Considering the usage of memory, only support for 4 regions.
>> 4 regions contains 2 nodes at least, and is enough for kernel to
>> extract.
>> 
>> Signed-off-by: Chao Fan 
>> ---
>>  arch/x86/boot/compressed/kaslr.c | 63 
>> +++-
>>  1 file changed, 62 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/boot/compressed/kaslr.c 
>> b/arch/x86/boot/compressed/kaslr.c
>> index 17818ba6906f..3c1f5204693b 100644
>> --- a/arch/x86/boot/compressed/kaslr.c
>> +++ b/arch/x86/boot/compressed/kaslr.c
>> @@ -107,6 +107,12 @@ enum mem_avoid_index {
>> 
>>  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>> 
>> +/* Only supporting at most 4 immovable memory regions with kaslr */
>> +#define MAX_IMMOVABLE_MEM   4
>> +
>> +static struct mem_vector immovable_mem[MAX_IMMOVABLE_MEM];
>> +static int num_immovable_region;
>> +
>>  static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
>>  {
>>  /* Item one is entirely before item two. */
>> @@ -167,6 +173,28 @@ parse_memmap(char *p, unsigned long long *start, 
>> unsigned long long *size)
>>  return -EINVAL;
>>  }
>> 
>> +static int parse_immovable_mem(char *p,
>> +   unsigned long long *start,
>> +   unsigned long long *size)
>> +{
>> +char *oldp;
>> +
>> +if (!p)
>> +return -EINVAL;
>> +
>> +oldp = p;
>> +*size = memparse(p, );
>> +if (p == oldp)
>> +return -EINVAL;
>> +
>> +if (*p == '@') {
>> +*start = memparse(p + 1, );
>> +return 0;
>> +}
>> +
>
>Here you don't consider that if @ss[KMG] is omitted.

Yes, will add. Many thanks.

>
>> +return -EINVAL;
>> +}
>> +
>>  static void mem_avoid_memmap(char *str)
>>  {
>>  static int i;
>> @@ -206,6 +234,36 @@ static void mem_avoid_memmap(char *str)
>>  memmap_too_large = true;
>>  }
>> 
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +static void mem_mark_immovable(char *str)
>> +{
>> +int i = 0;
>> +
>
>you have use num_immovable_region, 'i' is useless. just remove it.

Using num_immovable_region makes code too long. Using i will be
clear and make sure shoter than 80 characters.

>
>> +while (str && (i < MAX_IMMOVABLE_MEM)) {
>> +int rc;
>> +unsigned long long start, size;
>> +char *k = strchr(str, ',');
>> +
>
>Why do you put this definition here? IMO, moving it out is better.
>
>> +if (k)
>> +*k++ = 0;
>> +
>> +rc = parse_immovable_mem(str, , );
>> +if (rc < 0)
>> +break;
>> +str = k;
>> +
>> +immovable_mem[i].start = start;
>> +immovable_mem[i].size = size;
>> +i++;
>
>Replace it with num_immovable_region

ditto...
Why do you care this little variable so much.

>
>> +}
>> +num_immovable_region = i;
>
>Just remove it.

ditto.

>
>> +}
>> +#else
>> +static inline void mem_mark_immovable(char *str)
>> +{
>> +}
>> +#endif
>> +
>>  static int handle_mem_memmap(void)
>>  {
>>  char *args = (char *)get_cmd_line_ptr();
>> @@ -214,7 +272,8 @@ static int handle_mem_memmap(void)
>>  char *param, *val;
>>  u64 mem_size;
>> 
>> -if (!strstr(args, "memmap=") && !strstr(args, "mem="))
>> +if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
>> +!strstr(args, "movable_node="))
>>  return 0;
>> 
>>  tmp_cmdline = malloc(len + 1);
>> @@ -239,6 +298,8 @@ static int handle_mem_memmap(void)
>> 
>>  if (!strcmp(param, "memmap")) {
>>  mem_avoid_memmap(val);
>> +} else if (!strcmp(param, "movable_node")) {
>> +mem_mark_immovable(val);
>
>AFAIK, handle_mem_memmap() is invoked in mem_avoid_init(), which is used to
>avoid mem. But, here the value of immovable node is the memory
>you want to mark and use, it is better that we split it out.

There is existing and useful code, so it's better to reuse but not
re-write.

>
>BTW, Using movable_node to store the memory of immovable node is
>strange and make me confuse. How about adding a new

Re: [PATCH 1/4] kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Chao Fan

On Fri, Oct 20, 2017 at 11:04:42AM +0800, Dou Liyang wrote:
>Hi Chao,
>
Hi Dou-san,

>At 10/19/2017 06:02 PM, Chao Fan wrote:
>> Extend the movable_node to movable_node=nn[KMG]@ss[KMG].
>> Since in current code, kaslr may choose the memory region in hot-pluggable
>> nodes. So we can specific the region in immovable node. And store the
>> regions in immovable_mem.
>> 
>
>I guess you may mean that:
>
>In current Linux with KASLR. Kernel may choose a memory region in
>movable node for extracting kernel code, which will make the node
>can't be hot-removed.
>
>Solve it by only specific the region in immovable node. So create
>immovable_mem to store the region of movable node, and only choose
>the memory in immovable_mem array.
>

Thanks for the explaination.

>> Multiple regions can be specified, comma delimited.
>> Considering the usage of memory, only support for 4 regions.
>> 4 regions contains 2 nodes at least, and is enough for kernel to
>> extract.
>> 
>> Signed-off-by: Chao Fan 
>> ---
>>  arch/x86/boot/compressed/kaslr.c | 63 
>> +++-
>>  1 file changed, 62 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/boot/compressed/kaslr.c 
>> b/arch/x86/boot/compressed/kaslr.c
>> index 17818ba6906f..3c1f5204693b 100644
>> --- a/arch/x86/boot/compressed/kaslr.c
>> +++ b/arch/x86/boot/compressed/kaslr.c
>> @@ -107,6 +107,12 @@ enum mem_avoid_index {
>> 
>>  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>> 
>> +/* Only supporting at most 4 immovable memory regions with kaslr */
>> +#define MAX_IMMOVABLE_MEM   4
>> +
>> +static struct mem_vector immovable_mem[MAX_IMMOVABLE_MEM];
>> +static int num_immovable_region;
>> +
>>  static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
>>  {
>>  /* Item one is entirely before item two. */
>> @@ -167,6 +173,28 @@ parse_memmap(char *p, unsigned long long *start, 
>> unsigned long long *size)
>>  return -EINVAL;
>>  }
>> 
>> +static int parse_immovable_mem(char *p,
>> +   unsigned long long *start,
>> +   unsigned long long *size)
>> +{
>> +char *oldp;
>> +
>> +if (!p)
>> +return -EINVAL;
>> +
>> +oldp = p;
>> +*size = memparse(p, );
>> +if (p == oldp)
>> +return -EINVAL;
>> +
>> +if (*p == '@') {
>> +*start = memparse(p + 1, );
>> +return 0;
>> +}
>> +
>
>Here you don't consider that if @ss[KMG] is omitted.

Yes, will add. Many thanks.

>
>> +return -EINVAL;
>> +}
>> +
>>  static void mem_avoid_memmap(char *str)
>>  {
>>  static int i;
>> @@ -206,6 +234,36 @@ static void mem_avoid_memmap(char *str)
>>  memmap_too_large = true;
>>  }
>> 
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +static void mem_mark_immovable(char *str)
>> +{
>> +int i = 0;
>> +
>
>you have use num_immovable_region, 'i' is useless. just remove it.

Using num_immovable_region makes code too long. Using i will be
clear and make sure shoter than 80 characters.

>
>> +while (str && (i < MAX_IMMOVABLE_MEM)) {
>> +int rc;
>> +unsigned long long start, size;
>> +char *k = strchr(str, ',');
>> +
>
>Why do you put this definition here? IMO, moving it out is better.
>
>> +if (k)
>> +*k++ = 0;
>> +
>> +rc = parse_immovable_mem(str, , );
>> +if (rc < 0)
>> +break;
>> +str = k;
>> +
>> +immovable_mem[i].start = start;
>> +immovable_mem[i].size = size;
>> +i++;
>
>Replace it with num_immovable_region

ditto...
Why do you care this little variable so much.

>
>> +}
>> +num_immovable_region = i;
>
>Just remove it.

ditto.

>
>> +}
>> +#else
>> +static inline void mem_mark_immovable(char *str)
>> +{
>> +}
>> +#endif
>> +
>>  static int handle_mem_memmap(void)
>>  {
>>  char *args = (char *)get_cmd_line_ptr();
>> @@ -214,7 +272,8 @@ static int handle_mem_memmap(void)
>>  char *param, *val;
>>  u64 mem_size;
>> 
>> -if (!strstr(args, "memmap=") && !strstr(args, "mem="))
>> +if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
>> +!strstr(args, "movable_node="))
>>  return 0;
>> 
>>  tmp_cmdline = malloc(len + 1);
>> @@ -239,6 +298,8 @@ static int handle_mem_memmap(void)
>> 
>>  if (!strcmp(param, "memmap")) {
>>  mem_avoid_memmap(val);
>> +} else if (!strcmp(param, "movable_node")) {
>> +mem_mark_immovable(val);
>
>AFAIK, handle_mem_memmap() is invoked in mem_avoid_init(), which is used to
>avoid mem. But, here the value of immovable node is the memory
>you want to mark and use, it is better that we split it out.

There is existing and useful code, so it's better to reuse but not
re-write.

>
>BTW, Using movable_node to store the memory of immovable node is
>strange and make me confuse. How about adding a new command option.
>

I have

Re: [PATCH 4/4] kaslr: clean up a useless variable and some usless space

2017-10-19 Thread Dou Liyang


Hi Chao,

At 10/19/2017 06:02 PM, Chao Fan wrote:

There are two same variable "rc" in this function. One is in the
circulation, the other is out of the circulation. The one out will never
be used, so drop it.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/kaslr.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 22330cbe8515..8a33ed82fd0b 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -198,7 +198,6 @@ static int parse_immovable_mem(char *p,
 static void mem_avoid_memmap(char *str)
 {
static int i;
-   int rc;

if (i >= MAX_MEMMAP_REGIONS)
return;


Seems it is redundant too,

Thanks,
dou.


@@ -277,7 +276,7 @@ static int handle_mem_memmap(void)
return 0;

tmp_cmdline = malloc(len + 1);
-   if (!tmp_cmdline )
+   if (!tmp_cmdline)
error("Failed to allocate space for tmp_cmdline");

memcpy(tmp_cmdline, args, len);
@@ -423,7 +422,7 @@ static void mem_avoid_init(unsigned long input, unsigned 
long input_size,
cmd_line |= boot_params->hdr.cmd_line_ptr;
/* Calculate size of cmd_line. */
ptr = (char *)(unsigned long)cmd_line;
-   for (cmd_line_size = 0; ptr[cmd_line_size++]; )
+   for (cmd_line_size = 0; ptr[cmd_line_size++];)
;
mem_avoid[MEM_AVOID_CMDLINE].start = cmd_line;
mem_avoid[MEM_AVOID_CMDLINE].size = cmd_line_size;

Re: [PATCH 4/4] kaslr: clean up a useless variable and some usless space

2017-10-19 Thread Dou Liyang


Hi Chao,

At 10/19/2017 06:02 PM, Chao Fan wrote:

There are two same variable "rc" in this function. One is in the
circulation, the other is out of the circulation. The one out will never
be used, so drop it.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/kaslr.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 22330cbe8515..8a33ed82fd0b 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -198,7 +198,6 @@ static int parse_immovable_mem(char *p,
 static void mem_avoid_memmap(char *str)
 {
static int i;
-   int rc;

if (i >= MAX_MEMMAP_REGIONS)
return;


Seems it is redundant too,

Thanks,
dou.


@@ -277,7 +276,7 @@ static int handle_mem_memmap(void)
return 0;

tmp_cmdline = malloc(len + 1);
-   if (!tmp_cmdline )
+   if (!tmp_cmdline)
error("Failed to allocate space for tmp_cmdline");

memcpy(tmp_cmdline, args, len);
@@ -423,7 +422,7 @@ static void mem_avoid_init(unsigned long input, unsigned 
long input_size,
cmd_line |= boot_params->hdr.cmd_line_ptr;
/* Calculate size of cmd_line. */
ptr = (char *)(unsigned long)cmd_line;
-   for (cmd_line_size = 0; ptr[cmd_line_size++]; )
+   for (cmd_line_size = 0; ptr[cmd_line_size++];)
;
mem_avoid[MEM_AVOID_CMDLINE].start = cmd_line;
mem_avoid[MEM_AVOID_CMDLINE].size = cmd_line_size;

Re: [PATCH 2/4] kaslr: select the memory region in immovable node to process

2017-10-19 Thread Dou Liyang


Hi Chao

At 10/19/2017 06:02 PM, Chao Fan wrote:

Since the interrelationship between e820 or efi entries and memory
region in immovable_mem is different:
One memory region in one node may contain several entries of e820 or
efi sometimes, and one entry of e820 or efi may contain the memory in
different nodes sometimes. So select the intersection as a region to
process_mem_region. It may split one node or one entry to several regions.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/kaslr.c | 72 
 1 file changed, 58 insertions(+), 14 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 3c1f5204693b..22330cbe8515 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -563,6 +563,7 @@ static void process_mem_region(struct mem_vector *entry,
end = min(entry->size + entry->start, mem_limit);
if (entry->start >= end)
return;
+
cur_entry.start = entry->start;
cur_entry.size = end - entry->start;


Above code has nothing to do with this patch. remove it.



@@ -621,6 +622,52 @@ static void process_mem_region(struct mem_vector *entry,
}
 }

+static bool select_immovable_node(unsigned long long start,
+ unsigned long long size,
+ unsigned long long minimum,
+ unsigned long long image_size)
+{
+   struct mem_vector region;
+   int i;
+
+   if (num_immovable_region == 0) {


Seems it more better:

#ifdef CONFIG_MEMORY_HOTPLUG
for (i = 0; i < num_immovable_region; i++) {
  ...
}
#else
...
process_mem_region(, minimum, image_size);
...
#endif


+   region.start = start;
+   region.size = size;
+   process_mem_region(, minimum, image_size);
+
+   if (slot_area_index == MAX_SLOT_AREA) {
+   debug_putstr("Aborted memmap scan (slot_areas 
full)!\n");
+   return 1;
+   }
+   } else {
+   for (i = 0; i < num_immovable_region; i++) {
+   unsigned long long end, select_end;
+   unsigned long long region_start, region_end;
+
+   end = start + size - 1;
+   region_start = immovable_mem[i].start;
+   region_end = region_start + immovable_mem[i].size - 1;
+
+   if (end < region_start || start > region_end)
+   continue;
+
+   region.start = start > region_start ?
+  start : region_start;
+   select_end = end > region_end ? region_end : end;
+
+   region.size = select_end - region.start + 1;
+
+   process_mem_region(, minimum, image_size);
+
+   if (slot_area_index == MAX_SLOT_AREA) {
+   debug_putstr("Aborted memmap scan (slot_areas 
full)!\n");
+   return 1;
+   }
+   }
+   }
+   return 0;
+}
+
 #ifdef CONFIG_EFI
 /*
  * Returns true if mirror region found (and must have been processed
@@ -631,7 +678,6 @@ process_efi_entries(unsigned long minimum, unsigned long 
image_size)
 {
struct efi_info *e = _params->efi_info;
bool efi_mirror_found = false;
-   struct mem_vector region;
efi_memory_desc_t *md;
unsigned long pmap;
char *signature;
@@ -664,6 +710,8 @@ process_efi_entries(unsigned long minimum, unsigned long 
image_size)
}

for (i = 0; i < nr_desc; i++) {
+   unsigned long long start, size;
+
md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);

/*
@@ -684,13 +732,11 @@ process_efi_entries(unsigned long minimum, unsigned long 
image_size)
!(md->attribute & EFI_MEMORY_MORE_RELIABLE))
continue;

-   region.start = md->phys_addr;
-   region.size = md->num_pages << EFI_PAGE_SHIFT;
-   process_mem_region(, minimum, image_size);
-   if (slot_area_index == MAX_SLOT_AREA) {
-   debug_putstr("Aborted EFI scan (slot_areas full)!\n");
+   start = md->phys_addr;
+   size = md->num_pages << EFI_PAGE_SHIFT;
+
+   if (select_immovable_node(start, size, minimum, image_size))


Why you replace the region with two parameters? Just use region.


break;
-   }
}
return true;
 }
@@ -706,22 +752,20 @@ static void process_e820_entries(unsigned long minimum,
 unsigned long image_size)
 {
int i;
-   struct mem_vector region;
struct boot_e820_entry *entry;

/* Verify potential e820 positions,

Re: [PATCH 2/4] kaslr: select the memory region in immovable node to process

2017-10-19 Thread Dou Liyang


Hi Chao

At 10/19/2017 06:02 PM, Chao Fan wrote:

Since the interrelationship between e820 or efi entries and memory
region in immovable_mem is different:
One memory region in one node may contain several entries of e820 or
efi sometimes, and one entry of e820 or efi may contain the memory in
different nodes sometimes. So select the intersection as a region to
process_mem_region. It may split one node or one entry to several regions.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/kaslr.c | 72 
 1 file changed, 58 insertions(+), 14 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 3c1f5204693b..22330cbe8515 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -563,6 +563,7 @@ static void process_mem_region(struct mem_vector *entry,
end = min(entry->size + entry->start, mem_limit);
if (entry->start >= end)
return;
+
cur_entry.start = entry->start;
cur_entry.size = end - entry->start;


Above code has nothing to do with this patch. remove it.



@@ -621,6 +622,52 @@ static void process_mem_region(struct mem_vector *entry,
}
 }

+static bool select_immovable_node(unsigned long long start,
+ unsigned long long size,
+ unsigned long long minimum,
+ unsigned long long image_size)
+{
+   struct mem_vector region;
+   int i;
+
+   if (num_immovable_region == 0) {


Seems it more better:

#ifdef CONFIG_MEMORY_HOTPLUG
for (i = 0; i < num_immovable_region; i++) {
  ...
}
#else
...
process_mem_region(, minimum, image_size);
...
#endif


+   region.start = start;
+   region.size = size;
+   process_mem_region(, minimum, image_size);
+
+   if (slot_area_index == MAX_SLOT_AREA) {
+   debug_putstr("Aborted memmap scan (slot_areas 
full)!\n");
+   return 1;
+   }
+   } else {
+   for (i = 0; i < num_immovable_region; i++) {
+   unsigned long long end, select_end;
+   unsigned long long region_start, region_end;
+
+   end = start + size - 1;
+   region_start = immovable_mem[i].start;
+   region_end = region_start + immovable_mem[i].size - 1;
+
+   if (end < region_start || start > region_end)
+   continue;
+
+   region.start = start > region_start ?
+  start : region_start;
+   select_end = end > region_end ? region_end : end;
+
+   region.size = select_end - region.start + 1;
+
+   process_mem_region(, minimum, image_size);
+
+   if (slot_area_index == MAX_SLOT_AREA) {
+   debug_putstr("Aborted memmap scan (slot_areas 
full)!\n");
+   return 1;
+   }
+   }
+   }
+   return 0;
+}
+
 #ifdef CONFIG_EFI
 /*
  * Returns true if mirror region found (and must have been processed
@@ -631,7 +678,6 @@ process_efi_entries(unsigned long minimum, unsigned long 
image_size)
 {
struct efi_info *e = _params->efi_info;
bool efi_mirror_found = false;
-   struct mem_vector region;
efi_memory_desc_t *md;
unsigned long pmap;
char *signature;
@@ -664,6 +710,8 @@ process_efi_entries(unsigned long minimum, unsigned long 
image_size)
}

for (i = 0; i < nr_desc; i++) {
+   unsigned long long start, size;
+
md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);

/*
@@ -684,13 +732,11 @@ process_efi_entries(unsigned long minimum, unsigned long 
image_size)
!(md->attribute & EFI_MEMORY_MORE_RELIABLE))
continue;

-   region.start = md->phys_addr;
-   region.size = md->num_pages << EFI_PAGE_SHIFT;
-   process_mem_region(, minimum, image_size);
-   if (slot_area_index == MAX_SLOT_AREA) {
-   debug_putstr("Aborted EFI scan (slot_areas full)!\n");
+   start = md->phys_addr;
+   size = md->num_pages << EFI_PAGE_SHIFT;
+
+   if (select_immovable_node(start, size, minimum, image_size))


Why you replace the region with two parameters? Just use region.


break;
-   }
}
return true;
 }
@@ -706,22 +752,20 @@ static void process_e820_entries(unsigned long minimum,
 unsigned long image_size)
 {
int i;
-   struct mem_vector region;
struct boot_e820_entry *entry;

/* Verify potential e820 positions, appending to slots list.

Re: [RFC PATCH] fs: fsnotify: account fsnotify metadata to kmemcg

2017-10-19 Thread Amir Goldstein

On Fri, Oct 20, 2017 at 12:20 AM, Yang Shi  wrote:
> We observed some misbehaved user applications might consume significant
> amount of fsnotify slabs silently. It'd better to account those slabs in
> kmemcg so that we can get heads up before misbehaved applications use too
> much memory silently.

In what way do they misbehave? create a lot of marks? create a lot of events?
Not reading events in their queue?
The latter case is more interesting:

Process A is the one that asked to get the events.
Process B is the one that is generating the events and queuing them on
the queue that is owned by process A, who is also to blame if the queue
is not being read.

So why should process B be held accountable for memory pressure
caused by, say, an FAN_UNLIMITED_QUEUE that process A created and
doesn't read from.

Is it possible to get an explicit reference to the memcg's  events cache
at fsnotify_group creation time, store it in the group struct and then allocate
events from the event cache associated with the group (the listener) rather
than the cache associated with the task generating the event?

Amir.

>
> Signed-off-by: Yang Shi 
> ---
>  fs/notify/dnotify/dnotify.c| 4 ++--
>  fs/notify/fanotify/fanotify_user.c | 6 +++---
>  fs/notify/fsnotify.c   | 2 +-
>  fs/notify/inotify/inotify_user.c   | 2 +-
>  4 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
> index cba3283..3ec6233 100644
> --- a/fs/notify/dnotify/dnotify.c
> +++ b/fs/notify/dnotify/dnotify.c
> @@ -379,8 +379,8 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned 
> long arg)
>
>  static int __init dnotify_init(void)
>  {
> -   dnotify_struct_cache = KMEM_CACHE(dnotify_struct, SLAB_PANIC);
> -   dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC);
> +   dnotify_struct_cache = KMEM_CACHE(dnotify_struct, 
> SLAB_PANIC|SLAB_ACCOUNT);
> +   dnotify_mark_cache = KMEM_CACHE(dnotify_mark, 
> SLAB_PANIC|SLAB_ACCOUNT);
>
> dnotify_group = fsnotify_alloc_group(_fsnotify_ops);
> if (IS_ERR(dnotify_group))
> diff --git a/fs/notify/fanotify/fanotify_user.c 
> b/fs/notify/fanotify/fanotify_user.c
> index 907a481..7d62dee 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -947,11 +947,11 @@ static int fanotify_add_inode_mark(struct 
> fsnotify_group *group,
>   */
>  static int __init fanotify_user_setup(void)
>  {
> -   fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC);
> -   fanotify_event_cachep = KMEM_CACHE(fanotify_event_info, SLAB_PANIC);
> +   fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, 
> SLAB_PANIC|SLAB_ACCOUNT);
> +   fanotify_event_cachep = KMEM_CACHE(fanotify_event_info, 
> SLAB_PANIC|SLAB_ACCOUNT);
>  #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> fanotify_perm_event_cachep = KMEM_CACHE(fanotify_perm_event_info,
> -   SLAB_PANIC);
> +   SLAB_PANIC|SLAB_ACCOUNT);
>  #endif
>
> return 0;
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 0c4583b..82620ac 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -386,7 +386,7 @@ static __init int fsnotify_init(void)
> panic("initializing fsnotify_mark_srcu");
>
> fsnotify_mark_connector_cachep = KMEM_CACHE(fsnotify_mark_connector,
> -   SLAB_PANIC);
> +   SLAB_PANIC|SLAB_ACCOUNT);
>
> return 0;
>  }
> diff --git a/fs/notify/inotify/inotify_user.c 
> b/fs/notify/inotify/inotify_user.c
> index 7cc7d3f..57b32ff 100644
> --- a/fs/notify/inotify/inotify_user.c
> +++ b/fs/notify/inotify/inotify_user.c
> @@ -785,7 +785,7 @@ static int __init inotify_user_setup(void)
>
> BUG_ON(hweight32(ALL_INOTIFY_BITS) != 21);
>
> -   inotify_inode_mark_cachep = KMEM_CACHE(inotify_inode_mark, 
> SLAB_PANIC);
> +   inotify_inode_mark_cachep = KMEM_CACHE(inotify_inode_mark, 
> SLAB_PANIC|SLAB_ACCOUNT);
>
> inotify_max_queued_events = 16384;
> init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128;
> --
> 1.8.3.1
>

Re: [RFC PATCH] fs: fsnotify: account fsnotify metadata to kmemcg

2017-10-19 Thread Amir Goldstein

On Fri, Oct 20, 2017 at 12:20 AM, Yang Shi  wrote:
> We observed some misbehaved user applications might consume significant
> amount of fsnotify slabs silently. It'd better to account those slabs in
> kmemcg so that we can get heads up before misbehaved applications use too
> much memory silently.

In what way do they misbehave? create a lot of marks? create a lot of events?
Not reading events in their queue?
The latter case is more interesting:

Process A is the one that asked to get the events.
Process B is the one that is generating the events and queuing them on
the queue that is owned by process A, who is also to blame if the queue
is not being read.

So why should process B be held accountable for memory pressure
caused by, say, an FAN_UNLIMITED_QUEUE that process A created and
doesn't read from.

Is it possible to get an explicit reference to the memcg's  events cache
at fsnotify_group creation time, store it in the group struct and then allocate
events from the event cache associated with the group (the listener) rather
than the cache associated with the task generating the event?

Amir.

>
> Signed-off-by: Yang Shi 
> ---
>  fs/notify/dnotify/dnotify.c| 4 ++--
>  fs/notify/fanotify/fanotify_user.c | 6 +++---
>  fs/notify/fsnotify.c   | 2 +-
>  fs/notify/inotify/inotify_user.c   | 2 +-
>  4 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
> index cba3283..3ec6233 100644
> --- a/fs/notify/dnotify/dnotify.c
> +++ b/fs/notify/dnotify/dnotify.c
> @@ -379,8 +379,8 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned 
> long arg)
>
>  static int __init dnotify_init(void)
>  {
> -   dnotify_struct_cache = KMEM_CACHE(dnotify_struct, SLAB_PANIC);
> -   dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC);
> +   dnotify_struct_cache = KMEM_CACHE(dnotify_struct, 
> SLAB_PANIC|SLAB_ACCOUNT);
> +   dnotify_mark_cache = KMEM_CACHE(dnotify_mark, 
> SLAB_PANIC|SLAB_ACCOUNT);
>
> dnotify_group = fsnotify_alloc_group(_fsnotify_ops);
> if (IS_ERR(dnotify_group))
> diff --git a/fs/notify/fanotify/fanotify_user.c 
> b/fs/notify/fanotify/fanotify_user.c
> index 907a481..7d62dee 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -947,11 +947,11 @@ static int fanotify_add_inode_mark(struct 
> fsnotify_group *group,
>   */
>  static int __init fanotify_user_setup(void)
>  {
> -   fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC);
> -   fanotify_event_cachep = KMEM_CACHE(fanotify_event_info, SLAB_PANIC);
> +   fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, 
> SLAB_PANIC|SLAB_ACCOUNT);
> +   fanotify_event_cachep = KMEM_CACHE(fanotify_event_info, 
> SLAB_PANIC|SLAB_ACCOUNT);
>  #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> fanotify_perm_event_cachep = KMEM_CACHE(fanotify_perm_event_info,
> -   SLAB_PANIC);
> +   SLAB_PANIC|SLAB_ACCOUNT);
>  #endif
>
> return 0;
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 0c4583b..82620ac 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -386,7 +386,7 @@ static __init int fsnotify_init(void)
> panic("initializing fsnotify_mark_srcu");
>
> fsnotify_mark_connector_cachep = KMEM_CACHE(fsnotify_mark_connector,
> -   SLAB_PANIC);
> +   SLAB_PANIC|SLAB_ACCOUNT);
>
> return 0;
>  }
> diff --git a/fs/notify/inotify/inotify_user.c 
> b/fs/notify/inotify/inotify_user.c
> index 7cc7d3f..57b32ff 100644
> --- a/fs/notify/inotify/inotify_user.c
> +++ b/fs/notify/inotify/inotify_user.c
> @@ -785,7 +785,7 @@ static int __init inotify_user_setup(void)
>
> BUG_ON(hweight32(ALL_INOTIFY_BITS) != 21);
>
> -   inotify_inode_mark_cachep = KMEM_CACHE(inotify_inode_mark, 
> SLAB_PANIC);
> +   inotify_inode_mark_cachep = KMEM_CACHE(inotify_inode_mark, 
> SLAB_PANIC|SLAB_ACCOUNT);
>
> inotify_max_queued_events = 16384;
> init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128;
> --
> 1.8.3.1
>

Re: v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()

2017-10-19 Thread Wei Wei

Sry. Here it is.

Unable to handle kernel paging request at virtual address 80005bfb81ed
Mem abort info:
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x0033
CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000
[80005bfb81ed] *pgd=beff7003, *pud=00e88711
Internal error: Oops: 9621 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3
Hardware name: linux,dummy-virt (DT)
task: 800074409e00 task.stack: 800033db
PC is at __skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 
(discriminator 4) /net/core/skbuff.c:873 (discriminator 4)) 
LR is at __skb_clone (/net/core/skbuff.c:861 (discriminator 4)) 
pc : lr : pstate: 1145
 
sp : 800033db33d0
x29: 800033db33d0 x28: 298ac378 
x27: 16a860e1 x26: 167b66b6 
x25: 8000743340a0 x24: 800035430708 
x23: 80005bfb80c9 x22: 800035430710 
x21: 0380 x20: 800035430640 
x19: 8000354312c0 x18:  
x17: 004af000 x16: 2845e8c8 
x15: 1e518060 x14: d8316070 
x13: d8316090 x12:  
x11: 16a8626f x10: 16a8626f 
x9 : dfff2000 x8 : 0082009000900608 
x7 :  x6 : 800035431380 
x5 : 16a86270 x4 :  
x3 : 16a86273 x2 :  
x1 : 0100 x0 : 80005bfb81ed 
Process syz-executor0 (pid: 4725, stack limit = 0x800033db)
Call trace:
Exception stack(0x800033db3290 to 0x800033db33d0)
3280:   80005bfb81ed 0100
32a0:  16a86273  16a86270
32c0: 800035431380  0082009000900608 dfff2000
32e0: 16a8626f 16a8626f  d8316090
3300: d8316070 1e518060 2845e8c8 004af000
3320:  8000354312c0 800035430640 0380
3340: 800035430710 80005bfb80c9 800035430708 8000743340a0
3360: 167b66b6 16a860e1 298ac378 800033db33d0
3380: 29705cfc 800033db33d0 29705f50 1145
33a0: 8000354312c0 800035430640 0001 800074334000
33c0: 800033db33d0 29705f50
__skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 (discriminator 4) 
/net/core/skbuff.c:873 (discriminator 4)) 
skb_clone (/net/core/skbuff.c:1286) 
arp_rcv (/./include/linux/skbuff.h:1518 /net/ipv4/arp.c:946) 
__netif_receive_skb_core (/net/core/dev.c:1859 /net/core/dev.c:1874 
/net/core/dev.c:4416) 
__netif_receive_skb (/net/core/dev.c:4466) 
netif_receive_skb_internal (/net/core/dev.c:4539) 
netif_receive_skb (/net/core/dev.c:4564) 
tun_get_user (/./include/linux/bottom_half.h:31 /drivers/net/tun.c:1219 
/drivers/net/tun.c:1553) 
tun_chr_write_iter (/drivers/net/tun.c:1579) 
do_iter_readv_writev (/./include/linux/fs.h:1770 /fs/read_write.c:673) 
do_iter_write (/fs/read_write.c:952) 
vfs_writev (/fs/read_write.c:997) 
do_writev (/fs/read_write.c:1032) 
SyS_writev (/fs/read_write.c:1102) 
Exception stack(0x800033db3ec0 to 0x800033db4000)
3ec0: 0015 829985e0 0001 8299851c
3ee0: 82999068 82998f60 82999650 
3f00: 0042 0036 00406608 82998400
3f20: 82998f60 d8316090 d8316070 1e518060
3f40:  004af000  0036
3f60: 20004fca 2000 0046ccf0 0530
3f80: 0046cce8 004ade98  395fa6f0
3fa0: 82998f60 82998560 00431448 82998520
3fc0: 0043145c 8000 0015 0042
3fe0:    
el0_svc_naked (/arch/arm64/kernel/entry.S:853) 
Code: f9406680 8b01 91009000 f9800011 (885f7c01) 
All code

   0:   80 66 40 f9 andb   $0xf9,0x40(%rsi)
   4:   00 00   add%al,(%rax)
   6:   01 8b 00 90 00 91   add%ecx,-0x6eff7000(%rbx)
   c:   11 00   adc%eax,(%rax)
   e:   80 f9 01cmp$0x1,%cl
  11:   7c 5f   jl 0x72
  13:*  88 00   mov%al,(%rax)   <-- trapping 
instruction
  15:   00 00   add%al,(%rax)
...

Code starting with the faulting instruction
===
   0:   01 7c 5f 88 add%edi,-0x78(%rdi,%rbx,2)
   4:   00 00   add%al,(%rax)
...
—[ end trace 261e7ac1458ccc0a ]---

Thanks,
Wei

> On 19 Oct 2017, at 10:53 PM, Eric Dumazet  wrote:
> 
> On Thu, Oct 19, 2017 at 7:16

Re: v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()

2017-10-19 Thread Wei Wei

Sry. Here it is.

Unable to handle kernel paging request at virtual address 80005bfb81ed
Mem abort info:
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x0033
CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000
[80005bfb81ed] *pgd=beff7003, *pud=00e88711
Internal error: Oops: 9621 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3
Hardware name: linux,dummy-virt (DT)
task: 800074409e00 task.stack: 800033db
PC is at __skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 
(discriminator 4) /net/core/skbuff.c:873 (discriminator 4)) 
LR is at __skb_clone (/net/core/skbuff.c:861 (discriminator 4)) 
pc : lr : pstate: 1145
 
sp : 800033db33d0
x29: 800033db33d0 x28: 298ac378 
x27: 16a860e1 x26: 167b66b6 
x25: 8000743340a0 x24: 800035430708 
x23: 80005bfb80c9 x22: 800035430710 
x21: 0380 x20: 800035430640 
x19: 8000354312c0 x18:  
x17: 004af000 x16: 2845e8c8 
x15: 1e518060 x14: d8316070 
x13: d8316090 x12:  
x11: 16a8626f x10: 16a8626f 
x9 : dfff2000 x8 : 0082009000900608 
x7 :  x6 : 800035431380 
x5 : 16a86270 x4 :  
x3 : 16a86273 x2 :  
x1 : 0100 x0 : 80005bfb81ed 
Process syz-executor0 (pid: 4725, stack limit = 0x800033db)
Call trace:
Exception stack(0x800033db3290 to 0x800033db33d0)
3280:   80005bfb81ed 0100
32a0:  16a86273  16a86270
32c0: 800035431380  0082009000900608 dfff2000
32e0: 16a8626f 16a8626f  d8316090
3300: d8316070 1e518060 2845e8c8 004af000
3320:  8000354312c0 800035430640 0380
3340: 800035430710 80005bfb80c9 800035430708 8000743340a0
3360: 167b66b6 16a860e1 298ac378 800033db33d0
3380: 29705cfc 800033db33d0 29705f50 1145
33a0: 8000354312c0 800035430640 0001 800074334000
33c0: 800033db33d0 29705f50
__skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 (discriminator 4) 
/net/core/skbuff.c:873 (discriminator 4)) 
skb_clone (/net/core/skbuff.c:1286) 
arp_rcv (/./include/linux/skbuff.h:1518 /net/ipv4/arp.c:946) 
__netif_receive_skb_core (/net/core/dev.c:1859 /net/core/dev.c:1874 
/net/core/dev.c:4416) 
__netif_receive_skb (/net/core/dev.c:4466) 
netif_receive_skb_internal (/net/core/dev.c:4539) 
netif_receive_skb (/net/core/dev.c:4564) 
tun_get_user (/./include/linux/bottom_half.h:31 /drivers/net/tun.c:1219 
/drivers/net/tun.c:1553) 
tun_chr_write_iter (/drivers/net/tun.c:1579) 
do_iter_readv_writev (/./include/linux/fs.h:1770 /fs/read_write.c:673) 
do_iter_write (/fs/read_write.c:952) 
vfs_writev (/fs/read_write.c:997) 
do_writev (/fs/read_write.c:1032) 
SyS_writev (/fs/read_write.c:1102) 
Exception stack(0x800033db3ec0 to 0x800033db4000)
3ec0: 0015 829985e0 0001 8299851c
3ee0: 82999068 82998f60 82999650 
3f00: 0042 0036 00406608 82998400
3f20: 82998f60 d8316090 d8316070 1e518060
3f40:  004af000  0036
3f60: 20004fca 2000 0046ccf0 0530
3f80: 0046cce8 004ade98  395fa6f0
3fa0: 82998f60 82998560 00431448 82998520
3fc0: 0043145c 8000 0015 0042
3fe0:    
el0_svc_naked (/arch/arm64/kernel/entry.S:853) 
Code: f9406680 8b01 91009000 f9800011 (885f7c01) 
All code

   0:   80 66 40 f9 andb   $0xf9,0x40(%rsi)
   4:   00 00   add%al,(%rax)
   6:   01 8b 00 90 00 91   add%ecx,-0x6eff7000(%rbx)
   c:   11 00   adc%eax,(%rax)
   e:   80 f9 01cmp$0x1,%cl
  11:   7c 5f   jl 0x72
  13:*  88 00   mov%al,(%rax)   <-- trapping 
instruction
  15:   00 00   add%al,(%rax)
...

Code starting with the faulting instruction
===
   0:   01 7c 5f 88 add%edi,-0x78(%rdi,%rbx,2)
   4:   00 00   add%al,(%rax)
...
—[ end trace 261e7ac1458ccc0a ]---

Thanks,
Wei

> On 19 Oct 2017, at 10:53 PM, Eric Dumazet  wrote:
> 
> On Thu, Oct 19, 2017 at 7:16 PM, Wei Wei  wrote:
>>

Re: [PATCH 1/4] kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Dou Liyang


Hi Chao,

At 10/19/2017 06:02 PM, Chao Fan wrote:

Extend the movable_node to movable_node=nn[KMG]@ss[KMG].
Since in current code, kaslr may choose the memory region in hot-pluggable
nodes. So we can specific the region in immovable node. And store the
regions in immovable_mem.



I guess you may mean that:

In current Linux with KASLR. Kernel may choose a memory region in
movable node for extracting kernel code, which will make the node
can't be hot-removed.

Solve it by only specific the region in immovable node. So create
immovable_mem to store the region of movable node, and only choose
the memory in immovable_mem array.


Multiple regions can be specified, comma delimited.
Considering the usage of memory, only support for 4 regions.
4 regions contains 2 nodes at least, and is enough for kernel to
extract.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/kaslr.c | 63 +++-
 1 file changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 17818ba6906f..3c1f5204693b 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -107,6 +107,12 @@ enum mem_avoid_index {

 static struct mem_vector mem_avoid[MEM_AVOID_MAX];

+/* Only supporting at most 4 immovable memory regions with kaslr */
+#define MAX_IMMOVABLE_MEM  4
+
+static struct mem_vector immovable_mem[MAX_IMMOVABLE_MEM];
+static int num_immovable_region;
+
 static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 {
/* Item one is entirely before item two. */
@@ -167,6 +173,28 @@ parse_memmap(char *p, unsigned long long *start, unsigned 
long long *size)
return -EINVAL;
 }

+static int parse_immovable_mem(char *p,
+  unsigned long long *start,
+  unsigned long long *size)
+{
+   char *oldp;
+
+   if (!p)
+   return -EINVAL;
+
+   oldp = p;
+   *size = memparse(p, );
+   if (p == oldp)
+   return -EINVAL;
+
+   if (*p == '@') {
+   *start = memparse(p + 1, );
+   return 0;
+   }
+


Here you don't consider that if @ss[KMG] is omitted.


+   return -EINVAL;
+}
+
 static void mem_avoid_memmap(char *str)
 {
static int i;
@@ -206,6 +234,36 @@ static void mem_avoid_memmap(char *str)
memmap_too_large = true;
 }

+#ifdef CONFIG_MEMORY_HOTPLUG
+static void mem_mark_immovable(char *str)
+{
+   int i = 0;
+


you have use num_immovable_region, 'i' is useless. just remove it.


+   while (str && (i < MAX_IMMOVABLE_MEM)) {
+   int rc;
+   unsigned long long start, size;
+   char *k = strchr(str, ',');
+


Why do you put this definition here? IMO, moving it out is better.


+   if (k)
+   *k++ = 0;
+
+   rc = parse_immovable_mem(str, , );
+   if (rc < 0)
+   break;
+   str = k;
+
+   immovable_mem[i].start = start;
+   immovable_mem[i].size = size;
+   i++;


Replace it with num_immovable_region


+   }
+   num_immovable_region = i;


Just remove it.


+}
+#else
+static inline void mem_mark_immovable(char *str)
+{
+}
+#endif
+
 static int handle_mem_memmap(void)
 {
char *args = (char *)get_cmd_line_ptr();
@@ -214,7 +272,8 @@ static int handle_mem_memmap(void)
char *param, *val;
u64 mem_size;

-   if (!strstr(args, "memmap=") && !strstr(args, "mem="))
+   if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
+   !strstr(args, "movable_node="))
return 0;

tmp_cmdline = malloc(len + 1);
@@ -239,6 +298,8 @@ static int handle_mem_memmap(void)

if (!strcmp(param, "memmap")) {
mem_avoid_memmap(val);
+   } else if (!strcmp(param, "movable_node")) {
+   mem_mark_immovable(val);


AFAIK, handle_mem_memmap() is invoked in mem_avoid_init(), which is used 
to avoid mem. But, here the value of immovable node is the memory

you want to mark and use, it is better that we split it out.

BTW, Using movable_node to store the memory of immovable node is
strange and make me confuse. How about adding a new command option.

Thanks,
dou.

} else if (!strcmp(param, "mem")) {
char *p = val;

Re: [PATCH 1/4] kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Dou Liyang


Hi Chao,

At 10/19/2017 06:02 PM, Chao Fan wrote:

Extend the movable_node to movable_node=nn[KMG]@ss[KMG].
Since in current code, kaslr may choose the memory region in hot-pluggable
nodes. So we can specific the region in immovable node. And store the
regions in immovable_mem.



I guess you may mean that:

In current Linux with KASLR. Kernel may choose a memory region in
movable node for extracting kernel code, which will make the node
can't be hot-removed.

Solve it by only specific the region in immovable node. So create
immovable_mem to store the region of movable node, and only choose
the memory in immovable_mem array.


Multiple regions can be specified, comma delimited.
Considering the usage of memory, only support for 4 regions.
4 regions contains 2 nodes at least, and is enough for kernel to
extract.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/kaslr.c | 63 +++-
 1 file changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 17818ba6906f..3c1f5204693b 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -107,6 +107,12 @@ enum mem_avoid_index {

 static struct mem_vector mem_avoid[MEM_AVOID_MAX];

+/* Only supporting at most 4 immovable memory regions with kaslr */
+#define MAX_IMMOVABLE_MEM  4
+
+static struct mem_vector immovable_mem[MAX_IMMOVABLE_MEM];
+static int num_immovable_region;
+
 static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 {
/* Item one is entirely before item two. */
@@ -167,6 +173,28 @@ parse_memmap(char *p, unsigned long long *start, unsigned 
long long *size)
return -EINVAL;
 }

+static int parse_immovable_mem(char *p,
+  unsigned long long *start,
+  unsigned long long *size)
+{
+   char *oldp;
+
+   if (!p)
+   return -EINVAL;
+
+   oldp = p;
+   *size = memparse(p, );
+   if (p == oldp)
+   return -EINVAL;
+
+   if (*p == '@') {
+   *start = memparse(p + 1, );
+   return 0;
+   }
+


Here you don't consider that if @ss[KMG] is omitted.


+   return -EINVAL;
+}
+
 static void mem_avoid_memmap(char *str)
 {
static int i;
@@ -206,6 +234,36 @@ static void mem_avoid_memmap(char *str)
memmap_too_large = true;
 }

+#ifdef CONFIG_MEMORY_HOTPLUG
+static void mem_mark_immovable(char *str)
+{
+   int i = 0;
+


you have use num_immovable_region, 'i' is useless. just remove it.


+   while (str && (i < MAX_IMMOVABLE_MEM)) {
+   int rc;
+   unsigned long long start, size;
+   char *k = strchr(str, ',');
+


Why do you put this definition here? IMO, moving it out is better.


+   if (k)
+   *k++ = 0;
+
+   rc = parse_immovable_mem(str, , );
+   if (rc < 0)
+   break;
+   str = k;
+
+   immovable_mem[i].start = start;
+   immovable_mem[i].size = size;
+   i++;


Replace it with num_immovable_region


+   }
+   num_immovable_region = i;


Just remove it.


+}
+#else
+static inline void mem_mark_immovable(char *str)
+{
+}
+#endif
+
 static int handle_mem_memmap(void)
 {
char *args = (char *)get_cmd_line_ptr();
@@ -214,7 +272,8 @@ static int handle_mem_memmap(void)
char *param, *val;
u64 mem_size;

-   if (!strstr(args, "memmap=") && !strstr(args, "mem="))
+   if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
+   !strstr(args, "movable_node="))
return 0;

tmp_cmdline = malloc(len + 1);
@@ -239,6 +298,8 @@ static int handle_mem_memmap(void)

if (!strcmp(param, "memmap")) {
mem_avoid_memmap(val);
+   } else if (!strcmp(param, "movable_node")) {
+   mem_mark_immovable(val);


AFAIK, handle_mem_memmap() is invoked in mem_avoid_init(), which is used 
to avoid mem. But, here the value of immovable node is the memory

you want to mark and use, it is better that we split it out.

BTW, Using movable_node to store the memory of immovable node is
strange and make me confuse. How about adding a new command option.

Thanks,
dou.

} else if (!strcmp(param, "mem")) {
char *p = val;

[PATCH] gpio: Fix loose spelling

2017-10-19 Thread Andrew Jeffery

Literally.

I expect "lose" was meant here, rather than "loose", though you could feasibly
use a somewhat uncommon definition of "loose" to mean what would be meant by
"lose": "Loose the hounds" for instance, as in "Release the hounds".
Substituting in "value" for "hounds" gives "release the value", and makes some
sense, but futher substituting back to loose gives "loose the value" which
overall just seems a bit anachronistic.

Instead, use modern, pragmatic English and save a character.

Cc: Russell Currey 
Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpiolib-of.c   | 4 ++--
 drivers/gpio/gpiolib.c  | 6 +++---
 drivers/gpio/gpiolib.h  | 2 +-
 include/dt-bindings/gpio/gpio.h | 2 +-
 include/linux/gpio/machine.h| 2 +-
 include/linux/of_gpio.h | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c
index bfcd20699ec8..e0d59e61b52f 100644
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -153,8 +153,8 @@ struct gpio_desc *of_find_gpio(struct device *dev, const 
char *con_id,
*flags |= GPIO_OPEN_SOURCE;
}
 
-   if (of_flags & OF_GPIO_SLEEP_MAY_LOOSE_VALUE)
-   *flags |= GPIO_SLEEP_MAY_LOOSE_VALUE;
+   if (of_flags & OF_GPIO_SLEEP_MAY_LOSE_VALUE)
+   *flags |= GPIO_SLEEP_MAY_LOSE_VALUE;
 
return desc;
 }
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index eb80dac4e26a..a56b29fd8bb1 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2890,7 +2890,7 @@ bool gpiochip_line_is_persistent(struct gpio_chip *chip, 
unsigned int offset)
if (offset >= chip->ngpio)
return false;
 
-   return !test_bit(FLAG_SLEEP_MAY_LOOSE_VALUE,
+   return !test_bit(FLAG_SLEEP_MAY_LOSE_VALUE,
 >gpiodev->descs[offset].flags);
 }
 EXPORT_SYMBOL_GPL(gpiochip_line_is_persistent);
@@ -3268,8 +3268,8 @@ int gpiod_configure_flags(struct gpio_desc *desc, const 
char *con_id,
set_bit(FLAG_OPEN_DRAIN, >flags);
if (lflags & GPIO_OPEN_SOURCE)
set_bit(FLAG_OPEN_SOURCE, >flags);
-   if (lflags & GPIO_SLEEP_MAY_LOOSE_VALUE)
-   set_bit(FLAG_SLEEP_MAY_LOOSE_VALUE, >flags);
+   if (lflags & GPIO_SLEEP_MAY_LOSE_VALUE)
+   set_bit(FLAG_SLEEP_MAY_LOSE_VALUE, >flags);
 
/* No particular flag request, return here... */
if (!(dflags & GPIOD_FLAGS_BIT_DIR_SET)) {
diff --git a/drivers/gpio/gpiolib.h b/drivers/gpio/gpiolib.h
index d003ccb12781..799208fc189a 100644
--- a/drivers/gpio/gpiolib.h
+++ b/drivers/gpio/gpiolib.h
@@ -201,7 +201,7 @@ struct gpio_desc {
 #define FLAG_OPEN_SOURCE 8 /* Gpio is open source type */
 #define FLAG_USED_AS_IRQ 9 /* GPIO is connected to an IRQ */
 #define FLAG_IS_HOGGED 11  /* GPIO is hogged */
-#define FLAG_SLEEP_MAY_LOOSE_VALUE 12  /* GPIO may loose value in sleep */
+#define FLAG_SLEEP_MAY_LOSE_VALUE 12   /* GPIO may lose value in sleep */
 
/* Connection label */
const char  *label;
diff --git a/include/dt-bindings/gpio/gpio.h b/include/dt-bindings/gpio/gpio.h
index c5074584561d..70de5b7a6c9b 100644
--- a/include/dt-bindings/gpio/gpio.h
+++ b/include/dt-bindings/gpio/gpio.h
@@ -30,6 +30,6 @@
 
 /* Bit 3 express GPIO suspend/resume persistence */
 #define GPIO_SLEEP_MAINTAIN_VALUE 0
-#define GPIO_SLEEP_MAY_LOOSE_VALUE 8
+#define GPIO_SLEEP_MAY_LOSE_VALUE 8
 
 #endif
diff --git a/include/linux/gpio/machine.h b/include/linux/gpio/machine.h
index ba4ccfd900f9..5e9f294c29eb 100644
--- a/include/linux/gpio/machine.h
+++ b/include/linux/gpio/machine.h
@@ -10,7 +10,7 @@ enum gpio_lookup_flags {
GPIO_OPEN_DRAIN = (1 << 1),
GPIO_OPEN_SOURCE = (1 << 2),
GPIO_SLEEP_MAINTAIN_VALUE = (0 << 3),
-   GPIO_SLEEP_MAY_LOOSE_VALUE = (1 << 3),
+   GPIO_SLEEP_MAY_LOSE_VALUE = (1 << 3),
 };
 
 /**
diff --git a/include/linux/of_gpio.h b/include/linux/of_gpio.h
index ca10f43564de..1fe205582111 100644
--- a/include/linux/of_gpio.h
+++ b/include/linux/of_gpio.h
@@ -31,7 +31,7 @@ enum of_gpio_flags {
OF_GPIO_ACTIVE_LOW = 0x1,
OF_GPIO_SINGLE_ENDED = 0x2,
OF_GPIO_OPEN_DRAIN = 0x4,
-   OF_GPIO_SLEEP_MAY_LOOSE_VALUE = 0x8,
+   OF_GPIO_SLEEP_MAY_LOSE_VALUE = 0x8,
 };
 
 #ifdef CONFIG_OF_GPIO
-- 
2.11.0

[PATCH] gpio: Fix loose spelling

2017-10-19 Thread Andrew Jeffery

Literally.

I expect "lose" was meant here, rather than "loose", though you could feasibly
use a somewhat uncommon definition of "loose" to mean what would be meant by
"lose": "Loose the hounds" for instance, as in "Release the hounds".
Substituting in "value" for "hounds" gives "release the value", and makes some
sense, but futher substituting back to loose gives "loose the value" which
overall just seems a bit anachronistic.

Instead, use modern, pragmatic English and save a character.

Cc: Russell Currey 
Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpiolib-of.c   | 4 ++--
 drivers/gpio/gpiolib.c  | 6 +++---
 drivers/gpio/gpiolib.h  | 2 +-
 include/dt-bindings/gpio/gpio.h | 2 +-
 include/linux/gpio/machine.h| 2 +-
 include/linux/of_gpio.h | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c
index bfcd20699ec8..e0d59e61b52f 100644
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -153,8 +153,8 @@ struct gpio_desc *of_find_gpio(struct device *dev, const 
char *con_id,
*flags |= GPIO_OPEN_SOURCE;
}
 
-   if (of_flags & OF_GPIO_SLEEP_MAY_LOOSE_VALUE)
-   *flags |= GPIO_SLEEP_MAY_LOOSE_VALUE;
+   if (of_flags & OF_GPIO_SLEEP_MAY_LOSE_VALUE)
+   *flags |= GPIO_SLEEP_MAY_LOSE_VALUE;
 
return desc;
 }
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index eb80dac4e26a..a56b29fd8bb1 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2890,7 +2890,7 @@ bool gpiochip_line_is_persistent(struct gpio_chip *chip, 
unsigned int offset)
if (offset >= chip->ngpio)
return false;
 
-   return !test_bit(FLAG_SLEEP_MAY_LOOSE_VALUE,
+   return !test_bit(FLAG_SLEEP_MAY_LOSE_VALUE,
 >gpiodev->descs[offset].flags);
 }
 EXPORT_SYMBOL_GPL(gpiochip_line_is_persistent);
@@ -3268,8 +3268,8 @@ int gpiod_configure_flags(struct gpio_desc *desc, const 
char *con_id,
set_bit(FLAG_OPEN_DRAIN, >flags);
if (lflags & GPIO_OPEN_SOURCE)
set_bit(FLAG_OPEN_SOURCE, >flags);
-   if (lflags & GPIO_SLEEP_MAY_LOOSE_VALUE)
-   set_bit(FLAG_SLEEP_MAY_LOOSE_VALUE, >flags);
+   if (lflags & GPIO_SLEEP_MAY_LOSE_VALUE)
+   set_bit(FLAG_SLEEP_MAY_LOSE_VALUE, >flags);
 
/* No particular flag request, return here... */
if (!(dflags & GPIOD_FLAGS_BIT_DIR_SET)) {
diff --git a/drivers/gpio/gpiolib.h b/drivers/gpio/gpiolib.h
index d003ccb12781..799208fc189a 100644
--- a/drivers/gpio/gpiolib.h
+++ b/drivers/gpio/gpiolib.h
@@ -201,7 +201,7 @@ struct gpio_desc {
 #define FLAG_OPEN_SOURCE 8 /* Gpio is open source type */
 #define FLAG_USED_AS_IRQ 9 /* GPIO is connected to an IRQ */
 #define FLAG_IS_HOGGED 11  /* GPIO is hogged */
-#define FLAG_SLEEP_MAY_LOOSE_VALUE 12  /* GPIO may loose value in sleep */
+#define FLAG_SLEEP_MAY_LOSE_VALUE 12   /* GPIO may lose value in sleep */
 
/* Connection label */
const char  *label;
diff --git a/include/dt-bindings/gpio/gpio.h b/include/dt-bindings/gpio/gpio.h
index c5074584561d..70de5b7a6c9b 100644
--- a/include/dt-bindings/gpio/gpio.h
+++ b/include/dt-bindings/gpio/gpio.h
@@ -30,6 +30,6 @@
 
 /* Bit 3 express GPIO suspend/resume persistence */
 #define GPIO_SLEEP_MAINTAIN_VALUE 0
-#define GPIO_SLEEP_MAY_LOOSE_VALUE 8
+#define GPIO_SLEEP_MAY_LOSE_VALUE 8
 
 #endif
diff --git a/include/linux/gpio/machine.h b/include/linux/gpio/machine.h
index ba4ccfd900f9..5e9f294c29eb 100644
--- a/include/linux/gpio/machine.h
+++ b/include/linux/gpio/machine.h
@@ -10,7 +10,7 @@ enum gpio_lookup_flags {
GPIO_OPEN_DRAIN = (1 << 1),
GPIO_OPEN_SOURCE = (1 << 2),
GPIO_SLEEP_MAINTAIN_VALUE = (0 << 3),
-   GPIO_SLEEP_MAY_LOOSE_VALUE = (1 << 3),
+   GPIO_SLEEP_MAY_LOSE_VALUE = (1 << 3),
 };
 
 /**
diff --git a/include/linux/of_gpio.h b/include/linux/of_gpio.h
index ca10f43564de..1fe205582111 100644
--- a/include/linux/of_gpio.h
+++ b/include/linux/of_gpio.h
@@ -31,7 +31,7 @@ enum of_gpio_flags {
OF_GPIO_ACTIVE_LOW = 0x1,
OF_GPIO_SINGLE_ENDED = 0x2,
OF_GPIO_OPEN_DRAIN = 0x4,
-   OF_GPIO_SLEEP_MAY_LOOSE_VALUE = 0x8,
+   OF_GPIO_SLEEP_MAY_LOSE_VALUE = 0x8,
 };
 
 #ifdef CONFIG_OF_GPIO
-- 
2.11.0

Re: [PATCH 0/4] kaslr: extend movable_node to movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Chao Fan

On Fri, Oct 20, 2017 at 10:37:52AM +0800, Dou Liyang wrote:
>Hi Chao,
>
Hi Dou-san,

>Cheer! I have some concerns below.

Thanks for your reply.

>
>At 10/19/2017 06:02 PM, Chao Fan wrote:
>> Here is a problem:
>> Here is a machine with several NUMA nodes and some of them are hot-pluggable.
>> It's not good for kernel to be extracted in the memory region of movable 
>> node.
>> But in current code, I print the address choosen by kaslr and found it may be
>> placed in movable node sometimes. To solve this problem, it's better to limit
>> the memory region choosen by kaslr to immovable node in kaslr.c. But the 
>> memory
>> infomation about if it's hot-pluggable is stored in ACPI SRAT table, which is
>> parsed after kernel is extracted. So we can't get the detail memory 
>> infomation
>> before extracting kernel.
>> 
>> So extend the movable_node to movable_node=nn@ss, in which nn means
>> the size of memory in *immovable* node, and ss means the start position of
>> this memory region. Then limit kaslr choose memory in these regions.
>
>Yes, great. Here we should remember that the situation of
>'movable_node=nn@ss' is rare, normal situation is 'movable_node=nn'.
>
>So, we should consider our code tendencies for normal situation. ;-)

Yes, it's normal. But you can not make sure the special situation will
never happen,. If it happens, we can make sure codes work well, right?

We can not make sure that the movable nodes are continuous, or even if
the movable nodes are continuous, we can not make sure the memory
address are continuous.

It is easy to avoid the memory region in movable node.
But if we can handle more special situations, and at the same time,
make kernel more safe, why not?

>
>> 
>> There are two policies:
>> 1. Specify the memory region in *movable* node to avoid:
>>Then we can use the existing mem_avoid to handle. But if the memory
>>one movable node was separated by memory hole or different movable nodes
>>are discontinuous, we don't know how many regions need to avoid.
>
>It is not a problem.
>
>As you said, we should provide an interface for users later, like that:
>
># cat /sys/device/system/memory/movable_node
>nn@ss
>

Both are OK. I think outputing the memory region in movable_node or
immovable_node are both reasonable. So the interface of both methods
will be useful. And after we decided which policy used in kaslr, then
add the interface of /sys.

Thanks,
Chao Fan

>
>Thanks,
>   dou.
>>OTOH, we must avoid all of the movable memory, otherwise, kaslr may
>>choose the wrong place.
>> 2. Specify the memory region in "immovable* node to select:
>>Only support 4 regions in this parameter. Then user can use two nodes
>>at least for kaslr to choose, it's enough for the kernel to extract.
>>At the same time, because we need only 4 new mem_vector, the usage
>>of memory here is not too big.
>> 
>> PATCH 1/4 parse the extended movable_node=nn[KMG]@ss[KMG], then
>>store the memory regions.
>> PATCH 2/4 selects the memory region in immovable node when process
>>memmap.
>> PATCH 3/4 is the change of document.
>> PATCH 4/4 cleans up some little problems.
>> 
>> Chao Fan (4):
>>   kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]
>>   kaslr: select the memory region in immovable node to process
>>   document: change the document for the extended movable_node
>>   kaslr: clean up a useless variable and some usless space
>> 
>>  Documentation/admin-guide/kernel-parameters.txt |   9 ++
>>  arch/x86/boot/compressed/kaslr.c| 140 
>> +---
>>  2 files changed, 131 insertions(+), 18 deletions(-)
>>

Re: [PATCH 0/4] kaslr: extend movable_node to movable_node=nn[KMG]@ss[KMG]

2017-10-19 Thread Chao Fan

On Fri, Oct 20, 2017 at 10:37:52AM +0800, Dou Liyang wrote:
>Hi Chao,
>
Hi Dou-san,

>Cheer! I have some concerns below.

Thanks for your reply.

>
>At 10/19/2017 06:02 PM, Chao Fan wrote:
>> Here is a problem:
>> Here is a machine with several NUMA nodes and some of them are hot-pluggable.
>> It's not good for kernel to be extracted in the memory region of movable 
>> node.
>> But in current code, I print the address choosen by kaslr and found it may be
>> placed in movable node sometimes. To solve this problem, it's better to limit
>> the memory region choosen by kaslr to immovable node in kaslr.c. But the 
>> memory
>> infomation about if it's hot-pluggable is stored in ACPI SRAT table, which is
>> parsed after kernel is extracted. So we can't get the detail memory 
>> infomation
>> before extracting kernel.
>> 
>> So extend the movable_node to movable_node=nn@ss, in which nn means
>> the size of memory in *immovable* node, and ss means the start position of
>> this memory region. Then limit kaslr choose memory in these regions.
>
>Yes, great. Here we should remember that the situation of
>'movable_node=nn@ss' is rare, normal situation is 'movable_node=nn'.
>
>So, we should consider our code tendencies for normal situation. ;-)

Yes, it's normal. But you can not make sure the special situation will
never happen,. If it happens, we can make sure codes work well, right?

We can not make sure that the movable nodes are continuous, or even if
the movable nodes are continuous, we can not make sure the memory
address are continuous.

It is easy to avoid the memory region in movable node.
But if we can handle more special situations, and at the same time,
make kernel more safe, why not?

>
>> 
>> There are two policies:
>> 1. Specify the memory region in *movable* node to avoid:
>>Then we can use the existing mem_avoid to handle. But if the memory
>>one movable node was separated by memory hole or different movable nodes
>>are discontinuous, we don't know how many regions need to avoid.
>
>It is not a problem.
>
>As you said, we should provide an interface for users later, like that:
>
># cat /sys/device/system/memory/movable_node
>nn@ss
>

Both are OK. I think outputing the memory region in movable_node or
immovable_node are both reasonable. So the interface of both methods
will be useful. And after we decided which policy used in kaslr, then
add the interface of /sys.

Thanks,
Chao Fan

>
>Thanks,
>   dou.
>>OTOH, we must avoid all of the movable memory, otherwise, kaslr may
>>choose the wrong place.
>> 2. Specify the memory region in "immovable* node to select:
>>Only support 4 regions in this parameter. Then user can use two nodes
>>at least for kaslr to choose, it's enough for the kernel to extract.
>>At the same time, because we need only 4 new mem_vector, the usage
>>of memory here is not too big.
>> 
>> PATCH 1/4 parse the extended movable_node=nn[KMG]@ss[KMG], then
>>store the memory regions.
>> PATCH 2/4 selects the memory region in immovable node when process
>>memmap.
>> PATCH 3/4 is the change of document.
>> PATCH 4/4 cleans up some little problems.
>> 
>> Chao Fan (4):
>>   kaslr: parse the extended movable_node=nn[KMG]@ss[KMG]
>>   kaslr: select the memory region in immovable node to process
>>   document: change the document for the extended movable_node
>>   kaslr: clean up a useless variable and some usless space
>> 
>>  Documentation/admin-guide/kernel-parameters.txt |   9 ++
>>  arch/x86/boot/compressed/kaslr.c| 140 
>> +---
>>  2 files changed, 131 insertions(+), 18 deletions(-)
>>

Re: v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()

2017-10-19 Thread Eric Dumazet

On Thu, Oct 19, 2017 at 7:16 PM, Wei Wei  wrote:
> Hi all,
>
> I have fuzzed v4.14-rc3 using syzkaller and found a bug similar to that one 
> [1].
> But the call trace isn’t the same. The atomic_inc() might handle a corrupted
> skb_buff.
>
> The logs and config have been uploaded to my github repo [2].
>
> [1] https://lkml.org/lkml/2017/10/2/216
> [2] https://github.com/dotweiba/skb_clone_atomic_inc_bug
>
> Thanks,
> Wei
>
>  Unable to handle kernel paging request at virtual address 80005bfb81ed
>  Mem abort info:
>Exception class = DABT (current EL), IL = 32 bits
>SET = 0, FnV = 0
>EA = 0, S1PTW = 0
>  Data abort info:
>ISV = 0, ISS = 0x0033
>CM = 0, WnR = 0
>  swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000
>  [80005bfb81ed] *pgd=beff7003, *pud=00e88711
>  Internal error: Oops: 9621 [#1] PREEMPT SMP
>  Modules linked in:
>  CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3
>  Hardware name: linux,dummy-virt (DT)
>  task: 800074409e00 task.stack: 800033db
>  PC is at __skb_clone+0x430/0x5b0
>  LR is at __skb_clone+0x1dc/0x5b0
>  pc : [] lr : [] pstate: 1145
>  sp : 800033db33d0
>  x29: 800033db33d0 x28: 298ac378
>  x27: 16a860e1 x26: 167b66b6
>  x25: 8000743340a0 x24: 800035430708
>  x23: 80005bfb80c9 x22: 800035430710
>  x21: 0380 x20: 800035430640
>  x19: 8000354312c0 x18: 
>  x17: 004af000 x16: 2845e8c8
>  x15: 1e518060 x14: d8316070
>  x13: d8316090 x12: 
>  x11: 16a8626f x10: 16a8626f
>  x9 : dfff2000 x8 : 0082009000900608
>  x7 :  x6 : 800035431380
>  x5 : 16a86270 x4 : 
>  x3 : 16a86273 x2 : 
>  x1 : 0100 x0 : 80005bfb81ed
>  Process syz-executor0 (pid: 4725, stack limit = 0x800033db)
>  Call trace:
>  Exception stack(0x800033db3290 to 0x800033db33d0)
>  3280:   80005bfb81ed 0100
>  32a0:  16a86273  16a86270
>  32c0: 800035431380  0082009000900608 dfff2000
>  32e0: 16a8626f 16a8626f  d8316090
>  3300: d8316070 1e518060 2845e8c8 004af000
>  3320:  8000354312c0 800035430640 0380
>  3340: 800035430710 80005bfb80c9 800035430708 8000743340a0
>  3360: 167b66b6 16a860e1 298ac378 800033db33d0
>  3380: 29705cfc 800033db33d0 29705f50 1145
>  33a0: 8000354312c0 800035430640 0001 800074334000
>  33c0: 800033db33d0 29705f50
>  [] __skb_clone+0x430/0x5b0
>  [] skb_clone+0x164/0x2c8
>  [] arp_rcv+0x120/0x488
>  [] __netif_receive_skb_core+0x11e8/0x18c8
>  [] __netif_receive_skb+0x30/0x198
>  [] netif_receive_skb_internal+0x98/0x370
>  [] netif_receive_skb+0x1c/0x28
>  [] tun_get_user+0x12f0/0x2e40
>  [] tun_chr_write_iter+0xbc/0x140
>  [] do_iter_readv_writev+0x2d4/0x468
>  [] do_iter_write+0x148/0x498
>  [] vfs_writev+0x118/0x250
>  [] do_writev+0xc4/0x1e8
>  [] SyS_writev+0x34/0x48
>  Exception stack(0x800033db3ec0 to 0x800033db4000)
>  3ec0: 0015 829985e0 0001 8299851c
>  3ee0: 82999068 82998f60 82999650 
>  3f00: 0042 0036 00406608 82998400
>  3f20: 82998f60 d8316090 d8316070 1e518060
>  3f40:  004af000  0036
>  3f60: 20004fca 2000 0046ccf0 0530
>  3f80: 0046cce8 004ade98  395fa6f0
>  3fa0: 82998f60 82998560 00431448 82998520
>  3fc0: 0043145c 8000 0015 0042
>  3fe0:    
>  [] el0_svc_naked+0x24/0x28
>  Code: f9406680 8b01 91009000 f9800011 (885f7c01)
>  ---[ end trace 261e7ac1458ccc0a ]---

Please provide proper file:line information in this trace.

You can use scripts/decode_stacktrace.sh

Thanks.

Re: v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()

2017-10-19 Thread Eric Dumazet

On Thu, Oct 19, 2017 at 7:16 PM, Wei Wei  wrote:
> Hi all,
>
> I have fuzzed v4.14-rc3 using syzkaller and found a bug similar to that one 
> [1].
> But the call trace isn’t the same. The atomic_inc() might handle a corrupted
> skb_buff.
>
> The logs and config have been uploaded to my github repo [2].
>
> [1] https://lkml.org/lkml/2017/10/2/216
> [2] https://github.com/dotweiba/skb_clone_atomic_inc_bug
>
> Thanks,
> Wei
>
>  Unable to handle kernel paging request at virtual address 80005bfb81ed
>  Mem abort info:
>Exception class = DABT (current EL), IL = 32 bits
>SET = 0, FnV = 0
>EA = 0, S1PTW = 0
>  Data abort info:
>ISV = 0, ISS = 0x0033
>CM = 0, WnR = 0
>  swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000
>  [80005bfb81ed] *pgd=beff7003, *pud=00e88711
>  Internal error: Oops: 9621 [#1] PREEMPT SMP
>  Modules linked in:
>  CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3
>  Hardware name: linux,dummy-virt (DT)
>  task: 800074409e00 task.stack: 800033db
>  PC is at __skb_clone+0x430/0x5b0
>  LR is at __skb_clone+0x1dc/0x5b0
>  pc : [] lr : [] pstate: 1145
>  sp : 800033db33d0
>  x29: 800033db33d0 x28: 298ac378
>  x27: 16a860e1 x26: 167b66b6
>  x25: 8000743340a0 x24: 800035430708
>  x23: 80005bfb80c9 x22: 800035430710
>  x21: 0380 x20: 800035430640
>  x19: 8000354312c0 x18: 
>  x17: 004af000 x16: 2845e8c8
>  x15: 1e518060 x14: d8316070
>  x13: d8316090 x12: 
>  x11: 16a8626f x10: 16a8626f
>  x9 : dfff2000 x8 : 0082009000900608
>  x7 :  x6 : 800035431380
>  x5 : 16a86270 x4 : 
>  x3 : 16a86273 x2 : 
>  x1 : 0100 x0 : 80005bfb81ed
>  Process syz-executor0 (pid: 4725, stack limit = 0x800033db)
>  Call trace:
>  Exception stack(0x800033db3290 to 0x800033db33d0)
>  3280:   80005bfb81ed 0100
>  32a0:  16a86273  16a86270
>  32c0: 800035431380  0082009000900608 dfff2000
>  32e0: 16a8626f 16a8626f  d8316090
>  3300: d8316070 1e518060 2845e8c8 004af000
>  3320:  8000354312c0 800035430640 0380
>  3340: 800035430710 80005bfb80c9 800035430708 8000743340a0
>  3360: 167b66b6 16a860e1 298ac378 800033db33d0
>  3380: 29705cfc 800033db33d0 29705f50 1145
>  33a0: 8000354312c0 800035430640 0001 800074334000
>  33c0: 800033db33d0 29705f50
>  [] __skb_clone+0x430/0x5b0
>  [] skb_clone+0x164/0x2c8
>  [] arp_rcv+0x120/0x488
>  [] __netif_receive_skb_core+0x11e8/0x18c8
>  [] __netif_receive_skb+0x30/0x198
>  [] netif_receive_skb_internal+0x98/0x370
>  [] netif_receive_skb+0x1c/0x28
>  [] tun_get_user+0x12f0/0x2e40
>  [] tun_chr_write_iter+0xbc/0x140
>  [] do_iter_readv_writev+0x2d4/0x468
>  [] do_iter_write+0x148/0x498
>  [] vfs_writev+0x118/0x250
>  [] do_writev+0xc4/0x1e8
>  [] SyS_writev+0x34/0x48
>  Exception stack(0x800033db3ec0 to 0x800033db4000)
>  3ec0: 0015 829985e0 0001 8299851c
>  3ee0: 82999068 82998f60 82999650 
>  3f00: 0042 0036 00406608 82998400
>  3f20: 82998f60 d8316090 d8316070 1e518060
>  3f40:  004af000  0036
>  3f60: 20004fca 2000 0046ccf0 0530
>  3f80: 0046cce8 004ade98  395fa6f0
>  3fa0: 82998f60 82998560 00431448 82998520
>  3fc0: 0043145c 8000 0015 0042
>  3fe0:    
>  [] el0_svc_naked+0x24/0x28
>  Code: f9406680 8b01 91009000 f9800011 (885f7c01)
>  ---[ end trace 261e7ac1458ccc0a ]---

Please provide proper file:line information in this trace.

You can use scripts/decode_stacktrace.sh

Thanks.

[Part2 PATCH v6 00/38] x86: Secure Encrypted Virtualization (AMD)

2017-10-19 Thread Brijesh Singh

This part of Secure Encryted Virtualization (SEV) patch series focuses on KVM
changes required to create and manage SEV guests.

SEV is an extension to the AMD-V architecture which supports running encrypted
virtual machine (VMs) under the control of a hypervisor. Encrypted VMs have 
their
pages (code and data) secured such that only the guest itself has access to
unencrypted version. Each encrypted VM is associated with a unique encryption 
key;
if its data is accessed to a different entity using a different key the 
encrypted
guest's data will be incorrectly decrypted, leading to unintelligible data.
This security model ensures that hypervisor will no longer able to inspect or
alter any guest code or data.

The key management of this feature is handled by a separate processor known as
the AMD Secure Processor (AMD-SP) which is present on AMD SOCs. The SEV Key
Management Specification (see below) provides a set of commands which can be
used by hypervisor to load virtual machine keys through the AMD-SP driver.

The patch series adds a new ioctl in KVM driver (KVM_MEMORY_ENCRYPTION_OP). The
ioctl will be used by qemu to issue SEV guest-specific commands defined in Key
Management Specification.

The following links provide additional details:

AMD Memory Encryption whitepaper:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

AMD64 Architecture Programmer's Manual:
http://support.amd.com/TechDocs/24593.pdf
SME is section 7.10
SEV is section 15.34

Secure Encrypted Virutualization Key Management:
http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf

KVM Forum Presentation:
http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf

SEV Guest BIOS support:
  SEV support has been interated into EDKII/OVMF BIOS
  https://github.com/tianocore/edk2

SEV Part 1 patch series: https://marc.info/?l=kvm=150816835817641=2

--
The series is based on kvm/master commit : cc9085b68753 (Merge branch 
'kvm-ppc-fixes')

Complete tree is available at:
repo: https://github.com/codomania/kvm.git
branch: sev-v6-p2

TODO:
* Add SEV guest migration command support

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Paolo Bonzini 
Cc: "Radim KrÃ„ÂmÃƒÂ¡Ã…â„¢" 
Cc: Joerg Roedel 
Cc: Borislav Petkov 
Cc: Tom Lendacky 
Cc: Herbert Xu 
Cc: David S. Miller 
Cc: Gary Hook 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-cry...@vger.kernel.org

Changes since v5:
 * split the PSP driver support into multiple patches
 * multiple improvements from Boris
 * remove mem_enc_enabled() ops

Changes since v4:
 * Fixes to address kbuild robot errors
 * Add 'sev' module params to allow enable/disable SEV feature
 * Update documentation
 * Multiple fixes to address v4 feedbacks
 * Some coding style changes to address checkpatch reports

Changes since v3:
 * Re-design the PSP interface support patch
 * Rename the ioctls based on the feedbacks
 * Improve documentation
 * Fix i386 build issues
 * Add LAUNCH_SECRET command
 * Add new Kconfig option to enable SEV support
 * Changes to address v3 feedbacks.

Changes since v2:
 * Add KVM_MEMORY_ENCRYPT_REGISTER/UNREGISTER_RAM ioct to register encrypted
   memory ranges (recommend by Paolo)
 * Extend kvm_x86_ops to provide new memory_encryption_enabled ops
 * Enhance DEBUG DECRYPT/ENCRYPT commands to work with more than one page \
(recommended by Paolo)
 * Optimize LAUNCH_UPDATE command to reduce the number of calls to AMD-SP driver
 * Changes to address v2 feedbacks


Borislav Petkov (1):
  crypto: ccp: Build the AMD secure processor driver only with AMD CPU
support

Brijesh Singh (34):
  Documentation/virtual/kvm: Add AMD Secure Encrypted Virtualization
(SEV)
  KVM: SVM: Prepare to reserve asid for SEV guest
  KVM: X86: Extend CPUID range to include new leaf
  KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctl
  KVM: Introduce KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctl
  crypto: ccp: Define SEV userspace ioctl and command id
  crypto: ccp: Define SEV key management command id
  crypto: ccp: Add Platform Security Processor (PSP) device support
  crypto: ccp: Add Secure Encrypted Virtualization (SEV) command support
  crypto: ccp: Implement SEV_FACTORY_RESET ioctl command
  crypto: ccp: Implement SEV_PLATFORM_STATUS ioctl command
  crypto: ccp: Implement SEV_PEK_GEN ioctl command
  crypto: ccp: Implement SEV_PDH_GEN ioctl command
  crypto: ccp: Implement SEV_PEK_CSR ioctl command
  crypto: ccp: Implement SEV_PEK_CERT_IMPORT ioctl command
  crypto: ccp: Implement SEV_PDH_CERT_EXPORT ioctl command
  KVM: X86: Add CONFIG_KVM_AMD_SEV
  KVM: SVM: Add sev module_param
  KVM: SVM: Reserve ASID

[Part2 PATCH v6 00/38] x86: Secure Encrypted Virtualization (AMD)

2017-10-19 Thread Brijesh Singh

This part of Secure Encryted Virtualization (SEV) patch series focuses on KVM
changes required to create and manage SEV guests.

SEV is an extension to the AMD-V architecture which supports running encrypted
virtual machine (VMs) under the control of a hypervisor. Encrypted VMs have 
their
pages (code and data) secured such that only the guest itself has access to
unencrypted version. Each encrypted VM is associated with a unique encryption 
key;
if its data is accessed to a different entity using a different key the 
encrypted
guest's data will be incorrectly decrypted, leading to unintelligible data.
This security model ensures that hypervisor will no longer able to inspect or
alter any guest code or data.

The key management of this feature is handled by a separate processor known as
the AMD Secure Processor (AMD-SP) which is present on AMD SOCs. The SEV Key
Management Specification (see below) provides a set of commands which can be
used by hypervisor to load virtual machine keys through the AMD-SP driver.

The patch series adds a new ioctl in KVM driver (KVM_MEMORY_ENCRYPTION_OP). The
ioctl will be used by qemu to issue SEV guest-specific commands defined in Key
Management Specification.

The following links provide additional details:

AMD Memory Encryption whitepaper:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

AMD64 Architecture Programmer's Manual:
http://support.amd.com/TechDocs/24593.pdf
SME is section 7.10
SEV is section 15.34

Secure Encrypted Virutualization Key Management:
http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf

KVM Forum Presentation:
http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf

SEV Guest BIOS support:
  SEV support has been interated into EDKII/OVMF BIOS
  https://github.com/tianocore/edk2

SEV Part 1 patch series: https://marc.info/?l=kvm=150816835817641=2

--
The series is based on kvm/master commit : cc9085b68753 (Merge branch 
'kvm-ppc-fixes')

Complete tree is available at:
repo: https://github.com/codomania/kvm.git
branch: sev-v6-p2

TODO:
* Add SEV guest migration command support

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Paolo Bonzini 
Cc: "Radim KrÃ„ÂmÃƒÂ¡Ã…â„¢" 
Cc: Joerg Roedel 
Cc: Borislav Petkov 
Cc: Tom Lendacky 
Cc: Herbert Xu 
Cc: David S. Miller 
Cc: Gary Hook 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-cry...@vger.kernel.org

Changes since v5:
 * split the PSP driver support into multiple patches
 * multiple improvements from Boris
 * remove mem_enc_enabled() ops

Changes since v4:
 * Fixes to address kbuild robot errors
 * Add 'sev' module params to allow enable/disable SEV feature
 * Update documentation
 * Multiple fixes to address v4 feedbacks
 * Some coding style changes to address checkpatch reports

Changes since v3:
 * Re-design the PSP interface support patch
 * Rename the ioctls based on the feedbacks
 * Improve documentation
 * Fix i386 build issues
 * Add LAUNCH_SECRET command
 * Add new Kconfig option to enable SEV support
 * Changes to address v3 feedbacks.

Changes since v2:
 * Add KVM_MEMORY_ENCRYPT_REGISTER/UNREGISTER_RAM ioct to register encrypted
   memory ranges (recommend by Paolo)
 * Extend kvm_x86_ops to provide new memory_encryption_enabled ops
 * Enhance DEBUG DECRYPT/ENCRYPT commands to work with more than one page \
(recommended by Paolo)
 * Optimize LAUNCH_UPDATE command to reduce the number of calls to AMD-SP driver
 * Changes to address v2 feedbacks


Borislav Petkov (1):
  crypto: ccp: Build the AMD secure processor driver only with AMD CPU
support

Brijesh Singh (34):
  Documentation/virtual/kvm: Add AMD Secure Encrypted Virtualization
(SEV)
  KVM: SVM: Prepare to reserve asid for SEV guest
  KVM: X86: Extend CPUID range to include new leaf
  KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctl
  KVM: Introduce KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctl
  crypto: ccp: Define SEV userspace ioctl and command id
  crypto: ccp: Define SEV key management command id
  crypto: ccp: Add Platform Security Processor (PSP) device support
  crypto: ccp: Add Secure Encrypted Virtualization (SEV) command support
  crypto: ccp: Implement SEV_FACTORY_RESET ioctl command
  crypto: ccp: Implement SEV_PLATFORM_STATUS ioctl command
  crypto: ccp: Implement SEV_PEK_GEN ioctl command
  crypto: ccp: Implement SEV_PDH_GEN ioctl command
  crypto: ccp: Implement SEV_PEK_CSR ioctl command
  crypto: ccp: Implement SEV_PEK_CERT_IMPORT ioctl command
  crypto: ccp: Implement SEV_PDH_CERT_EXPORT ioctl command
  KVM: X86: Add CONFIG_KVM_AMD_SEV
  KVM: SVM: Add sev module_param
  KVM: SVM: Reserve ASID range for SEV guest
  KVM: Define SEV key management command id
  KVM: SVM: Add KVM_SEV_INIT command
  KVM: SVM: VMRUN should use assosiated ASID when SEV is enabled
  KVM: SVM: Add support for KVM_SEV_LAUNCH_START command

[Part2 PATCH v6 02/38] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

2017-10-19 Thread Brijesh Singh

From: Tom Lendacky 

Update the CPU features to include identifying and reporting on the
Secure Encrypted Virtualization (SEV) feature.  SEV is identified by
CPUID 0x801f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG and set bit 0 of MSR_K7_HWCR).  Only show the SEV feature
as available if reported by CPUID and enabled by BIOS.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Borislav Petkov 
Cc: k...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
Reviewed-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/msr-index.h   |  2 ++
 arch/x86/kernel/cpu/amd.c  | 66 ++
 arch/x86/kernel/cpu/scattered.c|  1 +
 4 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 2519c6c801c9..759d29c37686 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -197,6 +197,7 @@
 #define X86_FEATURE_HW_PSTATE  ( 7*32+ 8) /* AMD HW-PState */
 #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
 #define X86_FEATURE_SME( 7*32+10) /* AMD Secure Memory 
Encryption */
+#define X86_FEATURE_SEV( 7*32+11) /* AMD Secure Encrypted 
Virtualization */
 
 #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number 
*/
 #define X86_FEATURE_INTEL_PT   ( 7*32+15) /* Intel Processor Trace */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 17f5c12e1afd..e399d68029a9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -378,6 +378,8 @@
 #define MSR_K7_PERFCTR30xc0010007
 #define MSR_K7_CLK_CTL 0xc001001b
 #define MSR_K7_HWCR0xc0010015
+#define MSR_K7_HWCR_SMMLOCK_BIT0
+#define MSR_K7_HWCR_SMMLOCKBIT_ULL(MSR_K7_HWCR_SMMLOCK_BIT)
 #define MSR_K7_FID_VID_CTL 0xc0010041
 #define MSR_K7_FID_VID_STATUS  0xc0010042
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index d58184b7cd44..c1234aa0550c 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -556,6 +556,51 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
}
 }
 
+static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
+{
+   u64 msr;
+
+   /*
+* BIOS support is required for SME and SEV.
+*   For SME: If BIOS has enabled SME then adjust x86_phys_bits by
+*the SME physical address space reduction value.
+*If BIOS has not enabled SME then don't advertise the
+*SME feature (set in scattered.c).
+*   For SEV: If BIOS has not enabled SEV then don't advertise the
+*SEV feature (set in scattered.c).
+*
+*   In all cases, since support for SME and SEV requires long mode,
+*   don't advertise the feature under CONFIG_X86_32.
+*/
+   if (cpu_has(c, X86_FEATURE_SME) || cpu_has(c, X86_FEATURE_SEV)) {
+   /* Check if memory encryption is enabled */
+   rdmsrl(MSR_K8_SYSCFG, msr);
+   if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+   goto clear_all;
+
+   /*
+* Always adjust physical address bits. Even though this
+* will be a value above 32-bits this is still done for
+* CONFIG_X86_32 so that accurate values are reported.
+*/
+   c->x86_phys_bits -= (cpuid_ebx(0x801f) >> 6) & 0x3f;
+
+   if (IS_ENABLED(CONFIG_X86_32))
+   goto clear_all;
+
+   rdmsrl(MSR_K7_HWCR, msr);
+   if (!(msr & MSR_K7_HWCR_SMMLOCK))
+   goto clear_sev;
+
+   return;
+
+clear_all:
+   clear_cpu_cap(c, X86_FEATURE_SME);
+clear_sev:
+   clear_cpu_cap(c, X86_FEATURE_SEV);
+   }
+}
+
 static void early_init_amd(struct cpuinfo_x86 *c)
 {
u32 dummy;
@@ -627,26 +672,7 @@ static void early_init_amd(struct cpuinfo_x86 *c)
if (cpu_has_amd_erratum(c, amd_erratum_400))
set_cpu_bug(c, X86_BUG_AMD_E400);
 
-   /*
-* BIOS support is required for SME. If BIOS has enabled SME then
-* adjust x86_phys_bits by the SME physical address space reduction
-* value. If BIOS has not enabled SME then don't advertise the
-* feature (set in scattered.c). Also, since the SME support requires
-* long mode, don't advertise the feature

[Part2 PATCH v6 02/38] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

2017-10-19 Thread Brijesh Singh

From: Tom Lendacky 

Update the CPU features to include identifying and reporting on the
Secure Encrypted Virtualization (SEV) feature.  SEV is identified by
CPUID 0x801f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG and set bit 0 of MSR_K7_HWCR).  Only show the SEV feature
as available if reported by CPUID and enabled by BIOS.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Borislav Petkov 
Cc: k...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
Reviewed-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/msr-index.h   |  2 ++
 arch/x86/kernel/cpu/amd.c  | 66 ++
 arch/x86/kernel/cpu/scattered.c|  1 +
 4 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 2519c6c801c9..759d29c37686 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -197,6 +197,7 @@
 #define X86_FEATURE_HW_PSTATE  ( 7*32+ 8) /* AMD HW-PState */
 #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
 #define X86_FEATURE_SME( 7*32+10) /* AMD Secure Memory 
Encryption */
+#define X86_FEATURE_SEV( 7*32+11) /* AMD Secure Encrypted 
Virtualization */
 
 #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number 
*/
 #define X86_FEATURE_INTEL_PT   ( 7*32+15) /* Intel Processor Trace */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 17f5c12e1afd..e399d68029a9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -378,6 +378,8 @@
 #define MSR_K7_PERFCTR30xc0010007
 #define MSR_K7_CLK_CTL 0xc001001b
 #define MSR_K7_HWCR0xc0010015
+#define MSR_K7_HWCR_SMMLOCK_BIT0
+#define MSR_K7_HWCR_SMMLOCKBIT_ULL(MSR_K7_HWCR_SMMLOCK_BIT)
 #define MSR_K7_FID_VID_CTL 0xc0010041
 #define MSR_K7_FID_VID_STATUS  0xc0010042
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index d58184b7cd44..c1234aa0550c 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -556,6 +556,51 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
}
 }
 
+static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
+{
+   u64 msr;
+
+   /*
+* BIOS support is required for SME and SEV.
+*   For SME: If BIOS has enabled SME then adjust x86_phys_bits by
+*the SME physical address space reduction value.
+*If BIOS has not enabled SME then don't advertise the
+*SME feature (set in scattered.c).
+*   For SEV: If BIOS has not enabled SEV then don't advertise the
+*SEV feature (set in scattered.c).
+*
+*   In all cases, since support for SME and SEV requires long mode,
+*   don't advertise the feature under CONFIG_X86_32.
+*/
+   if (cpu_has(c, X86_FEATURE_SME) || cpu_has(c, X86_FEATURE_SEV)) {
+   /* Check if memory encryption is enabled */
+   rdmsrl(MSR_K8_SYSCFG, msr);
+   if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+   goto clear_all;
+
+   /*
+* Always adjust physical address bits. Even though this
+* will be a value above 32-bits this is still done for
+* CONFIG_X86_32 so that accurate values are reported.
+*/
+   c->x86_phys_bits -= (cpuid_ebx(0x801f) >> 6) & 0x3f;
+
+   if (IS_ENABLED(CONFIG_X86_32))
+   goto clear_all;
+
+   rdmsrl(MSR_K7_HWCR, msr);
+   if (!(msr & MSR_K7_HWCR_SMMLOCK))
+   goto clear_sev;
+
+   return;
+
+clear_all:
+   clear_cpu_cap(c, X86_FEATURE_SME);
+clear_sev:
+   clear_cpu_cap(c, X86_FEATURE_SEV);
+   }
+}
+
 static void early_init_amd(struct cpuinfo_x86 *c)
 {
u32 dummy;
@@ -627,26 +672,7 @@ static void early_init_amd(struct cpuinfo_x86 *c)
if (cpu_has_amd_erratum(c, amd_erratum_400))
set_cpu_bug(c, X86_BUG_AMD_E400);
 
-   /*
-* BIOS support is required for SME. If BIOS has enabled SME then
-* adjust x86_phys_bits by the SME physical address space reduction
-* value. If BIOS has not enabled SME then don't advertise the
-* feature (set in scattered.c). Also, since the SME support requires
-* long mode, don't advertise the feature under CONFIG_X86_32.
-*/
-   if (cpu_has(c, X86_FEATURE_SME)) {
-   u64 msr;
-
-   /* Check if SME is enabled */
-   rdmsrl(MSR_K8_SYSCFG, msr);
-

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2506 matches

Mail list logo