Re: [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc

2016-09-26 Thread Reza Arbab

On Tue, Sep 27, 2016 at 07:15:41AM +1000, Benjamin Herrenschmidt wrote:

What is that business with a command line argument ? Do that mean that
we'll need some magic command line argument to properly handle LPC memory
on CAPI devices or GPUs ? If yes that's bad ... kernel arguments should
be a last resort.


Well, movable_node is just a boolean, meaning "allow nodes which contain 
only movable memory". It's _not_ like "movable_node=10,13-15,17", if 
that's what you were thinking.



We should have all the information we need from the device-tree.

Note also that we shouldn't need to create those nodes at boot time,
we need to add the ability to create the whole thing at runtime, we may know
that there's an NPU with an LPC window in the system but we won't know if it's
used until it is and for CAPI we just simply don't know until some PCI device
gets turned into CAPI mode and starts claiming LPC memory...


Yes, this is what is planned for, if I'm understanding you correctly.

In the dt, the PCI device node has a phandle pointing to the memory 
node. The memory node describes the window into which we can hotplug at 
runtime.


--
Reza Arbab

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node

2016-09-26 Thread Reza Arbab

On Tue, Sep 27, 2016 at 07:12:31AM +1000, Benjamin Herrenschmidt wrote:
In any case, if the memory hasn't been hotplug, this shouldn't be 
necessary as we shouldn't be considering it for allocation.


Right. To be clear, the background info I put in the commit log refers 
to x86, where the SRAT can describe movable nodes which exist at boot.  
They're trying to avoid allocations from those nodes before they've been 
identified.


On power, movable nodes can only exist via hotplug, so that scenario 
can't happen. We can immediately go back to top-down allocation. That is 
the missing call being added in the patch.


--
Reza Arbab

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes

2016-09-26 Thread Waiman Long

On 09/23/2016 09:02 AM, Thomas Gleixner wrote:

On Thu, 22 Sep 2016, Waiman Long wrote:

Locking was done mostly by lock stealing. This is where most of the
performance benefit comes from, not optimistic spinning.

How does the lock latency distribution of all this look like and how fair
is the whole thing?

The TO futexes are unfair as can be seen from the min/max thread times listed
above. It took the fastest thread 0.07s to complete all the locking
operations, whereas the slowest one needed 2.65s. However, the situation
reverses when I changed the critical section to a 1us sleep. In this case,

1us sleep is going to add another syscall and therefor scheduling, so what?

Or did you just extend the critical section busy time?


The 1us sleep will cause the spinning to stop and make all the waiters 
sleep. This is to simulate the extreme case where TO futex may not have 
the performance advantage.





there will be no optimistic spinning. The performance results for 100k locking
operations were listed below.

 wait-wake futex PI futexTO futex
 --- 
max time0.06s 9.32s  4.76s

    


Yes, wait-wake futex is the unfair one in this case.


min time5.59s 9.36s  5.62s
average time3.25s 9.35s  5.41s

In this case, the TO futexes are fairer but perform worse than the wait-wake
futexes. That is because the lock handoff mechanism limit the amount of lock
stealing in the TO futexes while the wait-wake futexes have no such
restriction. When I disabled  lock handoff, the TO futexes would then perform
similar to the wait-wake futexes.

So the benefit of these new fangled futexes is only there for extreme short
critical sections and a gazillion of threads fighting for the same futex,
right?


Not really. Lock stealing will help performance when a gazillion of 
threads fighting for the same futex. Optimistic spinning will help to 
reduce the lock transfer latency because the waiter isn't sleeping no 
matter the number of threads. One set of data that I haven't shown so 
far is that the performance delta between wait-wait and TO futexes 
actually increases as the critical section is lengthened. This is 
because for short critical section, the waiters of wait-wake futex may 
not actually go to sleep because of the latency introduced by the code 
that has to be run before they do a final check to see if the futex 
value change before going to sleep. The longer the critical section, the 
higher the chance that they actually sleep and hence their performance 
is getting worse relative to the TO futexes.


For example, with the critical section of 50 pause instructions instead 
of 5, the performance gain is about 5X instead of about 1.6X in the 
latter case.



I really wonder how the average programmer should pick the right flavour,
not to talk about any useful decision for something like glibc to pick the
proper one.


I would say that TO futexes will have better performance in most cases. 
Of course, I still need to run some real world benchmarks to quantify 
the effect of the new futexes. I am hoping to get suggestion of what is 
a good set of benchmarks to run.


Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip] locking/rtmutex: Reduce top-waiter blocking on a lock

2016-09-26 Thread Waiman Long

On 09/23/2016 09:28 PM, Davidlohr Bueso wrote:


+#ifdef CONFIG_RT_MUTEX_SPIN_ON_OWNER
+static bool rt_mutex_spin_on_owner(struct rt_mutex *lock,
+   struct task_struct *owner)
+{
+bool ret = true;
+
+/*
+ * The last owner could have just released the lock,
+ * immediately try taking it again.
+ */
+if (!owner)
+goto done;
+
+rcu_read_lock();
+while (rt_mutex_owner(lock) == owner) {
+/*
+ * Ensure we emit the owner->on_cpu, dereference _after_
+ * checking lock->owner still matches owner. If that fails,
+ * owner might point to freed memory. If it still matches,
+ * the rcu_read_lock() ensures the memory stays valid.
+ */
+barrier();
+if (!owner->on_cpu || need_resched()) {
+ret = false;
+break;
+}
+
+cpu_relax_lowlatency();
+}
+rcu_read_unlock();
+done:
+return ret;
+}
+


One issue that I saw is that the spinner may no longer be the top waiter 
while spinning. Should we also check this condition in the spin loop?


Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc

2016-09-26 Thread Benjamin Herrenschmidt
On Sun, 2016-09-25 at 13:36 -0500, Reza Arbab wrote:
> To create a movable node, we need to hotplug all of its memory into
> ZONE_MOVABLE.
> 
> Note that to do this, auto_online_blocks should be off. Since the memory
> will first be added to the default zone, we must explicitly use
> online_movable to online.
> 
> Because such a node contains no normal memory, can_online_high_movable()
> will only allow us to do the onlining if CONFIG_MOVABLE_NODE is set.
> Enable the use of this config option on PPC64 platforms.

What is that business with a command line argument ? Do that mean that
we'll need some magic command line argument to properly handle LPC memory
on CAPI devices or GPUs ? If yes that's bad ... kernel arguments should
be a last resort.

We should have all the information we need from the device-tree.

Note also that we shouldn't need to create those nodes at boot time,
we need to add the ability to create the whole thing at runtime, we may know
that there's an NPU with an LPC window in the system but we won't know if it's
used until it is and for CAPI we just simply don't know until some PCI device
gets turned into CAPI mode and starts claiming LPC memory...

Ben.

> Signed-off-by: Reza Arbab 
> ---
>  Documentation/kernel-parameters.txt | 2 +-
>  mm/Kconfig  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index a4f4d69..3d8460d 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2344,7 +2344,7 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
> >     that the amount of memory usable for all allocations
> >     is not too small.
>  
> > > - movable_node[KNL,X86] Boot-time switch to enable the effects
> > > + movable_node[KNL,X86,PPC] Boot-time switch to enable the effects
> >     of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
>  
> > >   MTD_Partition=  [MTD]
> diff --git a/mm/Kconfig b/mm/Kconfig
> index be0ee11..4b19cd3 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -153,7 +153,7 @@ config MOVABLE_NODE
> >     bool "Enable to assign a node which has only movable memory"
> >     depends on HAVE_MEMBLOCK
> >     depends on NO_BOOTMEM
> > -   depends on X86_64
> > +   depends on X86_64 || PPC64
> >     depends on NUMA
> >     default n
> >     help
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node

2016-09-26 Thread Reza Arbab

On Mon, Sep 26, 2016 at 09:17:43PM +0530, Aneesh Kumar K.V wrote:

+   /* bottom-up allocation may have been set by movable_node */
+   memblock_set_bottom_up(false);
+


By then we have done few memblock allocation right ?


Yes, some allocations do occur while bottom-up is set.

IMHO, we should do this early enough in prom.c after we do 
parse_early_param, with a comment there explaining that, we don't 
really support hotplug memblock and when we do that, this should be 
moved to a place where we can handle memblock allocation such that we 
avoid spreading memblock allocation to movable node.


Sure, we can do it earlier. The only consideration is that any potential 
calls to memblock_mark_hotplug() happen before we reset to top-down.  
Since we don't do that at all on power, the call can go anywhere.


--
Reza Arbab

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Moving runnable code from Documentation (last 2 patches)

2016-09-26 Thread Kees Cook
On Mon, Sep 26, 2016 at 11:40 AM, Shuah Khan  wrote:
> This patch series contains the last 2 patches to complete moving runnable
> code from Documentation to selftests, samples, and tools.
>
> The first patch moves blackfin gptimers-example to samples and removes
> CONFIG_BUILD_DOCSRC.
>
> The second one updates 00-INDEX files under Documentation to reflect the
> move of runnable code from Documentation.

Looks good to me!

Reviewed-by: Kees Cook 

-Kees

-- 
Kees Cook
Nexus Security
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Doc: update 00-INDEX files to reflect the runnable code move

2016-09-26 Thread Shuah Khan
Update 00-INDEX files with the current file list to reflect the runnable
code move.

Signed-off-by: Shuah Khan 
---
 Documentation/00-INDEX | 2 --
 Documentation/arm/00-INDEX | 2 --
 Documentation/filesystems/00-INDEX | 2 --
 Documentation/networking/00-INDEX  | 2 --
 Documentation/spi/00-INDEX | 2 --
 Documentation/timers/00-INDEX  | 2 --
 6 files changed, 12 deletions(-)

diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index cb9a6c6..b79d661 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -45,8 +45,6 @@ IRQ.txt
- description of what an IRQ is.
 Intel-IOMMU.txt
- basic info on the Intel IOMMU virtualization support.
-Makefile
-   - some files in Documentation dir are actually sample code to build
 ManagementStyle
- how to (attempt to) manage kernel hackers.
 RCU/
diff --git a/Documentation/arm/00-INDEX b/Documentation/arm/00-INDEX
index dea011c..b6e69fd 100644
--- a/Documentation/arm/00-INDEX
+++ b/Documentation/arm/00-INDEX
@@ -8,8 +8,6 @@ Interrupts
- ARM Interrupt subsystem documentation
 IXP4xx
- Intel IXP4xx Network processor.
-Makefile
-   - Build sourcefiles as part of the Documentation-build for arm
 Netwinder
- Netwinder specific documentation
 Porting
diff --git a/Documentation/filesystems/00-INDEX 
b/Documentation/filesystems/00-INDEX
index 9922939..f66e748 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -2,8 +2,6 @@
- this file (info on some of the filesystems supported by linux).
 Locking
- info on locking rules as they pertain to Linux VFS.
-Makefile
-   - Makefile for building the filsystems-part of DocBook.
 9p.txt
- 9p (v9fs) is an implementation of the Plan 9 remote fs protocol.
 adfs.txt
diff --git a/Documentation/networking/00-INDEX 
b/Documentation/networking/00-INDEX
index 415154a..98f3d4b 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -10,8 +10,6 @@ LICENSE.qlge
- GPLv2 for QLogic Linux qlge NIC Driver
 LICENSE.qlcnic
- GPLv2 for QLogic Linux qlcnic NIC Driver
-Makefile
-   - Makefile for docsrc.
 PLIP.txt
- PLIP: The Parallel Line Internet Protocol device driver
 README.ipw2100
diff --git a/Documentation/spi/00-INDEX b/Documentation/spi/00-INDEX
index 4644bf0..8e4bb17 100644
--- a/Documentation/spi/00-INDEX
+++ b/Documentation/spi/00-INDEX
@@ -1,7 +1,5 @@
 00-INDEX
- this file.
-Makefile
-   - Makefile for the example sourcefiles.
 butterfly
- AVR Butterfly SPI driver overview and pin configuration.
 ep93xx_spi
diff --git a/Documentation/timers/00-INDEX b/Documentation/timers/00-INDEX
index ee212a2..6ee117b 100644
--- a/Documentation/timers/00-INDEX
+++ b/Documentation/timers/00-INDEX
@@ -8,8 +8,6 @@ hpet_example.c
- sample hpet timer test program
 hrtimers.txt
- subsystem for high-resolution kernel timers
-Makefile
-   - Build and link hpet_example
 NO_HZ.txt
- Summary of the different methods for the scheduler clock-interrupts 
management.
 timekeeping.txt
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] samples: move blackfin gptimers-example from Documentation

2016-09-26 Thread Shuah Khan
Move blackfin gptimers-example to samples and remove it from Documentation
Makefile. Update samples Kconfig and Makefile to build gptimers-example.

blackfin is the last CONFIG_BUILD_DOCSRC target in Documentation/Makefile,
hence this patch also includes changes to remove CONFIG_BUILD_DOCSRC from
Makefile and lib/Kconfig.debug.

Signed-off-by: Shuah Khan 
---
 Documentation/Makefile|  1 -
 Documentation/blackfin/00-INDEX   |  4 --
 Documentation/blackfin/Makefile   |  5 --
 Documentation/blackfin/gptimers-example.c | 91 ---
 Makefile  |  3 -
 lib/Kconfig.debug |  9 ---
 samples/Kconfig   |  6 ++
 samples/Makefile  |  2 +-
 samples/blackfin/Makefile |  1 +
 samples/blackfin/gptimers-example.c   | 91 +++
 10 files changed, 99 insertions(+), 114 deletions(-)
 delete mode 100644 Documentation/Makefile
 delete mode 100644 Documentation/blackfin/Makefile
 delete mode 100644 Documentation/blackfin/gptimers-example.c
 create mode 100644 samples/blackfin/Makefile
 create mode 100644 samples/blackfin/gptimers-example.c

diff --git a/Documentation/Makefile b/Documentation/Makefile
deleted file mode 100644
index 8435965..000
--- a/Documentation/Makefile
+++ /dev/null
@@ -1 +0,0 @@
-subdir-y := blackfin
diff --git a/Documentation/blackfin/00-INDEX b/Documentation/blackfin/00-INDEX
index c54fcdd..265a1ef 100644
--- a/Documentation/blackfin/00-INDEX
+++ b/Documentation/blackfin/00-INDEX
@@ -1,10 +1,6 @@
 00-INDEX
- This file
-Makefile
-   - Makefile for gptimers example file.
 bfin-gpio-notes.txt
- Notes in developing/using bfin-gpio driver.
 bfin-spi-notes.txt
- Notes for using bfin spi bus driver.
-gptimers-example.c
-   - gptimers example
diff --git a/Documentation/blackfin/Makefile b/Documentation/blackfin/Makefile
deleted file mode 100644
index 6782c58..000
--- a/Documentation/blackfin/Makefile
+++ /dev/null
@@ -1,5 +0,0 @@
-ifneq ($(CONFIG_BLACKFIN),)
-ifneq ($(CONFIG_BFIN_GPTIMERS),)
-obj-m := gptimers-example.o
-endif
-endif
diff --git a/Documentation/blackfin/gptimers-example.c 
b/Documentation/blackfin/gptimers-example.c
deleted file mode 100644
index 283eba9..000
--- a/Documentation/blackfin/gptimers-example.c
+++ /dev/null
@@ -1,91 +0,0 @@
-/*
- * Simple gptimers example
- * 
http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:drivers:gptimers
- *
- * Copyright 2007-2009 Analog Devices Inc.
- *
- * Licensed under the GPL-2 or later.
- */
-
-#include 
-#include 
-
-#include 
-#include 
-
-/* ... random driver includes ... */
-
-#define DRIVER_NAME "gptimer_example"
-
-#ifdef IRQ_TIMER5
-#define SAMPLE_IRQ_TIMER IRQ_TIMER5
-#else
-#define SAMPLE_IRQ_TIMER IRQ_TIMER2
-#endif
-
-struct gptimer_data {
-   uint32_t period, width;
-};
-static struct gptimer_data data;
-
-/* ... random driver state ... */
-
-static irqreturn_t gptimer_example_irq(int irq, void *dev_id)
-{
-   struct gptimer_data *data = dev_id;
-
-   /* make sure it was our timer which caused the interrupt */
-   if (!get_gptimer_intr(TIMER5_id))
-   return IRQ_NONE;
-
-   /* read the width/period values that were captured for the waveform */
-   data->width = get_gptimer_pwidth(TIMER5_id);
-   data->period = get_gptimer_period(TIMER5_id);
-
-   /* acknowledge the interrupt */
-   clear_gptimer_intr(TIMER5_id);
-
-   /* tell the upper layers we took care of things */
-   return IRQ_HANDLED;
-}
-
-/* ... random driver code ... */
-
-static int __init gptimer_example_init(void)
-{
-   int ret;
-
-   /* grab the peripheral pins */
-   ret = peripheral_request(P_TMR5, DRIVER_NAME);
-   if (ret) {
-   printk(KERN_NOTICE DRIVER_NAME ": peripheral request failed\n");
-   return ret;
-   }
-
-   /* grab the IRQ for the timer */
-   ret = request_irq(SAMPLE_IRQ_TIMER, gptimer_example_irq,
-   IRQF_SHARED, DRIVER_NAME, );
-   if (ret) {
-   printk(KERN_NOTICE DRIVER_NAME ": IRQ request failed\n");
-   peripheral_free(P_TMR5);
-   return ret;
-   }
-
-   /* setup the timer and enable it */
-   set_gptimer_config(TIMER5_id,
-   WDTH_CAP | PULSE_HI | PERIOD_CNT | IRQ_ENA);
-   enable_gptimers(TIMER5bit);
-
-   return 0;
-}
-module_init(gptimer_example_init);
-
-static void __exit gptimer_example_exit(void)
-{
-   disable_gptimers(TIMER5bit);
-   free_irq(SAMPLE_IRQ_TIMER, );
-   peripheral_free(P_TMR5);
-}
-module_exit(gptimer_example_exit);
-
-MODULE_LICENSE("BSD");
diff --git a/Makefile b/Makefile
index 1a8c8dd..de5136a 100644
--- a/Makefile
+++ b/Makefile
@@ -926,9 +926,6 @@ vmlinux_prereq: $(vmlinux-deps) FORCE
 ifdef CONFIG_HEADERS_CHECK

[PATCH 0/2] Moving runnable code from Documentation (last 2 patches)

2016-09-26 Thread Shuah Khan
This patch series contains the last 2 patches to complete moving runnable
code from Documentation to selftests, samples, and tools.

The first patch moves blackfin gptimers-example to samples and removes
CONFIG_BUILD_DOCSRC.

The second one updates 00-INDEX files under Documentation to reflect the
move of runnable code from Documentation.

Shuah Khan (2):
  samples: move blackfin gptimers-example from Documentation
  Doc: update 00-INDEX files to reflect the runnable code move

 Documentation/00-INDEX|  2 -
 Documentation/Makefile|  1 -
 Documentation/arm/00-INDEX|  2 -
 Documentation/blackfin/00-INDEX   |  4 --
 Documentation/blackfin/Makefile   |  5 --
 Documentation/blackfin/gptimers-example.c | 91 ---
 Documentation/filesystems/00-INDEX|  2 -
 Documentation/networking/00-INDEX |  2 -
 Documentation/spi/00-INDEX|  2 -
 Documentation/timers/00-INDEX |  2 -
 Makefile  |  3 -
 lib/Kconfig.debug |  9 ---
 samples/Kconfig   |  6 ++
 samples/Makefile  |  2 +-
 samples/blackfin/Makefile |  1 +
 samples/blackfin/gptimers-example.c   | 91 +++
 16 files changed, 99 insertions(+), 126 deletions(-)
 delete mode 100644 Documentation/Makefile
 delete mode 100644 Documentation/blackfin/Makefile
 delete mode 100644 Documentation/blackfin/gptimers-example.c
 create mode 100644 samples/blackfin/Makefile
 create mode 100644 samples/blackfin/gptimers-example.c

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc

2016-09-26 Thread Aneesh Kumar K.V
Reza Arbab  writes:

> To create a movable node, we need to hotplug all of its memory into
> ZONE_MOVABLE.
>
> Note that to do this, auto_online_blocks should be off. Since the memory
> will first be added to the default zone, we must explicitly use
> online_movable to online.
>
> Because such a node contains no normal memory, can_online_high_movable()
> will only allow us to do the onlining if CONFIG_MOVABLE_NODE is set.
> Enable the use of this config option on PPC64 platforms.
>

Reviewed-by: Aneesh Kumar K.V 

> Signed-off-by: Reza Arbab 
> ---
>  Documentation/kernel-parameters.txt | 2 +-
>  mm/Kconfig  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index a4f4d69..3d8460d 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2344,7 +2344,7 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
>   that the amount of memory usable for all allocations
>   is not too small.
>
> - movable_node[KNL,X86] Boot-time switch to enable the effects
> + movable_node[KNL,X86,PPC] Boot-time switch to enable the effects
>   of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
>
>   MTD_Partition=  [MTD]
> diff --git a/mm/Kconfig b/mm/Kconfig
> index be0ee11..4b19cd3 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -153,7 +153,7 @@ config MOVABLE_NODE
>   bool "Enable to assign a node which has only movable memory"
>   depends on HAVE_MEMBLOCK
>   depends on NO_BOOTMEM
> - depends on X86_64
> + depends on X86_64 || PPC64
>   depends on NUMA
>   default n
>   help
> -- 
> 1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/11] pci: support for configurable PCI endpoint

2016-09-26 Thread Kishon Vijay Abraham I
Hi Arnd,

On Thursday 22 September 2016 07:04 PM, Arnd Bergmann wrote:
> On Thursday, September 15, 2016 2:03:05 PM CEST Kishon Vijay Abraham I wrote:
>> On Wednesday 14 September 2016 06:55 PM, Arnd Bergmann wrote:
>>> On Wednesday, September 14, 2016 10:41:56 AM CEST Kishon Vijay Abraham I 
>>> wrote:
>>> I've added the drivers/ntb maintainers to Cc, given that there is
>>> a certain degree of overlap between your work and the existing
>>> code, I think they should be part of the discussion.
>>>  
 Known Limitation:
*) Does not support multi-function devices
>>>
>>> If I understand it right, this was a problem for USB and adding
>>> it later made it somewhat inconsistent. Maybe we can at least
>>> try to come up with an idea of how multi-function devices
>>> could be handled even if we don't implement it until someone
>>> actually needs it.
>>
>> Actually IMO multi-function device in PCI should be much simpler than it is 
>> for
>> USB. In the case of USB, all the functions in a multi-function device will
>> share the same *usb configuration* . (USB device can have multiple
>> configuration but only one can be enabled at a time). A multi-function USB
>> device will still have a single vendor-id/product-id/class... So I think a
>> separate library (composite.c) in USB makes sense.
> 
> Ok, makes sense.
> 
>> But in the case of PCI, every function can be treated independently since all
>> the functions have it's own 4KB configuration space. Each function can be
>> configured independently. Each can have it's own vendor-id/product-id/class..
>> I'm not sure if we'll need a separate library for PCI like we have for USB.
> 
> I think it depends on whether we want to add the software multi-function
> support you mention.
> 
>> Now the restriction for not allowing multi-function device is because of the
>> following structure definition.
>>
>> struct pci_epc {
>>  ..
>> struct pci_epf *epf;
>>  ..
>> };
>>
>> EPC has a single reference to EPF and it is used *only* to notify the 
>> function
>> driver when the link is up. (If this can be changed to use notification
>> mechanism, multi-function devices can be supported here)
>>
>> One more place where this restriction arises is in designware driver
>>
>> struct dw_pcie_ep {
>>  ..
>> u8 bar_to_atu[6];
>>  ..
>> };
>>
>> We use single ATU window to configure a BAR (in BAR). If there are multiple
>> functions, then this should also be modified since each function has 6 BARs.
>>
>> This can be fixed without much effort unless some other issue props up.
> 
> Ok.
> 
>>>
>>> Is your hardware able to make the PCIe endpoint look like
>>> a device with multiple PCI functions, or would one have to
>>> do this in software inside of a single PCI function if we
>>> ever need it?
>>
>> The hardware I have doesn't support multiple PCI functions (like having a
>> separate configuration space for each function). It has a dedicated space for
>> configuration space supporting only one function. [Section 24.9.7.3.2
>> PCIe_SS_EP_CFG_DBICS Register Description in  [1]].
>>
>> yeah, it has to be done in software (but that won't be multi-function device 
>> in
>> PCI terms).
>>
>> [1] -> http://www.ti.com/lit/ug/spruhz6g/spruhz6g.pdf
> 
> Ok, so in theory there can be other hardware (and quite likely is)
> that supports multiple functions, and we can extend the framework
> to support them without major obstacles, but your hardware doesn't,
> so you kept it simple with one hardcoded function, right?

right, PCIe can have upto 8 functions. So the issues with the current framework
has to be fixed. I don't expect major obstacles with this as of now.
> 
> Seems completely reasonable to me.
> 
 TODO:
*) access buffers in RC
*) raise MSI interrupts
*) Enable user space control for the RC side PCI driver
>>>
>>> The user space control would end up just being one of several
>>> gadget drivers, right? E.g. gadget drivers for standard hardware
>>> (8250 uart, ATA, NVMe, some ethernet) could be done as kernel
>>> drivers while a user space driver can be used for things that
>>> are more unusual and that don't need to interface to another
>>> part of the kernel?
>>
>> Actually I didn't mean that. It was more with respect to the host side PCI 
>> test
>> driver (drivers/misc/pci_endpoint_test.c). Right now it validates BAR, irq
>> itself. I wanted to change this so that the user controls which tests to run.
>> (Like for USB gadget zero tests, testusb.c invokes ioctls to perform various
>> tests). Similarly I want to have a userspace program invoke pci_endpoint_test
>> to perform various PCI tests.
> 
> Ok, I see. So what I described above would be yet another function
> driver that can be implemented, but so far, you have not planned
> to do that because there was not need, right?

right. I felt pci_endpoint_test is the generic function that would be of
interest to all the vendors. Any new function can be added by taking